1. Genetics and Genomics
  2. Microbiology and Infectious Disease
Download icon

Pyphe, a python toolbox for assessing microbial growth and cell viability in high-throughput colony screens

  1. Stephan Kamrad
  2. María Rodríguez-López
  3. Cristina Cotobal
  4. Clara Correia-Melo
  5. Markus Ralser  Is a corresponding author
  6. Jürg Bähler  Is a corresponding author
  1. University College London, Institute of Healthy Ageing, Department of Genetics, Evolution and Environment, United Kingdom
  2. The Francis Crick Institute, Molecular Biology of Metabolism Laboratory, United Kingdom
  3. Charité Universitaetsmedizin Berlin, Department of Biochemistry, Germany
Tools and Resources
  • Cited 0
  • Views 470
  • Annotations
Cite this article as: eLife 2020;9:e55160 doi: 10.7554/eLife.55160

Abstract

Microbial fitness screens are a key technique in functional genomics. We present an all-in-one solution, pyphe, for automating and improving data analysis pipelines associated with large-scale fitness screens, including image acquisition and quantification, data normalisation, and statistical analysis. Pyphe is versatile and processes fitness data from colony sizes, viability scores from phloxine B staining or colony growth curves, all obtained with inexpensive transilluminating flatbed scanners. We apply pyphe to show that the fitness information contained in late endpoint measurements of colony sizes is similar to maximum growth slopes from time series. We phenotype gene-deletion strains of fission yeast in 59,350 individual fitness assays in 70 conditions, revealing that colony size and viability provide complementary, independent information. Viability scores obtained from quantifying the redness of phloxine-stained colonies accurately reflect the fraction of live cells within colonies. Pyphe is user-friendly, open-source and fully documented, illustrated by applications to diverse fitness analysis scenarios.

Introduction

Colony fitness screens are a key assay in microbial genetics. The availability of knock-out libraries has revolutionised reverse genetics and enabled the field of functional genomics (Giaever and Nislow, 2014). Simultaneously, large collections of wild isolates (Jeffares et al., 2015; Peter et al., 2018), as well as synthetic populations (Bloom et al., 2013; Cubillos et al., 2013), have proven a powerful tool to study complex traits. More recently, the systematic measurement of fitness for hundreds of conditions and/or hundreds/thousands of strains in parallel is driving our systems-level understanding of gene function (Brochado et al., 2018; Costanzo et al., 2016; Kuzmin et al., 2018; Nichols et al., 2011).

Microbial phenomics screens generally follow a workflow where strains are arranged in high-density arrays (e.g. 384 or 1536 colonies per plate) and transferred using a colony-pinning robot or manual replicator. Image analysis software enables fast and precise quantification of colony sizes and other phenotypes (Bischof et al., 2016; Kritikos et al., 2017; Lawless et al., 2010; Memarian et al., 2007; Wagih and Parts, 2014). Colony-size data is prone to noise and technical variation between areas on the same plate and across plates and batches, some of which can be corrected by normalisation procedures (Baryshnikova et al., 2010; Blomberg, 2011; Zackrisson et al., 2016). Finally, differential fitness is assessed statistically, for which specialised approaches are available (Collins et al., 2010; Collins et al., 2006; Wagih and Parts, 2015).

Most screens use a single image or timepoint per plate (an endpoint measurement). Potentially more information is contained in the growth of colony sizes over time and a low-resolution time course of colony sizes can be used to fit growth models to population size data (Addinall et al., 2011; Banks et al., 2012; Shah et al., 2007). High-resolution image time series contain potentially even more information and have been used to determine lag phases (Levin-Reisman et al., 2014). Recently, highly precise fitness determination has been achieved by high-resolution, transilluminating time course imaging and growth curve analysis (Takeuchi et al., 2014) and combined with a reference grid normalisation (Zackrisson et al., 2016). The parallel use of commercially available scanners, combined with high-density arrays of colonies can enable growth curve-based phenotyping at very large scales, but poses challenges in terms of data storage, processing, equipment and the need for temperature-controlled space.

The dead-cell stain phloxine B can provide an additional phenotypic readout related to the proportion of dead cells in a colony. Phloxine B has been used to assess the viability of cells in budding yeast by microscopy (Tsukada and Ohsumi, 1993) and in fission yeast colonies (Matynia et al., 1998). When applied in a screening context, colonies are assigned a score which reflects the ‘redness’ of the colony to serve as an additional quantitative phenotype that can be used for downstream analysis (Lie et al., 2018).

Despite the popularity and importance of microbial colony screens, a consensus data framework has so far not emerged. In our laboratories, fitness screens are an essential technique used on a variety of scales, from a handful of plates to several thousand, and by researchers with varying bioinformatics skills. To enable and standardise data analysis workflows, we have developed a bioinformatics toolbox with a focus on being versatile, modular and user friendly. Pyphe (python package for phenotype analysis) consists of 6 command-line tools, each performing a different workflow step as well as the underlying functions, provided as a python package to expert users.

We illustrate the use of pyphe by investigating the growth dynamics of 57 natural S. pombe isolates. We show that the spatial correction implemented in pyphe, based on that proposed by Zackrisson et al., 2016, is effective in reducing measurement noise without overcorrection. Late endpoint measurements are shown to provide similar readouts to maximum slopes, but with lower precision. We then investigate the relationship between colony sizes and viability scores in a broad panel of S. pombe knock-out strains in over 40 conditions and find that the two approaches provide orthogonal and independent information. Using imaging flow cytometry, we link colony redness scores to the percentage of dead cells in a colony and show that phloxine B staining provides similar results as a different live/dead stain.

Results

Pyphe enables analysis pipelines for fitness-screen data

The pyphe pipeline is designed to take different fitness proxies as input: endpoint colony sizes, colony growth curves or endpoint colony viability estimates from phloxine B staining (Figure 1). Image acquisition, image analysis, growth-curve analysis, data normalisation and statistical analysis are split into separate tools which can be assembled into a pipeline as required for each experiment and combined with other published tools, e.g. gitter (Wagih and Parts, 2014) for image quantification. Each tool takes and produces human-readable data in text/table format.

Figure 1 with 2 supplements see all
Data processing workflows using pyphe.

Pyphe is flexible and can use several fitness proxies as input. In a typical endpoint experiment, plate images are acquired using transmission scanning and colony sizes are extracted using pyphe-quantify or the R package gitter (Wagih and Parts, 2014). Alternatively or additionally, plates containing phloxine B are scanned using reflective scanning and analysed with pyphe-quantify in redness mode to obtain redness scores reflecting colony viability. Alternatively, image time series can be analysed with pyphe-quantify in timecourse mode and growth curve characteristics extracted with pyphe-growthcurves. Pyphe-analyse analyses and organises data for collections of plates. It requires an Experimental Design Table (EDT) containing a single line per plate and the path to the data file, optionally the path to the layout file, and any additional metadata the user wishes to include. Data is then loaded and the chosen normalisation procedures are performed. QC plots are produced and the entire experiment data is summarised in a single long table. This table is used by pyphe-interpret which produces a table of summary statistics and p-values for differential fitness analysis.

In a typical workflow, images are acquired using pyphe-scan which provides an interface for image acquisition using SANE (Scanner Access Now Easy) on a Linux-type operating system. It handles plate numbering, cropping and flopping, and format conversion functionality for large image stacks. Optionally, image time-series can be recorded. Pyphe-scan was written to work with EPSON V800 scanners, the newer model in the series previously used by others (Takeuchi et al., 2014; Zackrisson et al., 2016).

Colony properties are then quantified from images using pyphe-quantify which can operate in three different modes. In batch mode (for colony-size quantification using grayscale transmission scanning) or redness mode (colony-viability estimation using phloxine B and reflective colour scanning), it separately analyses all images that match the input pattern (by default all jpg images in working directory), producing a csv table and qc image for each. In timecourse mode, colony positions are determined in the last image and the mask is applied to all previous images, extracting background-subtracted sums of pixel intensities for each colony/spot and producing a single table with the growth curves (one per column). Pyphe quantify reports a wide range of colony properties: colony area, overall intensity (an estimator that reflects thickness as well as area), circularity, perimeter and centroid coordinates, making this tool useful in cases where colonies are not arrayed. Image pixel darkness is known to scale non-linearly with true colony thickness/cell number (Zackrisson et al., 2016). Fitness estimates reported by pyphe-analyse are therefore related but not strictly the same as cell counts. If absolute population sizes are required for an experiment, the Scan-o-matic pipeline offers suitable calibration functionalities (Zackrisson et al., 2016). Pyphe-quantify algorithms are described in detail in Appendix 1 and Figure 1—figure supplement 1.

Spatial normalisation is performed for each plate and data across all plates are aggregated using pyphe-analyse to produce a single table for downstream hit calling and further analysis. Pyphe implements a grid normalisation procedure based on the one previously described (Zackrisson et al., 2016) as well as row/column median normalisation. Both strategies produce relative fitness estimates where a value of 1 corresponds to the fitness of the grid strain or the plate median respectively. We propose an improved placement of the grids in 1536 format (Figure 1—figure supplement 2) and implemented checks for missing colonies and normalisation artefacts. The main output is a single long table, containing one row per colony, with all position-, strain-, meta- and fitness-data as well as details about the normalisation. Algorithms are further described in Appendix 2 and Figure 1—figure supplement 2.

Finally, differential fitness is assessed using pyphe-interpret which produces summary statistics and p-values based on the complete data report from pyphe-analyse (Appendix 3). Pyphe-interpret gives users the option to either test for differences between strains in the same condition or between the same strain in different conditions. The latter is the recommended option for testing for condition-specific growth effects compared to a control condition.

Effective normalisation reduces noise and bias in data

Pyphe is designed to use different fitness proxies as input. In particular, it can use either maximum growth rates extracted from growth curves or endpoint colony size measurements. Previous studies have reported that information from growth curves are more precise (Zackrisson et al., 2016), but their acquisition requires substantially higher investment and produces large amounts of image data. While lower precision could be easily compensated by a higher number of replicates, growth curves provide the additional advantage that they capture the entire growth phase instead of a static snapshot. The results obtained in endpoint measurements might, therefore, depend on the timepoint used for the measurement. For example, the fitness of a strain with a long lag phase but high maximum growth rate may be underestimated if an early timepoint is chosen.

To assess the extent to which the choice of the timepoint matters, we recorded image time series for 57 S. pombe wild strains growing in 1536 spots per plate in approximately 20 replicates on 8 different media (Supplementary file 1). The conditions were designed to produce different growth rates and dynamics, and included mixes of different carbon sources with yeast extract as nitrogen source in rich media and different nitrogen sources in minimal media. These strains are genotypically and phenotypically diverse and display a broad range of growth characteristics (Jeffares et al., 2015). First, colony areas were extracted with gitter (Wagih and Parts, 2014), and relative, corrected colony sizes were computed for each image using the grid normalisation implemented in pyphe-analyse and averaged for each strain. We show an exemplary analysis of a single condition (standard rich media) in Figure 2 and a detailed analysis of all conditions in Figure 2—figure supplement 1 and 2. Relative colony sizes remained largely constant after the period of fast growth had come to end at roughly 16 hr (Figure 2A). Concordantly, a correlation matrix of all timepoints showed near perfect correlation of timepoints with the 48 hr end point from 16 hr (Figure 2Biii). Notably, all timepoints were correlated with the initial timepoint, albeit much lower, suggesting a significant bias introduced by the amount of initially deposited biomass. In our hands, this problem is more pronounced with wild strains than with knock-out collections as the former exhibit a variable degree of cell aggregation. However, we overcome this issue by reporting strain fitness as a ratio of growth in an assay condition relative to a control condition, in which case this bias is neutralised. We next analysed image timeseries with pyphe-quantify in timecourse mode, extracted slopes with pyphe-growthcurves (Appendix 4) and applied grid correction in pyphe-analyse. Later timepoints generally showed a much better correlation with maximum growth rate compared to early ones or those taken when growth is most rapid (Figure 2 Bi+ii). Across all conditions, the median correlation of corrected maximum slopes with corrected colony sizes at the final timepoint was 0.95 (Figure 2—figure supplement 3). We conclude that late timepoints should be chosen for endpoint measurements, when the readout is stable and correlates well with the maximum growth rate.

Figure 2 with 3 supplements see all
Normalisation strategies for growth curves and endpoints.

(A) Growth curves of 57 wild S. pombe strains (average of approximately 20 replicates each) before (top) and after (bottom) correction. Corrected colony sizes describe the fitness relative to the standard laboratory strain (972) after grid correction. (B) Late endpoint measurements are tightly correlated with maximum slopes. (i) Average growth rates (mean difference in sum of pixel intensities between consecutive timepoints) across all strains. (ii) Pearson correlation of each individually corrected timepoint with corrected maximum slope of growth curves. The correlation increases throughout the rapid growth curve and then maintains high levels as the phase of fast growth comes to an end. (iii) Pearson correlation matrix of all corrected timepoints (averaged by strain prior to correlation analysis). (C) Coefficient of variation (CV, blue) and fraction of unexplained variance (FUV, orange) for corrected and uncorrected colony sizes throughout the growth curve. Dashed lines are the same values computed based on maximum slopes. The average growth curve of the control strain is shown in green (based on colony sizes extracted with gitter). The normalisation procedure maintains noise at low levels even in later growth. Endpoint measurements contain slightly more noise than slope measurements. (D) Scatter plots of colony fitness estimates dependent on the sum of colony fitness of its 8 neighbours. A positive correlation, such as seen for the uncorrected readouts, points to spatial biases within plates (specific regions of a plate growing slower/faster, for example due to temperature, moisture or nutrient gradients). A negative correlation would be expected for competition effects. Without correction, regional plate effects dominate over competition effects and these are efficiently removed during grid correction. Importantly, the correction does not result in a negative correlation, a potential side-effect of correcting colony sizes by comparing it to the size of neighbouring controls, which would lead to phenotypes becoming artificially more extreme.

The choice of timepoint also affects the level of noise. For uncorrected colony sizes, the coefficient of variation (CV, ratio of the standard deviation to the mean) of 96 replicates of the control strain, dispersed evenly in the plate, dropped steadily during the rapid growth phase, reaching a minimum around 20 hr when it started to rise again (Figure 2C). This is likely due to edge and other spatial effects which affect later growth as nutrients deplete and plates start to dry unevenly. After normalisation, the CV was generally lower, and this later rise in noise could be compensated so that the CV remained near its minimum. The CV of the maximum slopes was lower than obtained with endpoints. However, CV values alone are insufficient to judge the effectiveness of a normalisation strategy, as it reflects precision of the reported values but not the method’s ability to delineate differences between strains. As an additional indicator, we therefore used the ratio of the variance of the controls and the variance of the entire dataset, the fraction of unexplained variance (FUV), which indicates the level of noise relative to the biological signal in the data. Overall, the FUV behaved similarly to the CV and was at a minimum at around 20 hr for the uncorrected data. With corrections, this minimal value was largely maintained until the end of the experiment. A lower FUV can be obtained by using maximum slopes rather than individual timepoints. The other, non-standard conditions tested showed similar qualitative dynamics, but with noise levels and timings varying between conditions as expected (Figure 2—figure supplements 13).

Although correcting for position and batch effects is essential for high-throughput experiments conducted on agar plates, there is a danger that any normalisation strategy could also create false positives. Specifically, a grid colony positioned next to a rapidly growing colony will be smaller (due to nutrient competition), leading to underestimation of the expected fitness in that area which will further increase the fitness estimate of neighbouring colonies. This argument applies equally the other way around; grid colonies positioned next to slow growers have access to more nutrients. Indeed, after reference grid normalisation, we often observed a (generally weak but detectable) secondary edge effect for colonies positioned in the next inward row/column (Figure 1—figure supplement 2B). We found, however, that this effect can be remedied by an additional row/column median normalisation, if the majority of strains in each row/column has no growth effect (as is usually the case when working with knock-out collections). Being a toolbox (not a black box), pyphe requires the user to think about their strains, choice of control strains as well as plate layout and to choose a suitable normalisation. Users have the option to perform only one of the two implemented normalisations or both (in which case grid normalisation will be done before row/column median normalisation), which allows users to tailor data analysis to their experiments.

To gauge if phenotype exaggeration globally presents a problem in other parts of the plate, we compared raw and final corrected colony sizes and maximum slopes to the respective sum of all its 8 neighbours. For uncorrected fitness values, there was generally a positive correlation (stronger for colony sizes than for slopes), indicating that regional plate effects dominate over competition between neighboring colonies. This bias was removed after correction. Importantly no negative correlation was observed. We conclude that grid correction does not lead to any significant effect exaggeration.

Monitoring cell viability with phloxine B provides an independent and complementary phenotypic readout to growth assays

The addition of phloxine B to agar medium stains colonies in different shades of red, reflecting the fraction of dead cells, which can provide an additional phenotype readout from the same colony used for growth measurements. To investigate how colony size and redness relate, we used the pyphe pipeline to characterise 238 S. pombe single-gene deletion strains in 70 conditions in biological triplicates (n = 59,350 total colonies profiled, including controls but excluding grid colonies, Supplementary file 2). The two fitness proxies showed little correlation (Pearson r = −0.088) after correction of colony sizes using the grid approach with subsequent row/column normalisation and correction of redness scores by row/column median normalisation only (Figure 3A). Normalisation strategies for redness images are described in Figure 3—figure supplement 1. Many mutant-condition pairs showed a strong phenotype in only one of the two read-outs. Noise levels of redness scores were very low (CV = 1.04%) and the biological signal strong (FUV = 7.83%). We conclude that the phloxine B redness scores provide robust, precise information on mutant fitness, and serves as a largely orthogonal and independent measure compared to the (well correlated) growth rate or colony size measurements.

Figure 3 with 1 supplement see all
Phloxine B provides an orthogonal and independent fitness proxy.

(A) Relative colony sizes and redness scores after correction for 238 single gene knock-outs in 70 conditions (after quality filtering as described in Methods, three biological replicate colonies for each condition-gene pair are shown individually). The two read-outs are only weakly anti-correlated (r = −0.088) and many mutant-condition pairs show a strong phenotype in only one of the two fitness proxies. Axes were cut to exclude extreme outliers for visualisation. The redness score was robust with a CV of 1.04% and a FUV of 7.83% (histogram on right). For comparison, the CV and FUV of the colony size read-out were 6.1% and 31.5%, respectively (top histogram). (B) Clustered Pearson correlation matrix of averaged corrected colony sizes (n = 3) for 7 conditions with and without phloxine B. Repeats with and without dye consistently cluster together indicating general robustness of our measurements across batches and no substantial mutant-condition-dye interactions. (C) Boxplot comparing the pairwise correlation between conditions with and without phloxine B (median = 0.92) and all possible pairs from (B) (median = 0.51).

Phloxine B can be toxic if exposed to light (Qi et al., 2011), so we tested whether phloxine B changes growth parameters by determining colony sizes for our mutant set in 7 conditions. Measurements with and without phloxine B were performed in different batches and in different weeks to exclude that batch effects increase the correlation. Within the 14 phenotype vectors measures in total, identical conditions with and without phloxine clearly and consistently clustered together (Figure 3B). The median correlation for the 7 condition pairs with and without phloxine was 0.92, which was substantially higher than that of all possible pairs from the 14 phenotypes (Figure 3C). We conclude that the main driver of the biological signal is the condition and not whether phloxine B is included. We tested for specific gene-condition pairs showing differential growth on media with and without phloxine (Supplementary file 3). This analysis identified a single gene, the trehalose-6-phosphate phosphatase tpp1, as having a small slow-growth phenotype on rich media (ratio of medians of corrected colony sizes 0.89, padj = 0.028) and a moderate effect on minimal media (ratio of medians = 0.79, padj = 0.049). In order to account for such genotype-specific effects, differential fitness should generally be assessed against a control condition also containing phloxine B.

Phloxine B staining informs about fraction of live cells in colony

Finally, we tested whether and how the colony redness score relates to the viability of cells in the colony. We determined colony composition and viability status at the single cell level using ImageStream flow cytometry. Across 23 samples, obtained from colonies with varying redness scores (Figure 4A), phloxine B staining classified cells into three populations (Figure 4B and CFigure 4—figure supplement 1, Supplementary file 4): live cells which showed a background level of staining, dead cells which were brightly stained, and lysed or damaged cells which showed no staining. The fraction of live cells (alive/(dead+lysed+alive)) was inversely correlated (Pearson r = −0.88, with some grouping of strains) with colony redness scores obtained with pyphe-quantify and row/column median corrected by pyphe-analyse (Figure 4D). This correlation was stronger than the correlation of colony redness scores with the fraction of live and lysed cells (lysed+alive /(dead+lysed+alive), r = −0.78, Figure 4—figure supplement 2), suggesting that lysed cells, while not stained in the FACS, do contribute to colony redness. This is explained by the dye not being washed out in colonies, unlike in cells resuspended in PBS for flow cytometry analysis. We next asked how well phloxine B staining agrees with a distinct, established dead-cell stain (LIVE/DEAD). In wild-type cells, staining with both dyes agreed closely (accuracy 99.3% using LIVE/DEAD classification as ground truth, Figure 4E). We conclude that phloxine B staining, combined with our imaging and analysis pipeline, provides a sensitive and accurate readout reflecting the proportion of live/dead cells in a colony.

Figure 4 with 2 supplements see all
Phloxine B staining reflects percentage of dead cells.

(A) Example of colony redness score extraction by pyphe-quantify in redness mode. From the acquired input image (i), colors are enhanced and the background subtracted (ii), colonies are identified by local thresholding (iii), and redness is quantified and annotated in the original image (iv). (B) Representative cells for alive, dead and lysed cells using imaging flow cytometry (ImageStream). Lysed cells show no signal in either the phloxine B or LIVE/DEAD channels. Live cells show an intermediate signal intensity in the phloxine B channel but no LIVE/DEAD signal. Dead cells are brightly stained in both channels. (C) Histogram of intensities in phloxine B channel across 23 samples with three populations (lysed, alive and dead) clearly resolved. (D) Fraction of live cells (live/(lysed+dead)) by ImageStream correlate with colony redness scores (corrected by row/median column normalisation) obtained with pyphe. (E) Co-localisation of phloxine B stain with LIVE/DEAD stain for the standard lab strain 972. Both readouts agree with 99.3% accuracy using the illustrated thresholds.

Redness readouts should be obtained in stationary phase

We have shown that for colony sizes similar results are obtained even if the plates are incubated for a few days after rapid growth has ended. The same is not necessarily expected for colony redness scores. In fact, colonies might appear red due to strains producing dead cells during growth or due to death when non-dividing cells reach the end of their chronological lifespan, which is temporally decoupled from growth. Certainly, if colonies are left for a very long time, cells will age, with striking physiological adaptations and eventually cell death (Váchová and Palková, 2018). To investigate how much the choice of timepoint matters with colony redness scores, we acquired colour images every 20 min for 48 hr on standard rich media for the set of 238 S. pombe single gene knock-outs. Each image from the experiment was analysed with pyphe-quantify in redness mode. In general, we do not recommend analysing images of young, small colonies for redness. All colonies showed a background signal unspecific to the dye and this increased with colony thickness. During early timepoints, we therefore detect an increase in raw, uncorrected redness (Figure 5A).

Temporal dynamics of phloxine B colony redness scores.

(A) Raw redness scores over time for 96 wild-type grid colonies (dark line shows mean, shaded area shows standard deviation). The uncorrected redness increases as colonies grow as there is a background signal unrelated to cell death. (B) Correlation matrix corrected redness scores for all 238 strains over 48 hr (3 timepoints per hour). The readout is stable from the point at which fast growth ends and remains tightly correlated for at least 24 hr. (C) CVs and FUVs during 48 hr. Grid normalisation effectively neutralises non-biological effects. (D) Redness curves for selected mutants showing the strongest red phenotype. Increased redness is visible from the start, and this further increases as colonies grow. Therefore, in this case, growth and death are not temporally decoupled.

Timepoints generally correlated well after rapid growth had ended for a period of at least 24 hr (Figure 5B). The CVs and FUVs were stable over this time as well (Figure 5C). These robust characteristics thus allow sufficient time for scanning without the need to hit a certain ‘sweet spot’. For our work with knock-out libraries, we imaged plates soon after growth had slowed down. We identified a group of mutants with the strongest redness phenotype in the set (corrected colony redness >1.05). These colonies showed a clear and strong increase in redness during growth (Figure 5D), suggesting that here redness was not temporally decoupled from growth. As for colony growth, we conclude that the exact timepoint to determine colony redness is not that critical, as long as colonies are not growing rapidly anymore.

Discussion

High-throughput colony-based screening is a powerful tool for microbiological discovery and functional genomics. Using a set of diverse wild yeast strains, we show that the fitness correction approach implemented in pyphe effectively reduces noise in the data. Importantly, for endpoint measurements the corrected fitness is independent of the exact timepoint, as long as a late timepoint is chosen, and late colony sizes are tightly correlated with maximum slopes of colony areas. This finding has two important implications. First, our results show that growth-rate measurements do not necessarily boost phenotyping experiments in the sense that they contain novel information, while one can compensate for the reduced precision of end-point measures by measuring more independent replicates. Second, little if anything is gained from precisely pre-defining incubation times of assay plates prior to scanning. Instead, plates can be simply incubated for longer (usually 2–5 days for fission yeast), especially if the assay condition slows down growth. By using genetically and phenotypically diverse wild strains for these experiments, we covered strains with diverse morphology and growth behaviour. However, we cannot exclude that this tight correlation does not hold true for other species of microbes.

Furthermore, we show that colony viability measured by phloxine B staining and image quantification by pyphe provides a largely orthogonal and independent readout to colony sizes, thus offering an additional trait for mutant profiling. Redness scores obtained with the pyphe pipeline closely reflect the number of live cells in the colony. We report that corrected (relative) redness scores are globally uncorrelated to corrected colony sizes in endpoint measurements of S. pombe knock-out mutants. The simplest explanation for how colonies can show normal growth even though a substantial fraction of its cells is dead is that growth and death are temporally uncoupled. While this does not seem to be the case for the knock-out mutants investigated, it might be the case in other scenarios, e.g. when working with wild strains. Similarly, they could be spatially decoupled. As not all cells in the colony are actively dividing, especially during later growth (Meunier and Choder, 1999), and potentially in stress conditions, a subset of cells could die without the overall colony growth being affected. This idea is supported by the observed uneven distribution of redness within the colony (which we currently do not capture with pyphe). Furthermore, colonies could sustain normal growth if viability were sacrificed for growth rate (Nakaoka and Wakamoto, 2017). Explaining the observed disparity between redness and size data should be a priority for future research and the explanation may depend on the strains, conditions, incubation times, or technical factors (or combinations thereof). Colony redness analysis opens up new avenues of investigations, for example for high-throughput chronological lifespan experiments. It will be important to examine the relationship between redness scores and live cells if the proportion of live cells drops to very low levels as the redness signal may saturate. Potentially even more information may be contained in the distribution of dead cells within a colony, which is hard to describe quantitatively and not reported by pyphe-quantify.

The pyphe toolbox and underlying python package provide a versatile pipeline for analysing fitness-screen data. Pyphe is an all-in-one solution enabling image acquisition, quantification, batch and plate bias correction, data reporting and hit calling. Pyphe is flexible and accepts growth curves and endpoint measurements as well as colony sizes and staining as input. Pyphe functionality is provided in the form of multiple separate, simple and well-documented command line tools operating on human-readable files. Pyphe is written for the analysis of extremely large data sets (thousands of plates, millions of colonies), and its modular design allows the easy integration of other, future tools and scripting/automatisation of analysis pipelines which aids reproducibility.

Materials and methods

Key resources table
Reagent type
(species) or
resource
DesignationSource or
reference
IdentifiersAdditional
information
Strain, strain background (Schizosaccharomyces pombe)57 S. pombe wild strainsJeffares et al., 2015JBxxxThese strains were identified as a set of most diverse strains from the overall
collection
Strain, strain background (Schizosaccharomyces pombe)238 S. pombe knock-out strainsBioneer and (Sideri et al., 2015)Pombase gene IDs and namesThe original library obtained from Bioneer was made prototrophic by crossing
with suitable strain. Genes were selected to cover GO functional categories and include unknowns.
Chemical compound, drugPhloxine BSigmaCat# P2759Prepared as a 5 g/L (1000x) stock in water and stored at 4°C in the dark.
Software, algorithmPypheThis publicationPyphe provides the following tools: pyphe-scan, pyphe-scan-timecourse, pyphe-quantify, pyphe-analyse, pyphe-interpret, pyphe-growthcurvesVersion 0.95 was used for preparation of this manuscript.
OtherScannerEpsonV800 Photo

Software availability statement

Request a detailed protocol

Pyphe is open software published under a permissive license. We welcome bug reports, feature requests and code contributions through https://github.com/Bahler-Lab/pyphe. Pyphe is also available through the Python Package Index at https://pypi.org/project/pyphe/.

Wild strain test data set

Request a detailed protocol

An overnight liquid culture of strain 972 h- in YES medium was pinned in 96-colony (8 × 12) format on YES agar medium, using a RoToR HDA pinning robot (Singer Instruments) and grown for two days at 32°C. This grid was combined with randomly arranged plates of the 57 wild strains in 1536 (32 × 48) format and grown for 2 days at 32°C. Strains were then copied onto fresh assay plates, using the 1536 short pinning tool at low pressure. Plates were placed in scanners (EPSON V800) in an incubator at 32°C and images were acquired every 20 min for 48 hr using pyphe-scan-timecourse. Growth curves were extracted using pyphe-quantify in timecourse mode with the following settings: --s 0.1. Growth curve parameters were extracted with pyphe-growthcurves with the --fitrange 12 option. Individual images were analysed with gitter using the following settings: --inverse TRUE --remove.noise TRUE. Grid correction of maximum slopes and individual timepoints was performed in pyphe-analyse.

Knock-out test data set

Request a detailed protocol

238 mutants, broadly spanning GO Biological Function categories plus several uncharacterised genes, were selected from a prototroph derivative of the Bioneer deletion library (Sideri et al., 2015). Strains were arranged in 384-colony (16 × 24) format with a single 96 grid placed in the top left position, so that the grid includes one colony in every fourth position within the 384-colony array. To prepare replicates, this plate was independently pinned 3 times from the cryostock on solid YES media for each batch. From these plates, colonies were then spotted on assay plates containing various toxins, drugs or nutrients. The conditions used in Figure 3B are: EtOH10 is YES+10% (v/v) ethanol, VPA10 is YES+10 mM valproic acid, MMS0.005 is YES+0.005% (v/v) methyl methanesulfonate, Forma2.5 is YES+2.5% (v/v) formamide, Diamide2 is YES+2 mM diamide, EMM is standard Edinburgh Minimal Medium, YES is standard Yeast extract with supplements and 3% glucose. Assay plates were usually grown for 2 days at 32°C but this varied according to the strength of the stress slowing the growth of the colonies. After incubation, images were acquired using EPSON V800 scanners and pyphe-scan and quantified with gitter (see options above) or pyphe-quantify in redness mode. Grid correction and subsequent row/column median normalisation of maximum slopes and individual timepoints was performed in pyphe-analyse. Row/column median normalisation was applied to redness data plates. For the size data set, 0-sized colonies and colonies with a circularity below 0.85 were set to NA. Plates with a CV > 0.2 or FUV >1 were removed as those most likely represent conditions in which the stress was too strong or where technical errors occurred.

Imaging flow cytometry

Request a detailed protocol

We picked 23 colonies with varying redness from the collection of 238 S. pombe deletion strains grown on solid YES with 5 mg/L phloxine B for 3 days at 32°C and resuspended in 1 mL of water. For analysis of phloxine B staining, 500 µL of this cell suspension were centrifuged at 4000 g for 2 min, the supernatant was removed and the pellet resuspended in 75 µL of PBS. For analysis of phloxine B and LIVE/DEAD co-staining, 500 µL of the same suspension were centrifuged at 4000 g for 2 min, the supernatant was removed and the pellet resuspended in 300 µL of LIVE/DEAD solution (LIVE/DEAD Fixable Far Red Dead Cell Stain Kit, for 633 or 635 nm excitation, ThermoFisher Scientific, Cat. no. L34974). LIVE/DEAD solution was prepared according to manufacturer’s instructions (1:1000 dilution in H2O from a stock solution dissolved in 50 uL of DMSO). The pellet was resuspended and incubated for 30 min in the dark. Cells were then spun down and resuspended in 75 µL of PBS.

Immediately prior to analysis, samples were sonicated for 20 s at 50W (JSP Ultrasonic Cleaner model US21), and transferred to a two-camera ImageStreamX Mk II (ISX MKII) imaging flow cytometer (LUMINEX Corporation, Austin, Texas) for automated sample acquisition and captured using the ISX INSPIRE data acquisition software. Images of 5000–12,000 single focused cells were acquired at 60x magnification and low flow rates, using the 488 nm excitation laser at 90 mW to capture phloxine B on channel 3; 642 nm excitation laser at 150 mW to capture LIVE/DEAD cells on channel 11; bright field (BF) images were captured on channels 1 and 9, and side scatter (SSC) on channel 6. For co-stained cell analysis, to generate a compensation matrix, cells stained either with phloxine B or with LIVE/DEAD dye individually were captured without brightfield illumination (BF and SSC channels were OFF). The compensation coefficients were calculated automatically using the compensation wizard in the Image Data Exploration and Analysis Software (IDEAS) package (v6.2). Populations of interest (single focused cells) were gated in IDEAS and the features of interest (dye intensities) were then exported for further analysis using Python. Intensity values were subtracted by their minimum over all samples (which was slightly below zero) and added to 1 prior to log10 transformation. Thresholds for the three populations were set manually based on the intensity histogram across all samples.

Redness timecourse dataset

Request a detailed protocol

The mutant collection was woken up from the cryostock on YES media and copied onto fresh YES with 5 mg/L phloxine B. Images were acquired every 20 mins with pyphe-scan-timecourse. Images were analysed with pyphe-quantify redness with --s 0.1. Timepoints were grid corrected using pyphe-analyse.

Appendix 1

Image quantification algorithms

Pyphe-quantify is a command line tool for the analysis of images containing microbial colonies based on scikit-image (van der Walt et al., 2014). By default, it analyses all. jpg image files in the directory where it is executed. Alternatively, the user can set a pattern to specify input images and all image formats supported by scikit-image can be used (e.g. tiff, jpg, png). Pyphe-quantify can operate in three distinct modes: batch (analyse colony areas in each image separately), redness (analyse colony redness in each image separately) and timecourse (analyse colony area and thickness in a stack of images from a timeseries) mode. Pyphe-quantify produces simple csv files (one per plate for batch and redness mode) which can be directly processed further by pyphe-analyse or analysed using other software.

Common to all three modes is the need to match identified colonies in the image to their row-column position, which is configured using the --grid option. Pyphe-quantify implements an automatic grid detection (used by setting --grid auto_384 or --grid auto_1536) which identifies peaks in rows and columns pixel intensities. This is done using the find_peaks function from scipy’s signal module (using the image dimensions and number of colonies per row/column to define a suitable minimal distance between peaks). The distance between peaks is then determined using an outlier-robust, trimmed mean. The maximum pixel position of the first colony is then determined as (image dimension - (mean distance between colonies * (colonies per row/column - 1)). A cosine function with the same periodicity as the distance between colonies is created. The fit of that function to the data (row/column mean intensities) is then evaluated by taking the sum of squared differences for each possible pixel offset from 0 to the maximum previously determined. The expected positions of all colonies are then computed using the start position and the mean distance.

Automatic grid detection is in our hands the most common reason for image analysis tools to fail. We have therefore given the user full control over defining expected colony position. If all plates were scanned with the same fixture taped firmly to the scanners, colony positions are actually highly consistent across images. This means grid definitions can be ‘hard-wired’. The fixture provided by us has been preconfigured and is done using the -grid pp_384 or --grid pp_1536 options. Simultaneously, with the goal of maximum flexibility, pyphe-quantify offers the possibility of manually defining grid positions. In that case, the argument has to be in the form of 6 integer numbers separated by ‘-':<number of rows>-<number of columns>-<x position of the top left colony>-<y position of the top left colony>-<x position of the bottom right colony>-<y position of the bottom right colony>. Positions must be integers and are the distance in number of pixels from the image origin in each dimension (x is width dimension, y is height dimension). The image's origin is, in line with scikit-image convention, in the top left corner. Getting those coordinates is trivial and can be done, for example in Microsoft Paint. The option to manually define grid positions is important in our experience, as automatic gridding is the step where most image analysis tools typically fail (especially if plates have many missing colonies or images are rotated).

In batch mode, pyphe-quantify will analyse colony sizes in each image individually. First, morphological components are identified by thresholding the image. By default the Otsu method is used to find the threshold (Otsu, 1979), but this can be tuned by the user by providing a coefficient to be used with this threshold or an absolute threshold. Components are then filtered by size to exclude erroneous identification of small particles, such as dust, as colonies. Border components are removed. Components are then matched to grid positions. By default, a component is assigned to a particular position if it is less than a third of the distance between two grid positions away from a position. This threshold can be set by the user. This means that in case of missing colonies, there will be no data for the corresponding grid correction, and this position will be missing from the output file (i.e. it will not be 0). Similarly, two components can be assigned to the same grid position in the case of contaminations. This can be disabled by the user to retain only the component nearest to the grid position. When reporting all colonies, pyphe-quantify can be used for plates with non-arrayed colonies. An output table is saved which contains colony area, mean intensity (an estimator that reflects thickness), circularity, perimeter and centroid coordinates. Area measurements are in very close agreement with those obtained with gitter (Figure 1—figure supplement 1A). Mean intensity measurements reported by pyphe-quantify are dependent on colony area as colonies get thicker as they grow (Figure 1—figure supplement 1B). Pyphe-quantify exports a qc image for every image analysed indicating the identified colonies and their assigned grid positions.

In timecourse mode, the final image of the time course is analysed as described above and the obtained mask (indicating the position of each colony) is then applied to all previous images of the timecourse. The background subtracted sum of pixel intensities (the mean intensity times the number of pixels) for all images combined, that is the growth curves, is reported in a single file.

Finally, pyphe-quantify can analyse colony redness (Figure 1—figure supplement 1C). Phloxine B stains dead cells within the colonies and these are usually not homogeneously distributed upon close inspection. However, for simplicity, we have developed an image analysis workflow which extracts the mean redness of colonies in high-density arrays, providing a single quantitative readout. We decided to use reflective scanning with our Epson V800 scanners (implemented in pyphe-scan). This is fast and produces images with consistent properties, but with the caveat that the focus position is just above the scanner glass and colonies are therefore somewhat out of focus. Additionally, there is a strong, uneven background signal from the media and colour artefacts (appearing as bright stripes between colony columns) which required a different image analysis approach. The images are first adjusted to make colony redness more visible by multiplying the red, green and blue channels by 0, 0.5 and 1, respectively and their sum is taken to produce single-channel/grayscale images. The background value for each pixel is estimated by blurring the image with a Gaussian filter with a standard deviation of the number of pixels in the image divided by 10000. The background is subtracted from the image which is then inverted. Colonies are detected by local thresholding and processed further as described for batch mode above. The mean intensity for each colony is computed from the processed image and reported in a similar file as described for batch mode. The produced QC images allow to verify grid placement and visualise the colour readout on the actual image.

Appendix 2

Spatial correction algorithms

This text describes the steps performed by pyphe-analyse during the analysis of a typical batch of plates. All functions and objects are also available for use as a python package. At the core of pyphe is the Experiment object which is initialised from the Experimental Design Table (EDT). The EDT is first checked for obvious errors, including the uniqueness of plate IDs and paths to data files and if these files exist. Data from the image analysis output files is then loaded using appropriate parsers. Layout files are loaded if set by the user.

Spatial normalisation is then performed if requested by the user. Pyphe implements a grid correction procedure similar to that used previously (Zackrisson et al., 2016). In that paper, the authors use 1536 format arrays and place a 384 grid in the top left position of the plate. This means one quarter of plate positions are taken up by the grid. It also creates a problem because the right and lower edge of the plate are not covered by the reference grid. We have developed a small improvement of this technique by placing two 96 grids in opposite (top left and bottom right corners, Figure 1—figure supplement 2A). This leaves only two small corners of the plate (bottom left and top right) not covered by the interpolated grid surface. We solve this by extrapolating the grid by estimating the theoretical colony size of a grid strain in those corners using a linear model and the colony sizes of the two neighbouring grid colonies as input. Model parameters are determined for each experiment based on all plates using regression. We typically achieve accuracies of >90% (Figure 1—figure supplement 2C). This allows us to use grid correction over the entire plate without loss of data. Pyphe-analyse specifically looks for grid colonies with a colony size zero (this is reported by gitter if no colony has been detected), flags those as pinning errors and marks all neighbouring colonies NA in the final output. The grid correction itself is done using scipy.interpolate’s griddata function, fitting a piecewise cubic, continuously differentiable, approximately curvature-minimizing polynomial surface to the grid positions (real grid positions and extrapolated corners). The surface created in that way represents the expected fitness for each position if the strain growing there was the same as the grid strain. The observed fitness is then divided by this expected value, producing the corrected and relative (to the grid strain) fitness of each colony.

We have noticed that doing the grid correction this way slightly over-corrects for the edge effect for strains in the row neighbouring the edge (Figure 1—figure supplement 2B). This is because the edge effect is usually restricted to the outermost edge only. But the values of the reference grid in the next row/column will be most strongly determined by the grid colony on the edge, leading to an over-correction (underestimation of fitness) in these positions. We therefore often perform an additional row/column median normalisation after grid correction to remedy this. For this, pyphe-analyse computes median values for each row and column and divides the data by both. The data is then re-scaled to a median of 1 by dividing by the overall plate median. Note that a median correction is not valid if the median is not a good estimate of the null effect. For this reason, we strongly discourage row/column median normalisation for plates in 96-colony format (where the median is computed from only 8 or 12 values). If plates contain a large number of slow or fast growers a median normalisation is also unsuitable, especially if these are distributed non-randomly in the plate. For work with knock-out libraries, where most gene knock-outs have no effect in any given condition, and with wild strains which show a median-centred distribution of subtle growth phenotypes, the additional row/column median normalisation effectively neutralises the secondary edge effect.

In some cases, the normalisation procedures can lead to artefacts. Grid normalisation can result in negative corrected fitness values if grid colonies are very small in a region (Figure 1—figure supplement 2D). Row/column median normalisation can produce infinite values if more than half the colonies in a row/column have size 0. These artefacts are detected by pyphe and set to NA.

Pyphe gives the option to produce QC plots for each plate in the experiment in which case a pdf file will be generated containing heat maps for all numerical data associated with that plate. Finally, all data is collated in a single table which contains position information, layout information, all metadata provided by the user in the EDT, raw and corrected fitness values, and details about the grid correction.

Appendix 3

Hit calling with pyphe-interpret

Pyphe-interpret is a tool for statistical analysis of fitness data. It takes data reports produced by pyphe-analyse (or other data in a suitable tidy format) which contain a single line per colony, listing strain, condition and fitness information. The column names in which to find each of these are set by the user. The tool first checks the input and prints a summary to the command line listing the number of strains, conditions, plates and total number of data points. Next, QC filters based on circularity and 0 fitness are applied if set by the user. The number of excluded data points is reported. The tool then produces a table which lists all replicates for each strain-condition pair in wide format (see documentation folder on GitHub for an example). This table (which is also exported in csv format) allows to perform t-tests highly efficiently in a vectorised manner using the ttest_ind function for masked arrays from the scipy’s mstats_basic module. Pyphe-interpret requires the user to define a grouping (variable to use as the grouping variable for t-test) and the ‘axis’ column across which to apply multiple t-tests. This may initially seem complicated but enables the use of pyphe-interpret in two distinct scenarios: (1) Check for each condition separately (--grouping_column <condition_column>) if there is a significant difference in means between a mutant strain and a control strain (--axis_column <strain_id_column>); or (2) check for each strain separately (--grouping_column <strain_id_column>) if there is a significant difference in the means of the strain in the assay condition vs the control condition (--axis_column <condition_column>). We normally use the second option as it tests for condition-specific growth differences and it does not return significant results if a strain is consistently faster or slower growing than the grid strain. We use Welch’s t-test (equal_var = False) as this does not assume homogeneity of variances. P-values are corrected by the Benjamini-Hochberg method across the specified axis (i.e. across all strains in scenario 2 or across all conditions in scenario 1). The tool produces a table listing summary statistics (mean_fitness, mean_fitness_log2, median_fitness median_fitness_log2, observation_count stdev_fitness) and a statistical assessment of differential fitness (mean_effect_size, mean_effect_size_log2, median_effect_size, median_effect_size_log2, p_Welch p_Welch_BH p_Welch_BH_-log10).

Appendix 4

Growth curve analysis with pyphe-growthcurves

Pyphe-growthcurves is a simple tool for non-parametric growth curve analysis. It was written to directly use data produced by pyphe-quantify timecourse as input but other types of growth data can normally easily be adapted. There are various packages dedicated to growth curve analysis (Fernandez-Ricaud et al., 2016; Kahm et al., 2010; Veríssimo et al., 2013) which have more functionalities, but the goal here was to provide a simple solution integrated into pyphe which works well with data typically handled within the pyphe pipeline. The input data needs to be in csv format and contain one column per growth curve containing the population sizes in the right order (top to bottom). The first column must contain the timepoints and those must be numerical (i.e. not ‘1 hr’, ‘2 hr’, but 1.0, 2.0). A single input csv is analysed every time pyphe-growthcurves is run. Maximum slopes of growth curves are determined by linear regressions to a sliding window of size d. The regression with the highest slope is retained and the slope is reported together with the timepoint at which it occurred (center of window), the R2 of the regression, as well as its y- and x-intercepts. Lag phases can be determined by the absolute or relative method. The absolute method simply returns the timepoint at which the population size crosses the user-defined threshold. For the relative method (which is the default), the average of the first n timepoints is taken as the initial biomass. The algorithm then returns the timepoint at which the population sizes exceeds p*initial biomass, where p defaults to 2 (i.e. the time taken for the first population doubling is reported). Please note that no interpolation of population sizes between timepoints is currently implemented, but simply the timepoint where the threshold is crossed is returned. A table in csv format is created which lists all above-mentioned parameters and this can be read directly into pyphe-analyse. If the --plots option is set, the tool produces a pdf showing all growth curves and visualising the extracted parameters.

Appendix 5

Plate handling protocol

Materials and reagents

  • Sterile yeast medium with and without 2% agar (we preferentially use YES or EMM for S. pombe)

  • Serological pipette and pipette pump

  • Rectangular plates (PlusPlates, Singer Instruments)

  • RoToR pin pads (96 long, 96 short, 384 or 1536 short)

  • 96 well sterile plates

  • Phloxine B (Merck)

Equipment

  • Laminar Flow Cabinet

  • Microwave oven

  • Incubator

  • Pinning robot (RoToR HDA, Singer Instruments or similar)

  • Scanner (Epson Perfection V800) connected to Linux computer

  • Fixture to hold plates in place on scanner (cutting guide available at www.github.com/Bahler-Lab/pyphe)

Procedure

Overview

The grid strain is prepared to grow in 96-format plates to make grid plates. Grid plates are combined with library plates to make combined plates. Combined plates are copied onto fresh agar plates to make source plates. Assay plates (containing treatments of interest) are inoculated from source plates. Assay plates are imaged and analysed further.

1. Plate pouring
  1. Heat media in microwave with occasional mixing until completely melted. Let the media cool to approximately 60°C.

  2. Warning: Superheated agar media can pose a serious risk. Proceed carefully, never heat sealed containers and wear appropriate protective equipment.

  3. If drugs are to be added to the media mix them in the media before pouring.

  4. For phloxine B assays, add this reagent at a final concentration of 5 mg/L prior to pouring. Note that phloxine B is sensitive to light so it is advisable to store and incubate plates in the dark wherever possible. Phloxine B is also sensitive to oxidising agents and therefore incompatible with such assay conditions.

  5. Tip: A 1000x aqueous stock of phloxine B can be kept in the fridge in the dark for several weeks.

  6. Place plates on a flat surface in the sterile hood and pipette 40 ml of media in each.

  7. Tip: Take up 5 ml more than required to avoid bubble formation. If bubbles occur, remove them by sucking them back up into the pipette.

  8. Let plates dry for approximately 30 min. Correct dryness is important, if the plates are wet colonies will diffuse into agar.

2. Plate storage and handling
  1. Drugless plates made to preserve or wake up collections can be stored in the fridge for a week, but plates should be removed from the fridge and let them acclimate to room temperature prior to any experiment.

  2. Plates containing phloxine B should be stored in the dark, as phloxine B oxidizes in the presence of light.

  3. We recommend preparing assay plates containing drugs on the day of the experiment or the evening prior to the experiment taking place. If this is done, store them appropriately and keep them well wrapped to prevent uneven drying, and upside down to prevent condensation forming on the surface of the media. Let the plates reach room temperature before pinning.

3. Preparation of source plates: where to locate your grid, controls and your test strains

a. Preparation of the grid plate

  1. From a cryostock, streak out the strain that will be used as the control ‘grid strain’ on agar media (can be in a conventional Petri dish, add appropriate antibiotics if required). We advise to pick a standard strain which makes the comparative fitness value obtained in the end easily interpretable, for example the background strain in case of mutant collections. In general, the grid strain’s fitness should not be extreme (much higher or lower) than the strains to be assayed.

  2. Grow until colonies suitable for picking have formed (approximately 2 days at 32°C for S. pombe).

  3. Inoculate one colony of the grid strain into 30 ml liquid media (e.g. YES in the case of the standard 972 S. pombe strain) and grow for ~24 hr with shaking.

  4. Pour the grid strain culture into an empty PlusPlate and use the RoToR robot to pin onto solid agar media in 96 format using 96 long pin pads.

  5. Tip: Make several copies as needed. You can pin approximately 10 times from one grid source plate.

  6. Wrap the plates in cling film and place upside-down in an incubator to avoid condensation over the colonies. If the plates are not properly wrapped they will dry unevenly on the edges and will not be suitable.

  7. Grow up for approximately 2 days until suitable colonies for pinning have formed.

b. Preparation of the library to assay

  1. If you are starting from an established library

    1. If you are using an established yeast library stored in 96- or 384- well format, wake up the library onto the appropriate selective agar media using the RoToR and let it grow until colonies are visible at the appropriate temperature (32°C for the S. pombe Bioneer deletion collection).

    2. Once the colonies are grown, refresh the plates onto selective media the same day that you prepare your grid strain plates and let them grow for up to 2 days at the appropriate temperature.

  2. If you are arranging your own library

    1. Prepare fresh colonies of the strains that will be used on solid agar media plates.

    2. Design your library layout. Every plate should contain several wild type controls (at least 10). Plates should contain no or few empty spots, but do include a footprint to mark plate number, orientation and to serve as a negative control. Fill up the rest of the positions with your assay strains and include extra replicates to fill up the plate if required.

    3. Tip: If possible, we recommend to include some positive control strains (that are known to be resistant or sensitive to the stresses to be tested) in the library.

    4. The same day that you will be starting the liquid culture of your grid strain, fill a 96 well plate with the appropriate liquid media. Inoculate each well from a colony, according to your layout.

    5. This 96 well plate can be incubated in a stationary incubator with the lid on for ~24 hr at the appropriate temperature.

    6. As with the grid strain plate, use the RoToR robot to pin this plate onto solid agar media using 96 long pin pads. Use vigorous mixing (in 3D, 4 cycles) of the source plate.

    7. Wrap the plates in cling film and incubate for 2 days upside down at the appropriate temperature.

c. Preparation of the pyphe-ready source plates

  1. Combine your grid plate and library plates on a solid agar medium.

    1. If your assay plates should be 1536 format

    2. Refer to Figure 1—figure supplement 2A for an illustration of the arrangement process. Using 96 short pin pads and the manual programming mode of the RoToR robot, prepare your combined plates by copying the 96 well plate containing the grid strain in the top left and bottom right corner as well as in an additional position in the middle (we normally use the C2 position). Fill the remaining 13 positions with library plates. Record exactly which library plate was used to fill each position and use this information to prepare a layout table of your assay plates.

    3. Tip: This program can be saved and reused.

    4. If your assay plates should be 384 format

    5. If you want to work in 384 format, place one grid in the top left corner of each plate. Note that you will lose grid-corrected phenotypes for colonies on the bottom and right edge because these are not covered by the grid. You will also not have a control grid to check the quality of the grid correction. It is usually preferable to use 1536 format with more repeats, even if you have few strains.

  2. Grow for 1 or 2 days, wrapped in cling film and upside down in an incubator at the appropriate temperature.

  3. Copy your combined plates onto fresh plates to make your pyphe source plates. This will even out any differences in growth from different inoculum amounts from the previous steps and create a more even spacing of colonies.

    1. Tip: You need to make several copies if you have a large number of assay plates/conditions to be tested. As a rule of thumb, you can use the same plate for pinning ~6 plates on 1536 format ~8 plates on 384.

  4. Grow for 1 or 2 days at the appropriate temperature (but keep it consistent). At this stage the plates are ready to be used in the assay.

4. Phenotyping with the pyphe pipeline
  1. Using the appropriate 384 or 1536 short pin pad, inoculate your source library plates onto your assay plates using the RoToR robot. Label your plates clearly with the replicate number, plate layout and condition.

  2. Tip: Use low pressure (around 10% for 384 plates and 4% for 1536 format plates) in order to get a small, consistent inoculum.

  3. Tip: Check every time that you did not miss to pin an area of your plate. If this happens, repeat the pinning using a fresh, spare assay plate. If this happens repeatedly, you assay plates were not prepared on a flat, level surface or dried out unevenly.

  4. Tip: We recommend using the random offset for picking up the colonies.

  5. Wrap the assay plates on cling film and incubate upside down at the appropriate temperature on an incubator. For 1536 plates and mild stressors around 18 hr incubation is enough time for phenotype observation, 384 format or higher stressors might require further incubation times.

  6. Proceed with image acquisition and data analysis. See manuals and help on GitHub for this. We recommend preparing the Experimental Design Table (which will be later required by pyphe-analyse) during scanning, making note of all relevant data and meta-data associated with each plate. The table should contain columns for condition, plate layout, image location, incubation time, batch and scan/pin dates. Save this table in CSV format.

  • Tip: For large screens containing several batches, consistent naming is essential. We usually define a condition shortcut in a separate table and include the dose without units for brevity, for example an entry in the condition column in the EDT may state ‘VPA10’ which is short for YES+10 mM valproic acid.

  • Tip: File paths should generally not contain any spaces, non-standard characters or characters forbidden in Unix or Windows file names. Name your condition shortcuts, layouts and replicates accordingly.

  • Tip: Comments or observations which may be important for later analysis (e.g. if there were pinning errors or other issues) should be included in an extra column.

  • Tip: Any additional (meta-)data can and should be included and will be carried through to the data report produced by pyphe.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
    Isolation and characterization of fission yeast sns mutants defective at the mitosis-to-interphase transition
    1. A Matynia
    2. U Mueller
    3. N Ong
    4. J Demeter
    5. AL Granger
    6. K Hinata
    7. S Sazer
    (1998)
    Genetics 148:1799.
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38

Decision letter

  1. Kevin J Verstrepen
    Reviewing Editor; VIB-KU Leuven Center for Microbiology, Belgium
  2. Aleksandra M Walczak
    Senior Editor; École Normale Supérieure, France
  3. Kevin J Verstrepen
    Reviewer; VIB-KU Leuven Center for Microbiology, Belgium
  4. Jonas Warringer
    Reviewer; University of Gothenburg, Sweden

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

We believe that the Pyphe toolbox will prove a valuable tool for the community and will help set standards for image analysis of microbial growth and physiology.

Decision letter after peer review:

Thank you for submitting your article "Pyphe: A python toolbox for assessing microbial growth and cell viability in high-throughput colony screens" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Kevin J Verstrepen as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Aleksandra Walczak as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Jonas Warringer (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We appreciate your efforts to assemble a streamlined pipeline for image-based high-throughput microbial growth assays. We believe that such a pipeline would potentially be of interest to the broad readership of eLife. However, reviewer 2, who is an expert in this area and was actually one of the suggested reviewers, identified several technical issues that need to be resolved. We realize that we are requesting quite a bit of additional work, but all reviewers unanimously agreed that these are crucial. We realize that we are requesting quite a bit of work to further improve the pipeline, and we are not sure whether it will be possible for you to address all these comments in a timely manner. However, we decided that since the core idea of the pipeline is very valuable and would in principle be a good fit for eLife, we would leave the decision whether to address our concerns and resubmit a revised version with you.

Essential revisions:

1) All reviewers agree that the pipeline in itself is not novel, and it would be advisable to stress and acknowledge more how the pipeline builds upon previous work and similar pipelines

The following comments were directly taken from the report of reviewer 2, but reviewer 1 and 3 agree that these need to be addressed.

2) As an expert user, reviewer 2 has issues with core aspects of the method.

A) The v800 model now seems out of production. For the Pyphe-scan method to be useful for other labs, clearly it must work with other scanners. At the very least, the authors should show that the pipeline works and provides equivalent data with the 850 model.

B) In this context, it seems unfortunate that authors have chosen not to implement the pixel calibration function of the Zackrisson et al., 2016 approach. This uses a grey-scale calibration strip attached to each fixture to ensure that registered pixel intensities becomes comparable across types of instruments, scanners, external conditions and time – the latter is quite important as the properties of light sources and light receptors change as a function of age.

C) The authors have chosen to, as a default, use a lossy, irreversible compression format for their images, JPG. This format uses very inexact approximations and discards much data when representing the original content of the image. JPG is not recommended when quantitative data is to be extracted from an image. The use of e.g. TIFF is much to be preferred: if the authors want to maintain the JPG format, a minimum requirement is that they show that colony delineations, pixel intensity and background intensity are not affected by the lossy compression format. Otherwise, it should be discarded and no data based on it included.

D) Quality control, QC, is normally a substantial investment of time. It was not clear from the paper how the QC or the QC interface works. If an automated, or at least a semi-automated process is not implemented this is a serious shortcoming and would be another step backwards from the Zackrisson et al., 2016. The importance of a high throughput QC becomes clear when studying the example growth curves supplied together with the software: it is apparent that a large fraction of the slopes extracted are wildly inaccurate, or indeed pure noise, and does not reflect the actual growth. How is this handled? If the user has to filter these out manually, with no software guidance, the pipeline is no longer high-throughput.

E) The graphical interface provides user-friendliness. However, the GUI seems to only be applicable to the later stages of the pipeline? The initial steps are command-line based, which clearly does not square with claims of user-friendliness in general. As an example: to position the grid, the user is requested to input pixel coordinates, as integers, from the image in the command-line interface. This lack of a GUI for the early steps is another step backwards from the graphical interface of plate position or automatic grid detection implemented in Zackrisson et al., 2016?

F) How does Pyphe deal with the identification of scanners, when multiple scanners are connected to a computer? Explicitly: how does Pyphe in these conditions ensure that the images comes from the scanner that the user thinks the image comes from? In the CLI interface the scanner number can be input, but it is stated that this may change if the scanner is turned off. Such a change will supposedly cause serious confusion if multiple scanners are connected to a computer. Are the scanners supposed to be constantly turned on? Won't this cause serious light stress to colonies? Zackrisson et al., 2016 implemented a scanner power managing system to switch of scanners when not scanning while still keeping control over time over which scans are taken by which scanner. Pyphe not handling multiple scanners or imposing light stress on colonies would either drastically reduce throughput or accuracy and increase costs (for computers)?

G) The supplementary text does not describe the operations performed or their sequencing in sufficient detail for me to confidently evaluate the analysis process, or to reconstruct what has been done. I kindly ask for more detailed information.

H) The authors argue that the lower precision when using a single time point scan can be compensated for by a higher number of replicates. However, increasing the number of replicates involves very substantial extra manual work and additional costs? E.g. in terms of plates to be poured, scanners and chemicals required, experiments to be started etc? In contrast, measuring multiple times on the same colony is not really a major cost at all? If this is supposed to be a major argument against doing time series, I think the authors should show some empirical support for their point of view.

3) Given the number of pipelines already available:

A) Outside of the viability staining (which I really would love for the authors to succeed in developing), I am not sure what is really innovative or original with Pyphe.

B) Moreover, and in addition to the shortcomings mentioned above, the authors seem to not have implemented the major advancement described in the Zackrisson et al., 2016 paper: an accurate conversion of pixel intensities into cell counts. To my mind, this is a serious shortcoming, because of the non-linearity of transmitted light and true population size. Not accounting for this non-linearity is the same as not diluting dense cell cultures when doing manual OD measurements. The consequence of not accounting for this non-linearity is that detected mid and late stage growth will be much attenuated relative the true population size expansion in these stages. Thus, any effect, of genetic background and/or environment, on this part of the growth curve, risk being mis-quantified or completely overlooked.

C) For claims of capturing fitness to be on a more solid footing, the authors may want to mathematically derive selection coefficients from their fitness estimates (see e.g. Stenberg et al., 2016). This would add novelty to their pipeline.

4) A main conclusion is: "We apply pyphe to show that the fitness information contained in late endpoint measurements of colony sizes is similar to maximum growth slopes from time series." But:

A) The conclusion is reached by extrapolation from growth on rich YES medium, where all growth curves are canonical to growth in general. I am not sure this is appropriate. Different growth environments clearly have disparate effects on the growth curve (e.g. Warringer et al., 2008). For this conclusion to be of any value, a large number of environments with very different curve behaviors should be considered. Warringer et al., 2011 reached a more convincing conclusion using S. cerevisiae natural strains in hundreds of environments.

B) The authors use colony area for this comparison. I am not sure this is ideal. The horizontal expansion rate of a colony is not necessarily a good proxy for the population size expansion rate, and the ratio between the two is rarely constant over time. Early population size expansion is often predominantly horizontal while later expansion is often vertical. Thus, only considering horizontal expansion may lead to mid or late growth being underestimated? This ties into the non-linearity issue.

C) For meaningful biological interpretations to be made, the authors may want to compare the actual growth yield, i.e. the population size change to the end of growth, to the maximum growth rate. This will require some mathematical operations to ensure that growth has indeed ended at the later timepoints and potentially extending the cultivation time beyond 48h. The authors can then link their finding to major microbiological topics, such as r and k selection theory (see e.g. Wei et al., 2019, where the opposite conclusion to that here reported is reached) and thermodynamic considerations of the rate versus the yield of energy limited microbial growth reactions, e.g. MacLean et al., 2008. The activity of many cellular process changes quite dramatically during the growth curve. For example, our understanding of diauxic shifts, glucose and nitrogen catabolite repression, and the different affinities of alternative nutrient influx systems are hard to reconcile with the conclusion that population size at a single time point can well capture the total biology of a growth curve. The authors may want to discuss this. Ibstedt et al., 2015 showed strong correlations in natural strains, such as the ones here used, between effects on maximum growth and yield. However, they demonstrated that this is due to natural co-selection on these fitness components, rather than pleiotropy. Consequently, the here reported conclusion may not necessarily be valid for e.g. gene deletions, which is the major application envisioned by the authors?

5) The normalization procedure here employed and highlighted is a combination of the Zackrisson et al., 2016 and Baryshnikova et al., 2010 approaches, with a minor modification. I am not sure that it brings anything really novel to the table. Moreover:

A) The authors claim that their approach is superior to the Zackrisson and Baryshnikova approaches but does not empirically show this. The Baryshnikova approach – correcting for row and column based bias – makes sense if the error follows such a row – column wise pattern. This is the case for population size at late time points in rich 2% glucose medium – because nothing but the local glucose content is limiting for growth rates at this time. Thus, edge colonies, which experience less competition for the local glucose, expand faster at later stages. However, as Zackrisson et al., 2016 showed, the maximum slope on a rich medium is taken before the local glucose becomes growth limiting. The associated error at this time point therefore does not well follow a row-column wise pattern. Moreover, in harsher environments than 2% glucose YES (i.e. most other environments), where growth at all stages of the growth curve may be limited by other factors than local competition for limited glucose, the same reasoning applies (also to some extent shown by Zackrisson et al., 2016). I think the authors should show error distributions across a plate, in many environments, before and after the Zackrisson, Baryshnikova and Pyphe normalization. Moreover they should show that Pyphe normalization, across many environments and timepoints, results in more favorable ROC curves. Now it is not clear that the Pyphe normalization approach represents an advancement.

B) The authors do not show the effect of normalization on the growth curves, which I think should be done, e.g. in Figure 2A. Moreover, growth curves should be shown on a log-scale such that the exponential phase can be readily distinguished.

C) From a more general perspective: the whole of Figure 2 is based on a single, ideal environment (2% glucose YES) with canonical growth curves, but the conclusions are generalized to the method as such. I am not convinced that this is sufficient. I believe that it is fair to ask for expansion such that also a broad range of non-canonical growth curves are covered – preferentially from environments where growth is limited by other factors than the local glucose concentration. For comparison, Zackrisson et al., 2016 considered six different environments.

6) I really, really appreciate the intentions of the authors in trying to extend their set-up to viability staining, which has the potential to count dead cells and resolve birth and death rates. However, I am not convinced that they yet have succeeded in their intentions.

A) The authors do not really consider, or show, deaths over time and does not estimate death rates. Illuminating how death rates changes as a function of growth and environment, or really just highlighting how often it is substantially above zero at different parts of the growth curve, could advance the field substantially, as non-negligible death rates have confounding effects on key microbiological properties (e.g. Frenoy et al., 2018). A time resolved view on death would much improve the paper and we need to see it.

B) None of the method evaluation that is done for growth (Figure 2) is repeated for the viability staining. The reader has no real clue what precision and accuracy looks like over time, how errors are distributed across and within plates, in different environments, at different time points, how well the normalization works, how false positive rates compare to false negative rates etc. Since this is a method paper, I think this is an essential component.

C) From Figure 4D, E, it seems there is a huge variation in the registered colour intensity that is unrelated to whether cells are dead and alive. For much of the dynamic range, the registered redness seems to only reflect noise? And the fraction of live cells in a colony, in this span, has no real quantitative interpretation. The only reliable distinction seems to be the qualitative separation of colonies with many and few dead cells. This drastically reduces the usefulness of the method?

D) If I understand Figure 4C-E right, there seems to be a strong confounding effect of lysed cells when considering Pyphe colony redness: both lysed and alive cells reduce the redness. Hence, colonies with a high fraction of alive cells and colonies with a high fraction of lysed cells can show similar redness? In harsh environments, where a high fraction of cells are first killed and then lysed I imagine that results therefore will be very hard to interpret? This must be a serious shortcoming as compared to e.g. flow cytometry where lysed and alive cells seems to be well separated?

E) The authors’ general conclusion, that there is no overlap between the fraction of dead cells and colony growth, is conceptually very troubling. How can this not be the case, if they really capture population size growth and dead cells respectively? Surely, dead cells do not reproduce: growth, as a consequence, must slow? Or? For example, from Figure 3A, it seems that many colonies grow at a normal rate (i.e. reach an intermediate size), even though only 25% of cells are alive (i.e. the colony redness is 1.25). If, as stated, the detected death is completely disconnected from the detected population size growth, something is fundamentally strange with the detection.

F) From Figure 3B it is clear that there often is a growth impact of phloxine B and that it depends heavily on the environment. One also wonders how genotype-dependent the impact of phloxine B on growth is? If phloxine B has a large impact on growth in many environments and on many genotypes – isn't there a serious risk for confounding effects when measuring both in parallel?

For your reference, we are also providing the individual reviews below; it may be worthwhile to also have a look at these and regard them as suggestions that could help to further improve the paper.

Reviewer #1:

This resources paper describes a modular Python pipeline for automated analysis of microbial colony growth and viability (color). The pipeline integrates basic image analysis, correction and statistics.

Whereas many different research teams have independently developed similar pipelines, it is definitely useful to make a standardized and somewhat user-friendly pipeline available to the broad community, especially for those colleagues lacking the expertise to develop similar pipelines. In addition, if Pyphe becomes a success, it could help standardize colony image analysis, and, by extension, fitness data.

The authors mention position effects, but it is unclear to me how they really correct for these. The text mentions "... i.e. cells positioned next to slow growers have better access to nutrients. Indeed, after reference grid normalisation, we often observed a (generally weak but detectable) secondary edge effect for colonies positioned in the next inward row/column (Figure 1—figure supplement 2B right). We found however, that this effect can easily be remedied by an additional row/column median normalization". How can normalization over a complete row or column remove the effects of neighboring cells, especially since the number of fast or slow growers can be very different across different rows and columns? Or are rows/columns at the plate's extremities compared to the inner colonies? Probably very simple, but please explain more clearly.

Instead of using the term "corrected growth rate" values to refer to fitness/growth relative to the WT, perhaps it is better to call this "relative growth rate". When I read "corrected", I am thinking of the removal of non-biological noise such as positional effects.

Figure 2: To me, the color scale is misleading. When I first looked at it, I interpreted the dark red color as being high, only to realize that this is in fact low. Consider re-coloring (e.g. blue-red is color-blindness-friendly, with blue intuitively meaning low).

Reviewer #2:

Kamrad et al. introduces a data acquisition and analysis pipeline for high-throughput microbial growth data: Pyphe. They use Pyphe to analyze colony growth, and the death component of growth, using moderate scale S. pombe experiments. Microbial growth is a central phenotype in microbiology; if correctly measured it can be used as a proxy for fitness. I appreciate the efforts and intentions by the authors, but there are a quite substantial number of similar pipelines available. The authors base their approach on the Zackrisson et al., 2016 pipeline, incorporating some concepts from Wagih and Parts, 2014 and Baryshnikova et al., 2010. But they introduce few novel developments. Moreover, in several critical respects the Pyphe pipeline seems like a step backwards. I have some technical and conceptual concerns with the pipeline and I am not sure that the authors yet have benchmarked and evaluated their pipeline to a sufficient extent. The main conclusion highlighted, a correlation between the maximum growth rate and late stage yield data, is not convincing or well illuminated and have been reported before, using a much broader empirical basis. The most exciting part of this paper, the ambition of which I much appreciate, is the expansion of the growth platform to also incorporate viability staining to measure cell death. However, I am not sure that the method is yet put on a sufficiently sound empirical footing. There are question marks concerning what is actually captured, whether the method achieves more than a qualitative resolution and to what extent the staining as such impacts on the growth of cells. Moreover, the main conclusion from this section, that the fraction of dead cells is disconnected from colony growth, is conceptually quite troubling and hints at underlying serious issues with the method. While I am very positively disposed towards the intentions of the authors, I am afraid that quite substantial work remains before Pyphe can be regarded as a robust and innovative data analysis pipeline.

1) As an expert user, I have issues with core aspects of the method.

A) The v800 model now seems out of production. For the Pyphe-scan method to be useful for other labs, clearly it must work with other scanners. At the very least, the authors should show that the pipeline works and provides equivalent data with the 850 model.

B) In this context, it seems unfortunate that authors have chosen not to implement the pixel calibration function of the Zackrisson et al., 2016 approach. This uses a grey-scale calibration strip attached to each fixture to ensure that registered pixel intensities becomes comparable across types of instruments, scanners, external conditions and time – the latter is quite important as the properties of light sources and light receptors change as a function of age.

C) The authors have chosen to, as a default, use a lossy, irreversible compression format for their images, JPG. This format uses very inexact approximations and discards much data when representing the original content of the image. JPG is not recommended when quantitative data is to be extracted from an image. The use of e.g. TIFF is much to be preferred: if the authors want to maintain the JPG format, a minimum requirement is that they show that colony delineations, pixel intensity and background intensity are not affected by the lossy compression format. Otherwise, it should be discarded and no data based on it included.

D) Quality control, QC, is normally a substantial investment of time. It was not clear from the paper how the QC or the QC interface works. If an automated, or at least a semi-automated process is not implemented this is a serious shortcoming and would be another step backwards from the Zackrisson et al., 2016. The importance of a high throughput QC becomes clear when studying the example growth curves supplied together with the software: it is apparent that a large fraction of the slopes extracted are wildly inaccurate, or indeed pure noise, and does not reflect the actual growth. How is this handled? If the user has to filter these out manually, with no software guidance, the pipeline is no longer high-throughput.

E) The graphical interface provides user-friendliness. However, the GUI seems to only be applicable to the later stages of the pipeline? The initial steps are command-line based, which clearly does not square with claims of user-friendliness in general. As an example: to position the grid, the user is requested to input pixel coordinates, as integers, from the image in the command-line interface. This lack of a GUI for the early steps is another step backwards from the graphical interface of plate position or automatic grid detection implemented in Zackrisson et al., 2016?

F) How does Pyphe deal with the identification of scanners, when multiple scanners are connected to a computer? Explicitly: how does Pyphe in these conditions ensure that the images comes from the scanner that the user thinks the image comes from? In the CLI interface the scanner number can be input, but it is stated that this may change if the scanner is turned off. Such a change will supposedly cause serious confusion if multiple scanners are connected to a computer. Are the scanners supposed to be constantly turned on? Won't this cause serious light stress to colonies? Zackrisson et al., 2016 implemented a scanner power managing system to switch of scanners when not scanning while still keeping control over time over which scans are taken by which scanner. Pyphe not handling multiple scanners or imposing light stress on colonies would either drastically reduce throughput or accuracy and increase costs (for computers)?

G) The supplementary text does not describe the operations performed or their sequencing in sufficient detail for me to confidently evaluate the analysis process, or to reconstruct what has been done. I kindly ask for more detailed information.

H) The authors argue that the lower precision when using a single time point scan can be compensated for by a higher number of replicates. However, increasing the number of replicates involves very substantial extra manual work and additional costs? E.g. in terms of plates to be poured, scanners and chemicals required, experiments to be started etc? In contrast, measuring multiple times on the same colony is not really a major cost at all? If this is supposed to be a major argument against doing time series, I think the authors should show some empirical support for their point of view.

2) Given the number of pipelines already available:

A) Outside of the viability staining (which I really would love for the authors to succeed in developing), I am not sure what is really innovative or original with Pyphe.

B) Moreover, and in addition to the shortcomings mentioned above, the authors seem to not have implemented the major advancement described in the Zackrisson et al., 2016 paper: an accurate conversion of pixel intensities into cell counts. To my mind, this is a serious shortcoming, because of the non-linearity of transmitted light and true population size. Not accounting for this non-linearity is the same as not diluting dense cell cultures when doing manual OD measurements. The consequence of not accounting for this non-linearity is that detected mid and late stage growth will be much attenuated relative the true population size expansion in these stages. Thus, any effect, of genetic background and/or environment, on this part of the growth curve, risk being mis-quantified or completely overlooked.

C) For claims of capturing fitness to be on a more solid footing, the authors may want to mathematically derive selection coefficients from their fitness estimates (see e.g. Stenberg et al., 2016). This would add novelty to their pipeline.

3) A main conclusion is: "We apply pyphe to show that the fitness information contained in late endpoint measurements of colony sizes is similar to maximum growth slopes from time series." But:

A) The conclusion is reached by extrapolation from growth on rich YES medium, where all growth curves are canonical to growth in general. I am not sure this is appropriate. Different growth environments clearly have disparate effects on the growth curve (e.g. Warringer et al., 2008). For this conclusion to be of any value, a large number of environments with very different curve behaviors should be considered. Warringer et al., 2011 reached a more convincing conclusion using S. cerevisiae natural strains in hundreds of environments.

B) The authors use colony area for this comparison. I am not sure this is ideal. The horizontal expansion rate of a colony is not necessarily a good proxy for the population size expansion rate, and the ratio between the two is rarely constant over time. Early population size expansion is often predominantly horizontal while later expansion is often vertical. Thus, only considering horizontal expansion may lead to mid or late growth being underestimated? This ties into the non-linearity issue.

C) For meaningful biological interpretations to be made, the authors may want to compare the actual growth yield, i.e. the population size change to the end of growth, to the maximum growth rate. This will require some mathematical operations to ensure that growth has indeed ended at the later timepoints and potentially extending the cultivation time beyond 48h. The authors can then link their finding to major microbiological topics, such as r and k selection theory (see e.g. Wei et al., 2019, where the opposite conclusion to that here reported is reached) and thermodynamic considerations of the rate versus the yield of energy limited microbial growth reactions, e.g. MacLean et al., 2008. The activity of many cellular process changes quite dramatically during the growth curve. For example, our understanding of diauxic shifts, glucose and nitrogen catabolite repression, and the different affinities of alternative nutrient influx systems are hard to reconcile with the conclusion that population size at a single time point can well capture the total biology of a growth curve. The authors may want to discuss this. Ibstedt et al., 2015 showed strong correlations in natural strains, such as the ones here used, between effects on maximum growth and yield. However, they demonstrated that this is due to natural co-selection on these fitness components, rather than pleiotropy. Consequently, the here reported conclusion may not necessarily be valid for e.g. gene deletions, which is the major application envisioned by the authors?

4) The normalization procedure here employed and highlighted is a combination of the Zackrisson et al., 2016 and Baryshnikova et al., 2010 approaches, with a minor modification. I am not sure that it brings anything really novel to the table. Moreover:

A) The authors claim that their approach is superior to the Zackrisson and Baryshnikova approaches but does not empirically show this. The Baryshnikova approach – correcting for row and column based bias – makes sense if the error follows such a row – column wise pattern. This is the case for population size at late time points in rich 2% glucose medium – because nothing but the local glucose content is limiting for growth rates at this time. Thus, edge colonies, which experience less competition for the local glucose, expand faster at later stages. However, as Zackrisson et al., 2016 showed, the maximum slope on a rich medium is taken before the local glucose becomes growth limiting. The associated error at this time point therefore does not well follow a row-column wise pattern. Moreover, in harsher environments than 2% glucose YES (i.e. most other environments), where growth at all stages of the growth curve may be limited by other factors than local competition for limited glucose , the same reasoning applies (also to some extent shown by Zackrisson et al., 2016). I think the authors should show error distributions across a plate, in many environments, before and after the Zackrisson, Baryshnikova and Pyphe normalization. Moreover they should show that Pyphe normalization, across many environments and timepoints, results in more favorable ROC curves. Now it is not clear that the Pyphe normalization approach represents an advancement.

B) The authors do not show the effect of normalization on the growth curves, which I think should be done, e.g. in Figure 2A. Moreover, growth curves should be shown on a log-scale such that the exponential phase can be readily distinguished.

C) From a more general perspective: the whole of Figure 2 is based on a single, ideal environment (2% glucose YES) with canonical growth curves, but the conclusions are generalized to the method as such. I am not convinced that this is sufficient. I believe that it is fair to ask for expansion such that also a broad range of non-canonical growth curves are covered – preferentially from environments where growth is limited by other factors than the local glucose concentration. For comparison, Zackrisson et al., 2016 considered six different environments.

5) I really, really appreciate the intentions of the authors in trying to extend their set-up to viability staining, which has the potential to count dead cells and resolve birth and death rates. However, I am not convinced that they yet have succeeded in their intentions.

A) The authors do not really consider, or show, deaths over time and does not estimate death rates. Illuminating how death rates changes as a function of growth and environment, or really just highlighting how often it is substantially above zero at different parts of the growth curve, could advance the field substantially, as non-negligible death rates have confounding effects on key microbiological properties (e.g. Frenoy et al., 2018). A time resolved view on death would much improve the paper and we need to see it.

B) None of the method evaluation that is done for growth (Figure 2) is repeated for the viability staining. The reader has no real clue what precision and accuracy looks like over time, how errors are distributed across and within plates, in different environments, at different time points, how well the normalization works, how false positive rates compare to false negative rates etc. Since this is a method paper, I think this is an essential component.

C) From Figure 4D, E, it seems there is a huge variation in the registered colour intensity that is unrelated to whether cells are dead and alive. For much of the dynamic range, the registered redness seems to only reflect noise? And the fraction of live cells in a colony, in this span, has no real quantitative interpretation. The only reliable distinction seems to be the qualitative separation of colonies with many and few dead cells. This drastically reduces the usefulness of the method?

D) If I understand Figure 4C-E right, there seems to be a strong confounding effect of lysed cells when considering Pyphe colony redness: both lysed and alive cells reduce the redness. Hence, colonies with a high fraction of alive cells and colonies with a high fraction of lysed cells can show similar redness? In harsh environments, where a high fraction of cells are first killed and then lysed I imagine that results therefore will be very hard to interpret? This must be a serious shortcoming as compared to e.g. flow cytometry where lysed and alive cells seems to be well separated?

E) The authors general conclusion, that there is no overlap between the fraction of dead cells and colony growth, is conceptually very troubling. How can this not be the case, if they really capture population size growth and dead cells respectively? Surely, dead cells do not reproduce: growth, as a consequence, must slow? Or? For example, from Figure 3A, it seems that many colonies grow at a normal rate (i.e. reach an intermediate size), even though only 25% of cells are alive (i.e. the colony redness is 1.25). If, as stated, the detected death is completely disconnected from the detected population size growth, something is fundamentally strange with the detection.

F) From Figure 3B it is clear that there often is a growth impact of phloxine B and that it depends heavily on the environment. One also wonders how genotype-dependent the impact of phloxine B on growth is? If phloxine B has a large impact on growth in many environments and on many genotypes – isn't there a serious risk for confounding effects when measuring both in parallel?

Reviewer #3:

The current manuscript describes a comprehensive pipeline that is a wrapper around already available tools and implements already described approaches (e.g. grid normalisation) and which can be used to analyse imaging data collected using flatbed scanners for high-throughput fitness screens. While the paper is very clear and well written and the code deposited in a public repository appears to be well crafted and documented, I am unsure there is enough novelty in this tool or in the experimental validation reported in the current manuscript to be of interest for the general readership of eLife. If I understand this correctly, there aren't any critical steps implemented in this pipeline which had not been reported or implemented before, which makes me think that a journal where this kind of tools are reported might be a better home for the current manuscript. Unfortunately I lack the expertise in the specific area of high-throughput phenotypic screens to be able to judge whether the substantial work presented here constitutes a technical improvement and a practical tool that might be widely used or rather a step change in the field. Without more competing arguments in favour of the latter and without a clear indication of a novel approach rather than implementation of already described tools and techniques I cannot fully support this manuscript for publication in eLife.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Pyphe, a python toolbox for assessing microbial growth and cell viability in high-throughput colony screens" for further consideration by eLife. Your revised article has been evaluated by Aleksandra Walczak as the Senior Editor and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

All reviewers agree that you and your co-authors have responded adequately to the concerns that were raised and that the paper has matured significantly. That said, we would still recommend addressing two specific points in a bit more detail in the paper, so that the readers are at the very least made aware that these might be potential concerns. Firstly, we think it is important for growth measures to be as good proxies for population size as possible. Accounting for the non-linearity of optical and cell density is important, even if this is often ignored (because the difference is not huge, or because of technical issues). Second, there still is some concern about the lack of correlation between measured death and the measured fitness proxy – one reviewer is not sure that this is not due to one or both measures being afflicted by a large error. We understand that this is not easily solved, but believe it is fair to mention it explicitly in the paper as a potential concern.

This resources paper describes a modular Python pipeline for automated analysis of microbial colony growth and viability (color). The pipeline integrates basic image analysis, correction and statistics. The pipeline will be a useful tool for the broad community and may help standardize high-throughput automated growth measurements for microbes.

https://doi.org/10.7554/eLife.55160.sa1

Author response

Reviewer #1:

[…]

The authors mention position effects, but it is unclear to me how they really correct for these. The text mentions "... i.e. cells positioned next to slow growers have better access to nutrients. Indeed, after reference grid normalisation, we often observed a (generally weak but detectable) secondary edge effect for colonies positioned in the next inward row/column (Figure 1—figure supplement 2B right). We found however, that this effect can easily be remedied by an additional row/column median normalization". How can normalization over a complete row or column remove the effects of neighboring cells, especially since the number of fast or slow growers can be very different across different rows and columns? Or are rows/columns at the plate's extremities compared to the inner colonies? Probably very simple, but please explain more clearly.

Thank you for highlighting the need for additional explanation and more clarity here. All arrayed yeast colonies will show differences in growth based on where they are located in the plate. This is partly due to the classical edge effect reflecting that colonies on the edge are exposed to a larger volume of agar without competition/detoxification effects from neighbours. This affects entire rows and columns largely uniformly so could be corrected with a simple row/column median normalisation. But in addition to the edge effect, there are other spatial effects with no defined patterns. These present themselves in the form of growth differences in different areas of the plate, without a clear pattern or size, and originate from uneven plate pouring, uneven heating or drying of the plate, uneven distribution of nutrients/toxins in the agar or pinning errors. Both these effects can efficiently be corrected using a reference grid normalisation which essentially compares the size of any given colony to those of wild type control colonies arrayed around it. We have expanded Appendix 2 with additional details and explanations.

“Or are rows/columns at the plate's extremities compared to the inner colonies?”

Yes. While the grid correction is very good at removing most technical noise (it typically reduces the CV by 4-fold), it can potentially introduce artefacts in the form of negative fitness values and a secondary edge effect. In the case of the secondary edge, this is because the edge effect only affects the outermost rows/columns. But in order to compute the relative fitness, colonies in the second row/column are compared to neighbouring colonies, which includes colonies on the edge (which increases the value that is being compared to). Since this affects entire rows/columns uniformly, it can be corrected by dividing by row/column medians.

“How can normalization over a complete row or column remove the effects of neighboring cells, especially since the number of fast or slow growers can be very different across different rows and columns?”

As pointed out, a median correction is not valid if the median is not a good estimate of the null effect. For this reason, we strongly discourage row/column median normalisation for plates in 96-colony format (where the median is computed from only 8 or 12 values). If plates contain a large number of slow or fast growers, a median normalisation is also unsuitable as pointed out, especially if these are distributed non-randomly in the plate. For work with knockout libraries, where most gene knock-outs have no effect in any given condition, the additional row/column median normalisation effectively neutralises the secondary edge effect. Being a toolbox, pyphe requires the user to think about their experimental design, choice of control strains, and plate layout to choose a suitable normalisation strategy from the options provided. Although this will require some extra time from the user, we believe that this level of understanding is necessary in order to obtain reliable, interpretable results. This flexibility will allow researchers to tailor data analysis to their experiment and will foster the uptake of pyphe in diverse labs and settings. We have improved and expanded Appendix 2, following on from the above arguments.

Instead of using the term "corrected growth rate" values to refer to fitness/growth relative to the WT, perhaps it is better to call this "relative growth rate". When I read "corrected", I am thinking of the removal of non-biological noise such as positional effects.

The grid correction takes care of three things simultaneously: (1) it converts colony size into an easily interpretable value by reporting a ratio relative to WT; (2) it makes results comparable across different plates/batches as long as the same WT strain is used used and grown in the same way; and (3) it corrects for within-plate positional effects which become apparent due to the same WT grid strain showing different fitness in different plate positions. We agree that this reflects the nature of the normalisation better and have followed your suggestion. We now explicitly point out the relative nature of the corrected fitness score (subsection “Pyphe enables analysis pipelines for fitness-screen data“) and adapted the term “relative growth rate” in several places in the manuscript.

Figure 2: To me, the color scale is misleading. When I first looked at it, I interpreted the dark red color as being high, only to realize that this is in fact low. Consider re-coloring (e.g. blue-red is color-blindness-friendly, with blue intuitively meaning low).

Thank you for this suggestion. We agree that an inversion of the colour scale is more intuitive and have changed heatmaps. We have opted against using a two-colour scheme which would imply a neutral mid-point or inflection point not present in correlation data.

Reviewer #2:

Kamrad et al. introduces a data acquisition and analysis pipeline for high-throughput microbial growth data: Pyphe. […] While I am very positively disposed towards the intentions of the authors, I am afraid that quite substantial work remains before Pyphe can be regarded as a robust and innovative data analysis pipeline.

Before the point-by-point discussion, we would like to describe some fundamental differences between scan-o-matics and pyphe to demonstrate that pyphe is not a step backwards but a unique, novel approach with a distinct philosophy and streamlined implementation. Work on this project has evolved for almost four years and the initial goal was not to develop a new software solution. Indeed, it is a guiding philosophy of our labs to not reinvent the wheel, but to develop new tools only when they are needed and useful for the community. We have summarised key differences between scan-o-matics and pyphe in Author response table 1. These have been motivated by: (1) users conducting screens of different sizes (ranging from 3 to 3000 plates) with different questions and methods; (2) the need for users to have full control and insight into different steps of the analysis pipeline; (3) limited space in temperature-controlled rooms and incubators; (4) the need for simplified hardware set-ups not requiring scanner modifications (safety and warranty issues), dedicated LAN networks or power switches.

Author response table 1

Our preference for using endpoints and redness estimates as fitness inputs was a result, and not an initial expectation. Indeed we were for a long time very skeptical about end-point measurements, but the data of over half a million colony sizes recorded in our labs so far has shown that they can be used as a fitness proxy in a vast majority of cases. As resources are always limiting, we feel these are very important points to report.

For scientific questions and projects for which the priority lies in throughput, the flexible use of different fitness proxies, and streamlined scripted analysis, pyphe is a highly efficient and precise workflow that in our hands performs significantly better than its predecessors. There are other applications, however, like those that require the estimation of precise cell numbers in colonies for which pyphe is not designed and where scan-o-matics is the method of choice.

1) As an expert user, I have issues with core aspects of the method.

A) The v800 model now seems out of production. For the Pyphe-scan method to be useful for other labs, clearly it must work with other scanners. At the very least, the authors should show that the pipeline works and provides equivalent data with the 850 model.

We agree and had in fact already ordered a V850 scanner. The new model is supported by SANE, so we fully expect this to be straightforward. Unfortunately, with the current closure of our institutions due to the pandemic, we are at this time not able to validate it for use with pyphe in the lab. We will validate and update pyphe and the GitHub documentation for the new scanner model, as soon as we can.

B) In this context, it seems unfortunate that authors have chosen not to implement the pixel calibration function of the Zackrisson et al., 2016 approach. This uses a grey-scale calibration strip attached to each fixture to ensure that registered pixel intensities becomes comparable across types of instruments, scanners, external conditions and time – the latter is quite important as the properties of light sources and light receptors change as a function of age.

We have been working with pixel calibration strips during the development of pyphe (the fixture we use and have published cutting vectors for has a space to accommodate it) but have moved away from them for most daily uses (an exception are colony ageing experiments where the same plate is scanned once daily over several weeks). They are somewhat expensive and difficult to procure, and we have found them unnecessary for most of our purposes. A different calibration does not affect the binary classification of colonies and background which is all we need to quantify areas with high precision. We illustrate this in Author response image 1 by transforming the same image with 2 different hypothetical calibration functions and then analysing colony sizes with pyphe-quantify. The results are highly consistent, even in the case of the second transformation which is extreme and causes visible distortion of the image.

Author response image 1
Pixel calibration is not required for accurate determination of colony sizes.

Top row: calibration functions applied to the original scanned image. The first function is a linear transformation that scales the image to fill the entire 8bit range. We apply this to images in batch (but not timecourse) analysis by default. The other functions are 3rd-degree polynomials (as used by scan-o-matics). Middle row: Transformed images with upper right corner magnified. The third function has strong non-linear components which result in visible distortion of the image. Bottom row: colony sizes obtained with pyphe-quantify batch of the transformed images versus the original image. The median of the relative error abs(size(transformed)-size(original))/size(original) across the plate is noted. This is negligibly small when compared to the variation of colony sizes.

For virtually all applications of pyphe, we compare colony areas or redness scores within the same plate to produce corrected fitness values relative to the control strain. Differences between different scanners or the same scanner over a long time will thereby be corrected as they affect all colonies on a plate equally.

Special consideration needs to be given to growth curves, where uncorrected information from multiple images of the same plate is used. As described in Appendix 2, pyphe-quantify timecourse identifies colony positions in the last plate of the timecourse and then applies this mask to all background-subtracted images, each time summing up the intensities of all pixels in the masked areas. Zackrisson et al., 2016 noticed some variation in scanner calibration between consecutive scans (Supplementary Figure 3D). Somewhat surprisingly, we have found that our scans obtained with the V800 model are consistent enough over a period of a few days (scanners are undisturbed, light-protected and scanner age changes over much larger time-scales) to produce smooth growth curves (Author response image 2). We suspect that part of the variation/noise in the growth curves observed by Zackrisson et al., 2016 comes from the hard reboot of thecaner between every image, whichh we do not do (see also point 1F).

Author response image 2
Examples of raw growth curves obtained with pyphe setup.

Shown are 12 growth curves from the first row of a 1536 plate of 57 S. pombe wild strains (same data as Figure 2 in manuscript) analysed with pyphe-quantify in timecourse mode.

Fitting of lines to determine the maximum slope is an additional step to compensate for noise in the data (more noisy data can be compensated by fitting over more timepoints, and the user has the option to do so). After grid normalisation and reporting fitness results relative to the control strain, data are again comparable across days, instruments etc.

C) The authors have chosen to, as a default, use a lossy, irreversible compression format for their images, JPG. This format uses very inexact approximations and discards much data when representing the original content of the image. JPG is not recommended when quantitative data is to be extracted from an image. The use of e.g. TIFF is much to be preferred: if the authors want to maintain the JPG format, a minimum requirement is that they show that colony delineations, pixel intensity and background intensity are not affected by the lossy compression format. Otherwise, it should be discarded and no data based on it included.

Thank you for raising this point. Our choice of image format was not by accident but a carefully considered trade-off to reduce storage space requirements. A tiff image of a single plate scanned at 600dpi is approximately 4MB large, whereas the corresponding jpg (converted using ImageMagick’s default parameters) is usually only in the range of 180-580KB, reducing image storage needs by a factor of ~20. This is relevant, as pyphe is designed specifically for high-throughput pipelines.

In order to address your concern, we have re-analysed images from the 57 S. pombe wild strain growth curve experiment on rich medium (Figure 2). This experiment contained 145 images of the same plate, scanned every 20 minutes. We have analysed each image separately in tiff and jpg format, both with gitter and with pyphe-quantify batch. We have computed the Pearson correlation between results obtained with both image formats and achieved an overall correlation of 0.999976 for analysis with gitter and 0.999964 with pyphe-quantify batch. We have also computed the relative error introduced by conversion to jpg, defined as abs(size(jpg)size(tiff_image))/size(tiff), which has a median of 0.00245 for gitter and 0.00333 for pyphe. We have computed both of these measures plate-wise (Author response image 3) and find that the error introduced by conversion is consistently low across the growth curve when considering that early images with smaller, fainter colonies are harder to analyse. These relative errors need to be put into perspective by comparison of the biological signal (here estimated by the median absolute deviation of all colonies in each plate). The error introduced by conversion is negligible compared to the biological signal.

Author response image 3
Image conversion to jpg has negligible impact on results.

Each image of a growth curve consisting of 145 images (shown on x-axis) was analysed in the original tiff format and in the converted jpg, using gitter (right) and pyphe-quantify in batch mode (left). The correlation (blue line) is extremely high for all images (>0.996) and increases as colonies get larger and darker. The median relative error abs(size(jpg)-size(tiff_image))/size(tiff) is shown (orange) and is practically 0 compared to the biological signal (median absolute deviation of all colony sizes per plate).

We also confirmed that image conversion makes no difference in the case of growth curves, where pyphe reports the sum of pixel intensities in the footprint of the colony in the final image. Using pyphe-quantify in timecourse mode on the entire image series in jpg and tiff format produces an overall Pearson correlation of 0.999998 with a median relative error of 0.00087 across all colonies and timepoints.

However, we fully accept that some researchers may prefer to work with lossless image formats and now offer the option in pyphe-scan to produce images in tiff. Pyphe-quantify is already flexible with regards to image format. Please note that pyphe-scan, in any case, saves the original scans in tiff format (which we usually delete or archive once image processing is complete, depending on the project).

D) Quality control, QC, is normally a substantial investment of time. It was not clear from the paper how the QC or the QC interface works. If an automated, or at least a semi-automated process is not implemented this is a serious shortcoming and would be another step backwards from the Zackrisson et al., 2016. The importance of a high throughput QC becomes clear when studying the example growth curves supplied together with the software: it is apparent that a large fraction of the slopes extracted are wildly inaccurate, or indeed pure noise, and does not reflect the actual growth. How is this handled? If the user has to filter these out manually, with no software guidance, the pipeline is no longer high-throughput.

Pyphe-growthcurves, our tool for growth curve analysis is written to be as flexible as possible, and we use it for liquid growth curve analysis as well. The example data supplied is from liquid cultures from a plate reader experiment and indeed of much poorer quality than what we would expect from solid growth curves. We have now provided more appropriate example data (from the 57 wild strain experiment, Figure 2). We have additionally improved the tool to perform some automated growth curve QC and tag spurious curves in a new column of the output file if the R2 of the fitted line is <0.95 or a significantly negative slope is detected in the growth curve (this happens quite frequently for plate reader growth curves). We consciously avoid fitting parametric growth models to colony area data and believe this should be left to expert users, if required, who can easily access pyphe-growthcurve data thanks to the use of standardised, simple, human-readable intermediate steps.

For endpoint measurements, we recommend a QC strategy based on colony circularities which is clearly described in the manuscript. We have now implemented these two key steps in pyphe-interpret and updated the documentation, so no manual QC is required by the user for typical endpoint experiments.

E) The graphical interface provides user-friendliness. However, the GUI seems to only be applicable to the later stages of the pipeline? The initial steps are command-line based, which clearly does not square with claims of user-friendliness in general. As an example: to position the grid, the user is requested to input pixel coordinates, as integers, from the image in the command-line interface. This lack of a GUI for the early steps is another step backwards from the graphical interface of plate position or automatic grid detection implemented in Zackrisson et al., 2016?

Thank you for raising this point. We agree that this had not been documented very well in the previous manuscript. We initially have not implemented an automatic grid correction since all our plates were scanned with the same fixture taped to the scanners, so colony positions were highly consistent across images. Users actually do not need to input pixel coordinates if they are using the fixture and scanning parameters provided by us (this is done using the --grid pp_384 or --grid pp_1536 option). This information has been updated. Simultaneously, with the goal of maximum flexibility, pyphe-quantify offers the option of manually defining grid positions. Getting those coordinates is trivial and can be done, for example, in Microsoft Paint by hovering the cursor over the colony (we have now pointed this out in the tool manual). The option to manually define grid positions is important in our experience, as automatic gridding is the step where most image analysis tools typically fail (especially if plates have many missing colonies, images are rotated or otherwise of low quality). However, we fully agree that manual entering of colony coordinates is awkward and have now implemented automatic grid detection functionality. It is based on detecting peaks in image pixel rows/columns and is used by setting the --grid argument to auto_96, auto_384 or auto_1536.

Secondly, we do not agree with the claim that GUIs are more user-friendly in general. They can be useful in many instances and, as pointed out, we do provide a GUI for one of our tools but have otherwise moved away from this for three reasons. First, GUIs struggle with cross-platform compatibility (esp. without browser-based implementation) and are time-consuming to build. Second, GUIs only really make sense if they have interactive/dynamic components, which our tools don’t require. Using a pipeline for data analysis, where tools with simple, well-defined purposes operate on human-readable files, is in our opinion a preferable solution to integrating all functionality in a complicated GUI with tabs, menus and submenus. All pyphe tools require only a small set of parameters to start the analysis, which then proceeds without user-input. The well-documented command-line interfaces follow the same scheme and are straightforward to use without any knowledge of computer programming. They are based on the powerful argparse package which checks user inputs carefully. Third, using command line calls allows our pipeline to be scriptable. It is therefore easy to document its use exactly, reproduce results and re-run analyses quickly if the input data has changed/expanded.

F) How does Pyphe deal with the identification of scanners, when multiple scanners are connected to a computer? Explicitly: how does Pyphe in these conditions ensure that the images comes from the scanner that the user thinks the image comes from? In the CLI interface the scanner number can be input, but it is stated that this may change if the scanner is turned off. Such a change will supposedly cause serious confusion if multiple scanners are connected to a computer. Are the scanners supposed to be constantly turned on? Won't this cause serious light stress to colonies? Zackrisson et al., 2016 implemented a scanner power managing system to switch of scanners when not scanning while still keeping control over time over which scans are taken by which scanner. Pyphe not handling multiple scanners or imposing light stress on colonies would either drastically reduce throughput or accuracy and increase costs (for computers)?

The implementation done by scan-o-matics is a clever and well-written solution to both the backlight and scanner identification problem. We have in the beginning implemented the complete setup from Zackrisson et al., 2016 using the LAN power switcher and scanners controlled by the scan-o-matics software interface. We have moved away from this for the following reasons:

1) We have been using only V800 scanners, the replacement model of the V700 used by Zackrisson et al., 2016, and could not reproduce the problem with the light staying on. In our hands, the light switches off promptly after each scan.

2) We suspect that part of the variation/noise in the growth curves observed by Zackrisson et al., 2016 comes from the hard reboot of the scanning between every image, and we do not seem to have these issues (see point 1B).

3) The power switch has not been easy to buy through our procurement system as it is non-standard electrical equipment. It takes time to set up (requiring a dedicated router and IP address configuration) and has not been entirely stable in our hands.

4) Using the power switcher requires a hardware modification of the scanner which many users will be uncomfortable with and which voids the warranty. It was further not compatible with the UK fire safety/electrical safety regulations for us to modify the scanner without obtaining a certificate of the modification. We found nobody that was willing, or legally able, to certify the electrical safety of a scanner modified by ourselves.

5) Scan-o-matics avoids having to turn on two scanners at once by dynamically moving timepoints. This can lead to an uneven spacing of timepoints which complicates downstream analysis.

Setting up an experiment with multiple scanners with pyphe-scan-timecourse is straightforward. One simply needs to prepare and connect the first scanner and start scanning with --scanner 1, then connect the second scanner to the computer and start scanning with --scanner 2, etc. This is now more clearly documented in the tool’s help page.

G) The supplementary text does not describe the operations performed or their sequencing in sufficient detail for me to confidently evaluate the analysis process, or to reconstruct what has been done. I kindly ask for more detailed information.

We have expanded Appendix 1 and 2 to describe pyphe’s algorithms in more detail. We have now also added Appendix 3 (description of pyphe-growthcurves) and Appendix 4 (description of pyphe-interpret). Information on how to use each tool may change in future versions and is given in the tools’ inbuilt help (accessible by calling the tool with the -h option) and on GitHub.

H) The authors argue that the lower precision when using a single time point scan can be compensated for by a higher number of replicates. However, increasing the number of replicates involves very substantial extra manual work and additional costs? E.g. in terms of plates to be poured, scanners and chemicals required, experiments to be started etc? In contrast, measuring multiple times on the same colony is not really a major cost at all? If this is supposed to be a major argument against doing time series, I think the authors should show some empirical support for their point of view.

We agree that this point will be stronger with supporting evidence which we now provide. First, we have conducted a power analysis illustrating the trade-off between more replicates and higher measurement precision (Author response image 4). Using a CV of 2% for scan-o-matic, as reported by Zackrisson et al., and a CV of 6%, as reported in our knock-out screen (Figure 2), we have calculated the statistical power (1 – chance of type II error, i.e. non-rejection of a false null hypothesis) dependent on the difference in means of the input phenotypes. We achieve similar power using the number of replicates shown below and note that both methods are able to detect even subtle (10%) growth differences reliably.

Author response image 4
The analysis shows the number of replicates required with scan-o-matics and with pyphe in order to achieve the same statistical power.

We further illustrate our response to this point using the example of an experiment we recently conducted, where we measured ~90 non-coding RNA mutants in 9 replicates across ~140 conditions (Rodriguez-Lopez et al., in preparation). This experiment comprises 3 plate layouts per condition (to accommodate all 9 replicates), which amounts to 420 plates in total. Assuming we could have obtained the same statistical power with 3 replicates in scan-o-matics, we have compiled Author response table 2 that breaks down costs and other requirements:

Author response table 2

2) Given the number of pipelines already available:

A) Outside of the viability staining (which I really would love for the authors to succeed in developing), I am not sure what is really innovative or original with Pyphe.

Our earlier responses have lined out how we have established a new, common framework for phenotyping analysis. It is important to note that gitter (Wagih and Parts, 2014) and grofit (Kahm et al., 2010), two popular R packages for image and growth curve analysis, are now archived and no longer installable via install.packages(). So despite the number of publications on the topic, the tools actually available to potential users are very few in practice. Researchers want and need tools which are straightforward to install and use and which fit into their existing workflow and data. Pyphe was designed with this in mind and is a unique, comprehensive end-to-end solution for various phenotyping scenarios.

Comparing pyphe specifically to scan-o-matics, several points of fundamental difference are highlighted by the reviewer. Moreover, pyphe has substantially expanded functionality, being able to process endpoints and growth curves as well as colony sizes and colony redness within the same framework. It further implements downstream statistical analysis and hit calling. Pyphe is, to our knowledge, the first platform with such a scope.

Besides pyphe itself, this manuscript contains a substantial amount of biological data and new findings and makes wide-ranging conclusions that will be important for everyone working on colony-based screens, regardless of whether they use pyphe or not. Briefly, these are (1) the observation that endpoints are highly correlated with growth rate and can be used as a fitness proxy which is much easier to obtain, (2) that colony redness is a reproducible, orthogonal and independent fitness readout easily obtained from the same colony, and (3) that redness scores reflect colony viability.

B) Moreover, and in addition to the shortcomings mentioned above, the authors seem to not have implemented the major advancement described in the Zackrisson et al., 2016 paper: an accurate conversion of pixel intensities into cell counts. To my mind, this is a serious shortcoming, because of the non-linearity of transmitted light and true population size. Not accounting for this non-linearity is the same as not diluting dense cell cultures when doing manual OD measurements. The consequence of not accounting for this non-linearity is that detected mid and late stage growth will be much attenuated relative the true population size expansion in these stages. Thus, any effect, of genetic background and/or environment, on this part of the growth curve, risk being mis-quantified or completely overlooked.

We agree that the measurement of true cell numbers is a distinguishing feature of the scan-o-matics pipeline. As pyphe is not meant at all to be a simplified clone of scan-o-matics, it has different feature sets, strengths and weaknesses. We now explicitly state this limitation of pyphe in the main text. If high-throughput measurements of true cell numbers should really be required for an experiment, we recommend scan-o-matics to potential users. However, obtaining true cell numbers as implemented in scan-o-matics assumes that the relationship between pixel intensity and cell number does not change between conditions and strains, and it is unclear to which extent this is normally valid.

Furthermore, it complicates the analysis considerably, effectively restricting it to specialist laboratories. But most importantly, we think that true cell numbers are not actually required to answer the vast majority of questions investigated with colony-based screens. The true “fitness” in a natural setting is rarely ever measured but approximated through a linked readout in the laboratory. Colony footprints are an intuitive fitness proxy reflecting how well a strain/colony has performed in the environment. This readout has served biologists incredibly well for decades and is well suited for answering the questions posed in most laboratories. These questions are normally one of the following two types. Type 1 requires a relative quantitative fitness readout and/or classification of “faster/slower growing than another strain in the same experiment”. For example, for GWAS, phenotype vectors are often centred to mean 0, scaled to variance 1 and transformed to normal shape by box-cox or similar before analysis. For this approach, obtaining a readout which is increasing with cell number (even if not in a strictly linear fashion) will result in similar outcomes.

Type 2 concerns profiling approaches for functional genomics and these require a reproducible readout which reflects aspects of physiology. Profile vectors are then used in multivariate analysis, such as clustering, which reveals similarities between genes. This approach requires no mechanistic understanding of what the readout means; it is even blind to the conditions used to obtain them. Instead, they have to be reproducible, precise and measurable in high numbers. Colony sizes are ideally suited to both of these types of questions, supported by the remarkable recent discoveries made using colony-size screens (e.g. Kuzmin et al., 2018 or Galardini et al., 2019).

To validate this conclusion, we would be keen to directly compare the results obtained with pyphe to those obtained with the full scan-o-matics setup. To this end, we have tried to analyse the example image data set provided by Zackrisson et al., 2016 (https://github.com/Scan-oMatic/scanomatic/wiki/Example-experimental-data). We have, however, run into difficulties with that data and think it would be best to open a direct dialogue about this issue and whether or how best to pursue the comparison.

C) For claims of capturing fitness to be on a more solid footing, the authors may want to mathematically derive selection coefficients from their fitness estimates (see e.g. Stenberg et al., 2016). This would add novelty to their pipeline.

We could not find this publication. There does not seem to be a publication from this first author in that year. More generally, we do not feel that deriving selection coefficients would be useful at this time, as it is not the core area of expertise of our labs and we have no immediate application for it. However, pyphe is set up to become a collaborative project and welcomes code contributions from the community.

3) A main conclusion is: "We apply pyphe to show that the fitness information contained in late endpoint measurements of colony sizes is similar to maximum growth slopes from time series." But:

A) The conclusion is reached by extrapolation from growth on rich YES medium, where all growth curves are canonical to growth in general. I am not sure this is appropriate. Different growth environments clearly have disparate effects on the growth curve (e.g. Warringer et al., 2008). For this conclusion to be of any value, a large number of environments with very different curve behaviors should be considered. Warringer et al., 2011 reached a more convincing conclusion using S. cerevisiae natural strains in hundreds of environments.

We agree and have now collected a bigger data set for cell growth in 8 additional conditions. These conditions have been specifically designed to produce diverse growth dynamics, using combinations of different carbon sources, salt stress, and different nitrogen sources. We show plots for each individual condition in the new Figure 2—figure supplement 1. We have calculated for each condition the correlation between endpoint and maximum slope of the growth curve (new Figure 2—figure supplement 2) and obtain a medium correlation of 0.95, thus confirming our earlier conclusion.

B) The authors use colony area for this comparison. I am not sure this is ideal. The horizontal expansion rate of a colony is not necessarily a good proxy for the population size expansion rate, and the ratio between the two is rarely constant over time. Early population size expansion is often predominantly horizontal while later expansion is often vertical. Thus, only considering horizontal expansion may lead to mid or late growth being underestimated? This ties into the non-linearity issue.

We agree that these are valid and important theoretical considerations. In fact, we have carefully considered these very same points when we set out to investigate the relationship between endpoints and other growth-curve parameters. Please note that pyphe-quantify in timecourse mode reports the sum of pixel intensities, so it does take into account thickness as well as area. We agree with your observations above and with the argument that these could complicate our analysis. Yet, the fact that colony areas do correlate so well with maximum slopes indicates that these points have a comparatively minor impact and can be ignored in practice. By using genetically diverse wild strains for these experiments, we covered strains with highly diverse morphology and growth behaviour. However, such considerations might matter more for other microbial species, and we have added this potential caveat in the main text. Pyphe-quantify in batch mode also reports the average pixel intensity as well as the colony area by default (the relationship between which we show in Figure 1—figure supplement 1B), so the user has the option to use those instead.

C) For meaningful biological interpretations to be made, the authors may want to compare the actual growth yield, i.e. the population size change to the end of growth, to the maximum growth rate. This will require some mathematical operations to ensure that growth has indeed ended at the later timepoints and potentially extending the cultivation time beyond 48h. The authors can then link their finding to major microbiological topics, such as r and k selection theory (see e.g. Wei et al., 2019, where the opposite conclusion to that here reported is reached) and thermodynamic considerations of the rate versus the yield of energy limited microbial growth reactions, e.g. MacLean et al., 2008. The activity of many cellular process changes quite dramatically during the growth curve. For example, our understanding of diauxic shifts, glucose and nitrogen catabolite repression, and the different affinities of alternative nutrient influx systems are hard to reconcile with the conclusion that population size at a single time point can well capture the total biology of a growth curve. The authors may want to discuss this. Ibstedt et al., 2015 showed strong correlations in natural strains, such as the ones here used, between effects on maximum growth and yield. However, they demonstrated that this is due to natural co-selection on these fitness components, rather than pleiotropy. Consequently, the here reported conclusion may not necessarily be valid for e.g. gene deletions, which is the major application envisioned by the authors?

Thank you for raising this interesting point. We fully agree with everything stated in principle but do not believe that this is relevant here. Endpoint colony sizes on solid media cannot be used to determine growth yields. Colonies are densely arrayed and keep growing until the agar is depleted of the limiting nutrient. This competition for resources means that rather than each strain having the same amount of nutrients to grow (as would be required to determine yield), each strain has roughly the same amount of time to grow, which means that endpoints largely reflect growth rate, as our analyses show. This is fundamentally different to work in liquid media, as used by (Ibstedt et al., 2015), where each strain grows in its own well/flask without competition from other strains.

“For example, our understanding of diauxic shifts, glucose and nitrogen catabolite repression, and the different affinities of alternative nutrient influx systems are hard to reconcile with the conclusion that population size at a single time point can well capture the total biology of a growth curve.”

We agree that a single data point cannot capture the total biology of a growth curve and we certainly do not claim that it does so. But in the end, most quantitative analyses require a simple numerical input to be extracted from the growth curves. We do show that endpoint colony sizes are an accurate reflection specifically of the maximum slope so these both reflect the key growth-curve parameter. By providing easy access to the raw growth-curve data and plotting all curves as pdf, users can detect unusual growth dynamics and specialised users can easily perform additional, specific analyses.

4) The normalization procedure here employed and highlighted is a combination of the Zackrisson et al., 2016 and Baryshnikova et al., 2010 approaches, with a minor modification. I am not sure that it brings anything really novel to the table. Moreover:

A) the authors claim that their approach is superior to the Zackrisson and Baryshnikova approaches but does not empirically show this. The Baryshnikova approach – correcting for row and column based bias – makes sense if the error follows such a row – column wise pattern. This is the case for population size at late time points in rich 2% glucose medium – because nothing but the local glucose content is limiting for growth rates at this time. Thus, edge colonies, which experience less competition for the local glucose, expand faster at later stages. However, as Zackrisson et al., 2016 showed, the maximum slope on a rich medium is taken before the local glucose becomes growth limiting. The associated error at this time point therefore does not well follow a row-column wise pattern. Moreover, in harsher environments than 2% glucose YES (i.e. most other environments), where growth at all stages of the growth curve may be limited by other factors than local competition for limited glucose, the same reasoning applies (also to some extent shown by Zackrisson et al., 2016). I think the authors should show error distributions across a plate, in many environments, before and after the Zackrisson, Baryshnikova and Pyphe normalization. Moreover they should show that Pyphe normalization, across many environments and timepoints, results in more favorable ROC curves. Now it is not clear that the Pyphe normalization approach represents an advancement.

We fully agree with these arguments. A row/column median normalisation only makes sense if (a) the error follows a row/column pattern, and (b) if the error can be estimated reliably. Except for edge effects, errors do not normally follow row/column patterns requiring a different approach. One approach applied previously to combat non-row/column variation is to create a normalisation surface by convolving the image with a median filter. This assumes that most colonies show no response in the condition and creates problems at the edge where the surface is undefined. A normalisation surface based on control strains is a great solution which makes fewer assumptions and makes results intuitively interpretable. Our implementation of the reference grid normalisation is similar in that we use scipy’s interpolate.griddata function with a cubic interpolation. We do not claim that our implementation is better in the sense that it delivers lower noise levels, and we refer frequently to Zackrisson et al., 2016. However, we have made changes to the original implementation which improve data completeness, slightly increase throughput, and facilitate quality control. These are:

1) We recommend placing two 96 grids on each 1536 plate: one in the top left position and one in the bottom right. This leaves 192 more positions per plate to be filled by strains to be assayed.

2) We have implemented a statistical prediction method for predicting colony sizes in the two missing corners of the plate (bottom left and top right), which allows us to extrapolate the grid to cover the entire plate. Together with point (1), this means that we can predict null-effects for the entire plate without loss of data (key for large libraries which would otherwise have to be re-arranged).

3) We recommend including a third 96 grid of control strains (same as the grid or a mix of a few strains for use as positive controls). This enables easy plate-level quality control.

4) We actively look for grid positions where colonies are missing due to pinning errors, throw a warning and set all neighbouring colonies to NA.

During testing of our normalisation strategy, we noticed the secondary edge effect (Figure 1—figure supplement 2D). This is essentially due to colonies in the second row/column being compared to colonies on the edge. This is an error which follows a clear row/column pattern and can easily be corrected with a row/column median normalisation. But this requires that most of the strains in each row/column show no phenotype (as is usually the case when working with knock-out collections, we specifically now point this out in Appendix 2). Generally, pyphe gives the user the choice of using both normalisations alone or sequentially (or none at all). We think that showing the requested direct comparison between row/column median normalisation and grid normalisation is not necessary, as the superiority of the latter has been well documented in the previous work by Zackrisson et al., 2016. As mentioned above, we would like to directly compare pyphe to scan-o-matics if we can make it work. But generally, we would not expect pyphe to outperform scan-o-matics in terms of an easily measurable parameter like CV, since our changes address other aspects. We have expanded our analysis of wild strain growth curves using an additional 8 conditions as requested, and show details of the normalisation procedure for each (Figure 2—figure supplement 1).

B) The authors do not show the effect of normalization on the growth curves, which I think should be done, e.g. in Figure 2A. Moreover, growth curves should be shown on a log-scale such that the exponential phase can be readily distinguished.

Growth curves are not normalised, only the extracted maximum slopes are. We now show heatmaps of maximum slopes before correction, after grid correction and after additional rcmedian correction for 8 conditions in Figure 2—figure supplement 1. We do not show growth curves in Figure 2 on a log scale because they do not show cell numbers and do not follow an exponential pattern.

C) From a more general perspective: the whole of Figure 2 is based on a single, ideal environment (2% glucose YES) with canonical growth curves, but the conclusions are generalized to the method as such. I am not convinced that this is sufficient. I believe that it is fair to ask for expansion such that also a broad range of non-canonical growth curves are covered – preferentially from environments where growth is limited by other factors than the local glucose concentration. For comparison, Zackrisson et al., 2016 considered six different environments.

We have now expanded this dataset considerably using 8 additional and diverse conditions (Figure 2—figure supplement 1). These have been selected specifically to challenge our hypothesis and to result in as diverse growth dynamics as possible. We use rich media with mixed carbon sources and salt stress. We also use 4 different nitrogen sources of different quality (where glucose is not limiting for growth in poor nitrogen conditions). The correlation between endpoints and colony sizes is >0.9 for all with a median of 0.947.

5) I really, really appreciate the intentions of the authors in trying to extend their set-up to viability staining, which has the potential to count dead cells and resolve birth and death rates. However, I am not convinced that they yet have succeeded in their intentions.

A) The authors do not really consider, or show, deaths over time and does not estimate death rates. Illuminating how death rates changes as a function of growth and environment, or really just highlighting how often it is substantially above zero at different parts of the growth curve, could advance the field substantially, as non-negligible death rates have confounding effects on key microbiological properties (e.g. Frenoy et al., 2018). A time resolved view on death would much improve the paper and we need to see it.

Thank you for raising this point. We have initially used colony redness as a readout for strain profiling, which can be obtained from the same plate with little extra investment. As such, we are more interested in how the readout can be used (e.g. as an input for clustering), which does not require mechanistic knowledge of how exactly this readout manifests itself. However, we agree that this is an interesting question and have now performed new timecourse experiments (new Figure 5). In summary, we find that redness scores are stable for at least 1 day after rapid growth has ended and that knock-out mutants with final redness are more red already at earlier points.

B) None of the method evaluation that is done for growth (Figure 2) is repeated for the viability staining. The reader has no real clue what precision and accuracy looks like over time, how errors are distributed across and within plates, in different environments, at different time points, how well the normalization works, how false positive rates compare to false negative rates etc. Since this is a method paper, I think this is an essential component.

Additionally to the timecourse experiment described above, we have added Figure 2—figure supplement 1 which describe within and between plate variation and normalisation strategies specifically for redness data.

C) From Figure 4D, E, it seems there is a huge variation in the registered colour intensity that is unrelated to whether cells are dead and alive. For much of the dynamic range, the registered redness seems to only reflect noise? And the fraction of live cells in a colony, in this span, has no real quantitative interpretation. The only reliable distinction seems to be the qualitative separation of colonies with many and few dead cells. This drastically reduces the usefulness of the method?

We respectfully disagree with this interpretation of the data. It is surprising to us that a simple scan of the bottom of a whole colony can detect viability to such high accuracy (r=0.88). This simple approach has possible confounding effects, such as distribution of dead cells within the colony (dead cells in thicker parts or on top are less easy to detect). Considering that other methods to assess viability of cells in a colony would usually require picking and resuspension of that colony and assessment by CFU counting, flow cytometry or microscopy, our approach is a game-changer which enables obtaining viability estimates at unprecedented throughput. We discuss caveats in the Discussion and are currently working on solutions to improve the sensitivity of this readout, using other imaging and quantification strategies. However, despite the not perfect correlation with the fraction of viable cells obtained by flow cytometry, colony viability scores show lower CVs and lower FUVs than colony sizes in our hand, making them a reliable and biologically meaningful readout for strain profiling and other functional genomics approaches.

“The only reliable distinction seems to be the qualitative separation of colonies with many and few dead cells.”

Both variables on the x- and y-axis cover the range of intermediate values so no strict binary grouping is apparent to us. We have further checked whether the data is compatible with the reviewer’s interpretation by dividing them into two groups (low and high viability based on FACS) and have computed the correlation to colony redness scores for both separately. Within both groups, the two readouts are still correlated (albeit to a lower extent; Author response image 5). We now mention this grouping in the main text.

Author response image 5
Subgroup analysis of colony staining.

We divided the data into two groups and computed the correlation separately. Both groups still show clear correlation (0.41 and -0.33) which is incompatible with the claim that the method allows a binary classification only. However, the within-group correlation is substantially lower than the overall correlation. Regardless, redness scores in themselves are highly reproducible and precise and therefore present an attractive fitness readout, the use of which does not require a detailed mechanistic understanding.

D) If I understand Figure 4C-E right, there seems to be a strong confounding effect of lysed cells when considering Pyphe colony redness: both lysed and alive cells reduce the redness. Hence, colonies with a high fraction of alive cells and colonies with a high fraction of lysed cells can show similar redness? In harsh environments, where a high fraction of cells are first killed and then lysed I imagine that results therefore will be very hard to interpret? This must be a serious shortcoming as compared to e.g. flow cytometry where lysed and alive cells seems to be well separated?

Indeed, in the flow cytometer, the redness is dead>alive>lysed, with three distinct populations visible. If this was similar in colonies, this would indeed be a problem. However, we propose that the redness in colonies is dead=lysed>alive. Dead cells stain bright red because phloxine B passively enters the cell and is not pumped out as it is in live cells. Lysed cells also cannot pump out the dye so they will be stained in the colony. When running samples on the flow cytometer, cells are resuspended in a buffer which quickly washes the dye out of the lysed cells. For these reasons, the colony redness largely reflects the number of dead and lysed cells. This conclusion is supported by the good correlation we observe between colony redness and the fraction of live cells [alive/(alive+lysed+dead)] by FACS. We also show in Author response image 6 that the correlation of colony redness to (alive+lysed)/(alive+lysed+dead) is weaker, suggesting that lysed cells do contribute to colony redness. We have not explained this issue more clearly in the manuscript text and added Figure 4—figure supplement 2.

Author response image 6
The fraction of live cells (neither burst nor strongly stained in flow cytometer, left panel) better explains the colony redness score than the fraction of strongly stained cells only (right panel).

This suggests that burst cells do contribute to staining in the colony (while being unstained in the flow cytometer). Note that the correlation breaks down for colonies with higher redness scores (which have more burst cells).

E) The authors general conclusion, that there is no overlap between the fraction of dead cells and colony growth, is conceptually very troubling. How can this not be the case, if they really capture population size growth and dead cells respectively? Surely, dead cells do not reproduce: growth, as a consequence, must slow? Or? For example, from Figure 3A, it seems that many colonies grow at a normal rate (i.e. reach an intermediate size), even though only 25% of cells are alive (i.e. the colony redness is 1.25). If, as stated, the detected death is completely disconnected from the detected population size growth, something is fundamentally strange with the detection.

We agree that these are interesting considerations. The simplest explanation is that growth and death are temporally uncoupled. While this does not seem to be the case for the knock-out mutants we investigated it might be the case in other scenarios, e.g. when working with wild strains. Similarly, they could be spatially decoupled. As not all cells in the colony are actively dividing, especially during later growth (Meunier and Choder, 1999) (and most likely in stress conditions), a subset of cells can die without overall colony growth being affected. This is supported by the uneven distribution of redness within the colony (which we currently do not capture with pyphe). Furthermore, colonies could sustain normal growth if viability is sacrificed for growth rate (akin to cells going into ‘overdrive’) (Nakaoka and Wakamoto, 2017). Which of these explanations is true will depend on the strains, conditions and incubation times and they can, of course, occur in combination. We have improved the Discussion based on the above points. We have several ongoing projects investigating these questions (with wild strains and mutants) and believe that pyphe, as it is currently presented, is well suited to give users the tools to explore these questions. Note that we have not calibrated redness scores to absolute viabilities. A redness score of 1.25 therefore does not mean that 25% of the cells are dead.

F) From Figure 3B it is clear that there often is a growth impact of phloxine B and that it depends heavily on the environment. One also wonders how genotype-dependent the impact of phloxine B on growth is? If phloxine B has a large impact on growth in many environments and on many genotypes – isn't there a serious risk for confounding effects when measuring both in parallel?

We do not agree with this interpretation of Figure 3B (and D). We do observe that otherwise identical conditions with and without phloxine cluster very closely together, meaning that the observed patterns in the data are clearly dominated by the condition (and not whether phloxine is used or not). Figure 3D further shows that the correlation between conditions with and without phloxine is consistently very strong (as strong as repeats of identical conditions in different batches). We already do point out the caveat that this does not prove that phloxine has no effect for few gene-condition pairs or for other conditions not tested.

Furthermore, one should consider that a general genotype-dependent impact of phloxine is conceptually not a problem. If phloxine is included in the assay condition and the control condition (as should be done), the genotype-dependent effect of phloxine will be normalised out. Conceptually, the addition of phloxine is just like adding/changing any other component in the media. This can and often will affect growth but is not a problem as long as the media used for the same experiment is the same.

We have now conducted an additional statistical analysis, testing for strain-condition pairs which have different fitness in the same condition with and without phloxine. We have an extremely high number of replicates for rich and minimal media control conditions with and without phloxine as they were repeated across many batches. We find one mutant, trehalose-6phosphate phosphatase Tpp1, which has lower fitness in media with phloxine than without. We now report this hit in the main text, also noting that the statistical non-rejection of the null hypothesis (that phloxine B has no effect) cannot be used as an indication that the null hypothesis is true.

Reviewer #3:

The current manuscript describes a comprehensive pipeline that is a wrapper around already available tools and implements already described approaches (e.g. grid normalisation) and which can be used to analyse imaging data collected using flatbed scanners for high-throughput fitness screens. While the paper is very clear and well written and the code deposited in a public repository appears to be well crafted and documented, I am unsure there is enough novelty in this tool or in the experimental validation reported in the current manuscript to be of interest for the general readership of eLife. If I understand this correctly, there aren't any critical steps implemented in this pipeline which had not been reported or implemented before, which makes me think that a journal where this kind of tools are reported might be a better home for the current manuscript. Unfortunately I lack the expertise in the specific area of high-throughput phenotypic screens to be able to judge whether the substantial work presented here constitutes a technical improvement and a practical tool that might be widely used or rather a step change in the field. Without more competing arguments in favour of the latter and without a clear indication of a novel approach rather than implementation of already described tools and techniques I cannot fully support this manuscript for publication in eLife.

Many earlier sections in this response letter contain detailed comparisons of our strategy against others and several other points specifically deal with the question of novelty. We are confident that these make a convincing case for pyphe.

References:

Galardini M, Busby BP, Vieitez C, Dunham AS, Typas A, Beltrao P. 2019. The impact of the genetic background on gene deletion phenotypes in Saccharomyces cerevisiae. Mol SystBiol 15:e8831.

Ibstedt S, Stenberg S, Bagés S, Gjuvsland AB, Salinas F, Kourtchenko O, Samy JKA, Blomberg A, Omholt SW, Liti G, Beltran G, Warringer J. 2015. Concerted evolution of life stage performances signals recent selection on yeast nitrogen use. Mol Biol Evol 32:153–161.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

All reviewers agree that you and your co-authors have responded adequately to the concerns that were raised and that the paper has matured significantly. That said, we would still recommend addressing two specific points in a bit more detail in the paper, so that the readers are at the very least made aware that these might be potential concerns.

Firstly, we think it is important for growth measures to be as good proxies for population size as possible. Accounting for the non-linearity of optical and cell density is important, even if this is often ignored (because the difference is not huge, or because of technical issues).

We now state: "Image pixel darkness is known to scale non-linearly with true colony thickness/cell number (Zackrisson et al., 2016). Fitness estimates reported by pyphe-analyse are therefore related but not strictly the same as cell counts. If absolute population sizes are required for an experiment, the Scan-o-matic pipeline offers suitable calibration functionalities (Zackrisson et al., 2016)."

Second, there still is some concern about the lack of correlation between measured death and the measured fitness proxy – one reviewer is not sure that this is not due to one or both measures being afflicted by a large error. We understand that this is not easily solved, but believe it is fair to mention it explicitly in the paper as a potential concern.

In the Conclusions we now state: "Explaining the observed disparity between redness and size data should be a priority for future research and the explanation may depend on the strains, conditions, incubation times, or technical factors (or combinations thereof)."

https://doi.org/10.7554/eLife.55160.sa2

Article and author information

Author details

  1. Stephan Kamrad

    1. University College London, Institute of Healthy Ageing, Department of Genetics, Evolution and Environment, London, United Kingdom
    2. The Francis Crick Institute, Molecular Biology of Metabolism Laboratory, London, United Kingdom
    Contribution
    Conceptualization, Resources, Software, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5957-4661
  2. María Rodríguez-López

    University College London, Institute of Healthy Ageing, Department of Genetics, Evolution and Environment, London, United Kingdom
    Contribution
    Conceptualization, Resources, Validation, Investigation, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2066-0589
  3. Cristina Cotobal

    University College London, Institute of Healthy Ageing, Department of Genetics, Evolution and Environment, London, United Kingdom
    Contribution
    Resources, Validation, Investigation, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5877-2228
  4. Clara Correia-Melo

    The Francis Crick Institute, Molecular Biology of Metabolism Laboratory, London, United Kingdom
    Contribution
    Investigation, Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6062-1472
  5. Markus Ralser

    1. The Francis Crick Institute, Molecular Biology of Metabolism Laboratory, London, United Kingdom
    2. Charité Universitaetsmedizin Berlin, Department of Biochemistry, Berlin, Germany
    Contribution
    Conceptualization, Supervision, Funding acquisition, Project administration, Writing - review and editing
    For correspondence
    markus.ralser@crick.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9535-7413
  6. Jürg Bähler

    University College London, Institute of Healthy Ageing, Department of Genetics, Evolution and Environment, London, United Kingdom
    Contribution
    Conceptualization, Supervision, Funding acquisition, Methodology, Project administration, Writing - review and editing
    For correspondence
    j.bahler@ucl.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4036-1532

Funding

Wellcome (095598/Z/11/Z)

  • Stephan Kamrad
  • María Rodríguez-López
  • Cristina Cotobal
  • Jürg Bähler

Wellcome (200829/Z/16/Z)

  • Stephan Kamrad
  • Clara Correia-Melo
  • Markus Ralser

Biotechnology and Biological Sciences Research Council (BB/R009597/1)

  • María Rodríguez-López
  • Jürg Bähler

Medical Research Council (Francis Crick Institute FC001134)

  • Stephan Kamrad
  • Clara Correia-Melo
  • Markus Ralser

Wellcome Trust (Francis Crick Institute FC001134)

  • Stephan Kamrad
  • Clara Correia-Melo
  • Markus Ralser

Cancer Research UK (Francis Crick Institute FC001134)

  • Stephan Kamrad
  • Clara Correia-Melo
  • Markus Ralser

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

Mimoza Hoti helped with the wet lab part of phenotyping the 238 knock-out mutants. This research was funded by Wellcome Trust Senior Investigator Awards to JB [grant number 095598/Z/11/Z] and to MR [grant number 200829/Z/16/Z], as well as a BBSRC Project Grant to JB [grant number BB/R009597/1]. This work was also supported by the Francis Crick Institute which receives its core funding from Cancer Research UK (FC001134), the UK Medical Research Council (FC001134) and the Wellcome Trust (FC001134).

Senior Editor

  1. Aleksandra M Walczak, École Normale Supérieure, France

Reviewing Editor

  1. Kevin J Verstrepen, VIB-KU Leuven Center for Microbiology, Belgium

Reviewers

  1. Kevin J Verstrepen, VIB-KU Leuven Center for Microbiology, Belgium
  2. Jonas Warringer, University of Gothenburg, Sweden

Publication history

  1. Received: January 14, 2020
  2. Accepted: May 21, 2020
  3. Version of Record published: June 16, 2020 (version 1)

Copyright

© 2020, Kamrad et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 470
    Page views
  • 47
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Evolutionary Biology
    2. Genetics and Genomics
    CJ Battey et al.
    Tools and Resources Updated
    1. Genetics and Genomics
    Yasuhiro Kazuki et al.
    Research Article