1. Cell Biology
  2. Computational and Systems Biology
Download icon

Integrating images from multiple microscopy screens reveals diverse patterns of change in the subcellular localization of proteins

Research Article
  • Cited 0
  • Views 1,712
  • Annotations
Cite as: eLife 2018;7:e31872 doi: 10.7554/eLife.31872

Abstract

The evaluation of protein localization changes on a systematic level is a powerful tool for understanding how cells respond to environmental, chemical, or genetic perturbations. To date, work in understanding these proteomic responses through high-throughput imaging has catalogued localization changes independently for each perturbation. To distinguish changes that are targeted responses to the specific perturbation or more generalized programs, we developed a scalable approach to visualize the localization behavior of proteins across multiple experiments as a quantitative pattern. By applying this approach to 24 experimental screens consisting of nearly 400,000 images, we differentiated specific responses from more generalized ones, discovered nuance in the localization behavior of stress-responsive proteins, and formed hypotheses by clustering proteins that have similar patterns. Previous approaches aim to capture all localization changes for a single screen as accurately as possible, whereas our work aims to integrate large amounts of imaging data to find unexpected new cell biology.

https://doi.org/10.7554/eLife.31872.001

eLife digest

Cells are busy structures in which proteins move from one place to the other to perform their roles. To survive, organisms must constantly adapt to changes, such as mutations, new environments, or encountering drugs. Often, these adaptive responses involve groups of proteins traveling to new locations in the cell. Tracking these movements is useful to understand which biological processes cells activate to cope with modifications.

To study these mechanisms, scientists conduct experiments where they expose cells to one type of change – for example they mutate one gene, or give one drug. They then follow the proteins in the cell using powerful microscopes.

These instruments can take thousands of pictures of cells every day, which results in large amounts of data. The images are then analyzed, which is often subjective, time-consuming or cannot easily be repeated on other experiments. This prevents researchers from considering more than one experiment at the time, and it makes it difficult to compare how cells respond across many different types of changes.

Here, Lu et al. combine and analyze 400,000 images from 24 experiments conducted on yeasts; they use a computational method that automatically measures the differences in the locations of the proteins between experiments. This new large-scale approach reveals aspects of cell biology that are not obvious from the results of any one experiment alone. For example, responses that are specific to one type of change can be distinguished from the ones that occur each time cells are exposed to new conditions. It also becomes possible, for the first time, to group together proteins that move in the same way across different types of changes, and to infer their roles. For instance, a protein was shown to be ‘pulsing’ – moving quickly between two specific compartments in the cell – based on how it shared certain features with other pulsing proteins.

This computational approach is not limited to experiments on yeasts, and could be used on studies in human cells as well. Drug companies could then compare at a large scale how different treatments affect protein localization.

https://doi.org/10.7554/eLife.31872.002

Introduction

The ability to control the subcellular localization of proteins has long been understood as an important component of a cell’s regulatory toolkit in response to perturbations (Cyert, 2001; Bauer et al., 2015; Protter and Parker, 2016), such as drug treatments, genetic mutations, or environmental stressors. Towards the goal of systematically characterizing these proteome dynamics, high-throughput technologies have been employed to gather data about protein localization in cells (Yuet and Tirrell, 2014; Dephoure and Gygi, 2012; Nagaraj et al., 2012). Here, we focus on the analysis of images from screens of libraries of yeast strains expressing green fluorescent protein (GFP)-tagged proteins generated using automated high-throughput microscopy (Mattiazzi Usaj et al., 2016; Caicedo et al., 2016). These experiments yield terabyte-scale image datasets (Koh et al., 2015; Riffle and Davis, 2010; Breker et al., 2014) that show changes in the subcellular localization of the proteome in response to varied perturbations (Chong et al., 2015; Tkach et al., 2012; Kraus et al., 2017; Breker et al., 2013).

The growing quantity and diversity of imaging experiments examining protein localization in yeast open the possibility of integrating proteomic information from different image screens. Integrative strategies are widely appreciated for the analysis of high-throughput transcript profiles, such as RNA-seq or microarray data, providing mutual context and hypothesis discovery that would have not been evident in independent analyses of the datasets (reviewed in Dolinski and Troyanskaya, 2015). One well-known example is the analysis of the environmental stress response in yeast; by combining the results of 142 different microarray experiments, Gasch et al. (2000) identified a large set of genes whose transcript levels changed under most environmental stress perturbations as part of a general program. However, transcriptome profiling experiments capture only part of the overall biomolecular activity in a cell. To extend these highly successful strategies for genomic data to a wider perspective, researchers are also interested in integrating information about the proteome. Protein interaction (Greene et al., 2015) and protein expression data (Cenik et al., 2015) have been integrated to reveal insights about tissue- and individual-specific regulatory variation in humans, but the integration of data about subcellular protein localization changes is rare despite the important regulatory role of these changes. Although protein localizations can be observed from fluorescent cell micrographs and have been catalogued qualitatively in databases such as the Human Cell Atlas (Thul et al., 2017) and Uniprot (UniProt Consortium, 2015), the automated comparison of localization changes from these images is challenging.

To integrate localization patterns from multiple screens, scalable data acquisition and analysis methods that produce consistent and quantitative summaries of protein localization changes are required. Using automated fluorescence microscopy, images can be rapidly acquired from many different yeast cell cultures grown in defined growth conditions (Chong et al., 2015; Tkach et al., 2012; Kraus et al., 2017). However, to date, analysis methods based on visual inspection or supervised machine learning have been applied one screen at a time (Chong et al., 2015; Tkach et al., 2012; Kraus et al., 2017). Although these approaches can discover differences in the abundance and localization of proteins in response to perturbations, they have not been applied in integrative analyses: visual inspection is not scalable, and supervised machine learning usually requires at least some retraining for application to new datasets (although recent work endeavours to reduce this [Kraus et al., 2017]). On the other hand, unsupervised analysis of protein localization is expected to be scalable and does not require retraining. Indeed, unsupervised cluster analysis was applied to spatial mass spectrometry data following a single drug treatment to capture changes in relative distribution between the cytoplasm, nucleus, and nucleolus (Boisvert et al., 2010).

Recently, we developed an unsupervised localization change detection algorithm (Lu and Moses, 2016) that requires no training of parameters on each screen, easily scales to large image collections, and describes relative patterns of change quantitatively rather than relying on predefined classes, and is therefore applicable to all subcellular localization patterns. Here, we use this method to analyze datasets generated by the large-scale analysis of the yeast GFP collection (Huh et al., 2003). We exploit the scalability of this method to perform an integrative analysis that combines information about protein localization changes from multiple datasets. We reasoned that this analysis would provide contextual information that would allow us to determine whether a protein localization change is specific to a perturbation or might reflect a more general response, leading to a better understanding of how protein localization changes allow cells to adapt to a wide range of genetic and environmental situations. Using this context, we demonstrate that we can infer novel biology from our analysis.

We present, to our knowledge, the first quantitative integrative cluster analysis of protein localization changes. We show that our data acquisition and analysis methods can scale to analyses of hundreds of thousands of images. We find that protein localization changes are varied in their degree of specialization: some occur only in one perturbation tested, whereas others appear to be more general stress responses. Moreover, we find functionally specific themes for many clusters of protein localization change profiles, consistent with the known biological effects of the perturbations. We show that although some protein localization changes are accompanied by transcriptional or protein abundance changes, subcellular localization is in general an independent layer of regulation, and that shared patterns of localization changes cannot be explained simply by physical interactions or subcellular compartments.

Results

An unsupervised analysis of protein localization changes in more than 280,000 images

In this work, we sought to understand protein localization changes across a wide range of perturbations. To achieve this, we made use of datasets that are available in the CYCLoPs database (Koh et al., 2015): 15 image-based screens of the yeast GFP-fusion collection (Huh et al., 2003). These images show protein localization in untreated wild-type cells (three replicates), in an rpd3Δ deletion strain (three replicates), and three time-points each of wild-type cells subjected to rapamycin (RAP), hydroxyurea (HU), and α-factor (αF) treatment (Chong et al., 2015; Kraus et al., 2017). We also included data from two independent screens of the GFP-fusion collection in strains deleted for IKI3, a component of the elongator complex and a histone acetyltransferase that functions as part of the RNA polymerase II holoenzyme (Wittschieben et al., 1999). Altogether, this dataset encompasses 4143 individual yeast GFP-fusion proteins for a total of 281,724 images and a total of 15.5 million cells used.

Figure 1 summarizes our method for representing protein localization changes across image screens. First, we start with two sets of micrographs of yeast cells, one of cells under untreated wild-type conditions and another under perturbation, where the protein of interest has been tagged with GFP, and where a cytosolic RFP has been expressed to facilitate cell identification. These images have been segmented into single cells (Figure 1(i)). We bin single cells into five bins by their stage in the cell cycle (Figure 1(ii)), and then by whether the cell is a mother or a bud cell (Figure 1(iii)), resulting in 10 independent bins in total; this binning strategy allows us to determine whether protein localization changes are dependent on cell cycle or type.

Figure 1 with 1 supplement see all
Summary of the methods used in this work.

In step (i), we segment micrographs into single cells. We divide single cells into 10 bins in step (ii), first into five bins by cell cycle stage, and then by mother or bud cell type. We calculate five features for all single cells in step (iii), and report the truncated average of each bin for each feature to form the protein localization profile (see Materials and methods). To compare the untreated wild-type protein localization profile to the perturbation protein localization profile, we apply an unsupervised localization change detection algorithm in step (iv), which outputs a protein localization change profile comprised of a z-score for each feature in the protein localization profile, reporting how significant the difference between the wild-type and perturbation is relative to expectation for each feature. Finally, we concatenate different protein localization change profiles with each other in step (v) to integrate profiles from different image screens. Steps (i) to (iv) are previously published methods, and are described in the Materials and methods section. We represent profiles as heat maps in this work (see the Figure 2 legend for details).

https://doi.org/10.7554/eLife.31872.003
Figure 2 with 3 supplements see all
A clustered heat map of protein change profiles.

The right panel shows the 1159 protein change profiles with the strongest values, grouped by similarity. In this visualization, rows represent the protein change profiles for individual proteins, whereas columns represent feature measurements, with each set of 50 measurements corresponding to an image screen. Positive values are blue, whereas negative values are yellow, with the intensity of color corresponding to the magnitude of the value, as shown in the color bar. Grey values are missing data. The headers indicate the image screens that sets of features correspond to, abbreviated using WT to indicate the untreated wild-type screens, ΔRPD3 for the rpd3Δ replicates, RAP for time points of the rapamycin treatment, HU for time points of the hydroxyurea treatment, αF for time points of the α-factor treatment, and ΔIKI for the iki3Δ replicates. The dendrogram depth shows similarity between connected protein profiles or groups of profiles. We highlight examples of strong patterns of protein change profiles in yellow, with some clusters that we have annotations for labelled from A to T, with labels and enrichments for some clusters presented in Table 1. In the four boxes on the left, we show examples of localization changes found in our clusters of protein change profiles. The images are representative cropped micrographs of yeast cells, where the protein named in the top left corner of each box has been tagged with GFP (shown as the green channel). The blue lines in the images show the boundaries drawn between cells by our single-cell segmentation algorithm, the small white circles between cells indicate mother-bud relations, and the white meshed regions indicate areas that have been ignored by our image analysis because they are likely to be artifacts or mis-segmented cells.

https://doi.org/10.7554/eLife.31872.005
Table 1
Annotations for some labelled clusters presented in Figure 1.

The letter in the cluster column corresponds to the letters assigned to clusters in Figure 2. Selected significant gene ontology enrichments are listed with their respective GO accession numbers and p-values, as well as some arbitrarily chosen examples of proteins. Clusters that are not associated with a GO accession number are manual annotations. Under the ‘Proteins with annotation’ column, we report the number of proteins with the GO annotation versus the total number of proteins in that cluster.

https://doi.org/10.7554/eLife.31872.010
ClusterPerturbation(s)AnnotationP-valueProteins with annotationExamples
Aα-factor, iki3ΔCytoskeleton-dependent cytokinesis [GO:0061640]3.77E–078/12Bud3, Bud4, Cdc10, Cdc11, Hof1, Myo1
Cell cycle
[GO:0007049]
9.21E–0511/12
DHydroxyureaCell cycleCdc28, Cdc7, Clb2
ERapamycin, hydroxyureaRNA metabolic process [GO:0016070]6.70E–0415/18Msn2, Dot6, Rtg1, Rtg3, Enp1, Tsr1, Stb3, Utp20
Stress-responsive transcription factorsMsn2, Dot6, Rtg1, Rtg3
FHydroxyurea, α-factorExtrinsic component of vacuolar membrane [GO:0000306]0.0288233/19Iml1, Vac14, Tco89
DNA replicationPol12, Mcm7, Ctf18
Grpd3Δ, hydroxyureaTriglyceride lipase activity [GO:0004806]0.0292392/6Tgl4, Tgl5
Metabolic activityTgl4, Tgl5, Rnr3, Gdb1
HRapamycinNegative regulator of hydrolase activity [GO:0051346]2.70E-044/9Stf1, Inh1, Yhr138c, Pbi2
Enzyme inhibitor activity [GO:0004857]0.0014414/9
Jrpd3ΔAnnotated by Chong et al. (2015) to localize away from cytoplasm in rpd3ΔAcs1, Rad7, Mni1, Yjr008w, Ycr061w, Pab1, Pus4, Yor292c
Liki3ΔRibosomal subunits--Rps9a, Rpl13b, Rps0b
MRapamycinCytosolic ribosome [GO:0022626]0.0108794/5Rpl23a, Rpl40a, Rpl40b, Rps10a
Ribosomal subunits
Qα-factorGolgi vesicle transport0.0054355/7Apl2, Avl9, Gyl1, Bch1, Gyp5
Cellular bud tip9.28E–065/7Rgd2, Yhr182w, Avl9 Gyl1, Gyp5
Rrpd3Δ, rapamycin, iki3ΔVacuole [GO:0005773]9.39E–057/7Bap2, Itr1, Hxt2, Aqr1, Dip5
Ion transmembrane transporter activity [GO:0015075]0.0205855/7
SRapamycin, hydroxyureaBounding membrane of organelle [GO:0098588]0.0277618/12Ams1, Kch1, Ymd8, Pho91, Mnn2, Och1, Kre2, Gnt1
Protein glycosylation [GO:0006486]0.0207864/12Mnn2, Och1, Kre2, Gnt1
Tα-factorCellular response to pheromone [GO:0071444]5.37E–045/9Fus1, Kar4, Ste2, Aga2, Crz1

For each single cell, we measure a set of five interpretable features (Figure 1[iv]) that describe the average distribution of GFP-tagged proteins relative to certain cell landmarks such as the cell center or cell edge (Handfield et al., 2013). We average these measurements for single cells within each of the ten bins described above, resulting in a 50-feature vector that we term the ‘protein localization profile’. As the name suggests, the protein localization profile represents the average localization of the GFP-tagged protein inside of a cell for a given image screen.

Previously, we have shown that these protein localization profiles are not directly comparable from image screen to image screens because of the presence of ‘global effects’, or systematic biases in image screens (Lu and Moses, 2016). To correct for this, we use an unsupervised localization change detection method (Lu and Moses, 2016) that compares the difference between a pair of protein localization profiles to a local expectation of the global effect for each protein (Figure 1[v]). The unsupervised localization change detection method converts a pair of protein localization profiles into a ‘protein localization change profile’ (or more simply a ‘protein change profile’), which represents the expectation of protein localization change for a protein between two image screens. In this work, the protein localization change profile is a vector of 50 z-scores, each corresponding to a feature in the protein localization profile and describing the direction and magnitude by which each feature deviates from the expectation of there being no localization change.

We apply this method to all proteins and image screens. For the work that we present here, we compared all other image screen to an untreated wild-type screen that we designated as our reference untreated wild-type (labelled as WT2 in figures, corresponding to its database label in CYCLoPS [Koh et al., 2015]). For each protein, we concatenated protein localization change profiles to integrate results from different image screens (Figure 1 [vi]). We found that the unsupervised localization change detection method predicts reproducible localization changes between replicate image screens: the magnitude of the protein change profile is strongly correlated between replicates of the same perturbation (Figure 1—figure supplement 1A and B), but weakly correlated between different perturbations (Figure 1—figure supplement 1C and D).

Finally, we grouped proteins by clustering our concatenated protein change profiles. The heat map in Figure 2 shows clusters formed by the 1159 profiles with strong signals in at least three features. We found that the concatenated vectors are readily interpretable. Because the protein localization change profile is a relative measure, the profiles generally have large values for image screens in which the protein changes localization, and low values for those in which it does not.

We manually evaluated images for localization changes predicted by our protein change profiles for five distinct clusters and recorded the proportion of proteins with localization changes visible to a human evaluator (Supplementary file 1). In general, we observed that clusters that had strong signals in their protein change profiles had a higher ratio of visible localization changes; Clusters E and J in Figure 2 have an average protein change profile magnitude of 33.5 and 34.5 for the perturbations that are predicted to change their localization, and contain 9/17 and 18/25 visible localization changes, respectively. Proteins that did not have visible localization changes within these clusters tended to have weaker signals (proteins without an asterisk in Figure 2—figure supplement 1). By contrast, clusters with more moderate signals in their protein change profiles contain fewer localization changes that are visible to the human eye, or more ambiguous or borderline protein localization changes. Cluster F in Figure 2 has an average protein change profile magnitude of 14.3, and contains 2/19 visible localization changes, with a further 9/19 localization changes being too ambiguous to determine. The localization changes within this cluster were visually subtle, involving changes from a round nuclear phenotype in the untreated wild-type to an irregular ‘fluffy’ nuclear phenotype in the perturbation (Tma16 and Ysh1 in Figure 2—figure supplement 2B).

Clustering of multi-perturbation z-score vectors reveals shared and specific localization changes that are supported by the literature

To investigate the hypothesis that our clusters of protein localization change profiles contain proteins that are linked functionally, we evaluated clusters that showed strong signals for perturbations. We ignored clusters that had strong signals in the other untreated wild-type screens (WT3 and WT1 in Figure 2), as these proteins were predicted to change localization between untreated wild-type screens. These localization changes can occur for a variety of reasons, including inherent protein variability (Handfield et al., 2015), day-to-day variation, or strain-specific variation; because localization changes for these proteins may not be specifically caused by the perturbation, we exclude them from our analysis. Table 1 shows GO ontology enrichments for clusters from Figure 2. In this section, we provide examples in which our clustering reveals known biology.

We reasoned that the rapamycin and hydroxyurea treatment profiles would share localization changes, because both drugs induce yeast stress responses (Loewith and Hall, 2011; Koç et al., 2004). We found a cluster of protein change profiles (Figure 2E, Table 1E) that predicted localization changes in both drug treatments. Within this cluster, we found stress-responsive proteins such as Msn2, a master regulator of the yeast general environmental stress response (Gasch et al., 2000). In addition, this cluster also contained proteins that are involved in ribosomal biogenesis and rRNA processing, consistent with previous work showing repression of ribosomal subunit gene transcription following the environmental stress response (Gasch et al., 2000). We observed some subtle differences in the protein change profiles of particular proteins in this cluster; for example, while the relocalization of Msn2 was shared between the rapamycin and hydroxyurea perturbations, the relocalization of Tsr1, a protein predicted to move from the cytoplasm to the nucleolus following stress (Lee et al., 2014), was shared between the rpd3Δ, rapamycin, and hydroxyurea perturbations. In another example, Msn2 was predicted to differ in localization relative to the reference untreated wild-type in all three time points of hydroxyurea treatment but in only the first time point following growth in the presence of rapamycin. By contrast, Stb3, a repressor of growth in response to stress that is known to be regulated by relocalization between the nucleus and cytoplasm (Liko et al., 2010), was predicted to differ in all time points of both perturbations.

Not all stress-related proteins clustered together; for example, Lap4, a hydrolase that relocates from the cytoplasm to the vacuole under starvation conditions (Suzuki et al., 2002), was predicted to relocalize in the rapamycin and rpd3Δ perturbations, but not in the hydroxyurea perturbation. For this reason, Lap4 clustered with other proteins that were predicted to relocalize in the rapamycin and rpd3Δ perturbations instead (Figure 2R, Table 1R). Our profile analysis of Lap4 is consistent with the image data. In the untreated wild-type, the protein is weakly cytoplasmic, with a small fraction of cells showing the protein in cytoplasmic foci. Under rapamycin and rpd3Δ perturbation, the protein is localized to the vacuole with a visually noticeable increase in the frequency of cells that have cytoplasmic foci (see Lap4 images in Figure 1).

We also found that sets of cell-cycle proteins could be distinguished by their protein change profiles. Treatment of yeast cells with hydroxyurea or α-factor induces cell cycle arrest during the S- and G1-phases, respectively (Day et al., 2004; Udden and Finkelstein, 1978). We found that some proteins involved in DNA replication (Pol12, Mcm7) and in the regulation of transcription at the G1/S transition (Swi6) had strong patterns of signals in their protein change profiles for both the hydroxyurea and α-factor perturbations (Figure 2F, Table 1F). The Mcm complex, including Mcm7 which we show as an example in Figure 1, is known to relocalize according to cell-cycle stage (Nguyen et al., 2000). Consistent with this prior knowledge and the predictions made by our protein change profiles, images of Mcm7 in the untreated wild-type cells showed a heterogeneous mixture of cytoplasmic or nuclear-localized protein. In both hydroxyurea and α-factor perturbations, the localization was almost entirely nuclear, with the distribution of the nuclear-localized protein becoming irregular compared to the circular untreated wild-type nuclear distribution. A different cluster of proteins with strong patterns of signals for the α-factor perturbation also includes cell cycle proteins (Figure 2A, Table 1A); instead of being shared with the hydroxyurea perturbations, some of these localization changes are predicted to occur under iki3Δ perturbation too, consistent with the role of iki3Δ in regulating sensitivity to G1 cell-cycle arrest (Butler et al., 1994).

In addition to clusters that have patterns that are shared between multiple perturbations, we also found clusters with patterns of strong signals that are specific to only one perturbation. α-factor is a pheromone that induces the yeast mating response. We find a cluster of proteins involved in the mating response (Figure 2T, Table 1T), all of which have a protein change profile that predicts a relocalization specific to α-factor treatment, such as changes in the localization of the transcription factor Kar4, which is induced as a regulator of downstream events in the mating response (Lahav et al., 2007), and of the transcription factor Crz1, which has been shown to relocate from the cytoplasm to the nucleus during bursts of calcium induced by the mating pheromone (Carbó et al., 2017). Images for Crz1 in the untreated wild-type versus the α-factor screens (shown in Figure 1) confirm that although the protein is predominantly cytoplasmic in the untreated wild-type, a proportion of the cells under α-factor perturbation have Crz1 that is also localized to the nucleus. As another example of a cluster with profiles that predict relocalizations under one perturbation only, we found a cluster specific to rapamycin perturbation that includes proteins that regulate hydrolase activity (Figure 2H; Table 1H), which may relate to the autophagic processes induced by rapamycin (Alvers et al., 2009).

A list of proteins that change for each perturbation is available in Supplementary file 2, curated by selecting clusters of protein change profiles that have strong coordinated signals from the heat map in Figure 2.

Clustering of protein localization changes reveals new patterns of protein relocalization and enables development of hypotheses about protein function

The extent to which we can interpret our clusters within known information from the literature varies. For example, while the mating response is well-studied and provides a clear basis to validate localization changes predicted by our method, other clusters of predictions were relatively novel. By integrating the observations made possible by our exploratory analysis of protein change profiles with a closer qualitative analysis of images of the proteins implicated in these clusters, the data can be mined to develop informed hypotheses for experimental follow-up. In this section, we describe three examples of this process.

Stress-responsive transcription factors that exhibit stochastic pulsatility may be controlled by the same mechanism

We observed a small cluster of three transcription factors, Msn2, Dot6, and Rtg3, with patterns shared between the hydroxyurea and rapamycin perturbations. All three proteins exhibited a protein change profile that indicates that they were becoming more compact in distribution and closer to the cell center following both hydroxyurea and rapamycin treatment (Figure 3A). Evaluating the images (Figure 3B) confirmed that these proteins moved from the cytoplasm to the nucleus following these perturbations. The import of Msn2, Dot6, and Rtg3 into the nucleus is expected, as previous work has shown that all three move to the nucleus following inhibition of TORC1 kinase by rapamycin treatment (Loewith and Hall, 2011). Furthermore, Msn2 and Dot6 were found to localize to the nucleus in previous manual assessment of images following hydroxyurea exposure (Tkach et al., 2012).

(A) Heat map of the protein change profiles and (B) cropped micrographs of the GFP collections strains for stress-responsive transcription factors Msn2, Dot6, and Rtg3.

Refer to Figure 2 for information on the heat map visualization of the protein change profiles and micrograph images. In (B), we show cropped micrographs of yeast cells in which Msn2, Dot6, and Rtg3 have been fused with GFP (green), under standard medium (WT2), rapamycin treatment for 60 min (RAP 60m) and 220 min (RAP 220m), and hydroxyurea treatment for 80 min (HU 80m) and 160 min (HU 160m). Cropped micrographs labelled with an asterisk (*) have had their image intensities clipped at their lower range to remove noise.

https://doi.org/10.7554/eLife.31872.012

We detected interesting coordinated time dependencies in these responses. All three transcription factors were localized to the nucleus throughout all three time points of hydroxyurea perturbation, but strongly localized to the nucleus at the first time point of rapamycin perturbation before returning closer to the phenotype for untreated cells in later time points. Although there were subtle differences between the localization patterns of the three proteins (for example, Msn2 only diminishes in relative localization to the nucleus rather than returning fully to the cytoplasm as we see in the other proteins), the general trend was preserved.

These similarities in their protein change profile patterns led us to hypothesize that Msn2, Dot6, and Rtg3 may be regulated by a common mechanism. Msn2 is the most studied of the three transcription factors, and exhibits a stochastic oscillation (or pulsing) between the cytoplasm and nucleus that is regulated by the cAMP-protein Kinase A (PKA) pathway (Jacquet et al., 2003). Moreover, Msn2 pulsing responds to stress in a quantitative manner, with low stress levels inducing a cytoplasmic steady state, high stress inducing a nuclear steady state, and only intermediate levels of stress inducing pulsing (Jacquet et al., 2003); rapamycin treatment specifically induces nuclear accumulation of Msn2 (Beck and Hall, 1999). Proteome-wide screens have revealed comparable dynamics for Dot6; whereas Rtg3 was not found to have pulsatile dynamics (Dalal et al., 2014).

Given our discovery that Rtg3 shares a protein change profile with Dot6 and Msn2, we decided to revisit the condition-specific dynamics of Rtg3 nuclear localization. To test whether Rtg3 also shows condition-specific pulsing dynamics, we produced time-lapse movies of Rtg3 in standard growth media and in medium containing rapamycin.

We observed Rtg3-GFP protein pulses (in both the GFP-collection strain and in an independently constructed Rtg3-GFP fusion strain, see Materials and methods) during growth in standard medium. We found that, as predicted by our cluster analysis, rapamycin treatment increased the duration of Rtg3 accumulation in the nucleus. In standard medium, single cells showed frequent oscillation of Rtg3 between the nucleus and cytoplasm, while under rapamycin treatment, single cells typically showed a prolonged period of nuclear localization in our movies. Quantification of the dynamics of Rtg3-GFP pulsing (Figure 4) revealed that the average period of nuclear localization for rapamycin-treated cells (175 min) was significantly greater (p=0.0003, single-tailed t-test) than the average for untreated cells (123 min). We repeated this experiment with a lower exposure time (400 ms rather 700 ms) and similarly found that the average period of nuclear localization for rapamycin-treated cells (216 min, n = 53) was also significantly greater (p=0.0103, single-tailed t-test) than the average for untreated cells (168 min, n = 41).

Single cell localization dynamics for Rtg3 for (A) a cell with frequent oscillations between the nucleus and cytoplasm, representative of typical untreated cells, and for (B) a cell with a single prolonged pulse, representative of typical rapamycin-treated cells.

(C) A histogram of the average duration of nuclear localization of Rtg3 for single cells treated with rapamycin versus untreated cells. For rapamycin-treated cells, rapamycin is added at 0 min. To quantify the localization of Rtg3-GFP in single cells, we compute a localization score (expected nuclear intensity, see Materials and methods), which is higher when there is more nuclear-localized protein and lower when there is more cytoplasmic protein. We track these scores for each single cell over each frame of our movie to track the cells over time. Plots show the localization score over time. We show cropped stills from the time-lapse movie showing the state of the cell at various timepoints (labels show the timepoint in minutes).

https://doi.org/10.7554/eLife.31872.013

Sets of ribosomal subunits exhibit perturbation-specific responses

We found evidence of localization changes for many ribosomal subunits, with their protein change profiles predicting that subsets of ribosomal proteins may respond specifically to one perturbation. As an example, we show a set of ribosomal protein subunits in Figure 5 that had a pattern of strong signals in its protein change profile in the last two timepoints following rapamycin treatment, with some proteins showing a redistribution from the nucleus to the cytoplasm in the images. These perturbation-specific localization changes for particular ribosomal protein subunits contrast with the more general transcriptional response to stress of ribosomal protein genes (Gasch et al., 2000).

(A) Heat map showing protein change profiles, and (B) cropped micrographs for ribosomal subunits belonging to a cluster that responded specifically in rapamycin.

Refer to Figure 2 for information on the heat map representation of the protein change profiles and micrograph images. In (B), we show cropped micrographs where the labelled protein has been respectively fused with GFP (green), in standard media (WT2) and after 220 min of rapamycin treatment for the proteins highlighted in blue in (A). Cropped micrographs labelled with an asterisk (*) have had their image intensities clipped at their lower range to remove noise.

https://doi.org/10.7554/eLife.31872.014

A growing body of research suggests that specific ribosomal subunits may have extra-ribosomal functions ranging from ribosomal biogenesis to DNA repair and adhesive growth (Lu et al., 2015). For example, Rpl40a and Rpl40b are ubiquitin-ribosomal protein fusions that contribute to Rps27b pre-rRNA maturation in the nucleolus by the cleaving of the ubiquitin-fusion precursor and to Rps60 maturation by export of the protein into the cytoplasm (Fernández-Pevida et al., 2012). We found that Rpl40a and Rpl40b both redistributed from the nucleus to the cytoplasm following prolonged rapamycin treatment (Rpl40a is shown as an example in Figure 4b), which may reflect the reduction in ribosomal biogenesis and rRNA synthesis and the processing activity caused by rapamycin (Stauffer and Powers, 2015).

Membrane proteins exhibit a reciprocal pattern of localization changes following hydroxyurea or rapamycin treatment

We observed a cluster of proteins that had strong protein change profile patterns in both the rapamycin and hydroxyurea perturbations, but with different patterns in each perturbation. Following rapamycin treatment, the direction of localization changes indicated that the proteins were becoming closer to the cell center and further from the cell edge relative to those in the untreated cell, whereas in following hydroxyurea treatment, the same proteins were predicted to relocalize further from the cell center and closer to the cell edge. Evaluation of the images of cells expressing these proteins (Figure 6) showed that most were either plasma-membrane- or Golgi-localized proteins. In untreated cells, these proteins were localized to both the membrane or Golgi and the vacuole; following hydroxyurea treatment, the distribution of protein shifts towards the membrane, whereas following rapamycin treatment, the distribution shifted towards the vacuole, confirming the prediction of our protein change profiles.

(A) Heat map of the protein change profiles and (B) cropped micrographs for a cluster of proteins with reciprocal protein change profiles between hydroxyurea and rapamycin treatment.

Refer to Figure 2 for information on the heat map representation of the protein change profiles and micrograph images. In (B), we show cropped micrographs of yeast cells in which Tat2, Och1, and Flc1 (highlighted in blue in [A]) have been tagged with GFP (green), after 220 min of rapamycin treatment (RAP 220m), growth in standard media (WT2), and after 160 min of hydroxyurea treatment (HU 160m). Cropped micrographs labelled with an asterisk (*) or a plus sign (+) have had their image intensities clipped at their lower range or higher range, respectively, to remove noise and to better visualize protein localization.

https://doi.org/10.7554/eLife.31872.015

Cluster associations found by our unsupervised exploration of localization changes are complementary to other high-throughput experiments

Next, we explored the relationship between our protein change profile clusters and other high-throughput datasets.

First, we compared the associations between proteins found in our protein localization change analysis to those found in transcript and protein abundance data, by comparing the protein profile similarities in each data source. We expect our clusters to overlap partially with the clusters retrieved from these other datasets, such as in cases where protein localization changes are being driven by or forming a feedback loop with transcript and abundance changes, but we anticipated that at least some protein localization changes are an independent layer of regulation. We constructed profiles for transcript changes and protein abundance changes for the perturbations that we tested in Figure 2, and compared the pairwise distance between proteins for these profiles with those for our protein localization change profiles. Figure 7A and Figure 7B show scatterplots of the transcript and protein abundance profilesimilarities, respectively, against the protein localization change profile similarity.

Figure 7 with 1 supplement see all
Scatterplots of the protein change profile similarity compared to (A) the transcript profile similarity and (B) the protein abundance profile similarity.

We show protein–protein pairs between all proteins that had strong signals in either profile (see Materials and methods for details). Protein–protein pairs are shown as grey dots, and we circle all protein–protein pairs that belonged to a cluster identified in Figure 2. Some circled protein–protein pairs are color-coded depending on the cluster they belonged to, for some clusters of interest (refer to legend in figures). For the transcript profile similarity scatterplot in (A), we also visualize all protein–protein pairs between ribosomal subunits as blue dots.

https://doi.org/10.7554/eLife.31872.016

We found little correlation between protein localization change profile similarity and transcript profile similarity (Figure 7A). Some clusters had more proteins with relatively small changes (low distances) in their transcription change profiles. Cluster T, which is enriched in proteins involved in the pheromone response (Table 1), had several protein–protein pairs with low localization change profile and transcript change profile distances (orange circles in Figure 7A), possibly due to the induction of these genes or proteins as a specific response to α-factor treatment. By contrast, Cluster S, which contains membrane and Golgi proteins with reciprocal localization changes, had higher transcription change profile distances for all pairs of proteins (purple circles in Figure 7A), suggesting that the concerted protein localization change behavior of these proteins may not occur at the transcriptional level. Finally, because we had previously detected unusual nuclear localizations for ribosomal subunits under specific perturbations, we decided to evaluate specifically the ribosomal subunits in this comparison (blue points in Figure 7A). In general, most pairs of ribosomal subunits have similar transcript change profiles, consistent with the transcriptional downregulation of ribosomal subunits globally as part of the environmental stress response (Gasch et al., 2000). By contrast, very few pairs of ribosomal subunits have low protein localization change profile distances, supporting our suggestion that localization changes in ribosomal subunits may be a more specific layer of regulation.

We found a greater correlation between protein localization change similarity and protein abundance change profile similarity, although some of this correlation may stem from the calculation of both profiles from the same dataset (see Materials and methods). Although many proteins that show similar localization change profiles have similar protein abundance change profiles, there are vast numbers of proteins with similar abundance change profiles that do not show similar localization change profiles. This suggests that similarity in protein localization changes is much more specific than similarity in protein abundance. We found that two clusters of protein localization change profiles had low distances in their protein abundance change profiles relative to the distribution of protein abundance change profile distances: Cluster J, which contains proteins that change abundance specifically in the Δrpd3 perturbation, and Cluster E, which contains stress-response proteins (Table 1).

To investigate the co-occurrence of protein localization changes and protein abundance changes directly, we plotted the magnitude of the protein change profile against the average protein abundance change for various screens in Figure 7—figure supplement 1. For all screens, we observed that in general, the strongest magnitude protein localization changes had low protein abundance changes; however, the Δrpd3 screen has a higher ratio of proteins with both large protein abundance and protein localization changes than the other perturbations. Finally, we directly compared the protein localization change profile with the transcript change and protein abundance change profiles for three clusters (Figure 1—figure supplement 1), and observed that except for Cluster J, most proteins with high protein localization change profile signals have weak transcription change and protein abundance change profiles. We note that the lack of correlation between the profiles cannot be explained by false positives alone, because the lack of correlation persists even for proteins for which we could manually evaluate a protein localization change (denoted with an asterisk (*) in Figure 1—figure supplement 1).

Next, we asked whether our clusters of protein localization change profiles were dominated by interacting proteins from the same complex. We hypothesized that the underlying causes behind localization changes were diverse, and that coordinated localization changes would not necessarily require that the participating proteins had to interact directly. To test overlap with physical protein–protein interactions, we evaluated the enrichment of our clusters of protein localization changes with high-quality physical interactions from the BIOGRID database (Oughtred et al., 2016), and all ‘binding’ protein associations in the STRING database (Szklarczyk et al., 2017) (Figure 8A). We observed that some clusters were enriched in protein–protein interactions in these databases: for example, Cluster A, which is enriched for cell-cycle proteins involved in cytokinesis (Table 1), was strongly enriched in both databases, and Cluster M, which contains ribosomal subunits, was strongly enriched in STRING. However, most clusters were not enriched, or only weakly enriched for protein–protein interactions.

(A) Comparison of protein localization change profile clusters with physical protein–protein interactions and (B) localization in untreated wild-type cells.

(A) The enrichment of each protein localization change profile cluster that we identified in Table 1 for physical interactions in the BIOGRID and the STRING databases, measured as z-scores indicating degree of enrichment relative to null expectation. (B) The distribution of manually annotated subcellular localization for untreated wild-type proteins for proteins in Clusters A, E, F, J, S, and T.

https://doi.org/10.7554/eLife.31872.018

Finally, we evaluated the degree to which proteins in our clusters of protein change profiles shared subcellular localizations in untreated wild-type cells. Previously, we observed a cluster of proteins in different compartments that exhibited similar behavior in response to perturbation: in the cluster of proteins that had strong protein change profile patterns in both the rapamycin and hydroxyurea perturbations, despite proteins being localized to the cell membrane and Golgi respectively, these proteins all move towards the vacuole under rapamycin treatment, and more to their specific localization under hydroxyurea treatment. These results suggested to us that our analysis was grouping proteins on the basis of shared trends in protein movement rather than simply grouping co-localized proteins moving from one compartment to another. To investigate the prevalence of clusters formed of proteins with different untreated wild-type localizations, we show the distribution of protein localization in the untreated wild-type for several clusters as pie graphs in Figure 8B, using manual annotations for localization classes for yeast proteins (Huh et al., 2003).

We find that clusters vary in their degree of heterogeneity in untreated wild-type localization. Some clusters are homogeneous: Cluster A, which was enriched for cell-cycle proteins, consists primarily of bud-neck localized proteins, and Cluster E, which included stress-responsive transcription factors (Table 1), consists primarily of proteins localized to the cytoplasm, nucleus, or cytoplasm or nucleus. As expected, Cluster S, which included the membrane and Golgi proteins that had strong protein change profile patterns in both the rapamycin and hydroxyurea perturbations, is highly heterogeneous. Cluster J, which contains proteins that change localization specifically in the Δrpd3 perturbation, is also highly heterogeneous. We also observed that Cluster T, enriched for proteins involved in pheromone response, consisted of an even mix of nuclear- and vacuole-localized proteins; this result reflects the biological process of mating, which includes both proteins that mediate the initial transcriptional response to pheromone, such as Kar4, as well as downstream factors such as Fus1, a cell fusion protein that facilitates vacuole mixing (Nolan et al., 2006), which is a later step in the mating process.

Exploratory analysis of kinase deletion mutant screens identifies new patterns of localization change

Because the regulation of cellular processes by protein kinases is an area of major research focus in cell biology, we created a new set of image screens for seven kinase deletion mutants: elm1Δ, hal5Δ, hsl1Δ, kin1Δ, kin2Δ, mck1Δ, and vhs1Δ. Phosphorylation is a well-characterized mechanism for regulating protein localization (Bauer et al., 2015), so we anticipated that we would identify localization changes in these mutants. We found technical challenges in this dataset. First, the screens did not proliferate well, resulting in the inclusion of fewer cells (the average number of cells used for these screens was ~630,000, compared to an average of 1.1 million cells for the perturbed screens in our previous analysis), and the elm1Δ showed a dramatic morphologic change, such that cells grew in long chains (Edgington et al., 1999). Second, we did not collect replicates for these screens, making it more challenging to confirm the reproducibility of the signals that we observed. However, we were still able to find clusters containing proteins of interest, and we highlight two such clusters below. Data for all protein localization change profiles for the kinase deletion screens can be found in Supplementary file 3.

We found a cluster of proteins predicted to have localization changes for elm1Δ only (Figure 9A). This cluster included several stress-responsive proteins, such the proteins Apj1 and Pph21, both of which have been found to form nuclear foci under DNA replication stress (Tkach et al., 2012). We found that a nuclear foci localization is observable for these proteins under elm1Δ, which is not seen in the wild-type (Figure 9B). Moreover, we also found that Gre3, which has been found to increase in abundance in response to DNA replication stress (Tkach et al., 2012), localized to the nucleus in elm1Δ relative to its cytoplasmic localization in the wild-type.

(A) Heat map showing protein change profiles, and (B) cropped micrographs for proteins in a cluster that responded specifically in elm1Δ.

Refer to Figure 2 for information on the heat map representation of the protein change profiles and micrograph images. In (B), we show cropped micrographs in which the labelled protein has been respectively fused with GFP (green), and the elm1Δ mutant for proteins Apj1, Gre3 and Pph21. Cropped micrographs labelled with an asterisk (*) have had their image intensities clipped at their lower range to remove noise.

https://doi.org/10.7554/eLife.31872.019

A cluster of proteins predicted to change localization predominantly in hal5Δ (Figure 10A) included some ribosomal subunits. Parallel to the observation described above in which we found that some ribosomal subunits showed specialized behavior in response to rapamycin treatment, we find that a different set of ribosomal subunits exhibit a stronger localization in the nucleus under hal5Δ perturbation relative to wild-type. Rps23a and Rpl7b are shown as examples in Figure 9B. We also observed that some nuclear transporters increased in their distribution to the nucleus specifically in the hal5Δ perturbation. Syo5 specifically facilitates the nuclear import of Rpl5 and Rpl11 (Kressler et al., 2012), whereas Nmd5 imports various proteins into the nucleus, including the transcription factors TFIIS (Albertini et al., 1998) and Crz1 (Polizotto and Cyert, 2001); both are shown as examples in Figure 10B.

(A) Heat map showing protein change profiles, and (B) cropped micrographs for proteins in a cluster that responded specifically in hal51Δ.

Refer to Figure 2 for information on the heat map representation of the protein change profiles and micrograph images. In (B), we show cropped micrographs in which the labelled protein has been fused with GFP (green), and the hal5Δ mutant for proteins Rps23a, Rpl7b, Syo1, and Nmd5. Cropped micrographs labelled with an asterisk (*) have had their image intensities clipped at their lower range to remove noise.

https://doi.org/10.7554/eLife.31872.020

Taken together, these results show that our integrative approach can be applied in a case in which there is little prior biological knowledge, and there are morphological differences between cells. This highlights the generalizability of the approach to a more realistic exploratory data analysis situation.

Discussion

Here, we define quantitative associations between proteins on the basis of shared localization change behavior obtained through a systematic comparison of microscopy images, a data source that is infrequently exploited for integrative analysis. Although our study focuses on localization changes, images contain other information, including information about cell morphology (Ohya et al., 2005) and protein abundance (Albert et al., 2014). In principle, this information could also be incorporated into high-throughput integrative analysis.

This integrative approach contrasts with those used in previous studies that have produced lists of discrete protein localization changes that are readily interpretable by biologists, but not necessarily comparable or quantitative. For example, in our previous supervised approaches (Chong et al., 2015; Kraus et al., 2017), localization changes following individual perturbations were represented as flux networks, which show snapshots of protein movement between organelles following individual perturbations. However, it is not clear how the flux network obtained from one perturbation can be compared to that obtained from another perturbation. The value of integrating datasets is illustrated by the cluster of proteins involved in the mating response (Cluster T) and another cluster containing cell-cycle proteins (Cluster A). The former had localization changes in just α-factor, whereas the latter had changes in both α-factor and iki3Δ perturbations. Flux maps of the α-factor (Kraus et al., 2017) identify proteins from both sets; here, we separate these proteins into specific and functionally enriched groups. Thus, increased resolution offered by a multi-perturbation context empowers the identification of functional relationships between proteins.

Moreover, a comparative representation of protein localization changes between perturbations permits the detection of subtle differences in protein responses. We showed that regulators of the yeast stress response (such as Stb3 and Msn2) could be differentiated in subtle ways. Our work integrates multiple datasets as a compact visualization, allowing for a deeper understanding of the subtle conditional and time-dependent dynamics of proteins.

We addressed the question of how to scale analyses to tens of high-throughput microscopy screens. Previous approaches act on a single screen, scaling to tens of thousands of images. By integrating data from multiple screens, we show how to scale a single analysis to hundreds of thousands of images, an order of a magnitude increase in the scale of data.

Our approach detects a novel reciprocal pattern of localization changes for membrane proteins following hydroxyurea or rapamycin treatment. These protein responses are notable because they exhibit different localization changes depending upon the perturbation. Although we do not have a clear hypothesis for these changes, the observation was made clear by our visualization, which shows trends in localization changes simultaneously for both perturbations. That this pattern occurred in multiple membrane or Golgi-localized proteins, increases our confidence that it reflects important, yet currently unknown biology. Without clustering based on quantitatively comparable localization change profiles, the prevalence of the pattern would not have been apparent.

Our analysis is unbiased by prior biological knowledge. Despite this, we find many specific examples of known responses to perturbations, and thus strong evidence that the patterns we identify reflect underlying biology. The unbiased nature of our approach permits the association of new genes with previously known responses. We found that Crz1 was clustered with proteins in the mating response in the α-factor perturbation, a result that was initially unexpected given that Crz1 is not part of the canonical mating response (Bardwell, 2005). However, during the course of our research, another study independently linked Crz1 to the mating response (Carbó et al., 2017), confirming our finding. Thus, unbiased, high-throughput exploratory analyses of protein localization changes can reveal connections that are not expected on the basis of prior knowledge.

Similarly, we identified pulsatile dynamics in the transcription factor Rtg3 in standard media in this study, and confirmed that these dynamics were altered by rapamycin. Our results contradict a previous proteome-wide screen that suggested that Rtg3 does not pulse (Dalal et al., 2014). However, the highly variable dynamics of pulsing from cell-to-cell (Jacquet et al., 2003; Dalal et al., 2014) and the large number of proteins independently tested in the screen, as well as differences in microscopy and imaging conditions, are plausible reasons for the discrepancy. In contrast to the time-lapse movie data used to analyse pulsing, our image data is much less comprehensive, consisting of still images of just three coarse time-points. Rather, our discovery was powered by associating protein behavior, by observing that Rtg3 had a protein change profile similar to those of other pulsatile transcription factors whose dynamics were affected by rapamycin in similar ways. That we could make findings missed by stronger experimental approaches demonstrates that looking for protein properties by association can be powerful.

As a method that exploits high-throughput imaging data, our results must be interpreted with the understanding that false positives and negatives may emerge from various steps in our data acquisition and analysis. First, occasional contamination and failures in the transformation process may occur. In this work, we found some proteins with bimodal expression in their population of cells. Our change detection algorithm correctly detects the resulting difference in protein expression in the images, but there is a possibility that these images are of mixed genetic strains. Determining whether variability in protein expression is biological or technical from the images alone is non-trivial: a previous quality-control experiment on six proteins with cell-to-cell variability in the GFP collection indicated that only two had mixed genotypes, while two had variability despite identical genotypes and another two were inconclusive (Handfield et al., 2015). In addition, as the GFP collection does not yet fully cover the proteome (Chong et al., 2015), and as cell samples must proliferate to generate an adequate sample size, our experimental method will miss some proteins because of missing data.

Second, segmentation and other image transformations can introduce artifacts. We found a cluster of bud tip proteins whose localization was predicted to change in the α-factor perturbation (Cluster Q). On manual evaluation, we observed that our segmentation algorithm incorrectly segmented the bud tip of cells in these images as a bud cell (Figure 2—figure supplement 3); while these proteins do exhibit a localization change, this change was incorrectly assigned to the bud cells rather than the mother.

Third, we observed that our clustering method can introduce false positives into our clusters. We observed that some of our clusters of protein change profiles had a few protein change profiles with weaker signals (e.g. Nab2 in Cluster E in Figure 1—figure supplement 1). Our data analysis uses distance-based clustering methods, with decrease in performance as data dimensionality increases (Steinbach et al., 2004). Adaptations that deal with higher-dimensionality data may be required as the number of image screens increases.

To handle false positives, we recommend that results are validated by human evaluation of the images, because an expert evaluator can determine whether the signals found by our unsupervised method are biological or technical. Our method complements human evaluation, which is too laborious to be applied to hundreds of thousands of images, many of which may only have subtle trends; our method drastically reduces this search space, reducing inspection to potentially only hundreds of images rather than hundreds of thousands. We focus on proteins with strong and interesting signals, equipped with knowledge of which specific screens to look at and the general nature of the putative change.

While our approach is based upon direct imaging of cells, predictive multi-perturbation approaches to study protein localization are also available (Lee et al., 2006, 2014; Gardy et al., 2003; Horton et al., 2007). While the latter approach is a valuable alternative when experimental data is not readily generated, we believe that predictions that are based on protein–protein interactions and mRNA expression patterns (Lee et al., 2014) will miss many of the localization dynamics. We found that not all clusters of protein change profiles overlap well with either of these types of datasets, suggesting that the underlying mechanisms that drive localization changes are diverse. Indeed, in recent predictions of localization change under stress (Lee et al., 2014), mitochondrion to nucleus and ER to Golgi were predicted to be the most frequent type of change. We observe that transitions between the nucleus and cytoplasm were common in the localization changes that we looked at. This could be a technical effect because our change detection method is more sensitive to localization changes that reflect larger distances in our feature space (discussed in [Lu and Moses, 2016]), but we also observe that many of the transcription factors that we observed as changing under stressful conditions were not predicted to have localization changes by a previous predictive method that relied upon protein-interaction and mRNA expression datasets (Lee et al., 2014).

Integrative proteomic analyses of image screens should become increasingly possible in the future thanks to the efforts of public resources such as the Image Data Resource (Williams et al., 2016). Strategies like ours should provide the comparative context to understand the diversity of experiments contained within these databases. In medical domains, our general strategy of viewing localization changes in a multi-perturbation context may be applicable to screens of human cells. Methods are emerging for scalable GFP tagging of human proteins (Leonetti et al., 2016); given that our strategy can show protein pathway responses and compensatory rearrangements of the proteome in response to drugs, it could identify potential protein targets for knockout or combination therapies. Importantly, as our approach can separate more general responses from those more specific to a drug, it may permit selection of protein targets that minimize side effects. Furthermore, while we frame our work as relevant to perturbations, our approach can easily be extended to study proteomic differences in tissue or cell types (Uhlén et al., 2015).

Materials and methods

Experimental strains and image acquisition

Image data for untreated wild-type GFP-tagged yeast cells, in addition to the rpd3Δ, rapamycin, hydroxyurea, and α-factor perturbations, were taken from the CYCLoPs database (Koh et al., 2015).

The iki3Δ, elm1Δ, hal5Δ, hsl1Δ, kin1Δ, kin2Δ, mck1Δ and vhs1Δ strains were constructed and imaged as described by Chong et al. (2015). Fluorescent micrographs were acquired using a high-throughput spinning-disc confocal microscope (Opera, PerkinElmer) with a water-immersion 60X objective (NA 1.2, image depth 0.6 µm and lateral resolution 0.28 µm). Acquisition settings included using a 405/488/561/640 nm primary dichroic, a 568 nm detector dichroic, a 520/35 nm filter in front of camera 1 (12-bit CCD) and a 600/40 nm filter in front of camera 2 (12-bit CCD). Excitation was conducted using 488 nm (blue) and 561 nm (green) lasers at maximum power. Eight images were acquired for each well, four in the red channel and four in the green channel with simultaneous acquisition of red and green channels (binning = 1, focus height 2 µm) and an 800 ms exposure time per site.

For images that we present in this study, we cropped a 300 × 300 pixel box from the respective images showing representative cells for the sample. These images are outputted by our single cell segmentation program (Handfield et al., 2013), and show blue lines around the cells indicating the results of the segmentation, and white circles connecting mother and bud cells. Cells that were considered to be artifacts are indicated by white crossed-out regions in the images. These images were rescaled in contrast using the highest and lowest intensity pixels in the crop to better show subcellular localization patterns for proteins expressed at lower abundances.

The GFP-tagged Rtg3 strain used to evaluate nuclear localization in Figure 4 was generated as a fusion product via homologous recombination of the native genomic sequence. Direct transformation of a linear PCR product containing codon-optimized GFP coding sequence and a selectable resistance marker flanked by gene-specific sequence yielded carboxy-terminally tagged fusion products, which were then isolated by the inferred drug resistance. To prepare strains for microscopy, strains were inoculated into synthetic minimal media and grown overnight at 30˚C. Prior to imaging, stationary cultures were diluted 1/10 in fresh media and grown at 30˚C for 4 hr to ensure log-phase growth and proper expression of GFP fusion products.

To produce time-lapse movies of the Rtg3 strains, cells were dosed with 200 ng/ml of rapamycin for the rapamycin treatment perturbation, and imaged at 22 ˚C for 16 hr at 2.5 min intervals for four z-stacks at one micrometer intervals, using a 700 ms exposure time. Cells were imaged using a Nikon spinning disk confocal microscope using a 60X oil-immersion objective. GFP excitation was at 488 nm. To analyse frames of the codon-optimized GFP strain, segmentation and tracking were conducted using Matlab on the brightfield image to identify cell peripheries. To quantify nuclear localization, we modeled the pixel intensities in each frame of the movie using a two-component generative model that uses a Gaussian distribution to model the cytoplasmic GFP signal, and a uniform distribution to model the nuclear GFP signal. Using this model, we simultaneously inferred both whether the nucleus is visible in an image, and if so, which pixels are within the nucleus. The localization score that we computed is the expected nuclear signal in each frame given these assumptions. A higher localization score indicates that the protein is distributed more to the nucleus, whereas a lower localization score indicates that the protein is distributed more to the cytoplasm. To quantify the duration of nuclear localization, we applied the ‘findpeaks’ function in Matlab to our localization scores.

To ensure that the consistency of protein abundance across screens was controlled, we measured the correlation of GFP-intensity across all screens. All screens have R2 >0.79 with the untreated reference wild-type.

Image analysis and quantification of localization changes

To quantify the patterns of GFP in the cells in our images, we used the single-cell segmentation and feature extraction program as described by Handfield et al. (2013), specialized for the segmentation and measurement of GFP-tagged yeast cells. This method estimates the cell-cycle stage of cells using the size of the bud as a heuristic, and describes a series of 10 bins of equidistant cell-stage keypoints; for this work, we reduced the number of bins from 10 to 5 by merging adjacent bins to have more cells in each bin.

We modeled GFP-expression using quantitative, biophysically motivated features, rather than standard computer vision texture measurements (Haralick et al., 1973). An important property of these features is that they track the concentration and distribution of GFP-tagged proteins relative to certain cellular landmarks. As opposed to features that only track the shape of the GFP pattern, these features allow us to track relative distributions between localizations when a protein is localized to two or more compartments, and shifts the ratio of protein between these compartments.

To conduct localization change detection on these features, we applied the method that we described previously in Lu and Moses (2016). This unsupervised change detection method constructs a conditional expectation for each protein and reports the direction and magnitude of deviation of the protein’s change compared to this expectation; because these conditional expectations are constructed locally for each protein, differences in feature measurements between samples explained by simple variation in the morphology of an organelle or systematic biases between image screens are de-emphasized, allowing for the fairer comparison of localizations that are impacted differently by these effects. The unsupervised change detection method quantifies the putative localization change for each protein with a shared set of features. This representation encodes the nature of the localization change in the pattern across our vector of features; for instance, cytoplasm to nucleus movements are characterized by strongly positive z-scores for ‘distance to cell center’ and ‘distance between proteins’ features, which indicate that the untreated wild-type values for these features are higher than the perturbation values. Using this method, localization changes are presented in a quantitative and comparable way that summarizes each localization change as a relative measure between the untreated wild-type and the perturbed cells.

The unsupervised change detection method requires a parameterization for the number of proteins used as examples to construct a local estimate of the conditional expectation of each protein; we set this to 50 proteins, as this was shown to be optimal in our previous work. We increased the leniency of our filters for sample size reliability compared to the method described in (Lu and Moses, 2016), permitting vectors that have at least one cell in each bin rather than requiring at least five as originally described; this permits for the retention of more data. In practice, the lowest number of cells we observed in the reference untreated wild-type with this filter is 18, and only 13 of 3955 proteins have fewer than 25 cells. We used the more robust truncated mean profiling method described in (Lu and Moses, 2016). We applied the image segmentation and feature extraction method to each image screen in our dataset, and then the change detection method between each image screen paired against the WT2 screen, which served as our common reference untreated wild-type screen for all perturbations and all other untreated wild-types.

Our unsupervised localization change detection method reports proteins that change from not being expressed (or being expressed at a level too low to confidently assign a localization) in one screen to being expressed in a recognizable localization in the other screen as localization changes. We elected to retain these signals. This strategy is consistent with previous work that has generally considered these cases to be localization changes (Chong et al., 2015; Tkach et al., 2012).

Clustering and visualization of protein change profile

The protein change profile for individual perturbations is concatenated into a concatenated profile. We clustered these profiles using the open-source Cluster 3.0 package (de Hoon et al., 2004), with hierarchical agglomerative clustering with uncentered correlation distance and average linkage. This operation can be performed on the entire set of profiles, but for the heat map that we present in Figure 2, we reduced the number of profiles visualized to just the strongest signals prior to clustering. We required that the concatenated vector have at least three z-scores over an absolute value of 5.0 and greater than 80% of data present, resulting in 1159 proteins displayed in Figure 2. 46% of the proteins among the 1159 strongest localization change patterns are among the brightest 50% of proteins in the collection, indicating that the proteins with strong patterns of localization change are not simply the strongest or weakest GFP signals.

To visualize these resultant clustered matrices as heat maps, we used Java Treeview (Saldanha, 2004). Blue values indicate positive z-scores and yellow values indicate negative z-scores, with the intensity of the color indicating the magnitude. Grey values indicate missing data.

Comparison with other high-throughput data sources

Enrichment analyses of the clusters in Table 1 were conducted using the 1159 proteins visualized in Figure 2 as the background population. We looked for gene ontology-enriched terms (Gene Ontology Consortium, 2015) in biological process, cellular component, and molecular function, with Benjamini-Hochberg correction (Benjamini and Hochberg, 1995) at a 0.05 FDR, using the YeastMine tool (Balakrishnan et al., 2012) on the Saccharomyces Genome Database. We report a sample of the significant terms that we found for some clusters in Table 1; full lists of enrichment for each cluster can be found in Table 1—source data 1.

To compare protein localization change profile similarity with transcript change profile similarity for these perturbations, we constructed transcript change profiles using four separate genomic expression microarray experiments for the rpd3Δ (Bernstein et al., 2000), hydroxyurea (Dubacq et al., 2006), rapamycin (Hardwick et al., 1999), and α-factor (Roberts et al., 2000) perturbations in yeast. We only included time points from these microarray experiments earlier or equivalent to the corresponding time points in our image screen data: this resulted in transcript data consisting of two replicates of rpd3Δ; three timepoints of rapamycin treatment at 15 min, 45 min, and 60 min; two timepoints of hydroxyurea treatment at 60 min and 120 min; and six timepoints of α-factor treatment at 15 min, 30 min, 45 min, 60 min, 90 min, and 120 min. Because showing all protein–protein pairs would have been prohibitive in terms of visualization (generating ~16 million points), we show protein–protein pairs formed between the proteins with the strongest changes in either protein localization changes or transcript changes. We required that a protein have either two log-fold changes above an absolute value of 2.0 in the transcript change profile, or eight values above an absolute value of 5.0 in the localization change profile. This filter resulted in 310 proteins with a transcript change, and 440 proteins with a localization change, for a total of 655 non-redundant proteins between the two lists. We then computed the correlation distance between the transcript change profile and the protein localization change profile for each pair of proteins in this list, which we visualized as the scatterplot in Figure 7A.

To compare protein localization change profile similarity with protein abundance change profile similarity, we constructed protein abundance change profiles using the same image data that we used to construct the protein localization change profiles. For each protein in each image screen, we measured the average intensity of the GFP signal in single cells. We calculated the difference between this measurement and that in the WT2 screen, and concatenated these differences into a profile. Similar to the transcript change profile similarity, we required that a protein have either two changes above an absolute value of 50 in the protein abundance change profile, or eight values above an absolute value of 5.0 in the localization change profile. This filter resulted in 318 proteins with a protein abundance change, and 440 proteins with a localization change, for a total of 686 non-redundant proteins between the two datasets. We then computed the correlation distance between the protein abundance change profile and the protein localization change profile for each pair of proteins in this list, which we visualized as the scatterplot in Figure 7B.

Protein–protein interactions for the Sacchromyces cerevisiae proteome were taken from the Biogrid (Oughtred et al., 2016) and STRING (Szklarczyk et al., 2017) databases. For the Biogrid database, we compared our data with physical interactions from low-throughput experimental sources or supported by at least two PMIDs in high-throughput experimental sources. For the STRING database, we compared our data with all protein–protein associations labeled as ‘binding’. We used the clusters of proteins previously defined in Figure 2. We counted the number of interactions in each dataset, counting interactions between the proteins of each cluster individually. To generate null expectations, we randomly shuffled all proteins within our defined clusters while retaining the sizes of each cluster, and counted the resulting number of interactions within each cluster. We repeated this simulation 10,000 times to produce a mean and standard deviation for number of interactions for randomized clusters, and report the true number of interactions for each cluster as a z-score against these expectations.

We use the manual annotations for the untreated wild-type yeast-GFP collection by Huh et al. (2003) to assess the distribution of untreated wild-type protein localizations within our clusters (presented in Figure 8B).

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
    Controlling the false discovery rate: a practical and powerful approach to multiple testing
    1. Y Benjamini
    2. Y Hochberg
    (1995)
    Journal of the Royal Statistical Society. Series B 57:289–300.
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
    Yeast cell synchronization
    1. A Day
    2. C Schneider
    3. BL Schneider
    (2004)
    Methods in Molecular Biology 241:55–76.
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
    New Directions in Statistical Physics
    1. M Steinbach
    2. L Ertöz
    3. V Kumar
    (2004)
    273–309, The Challenges of Clustering High Dimensional Data, New Directions in Statistical Physics, Berlin, Heidelberg, Springer Berlin Heidelberg.
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
    Reaction order of Saccharomyces cerevisiae alpha-factor-mediated cell cycle arrest and mating inhibition
    1. MM Udden
    2. DB Finkelstein
    (1978)
    Journal of Bacteriology 133:1501–1507.
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72

Decision letter

  1. Emmanuel Levy
    Reviewing Editor; Weizmann Institute of Science, Israel

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Patterns of protein subcellular localization change inferred by integrating 100,000s of images from microscopy screens" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by Emmanuel Levy as a Reviewing Editor and Naama Barkai as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another, and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

Lu et al. introduce the concept of "localization profile" representing a protein's localization signature across multiple environmental conditions. The use of protein profiles has proved powerful as an approach to study protein function as well as to characterize functional relationships between proteins. It has been employed in numerous contexts and includes, for example, phylogenetic, mRNA expression, physical interactions or genetic interaction profiles.

Here, the authors characterize protein localization profiles using images from several proteome–wide fluorescence microscopy screens of the yeast GFP library. Inferring protein localization from microscopy images normally involves human intervention to train classifiers that are hardly generalizable. As a result, it has been difficult to carry out systematic and quantitative comparisons of protein localization across multiple screens carried out at different times or involving different physiological conditions. To overcome this difficulty, the authors employed an unsupervised approach to analyzing protein localization, which they applied to several proteome–wide microscopy screens.

The localization features measured across conditions formed protein localization signatures, which the authors clustered to identify groups of similarly behaved proteins. Clusters showed enrichment for functionally related proteins. Numerous changes detected were validated by known literature, and novel changes were observed. Most of the clusters of proteins with similar signature in localization–change were not associated to similar transcriptional regulation nor were they enriched for physically interacting proteins.

Essential revisions:

Globally, all reviewers found that the method shows an important potential. Numerous comments point to analyses needing to be more thorough, sometimes requiring a more systematic examination. Considering that the data underlying the kinase deletions appear to be of lower quality, I suggest cutting that part. Figures are often hard to follow, so particular attention needs to be given to the presentation of the results. Please find below a consolidated list of the reviewer's comments.

1) The authors indicate they derive descriptors even when a single cell is available. Was this decision taken based on a general optimization scheme or is it arbitrary? What fraction of the data corresponds to 1 to 5 cells? One should show that this decision brings more signal than noise.

2) A significant fraction of strains in the GFP library show fluorescence levels that are close to auto–fluorescence, and in that respect can be hard to analyze. How do such strains affect this work? What fraction of the strains in the matrix of Figure 1 are in the "top–50% abundance class?"

3) More generally, to what extent does profile similarity depend on signal intensity or protein abundance?

4) How was the number of features decided upon? Each feature described in Figure S1 has 10 pixels associated with it (and therefore I assume 10 parameters). Why are 10 features necessary when, e.g., considering "distance to the cell edge." Could one feature, or perhaps an average of the features facilitate interpretation while being equally informative in the profile comparison? It would be necessary to establish an objective criterion that can be optimized – such as "number of changes detected across different conditions versus across replicates of the same condition." (this objective criterion can also be used to set the minimum number of cells to be used, see point #1.). A detailed depiction of the features employed is needed.

5) The approach is validated using a set of 20 pictures, which may not account for global effects. A more general benchmark would, therefore, make sense in that respect too (see comment #4).

6) In the validation, the authors infer a false positive rate of ~50%. They should provide a sense of what the origin of these false positives is, and how it may impact the conclusions. For example, that proteins in the same clusters to not exhibit similar transcription profile, physical interactions, or localization.

7) An analysis of a limited number of clusters is presented in Table 1, where GO enrichment is carried out. Are these the only clusters showing functional enrichment? The global nature of this work begs for a more general analysis of GO enrichment, i.e., by defining clusters across the entire matrix and analyzing GO enrichment for all of them, not a handful only. Even if a few are described in Table 1, the results can be provided as Supplementary data. In addition, "numbers" can be added to Table 1 (number of proteins in the cluster and number corresponding to the annotation being enriched).

8) Figure 4C contains cells that do not appear to contain any green fluorescence, while others show a strong signal. Such bimodal behavior is worrying as it may not be biological. Perhaps that some cells have lost their fluorescence during the SGA process for example. Is the localization profile affected by such cases? Is such bimodal behavior widespread? If so it should be accounted for in the analyses, and such cases would need to be filtered or flagged.

9) Global "quality control" figures should be shown, suggestions for this are given below:

– Subsection “An unsupervised analysis of protein localization changes in over 280,000 images”: More details should be presented regarding this sentence. Can a distribution of "localization change" score be shown for replicates of the same condition and different conditions? More of such "sanity check" plots would help get a better sense of how the method is doing. (See comment #1)

– The consistency of protein abundance (or rank abundance) across the different screen should be controlled.

– The association between change in localization, expression–change, and PPIs is shown in Figure 6 for a few cases, but it would be more valuable to show it in a systematic fashion, for example by showing (i) how Z–score for profile change in the data from Figure 1 correlate with a Z–score for abundance change, or (ii) How profile similarity relates to other measures of functional relatedness? E.g., protein interaction profile similarity, genetic interaction profile similarity, expression–profile similarity, etc. Such a bird's eye view of the data is lacking and would add much to this work.

– The last analysis suggests that proteins with different localizations can exhibit a similar change. This is counter–intuitive, and more data could be provided to examine the origin of such cases

– Looking at Figure 6C, why is cluster I, which contains ribosomal proteins, not enriched in physical interactions? If using BioGrid one does not need to limit oneself to low–throughput data. A large and high–quality dataset can be obtained by taking all physical interactions supported by more than one PMID. Lastly, the "clusters circles" of Figure 6C do not overlay with points.

10) The authors identified Rtg3 to be a pulsing transcription factor through cluster analysis and conducted time–lapse movies to confirm such dynamics. In the cited reference (Dalal et al., 2014), pulsing was defined as TFs that exhibit dynamic shuttling between nucleus and cytoplasm during the steady–state response to added stresses. In Figure 3, the authors didn't indicate when the stress was added and whether the cells on average have reached steady–state. The authors didn't specify what stress condition each cell represent in Figure 3. Lastly, to analyze "longest pulse duration" (Figure 3C) does not provide a robust measure. The mean/median duration should be presented.

11) Analysis of protein level change. In Figure 6A, the authors compared localization pattern with microarray pattern. However, it would be more natural to compare the protein expression pattern with microarray pattern, and more importantly protein expression with localization changes. It is indeed always difficult to merge different data from different laboratories, but protein expression is readily available from the images as the sum of GFP intensities in the cell.

https://doi.org/10.7554/eLife.31872.026

Author response

Essential revisions:

Globally, all reviewers found that the method shows an important potential. Numerous comments point to analyses needing to be more thorough, sometimes requiring a more systematic examination. Considering that the data underlying the kinase deletions appear to be of lower quality, I suggest cutting that part. Figures are often hard to follow, so particular attention needs to be given to the presentation of the results. Please find below a consolidated list of the reviewer's comments.

1) The authors indicate they derive descriptors even when a single cell is available. Was this decision taken based on a general optimization scheme or is it arbitrary? What fraction of the data corresponds to 1 to 5 cells? One should show that this decision brings more signal than noise.

We believe that this comment was based on a lack of clarity in the description of our previously published methods. We require at least one mother–bud pair in each of the 10 bins that we define, which track mother and bud cells separately over 5 stages in the cell cycle. This means that we require a minimum of 10 cells per sample, distributed across the cell cycle. In practice, the lowest number of cells we observe in the reference untreated wild–type with this filter is 18, and only 13 of 3955 proteins have fewer than 25 cells.

We have edited subsection “An unsupervised analysis of protein localization changes in over 280,000 images” and included a new figure (Figure 1) to better explain our image analysis pipeline and included the details about the numbers of cells in subsection “Image Analysis and Quantification of Localization Changes”. We hope that this is clearer in the revised manuscript.

These decisions are based on the desire to avoid filtering out proteins from the analysis because of missing data as we add more screens to the analysis. We believe that including as many proteins as possible is important because we wish to provide as much as possible a proteome–scale analysis. In addition, for technical reasons, our localization change detection algorithm (Lu and Moses, 2016) actually works better when we have more proteins in the analysis. For example, changing the minimum requirement from 1 to 5 cells per bin filters out an additional 730 (18.4%) proteins in the reference untreated wild–type.

2) A significant fraction of strains in the GFP library show fluorescence levels that are close to auto–fluorescence, and in that respect can be hard to analyze. How do such strains affect this work? What fraction of the strains in the matrix of Figure 1 are in the "top–50% abundance class?"

Our localization change algorithm automatically handles low intensity proteins where the GFP signal is not recognizable. Low intensity proteins show a distinctive pattern in our feature space (see the cluster labelled as “ambiguous” in Figure 6 of Handfield et al., 2013) and can be handled identically to proteins with a distinct localization: If the protein remains low intensity in the perturbation, the protein will have the same pattern of features and will not be identified as having a localization change.

On the other hand, if proteins change from being low intensity in the wild–type to having a recognizable localization in the perturbation (or vice versa), these will be identified as having a localization change. We elected to retain these signals. There is some ambiguity on whether these changes are “truly” localization changes or instead should be thought of as “abundance” changes, but we believe that this shows that the distinction between “abundance” and “localization” is somewhat arbitrary.

In addition, we believe our perspective is in line with previous localization change detection work has generally considered these cases to be localization changes: Tkach et al., (2012) includes localization changes where the protein abundance is at auto–fluorescence levels in untreated cells, Chong et al., (2015) also includes localization changes of this nature. We have added a brief discussion of abundance and localization changes to the revised manuscript to subsection “Image Analysis and Quantification of Localization Changes”.

For all proteins that we analyzed in this study, 50% of proteins have a protein abundance in the reference untreated wild–type greater than 41.0 A.U. For the 1159 proteins that we show in Figure 2, 529 (or 45.6%) of proteins have a protein abundance greater than 41.0 A.U. We have added these numbers to subsection “Clustering and Visualization of Protein Change Profile.”

3) More generally, to what extent does profile similarity depend on signal intensity or protein abundance?

For most screens, the strongest localization change signals do not co–occur with a significant abundance change, and the strongest abundance changes do not co–occur with a significant localization change. We have added Figure 7—figure supplement 1, which shows scatterplots of the protein localization change profile magnitude compared to the average protein abundance change for some screens.

The rpd3Δ perturbation is an exception. This screen shows more localization changes that co–occur with a strong change in abundance. These results are consistent with previous work; while Chong et al., (2015) report little correlation between abundance and localization changes for the hydroxyurea and rapamycin perturbations, 17 of the 31 localization changes that they detect for the rpd3Δ perturbation also significantly change in abundance.

We have reported these findings in subsection “Cluster associations found by our unsupervised exploration of localization changes are complementary to other high–throughput experiments” of the revised manuscript.

4) How was the number of features decided upon? Each feature described in Figure S1 has 10 pixels associated with it (and therefore I assume 10 parameters). Why are 10 features necessary when, e.g., considering "distance to the cell edge." Could one feature, or perhaps an average of the features facilitate interpretation while being equally informative in the profile comparison? It would be necessary to establish an objective criterion that can be optimized – such as "number of changes detected across different conditions versus across replicates of the same condition." (this objective criterion can also be used to set the minimum number of cells to be used, see point #1.). A detailed depiction of the features employed is needed.

Once again, we believe there is confusion due to the lack of clarity in the description of the methods in the submitted version of the manuscript.

The 10 features result from the separation our cells into 10 bins (5 cell cycle stages for the bud and 5 cell cycle stages for the mother cells). The features shown for proteins are an average of the single cells in each bin. The repeated features in the protein localization change profile are designed to show if the protein localization change is dependent on cell cycle and/or on mother/bud cell type. We have clarified this in the revised manuscript (Figure 1).

Feature selection and optimization are key areas for research in automated microscopy image analysis. It is likely that they could lead to easier interpretation of localization changes. However, since our current manuscript is focused on data integration from multiple screens, we believe that it is outside the scope to begin consideration of these challenging data analysis problems.

5) The approach is validated using a set of 20 pictures, which may not account for global effects. A more general benchmark would, therefore, make sense in that respect too (see comment #4).

We take the point that 20 pictures were not convincing. We have therefore expanded our manual validation set to 80 pairs of images, recorded in Supplementary Data 2. We analyzed the images in Clusters E, F, J, Q, and S, shown in Figure 1. For each set of images, we have reported whether a localization change was visible to the human observer, and quantified false positive rates with respect to these observations.

We thank the reviewer for the suggestion to develop a general benchmark for protein localization changes under perturbations. This would be useful to the community to develop new statistical approaches. The images that we have manually inspected as a part of this study could form the seed for such a data resource, and we have made them available in Supplementary Data 2. However, we believe that further development of this resource is beyond the scope of the current manuscript.

6) In the validation, the authors infer a false positive rate of ~50%. They should provide a sense of what the origin of these false positives is, and how it may impact the conclusions. For example, that proteins in the same clusters to not exhibit similar transcription profile, physical interactions, or localization.

To address the origin of the false positives, we used the set of manually inspected images (discussed in response to point #5). We found at least three causes for false positives: (1) protein change profiles with weaker signals that are clustered among the profiles with stronger signals. These are true “statistical” false positives where the noise passes the threshold that we chose and happens to fall in a pattern similar enough to the strong signals. (2) localization changes are too subtle, complex, or borderline to identify consistently by human eye. These often include cases where only a subset of cells in the image show the localization change, and/or the localization changes in in the quantitative appearance of an organelle. While these are false positives in the sense that a human observer would not call them localization changes, it is likely that they are simply subtle biological effects. (3) technical issues, particularly related to mis–segmentation of cells with previously unseen morphology. We believe that our methods to a surprisingly good job at dealing with changes in cell morphology, but they are not perfect.

A table of all our manual annotations can be found in Supplementary Data 2, and images of these proteins are available on CYCLoPS. We have added some supplementary figures to address this question and discussed all of these cases along with specific examples and images in the revised manuscript. Figure 2—figure supplement 1 shows examples of case 1 and Figure 2—figure supplement 2 shows examples of case 2; we discuss these in the result section entitled “An unsupervised analysis of protein localization changes in over 280,000 images”. Supplementary Data 5 shows an example of case 3, and we discuss this as a source of false positives in the Discussion section.

To address whether the results of the global comparisons with other data could be explained by the presence of false positives, we examined cases where localization change patterns did not co–occur with transcript or protein abundance patterns, or with protein interactions and shared subcellular localization, in the context of the clusters that we had manually identified the false positives. We found that the false positives could not completely explain the lack of co–occurrence for any cluster. We report these results in the result section entitled “Cluster associations found by our unsupervised exploration of localization changes are complementary to other high–throughput experiments” with reference to Figure 2—figure supplement 2; we believe that verification in this limited set is the best we can do to eliminate the possibility that our global results are explained by false positives.

7) An analysis of a limited number of clusters is presented in Table 1, where GO enrichment is carried out. Are these the only clusters showing functional enrichment? The global nature of this work begs for a more general analysis of GO enrichment, i.e., by defining clusters across the entire matrix and analyzing GO enrichment for all of them, not a handful only. Even if a few are described in Table 1, the results can be provided as sup. Data. In addition, "numbers" can be added to Table 1 (number of proteins in the cluster and number corresponding to the annotation being enriched).

We have added the numbers of proteins in the table as requested. We have now included any significant GO terms for all of the manually identified clusters in Source Data 1.

Presenting GO enrichments for automatically chosen clusters could be misleading because global effects (such as changes in morphology, growth or imaging conditions, etc.) that differentially effect proteins with different subcellular localization patterns might still be significant. For example, clusters showing differences between untreated wild–type screens show strong GO enrichments. We have therefore decided to stick with manually identified clusters, staying true to the use of clustering as exploratory data analysis to identify patterns.

8) Figure 4C contains cells that do not appear to contain any green fluorescence, while others show a strong signal. Such bimodal behavior is worrying as it may not be biological. Perhaps that some cells have lost their fluorescence during the SGA process for example. Is the localization profile affected by such cases? Is such bimodal behavior widespread? If so it should be accounted for in the analyses, and such cases would need to be filtered or flagged.

We agree that some of our detected localization changes may be due to contamination or technical issues in the transformation: occasional errors are inevitable in high–throughput screens. We have discussed contamination as a source of false positives in the Discussion section of the revised manuscript. We note that this issue is a limitation of the GFP collection. In these cases, our analysis correctly identifies that there is a localization change in the images.

Unfortunately, it is not possible to confirm from the images alone if bimodal behavior is caused by a technical issue. In a previous study, we performed quality control experiments on 6 proteins with highly variable expression in the GFP collection indicated that only 2 had mixed genotypes, while 2 had variability despite identical genotypes, and another 2 were inconclusive (Handfield et al., 2014).

We therefore decided not to filter bimodally–expressed proteins because this may unnecessarily exclude real and interesting biological findings. Instead, we advocate generation of independent GFP–tagged strains to verify the localization change, as we have done for our Rtg3 experiment. Indeed, the specific case of some ribosomal subunits being expressed at very low levels in some cells in the iki3Δ perturbation needs further confirmation and we have removed these examples from our results in subsection “Stress–responsive transcription factors that exhibit stochastic pulsatility may be controlled by the same mechanism”.

9) Global "quality control" figures should be shown, suggestions for this are given below:

– Subsection “An unsupervised analysis of protein localization changes in over 280,000 images”: More details should be presented regarding this sentence. Can a distribution of "localization change" score be shown for replicates of the same condition and different conditions? More of such "sanity check" plots would help get a better sense of how the method is doing. (See comment #1)

We thank the reviewers for this useful comment. We have now included a comparison of localization change profiles for replicates as Supplementary Data 1 and discuss these results in subsection “An unsupervised analysis of protein localization changes in over 280,000 images”. We analyse the correlation between the rpd3Δ and the iki3Δ replicates, as well as comparisons to different conditions. We find that the replicates exhibit strong correlation, particularly for proteins that show strong signals, while different perturbations are poorly correlated.

– The consistency of protein abundance (or rank abundance) across the different screen should be controlled.

We compared the protein abundance across all the screens and we find R2 > 0.79 with the untreated wildtype. We have reported this result in subsection “Experimental Strains and Image Acquisition”.

– The association between change in localization, expression–change, and PPIs is shown in Figure 6 for a few cases, but it would be more valuable to show it in a systematic fashion, for example by showing (i) how Z–score for profile change in the data from Figure 1 correlate with a Z–score for abundance change, or (ii) How profile similarity relates to other measures of functional relatedness? E.g., protein interaction profile similarity, genetic interaction profile similarity, expression–profile similarity, etc. Such a bird's eye view of the data is lacking and would add much to this work.

We thank the reviewers for this excellent suggestion. The suggested analysis of comparing profile similarity better demonstrates the point that we were trying to make in this section, which was that the associations that we found through clustering protein localization change profiles could not entirely be found by clustering other similar high–throughput datasets. We have now included the suggested analysis as Figure 7, which compares protein localization change profile similarity with both protein abundance change, and transcriptional profile similarity, in the revised manuscript.

– The last analysis suggests that proteins with different localizations can exhibit a similar change. This is counter–intuitive, and more data could be provided to examine the origin of such cases

Because our analysis captures relative movements in protein localization, it does not depend upon the protein sharing a localization in the untreated wild–type. For example, we show that a subset of proteins moved to the vacuole under rapamycin treatment in subsection “Membrane proteins exhibit a reciprocal pattern of localization changes following hydroxyurea or rapamycin treatment”, despite proteins showing different localizations in the untreated wild–type. We have now provided a clearer representation of this phenomenon using pie charts in Figure 8B, and discussed the origin of heterogeneity for three clusters, S, J and T in the text.

– Looking at Figure 6C, why is cluster I, which contains ribosomal proteins, not enriched in physical interactions? If using BioGrid one does not need to limit oneself to low–throughput data. A large and high–quality dataset can be obtained by taking all physical interactions supported by more than one PMID. Lastly, the "clusters circles" of Figure 6C do not overlay with points.

We thank the reviewer for pointing out this problem.

We have expanded the enrichment analysis for BIOGRID to incorporate all physical interactions supported by more than one PMID. We note that this does not change the number of interactions substantially: we originally analyzed 47,498 non–redundant low–throughput physical interactions and incorporating high–throughput physical interactions supported by more than one PMID only adds 7,656 more interactions.

Upon closer analysis, we found that BIOGRID contains very few interactions between ribosomal subunits: we found only 41 physical interactions between ribosomal subunits even before filtering for redundancy. We therefore decided to also check for enrichment for protein associations that had a “binding” interaction type in the STRING database. The ribosomal clusters are now enriched with the STRING database, but other clusters have lower enrichment in STRING compared to BIOGRID. In the revised manuscript, we show enrichment in both databases side–by–side in Figure 8A.

As we are unable to explain these inconsistencies in BIOGRID and STRING, we have downgraded our claims about protein–protein interactions. Instead of stating that our clusters are not enriched for physical protein–protein interactions in general, we state that we cannot find enrichment in these databases.

Finally, figure 6C no longer appears in the revised manuscript.

10) The authors identified Rtg3 to be a pulsing transcription factor through cluster analysis and conducted time–lapse movies to confirm such dynamics. In the cited reference (Dalal et al., 2014), pulsing was defined as TFs that exhibit dynamic shuttling between nucleus and cytoplasm during the steady–state response to added stresses. In Figure 3, the authors didn't indicate when the stress was added and whether the cells on average have reached steady–state. The authors didn't specify what stress condition each cell represent in Figure 3. Lastly, to analyze "longest pulse duration" (Figure 3C) does not provide a robust measure. The mean/median duration should be presented.

We have the edited the subsection “Stress–responsive transcription factors that exhibit stochastic pulsatility may be controlled by the same mechanism” to clarify this.

We have now indicated when stress (Rapamycin treatment) was added in the caption of Figure 4. Since we observe pulsing in standard media (unstressed cells) over many hours, we are confident that these cells have reached steady state, and that pulsing occurs at steady–state.

Our finding is that Rapamycin alters the pulsing behavior by inducing accumulation of protein in the nucleus, and we have clarified this in the new manuscript. Our observations suggest that the pulsatile dynamics of Rtg3 are similar to Msn2, which also exhibits pulsing under many constant conditions, but accumulates in the nucleus under rapamycin treatment.

Finally, we have now presented an analysis that compares the average duration of nuclear localization, instead of the largest pulse in Figure 4C and find similar results. In order to do so, we needed to reliably identify all the pulses in each trace, and we therefore devised a more robust method for estimating nuclear localization. We describe this briefly in the Materials and methods section; the traces in Figure 4 have been updated with measurements using this new method.

11) Analysis of protein level change. In Figure 6A, the authors compared localization pattern with microarray pattern. However, it would be more natural to compare the protein expression pattern with microarray pattern, and more importantly protein expression with localization changes. It is indeed always difficult to merge different data from different laboratories, but protein expression is readily available from the images as the sum of GFP intensities in the cell.

We have now incorporated protein expression into our analyses. We now show a comparison of protein abundance profile similarity to protein localization similarity in Figure 7 and discussed the results in the test. Our cluster–by–cluster comparison of protein localization change profiles also now shows both the transcript and the protein abundance profiles (Figure 2—figure supplement 1 and Figure 2—figure supplement 2).

We have also included scatterplots comparing the magnitude of the protein localization change profile with the protein abundance change for some screens in Figure 7—figure supplement 1. For most screens, the strongest localization change signals do not co–occur with a significant abundance change, and the strongest abundance changes do not co–occur with a significant localization change.

https://doi.org/10.7554/eLife.31872.027

Article and author information

Author details

  1. Alex X Lu

    Department of Computer Science, University of Toronto, Toronto, Canada
    Contribution
    Software, Validation, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
  2. Yolanda T Chong

    Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  3. Ian Shen Hsu

    Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
    Contribution
    Investigation
    Competing interests
    No competing interests declared
  4. Bob Strome

    Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
    Contribution
    Resources, Investigation
    Competing interests
    No competing interests declared
  5. Louis-Francois Handfield

    Department of Computer Science, University of Toronto, Toronto, Canada
    Contribution
    Software, Investigation, Methodology
    Competing interests
    No competing interests declared
  6. Oren Kraus

    Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada
    Contribution
    Conceptualization
    Competing interests
    No competing interests declared
  7. Brenda J Andrews

    1. Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
    2. Department of Molecular Genetics, University of Toronto, Toronto, Canada
    Contribution
    Funding acquisition, Project administration, Writing—review and editing
    Competing interests
    No competing interests declared
  8. Alan M Moses

    1. Department of Computer Science, University of Toronto, Toronto, Canada
    2. Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
    3. Center for Analysis of Genome Evolution and Function, University of Toronto, Toronto, Canada
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Project administration, Writing—review and editing
    For correspondence
    alan.moses@utoronto.ca
    Competing interests
    No competing interests declared
    ORCID icon 0000-0003-3118-3121

Funding

National Science and Engineering Research Council (Pre-Doctoral Award)

  • Alex X Lu

Canada Research Chairs (Tier II Chair)

  • Alan M Moses

Canada Foundation for Innovation

  • Brenda J Andrews
  • Alan M Moses

Canadian Institutes of Health Research (FDN-143265)

  • Brenda J Andrews

Canadian Institute for Advanced Research (Senior Fellow)

  • Brenda J Andrews

Canadian Institutes of Health Research (MOP-97939)

  • Brenda J Andrews

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Helena Friesen, Taraneh Zarin, and Muluye Liku for valuable comments on the manuscript. Helena Friesen and Matej Usaj provided assistance in retrieving image datasets.

Reviewing Editor

  1. Emmanuel Levy, Reviewing Editor, Weizmann Institute of Science, Israel

Publication history

  1. Received: September 10, 2017
  2. Accepted: March 30, 2018
  3. Accepted Manuscript published: April 5, 2018 (version 1)
  4. Version of Record published: May 4, 2018 (version 2)

Copyright

© 2018, Lu et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,712
    Page views
  • 260
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Cell Biology
    2. Developmental Biology
    Hannah J Green et al.
    Research Article Updated
    1. Cell Biology
    2. Stem Cells and Regenerative Medicine
    Aurore Lhonore et al.
    Research Article