1. Computational and Systems Biology
Download icon

A large accessory protein interactome is rewired across environments

  1. Zhimin Liu
  2. Darach Miller
  3. Fangfei Li
  4. Xianan Liu
  5. Sasha F Levy  Is a corresponding author
  1. Department of Biochemistry, Stony Brook University, United States
  2. Laufer Center for Physical and Quantitative Biology, Stony Brook University, United States
  3. Joint Initiative for Metrology in Biology, United States
  4. Department of Genetics, Stanford University, United States
  5. Department of Applied Mathematics and Statistics, Stony Brook University, United States
  6. SLAC National Accelerator Laboratory, United States
Research Article
  • Cited 4
  • Views 2,173
  • Annotations
Cite this article as: eLife 2020;9:e62365 doi: 10.7554/eLife.62365

Abstract

To characterize how protein-protein interaction (PPI) networks change, we quantified the relative PPI abundance of 1.6 million protein pairs in the yeast Saccharomyces cerevisiae across nine growth conditions, with replication, for a total of 44 million measurements. Our multi-condition screen identified 13,764 pairwise PPIs, a threefold increase over PPIs identified in one condition. A few ‘immutable’ PPIs are present across all conditions, while most ‘mutable’ PPIs are rarely observed. Immutable PPIs aggregate into highly connected ‘core’ network modules, with most network remodeling occurring within a loosely connected ‘accessory’ module. Mutable PPIs are less likely to co-express, co-localize, and be explained by simple mass action kinetics, and more likely to contain proteins with intrinsically disordered regions, implying that environment-dependent association and binding is critical to cellular adaptation. Our results show that protein interactomes are larger than previously thought and contain highly dynamic regions that reorganize to drive or respond to cellular changes.

Introduction

As environmental conditions change, cells undergo programmed alterations that ultimately rewire PPI networks to execute different biological processes. Numerous examples of localized PPI rewiring have been found (Balajee and Geard, 2001; Celaj et al., 2017; Mailand et al., 2013; Marles et al., 2004; Rochette et al., 2014). Yet, little is known about how PPI networks reorganize on a global scale or what drives these changes. One challenge is that commonly-used high-throughput PPI screening technologies are geared toward PPI identification (Gavin et al., 2002; Ito et al., 2001; Tarassov et al., 2008; Uetz et al., 2000; Yachie et al., 2016; Yu et al., 2008), not a quantitative analysis of relative PPI abundance that is necessary to determine if changes in the PPI network are occurring. The murine dihydrofolate reductase (mDHFR)‐based protein-fragment complementation assay (PCA) provides a viable path to characterize PPI abundance changes because it is a sensitive test for PPIs in the native cellular context and at native protein expression levels (Freschi et al., 2013; Remy and Michnick, 1999; Tarassov et al., 2008). Indeed, moderate-scale mDHFR-PCA studies have characterized the dynamics of a subset of known PPIs in yeast, finding that between 15% and 55% change across environments and are frequently driven by changes in protein abundance (Celaj et al., 2017; Rochette et al., 2014; Schlecht et al., 2012; Schlecht et al., 2017).

However, because mDHFR-PCA studies have only considered the dynamics of PPIs that have been identified under standard laboratory growth conditions (rich complete media), they may be providing an incomplete view of global PPI rewiring (Celaj et al., 2017). One possibility is that PPIs identified from a single condition are biased toward ‘immutable’ PPIs that are found in all conditions, and ‘mutable’ PPIs that are present in only some conditions have a large impact on global PPI network dynamics. If true, mutable PPIs would also be expected to be underrepresented in current PPI networks, with important consequences to our understanding of how the protein interactome is organized and how proteins typically interact. Previous work supports the idea that proteins that participate in mutable PPIs have different properties. For example, protein hubs predicted by mRNA co‐expression data to participate in mutable PPIs (‘date’ hubs) have been found to have more genetic interactions and to bridge tightly connected modules in the PPI network (Han et al., 2004). However, the robustness of conclusions drawn from co‐expression data has undergone a vigorous debate (Agarwal et al., 2010; Batada et al., 2006; Batada et al., 2007; Bertin et al., 2007; Yu et al., 2008).

Here, we combine the mDHFR-PCA assay with a double barcoding system (Liu et al., 2019; Schlecht et al., 2017) to quantify the relative in vivo PPI abundance of 1.6 million protein pairs across nine growth conditions. We find that a large majority of PPIs detected in our screen have not been identified as PPIs in standard growth conditions, providing us with a new view of how mutable PPIs contribute to PPI network rewiring.

Results

Defining a multi-condition PPI network

In mDHFR-PCA, two proteins of interest are fused to complementary mDHFR fragments. An interaction between the proteins reconstitutes mDHFR, providing resistance to the drug methotrexate and a growth advantage that is monotonically related to the PPI abundance (Celaj et al., 2017; Freschi et al., 2013; Rochette et al., 2014; Schlecht et al., 2012; Schlecht et al., 2017). We have previously shown that mDHFR-PCA can be adapted into a pooled barcode sequencing assay by fusing two genomic barcodes in vivo using a method called PPiSeq (Schlecht et al., 2017). To generate a large PPiSeq library, all strains from the protein interactome (mDHFR-PCA) collection that were found to contain a protein likely to participate in at least one PPI (1742 X 1130 protein pairs) (Tarassov et al., 2008) were barcoded in duplicate using the double barcoder iSeq collection (Liu et al., 2019), and mated together in a single pool (Figure 1A). Double barcode sequencing revealed that the PPiSeq library contained 1.79 million protein pairs and 6.05 million double barcodes (92.3% and 78.1% of theoretical, respectively, 1741 × 1113 protein pairs), with each protein pair represented by an average of 3.4 unique double barcodes (Figure 1—figure supplement 1). The library was grown under mild methotrexate selection in nine environments for 12–18 generations in serial batch culture, diluting 1:8 every ~3 generations, with a bottleneck population size greater than 2 × 109 cells (Appendix 2 Table S1). Double barcodes were enumerated over 4–5 timepoints by sequencing, and the resulting frequency trajectories (Appendix 2 Table S2) were used to estimate the relative fitness (Appendix 2 Table S3) of each strain, which is a rough measure of the average PPI abundance over a growth cycle (Figure 1B; Levy et al., 2015; Li et al., 2018; Schlecht et al., 2017). We recovered a minimum of two reliable replicate fitness estimates for 1.6 million protein pairs, and downstream analysis was limited to this set. We examined the reproducibility of fitness estimation within an environment by plotting the standard deviation by the mean of replicate fitness measures for each protein pair (Figure 1C and Figure 1—figure supplement 2). We found that fitness estimates are precise for high-fitness strains (putative PPIs), but less precise for low-fitness strains, which are more subject to noise stemming from growth bottlenecks, PCR, and sequencing (Li et al., 2018).

Figure 1 with 5 supplements see all
PPiSeq.

(A) A cartoon of PPiSeq yeast library construction. Strains from the protein interactome collection are individually mated to strains from the double barcoder collection and sporulated to recover haploids that contain a mDHFR-tagged protein and a barcode. Haploids are mated as pools. In diploids, expression of Cre recombinase causes recombination between homologous chromosomes at the loxP locus, resulting in a contiguous double barcode that marks the mDHFR-tagged protein pair. (B) Representative double barcode frequency trajectories over twelve generations of competitive growth. Trajectories are used to calculate a quantitative fitness for each double barcoded strain. (C) Standard error of fitness estimates of protein pairs. The blue and red lines represent the median standard error for a sliding window (width = 0.05) of all fitness-ranked protein pairs and of only the positive protein-protein interactions, respectively. (D) Estimated fitness of strains with different double barcodes representing the same protein pair in the same pooled growth. Positive protein pairs are randomly selected within a fitness window. ORF x Null is a violin plot of the fitness distribution of all interactions with a mDHFR fragment that is not tethered to a yeast protein. DHFR(-) is yeast strains that lack any mDHFR fragment. DHFR(+) is yeast strains that contain a full length mDHFR under a strong promoter. (E) Density plot of the fitness of double barcodes that represent the same putative PPI in the same pooled growth. In (B–E), the data in SD environment are used. (F) Density plot of the normalized mean fitness of the same PPI between two pooled growth cultures in SD environment. PPIs detected in either one growth culture are included. (G) Venn diagram of the number of PPIs identified within our search space by PPiSeq in nine environments (magenta), PPiSeq in SD environment (pink), the interactome-scale protein-fragment complementation screen (PCA, yellow), and the BioGRID database excluding any PPIs previously detected by PCA (blue).

To identify protein pairs that interact, we compared replicate fitness scores for each protein pair against a set of ~17,000 negative control strains that were included in the pool (Figure 1D, ORF x Null). Using putatively positive and negative reference sets, we empirically determined a statistical threshold for each environment with the best balance of precision and recall (positive predictive value (PPV) >61% and true positive rate >41% in SD media, Appendix 1—table 1). In general, protein pairs required replicated high fitness measures (mean >0.18 in SD) and low variance (standard deviation <0.02 in SD) to be identified as a PPI (Figure 1C and D). However, due in part to differences in the strength of methotrexate selection across environments, the minimal fitness of a PPI varied by environment (Figure 1—figure supplement 5 and Appendix 2 Table S4). Quantitative fitness measures between different barcodes marking the same PPI within a growth pool correlate well (0.67 < r < 0.92 for all environments, Spearman’s correlation, Figure 1E and Figure 1—figure supplement 3), as do fitness measures of the same PPI assayed in replicate growth cultures (Spearman’s r = 0.73, Figure 1F).

In total, we identified 13,764 PPIs across nine environments, a 2.9-fold increase over PPIs identified under standard growth conditions (SD media), and a 5.6-fold increase over colony-based mDHFR-PCA (Figure 1G and Figure 1—figure supplement 4). Within our search space, PPIs identified by liquid-growth-based PPiSeq encompass 62% (1544 of 2476 PPIs, PPV > 98.2% in Tarassov et al., 2008, see Appendix 1 for differences between PPVs) of those identified by mDHFR-PCA, but only 7% (925 of 13,259 PPIs) of those identified by other methods. In addition, PPiSeq identified 34% of PPIs (1838 of 5347) that mDHFR-PCA had identified as likely to be PPIs (80% < PPV < 98.2% in Tarassov et al., 2008) but had not called.

Further highlighting similarities to colony-based mDHFR-PCA, PPiSeq is enriched for PPIs within and between membranous compartments (Figure 2A, blue box) and those involved in cell division (purple box), and between related biological processes (Figure 2B, green box). By examining how much PPIs change across environments, we found that some cellular compartments and biological processes were more likely to gain or lose PPIs. For example, the number of PPIs associated with the chromosome are more variable (Figure 2A, orange triangles), as are those involved in transcription (Figure 2B, black triangles), which is consistent with a role for PPI-level regulation of gene expression. Also changing are PPIs involved in protein translation, RNA processing, and ribosome regulation (Figure 2B, brown triangles), which could reflect a global regulation of ribosome production rates in different growth or stress conditions (Brauer et al., 2008; Gasch et al., 2000). However, some processes are less likely to change in PPI number, such as endocytosis, exocytosis, vacuolar organization, and transport of amino acids, lipids, carbohydrates and endosomes.

Functional enrichment of PPIs detected by PPiSeq.

PPI enrichment (red) and variability (blue) across environments of gene ontology cellular compartments (A) and biological processes (B). Red node size is the percent of interacting protein pairs (interaction density) observed for a given pair of GO terms and the node color is the p-value of this percent over a random expectation. Blue node size and color are the variability (coefficient of variation, CV) of interaction densities across nine environments tested. GO terms are hierarchically clustered by the interaction density (red dots). Boxes mark frequently interacting and invariable cellular compartments and biological processes involved in membrane transport and protein maturation (blue and green) and cell division (purple). Barplots show the mean CV of interaction densities for each GO term across all other GO terms. Orange, black and brown triangles highlight three different groups of related GO terms: chromosome, transcription, and translation, respectively.

A large dynamic accessory protein interactome

We partitioned PPIs by the number of environments in which they were identified and defined PPIs at opposite ends of this spectrum as ‘mutable’ PPIs (identified in only 1–3 environments) and ‘immutable’ PPIs (identified in 8–9 environments). Mutable PPIs far outnumbered immutable PPIs, with PPIs identified in only one environment outnumbering all other PPIs combined (Figure 3A and Figure 3—figure supplement 1). Immutable PPIs were likely to have been previously reported by colony-based mDHFR-PCA or other methods, while the PPIs found in the fewest environments were not. One possible explanation for this observation is that previous PPI assays, which largely tested in standard laboratory growth conditions, and variations thereof, are biased toward identification of the least mutable PPIs. That is, since immutable PPIs are found in nearly all environments, they are more readily observed in just one. However, another possible explanation is that, in our assay, mutable PPIs are more likely to be false positives in environment(s) in which they are identified or false negatives in environments in which they are not identified. To investigate this second possibility, we first asked whether PPIs present in very few environments have lower fitnesses, as this might indicate that they are closer to our limit of detection. We found no such pattern: mean fitnesses were roughly consistent across PPIs found in 1 to 6 conditions, although they were elevated in PPIs found in 7–9 conditions (Figure 3—figure supplement 2A). To directly test the false-positive rate stemming from pooled growth and barcode sequencing, we validated randomly selected PPIs within each mutability bin by comparing their optical density growth trajectories against controls (Figure 3B). We found that mutable PPIs did indeed have lower validation rates in the environment in which they were identified, yet putative false positives were limited to ~50%, and, within a bin, do not differ between PPIs that have been previously identified and those that have been newly discovered by our assay (Figure 3—figure supplement 2B). We also note mutable PPIs might be more sensitive to environmental differences between our large pooled PPiSeq assays and clonal 96-well validation assays, indicating that differences in validation rates might be overstated. To test the false-negative rate, we assayed PPIs identified in only SD by PPiSeq across all other environments by optical density growth and found that PPIs can be assigned to additional environments (Figure 3—figure supplement 2C). However, the number of additional environments in which a PPI was detected was generally low (2.5 on average), and the interaction signal in other environments was generally weaker than in SD (Figure 3—figure supplement 2D). To better estimate how the number of PPIs changes with PPI mutability, we used these optical density assays to model the validation rate as a function of the mean PPiSeq fitness and the number of environments in which a PPI is detected. This accurate model (Spearman's r = 0.98 between predicted and observed, Figure 3—figure supplement 3) provided confidence scores (predicted validation rates) for each PPI (Appendix 2 Table S5) and allowed us to adjust the true positive PPI estimate in each mutability bin. Using this more conservative estimate, we still found a preponderance of mutable PPIs (Figure 3—figure supplement 3D). Finally, we used a pair of more conservative PPI calling procedures that either identified PPIs with a low rate of false positives across all environments (FPR <0.1%, see Materials and methods, Appendix 2 Table S6) or removed data from the most extreme environmental perturbation (16°C). Using these higher confidence sets, we still found that mutable PPIs far outnumbered others in the multi-condition PPI network (Figure 3—figure supplement 4).

Figure 3 with 6 supplements see all
A large accessory protein interactome.

(A) Barplot of PPI number binned by the number of environments in which a PPI is observed. Colors indicate PPIs called by both PPiSeq and BioGRID inclusive of mDHFR-PCA (red), PPIs called by PPiSeq that scored high but were not called by mDHFR-PCA (yellow), and PPIs called by PPiSeq that scored low by mDHFR-PCA (blue). (B) Validation rates of PPIs binned by the number of environments in which a PPI is observed. Validations were performed using OD600 trajectories of clones grown in multi-well plates. (C) Mutable and less mutable PPIs form distinct modules in the network. PPIs that are detected in at least five environments (red edges) form two tight core modules. PPIs that are detected in fewer than five environments (blue edges) form a less connected accessory module. Proteins in different modules are labeled with different shapes and colors. The network uses an edge-weighted spring embedded layout. (D) Number of PPIs within and between each community. PPIs detected in at least or fewer than five environments are shown in red and blue, respectively. The size of the square or circle is proportional to the number of PPIs. The number below each community is the number of proteins within each community. (E) Scatter plot of degrees and mutability scores of proteins in each community.

We next examined if mutable PPIs can be distinguished from less mutable PPIs in the PPI network. We first asked whether proteins that participate in less or more mutable PPIs differed in their connectivity (degree distribution) and found that proteins from less mutable PPIs are more likely to be and interact with hubs in the multi-environment PPI network (Figure 3—figure supplement 5A and B). One possible explanation for these findings is that proteins in less mutable PPIs form a ‘backbone’ in the PPI network and that proteins in more mutable PPIs decorate this backbone. Alternatively, proteins that participate in less and more mutable PPIs may be forming distinct modules in the protein interactome, with modules of less mutable PPIs being more highly connected. To distinguish between these two possibilities, we calculated a mutability score for each protein based on the variability in fitness measures across environments for PPIs in which it participates, and compared the mutability score of each protein against the mean mutability score of its neighbors (Materials and methods, Figure 3—figure supplement 5C and D). We found that the neighbors of proteins in less mutable PPIs tend to be in other PPIs with low mutability, suggesting that less mutable PPIs are forming tight ‘core’ modules in the PPI network and that more mutable PPIs form a distinct ‘accessory’ module. To verify this, we applied three network community detection algorithms to our multi-environment PPI network (Clauset et al., 2004; Pons and Latapy, 2005; Rosvall et al., 2009). Each found three major communities that differed by the mean mutability score of their constituents: two core modules composed with a greater proportion of highly connected (hub) proteins with low or intermediate mutability scores, and an accessory module composed of less connected proteins with high mutability scores (Figure 3C–E and Figure 3—figure supplement 6). Using gene ontology term enrichment analysis, we found that proteins in each module were enriched for different cellular compartments. The less mutable core modules are associated with membranous compartments (low mutability) and the actin cytoskeleton (intermediate mutability), while the highly mutable accessory module is associated with the chromosome (Figure 3E and Appendix 2 Table S7).

Properties of mutable PPIs

Previous work has used patterns of mRNA co‐expression to predict which hub proteins are likely to participate in PPIs that are mutable across environments (calling these proteins ‘date hubs’), and found that they have several distinguishing features (Han et al., 2004). However, the robustness of these conclusions has been vigorously debated (Agarwal et al., 2010; Batada et al., 2006; Batada et al., 2007; Bertin et al., 2007; Yu et al., 2008). Here, because we have a direct measure of the mutability of each PPI across environments, we are able to confidently validate or reject predictions made from co-expression data, in addition to testing new hypotheses.

We first asked whether co-expression is indeed a predictor of PPI mutability and found that it is: co-expression mutual rank (which is inversely proportional to co-expression across thousands of microarray experiments) declined with PPI mutability (Figure 4A and Figure 4—figure supplement 1; Obayashi and Kinoshita, 2009; Obayashi et al., 2019). Confirming a second prediction from co-expression data, we found that proteins from less mutable PPIs are more likely to co-localize to the same cellular compartment in standard laboratory conditions than those from more mutable PPIs (Chong et al., 2015), suggesting localization changes drive some changes (Figure 4B; Levy et al., 2014; Rochette et al., 2014). Both the co-expression and co-localization patterns were also apparent in our higher confidence PPI sets (Figure 4—figure supplements 2A, B, 3A and B), indicating that they are not caused by different false positive rates between the mutability bins.

Figure 4 with 6 supplements see all
Properties of mutable and less mutable PPIs.

(A) The co-expression mutual rank for PPIs binned by the number of environments in which the PPI is detected. A higher mutual rank means worse co-expression. Notches are the 95% confidence interval for the median, hinges correspond to the first and third quartiles, and whiskers extend 1.5 times the interquartile range. (B) The percent of protein pairs that have been found colocalized by gene ontology (GO Slim, dashed line) and fluorescence (solid line) (Chong et al., 2015). (C) Spearman correlation between the PPI mutability score and other gene features, binned a gene’s PPI degree. In (B) and (C), the error bars are the standard deviation from 1000 bootstrapped data sets. (D) Pearson correlation between a PPI’s fitness and geometric mean abundance of two interacting proteins in Ho et al., 2018, binned by the number of environments in which a PPI is detected. (E) Examples of non-significant (Erv25 x Shr3) and significant (Akr1 x Any1) predictions. Observed heterodimer fitness (SAB) is plotted against the expectation based on the geometric mean of the two constituent homodimer fitnesses (SAA and SBB). (F) Percent of heterodimers whose fitness changes can be significantly predicted by the geometric mean of the two constituent homodimers, binned by the number of environments in which a PPI is observed.

We next asked what features from existing genome-wide data sets correlate with proteins that are involved in less or more mutable PPIs. We binned proteins by their PPI degree, and, within each bin, determined the correlation between the mutability score and another gene feature (Figure 4C and Figure 4—figure supplement 4, Appendix 2 Table S8) (Costanzo et al., 2016; Finn et al., 2014; Gavin et al., 2006; Holstege et al., 1998; Krogan et al., 2006; Levy and Siegal, 2008; Myers et al., 2006; Newman et al., 2006; Ostlund et al., 2010; Rice et al., 2000; Stark et al., 2011; Wapinski et al., 2007; Ward et al., 2004; Yang, 2007; Yu et al., 2008). These correlations were also calculated using our higher confidence PPI sets, confirming results from the full data set (Figure 4—figure supplements 2C, D, 3C and D). We found that mutable hubs (>15 PPIs) have more genetic interactions, in agreement with predictions from co-expression data (Bertin et al., 2007; Han et al., 2004), and that their deletion tends to cause larger fitness defects. However, these two correlations were weaker or not seen with non-hub proteins. Contradicting predictions from co-expression data, we did not find that mutable hubs are more quickly evolving, as measured by dN/dS (Costanzo et al., 2016; Yang, 2007), but that non-hub proteins (the vast majority of mutable PPIs) are. In addition, we found that proteins that participate in mutable PPIs tend to have lower mRNA and protein expression levels than those in less mutable PPIs (Holstege et al., 1998; Newman et al., 2006), perhaps because highly abundant proteins are more difficult to post-translationally regulate. As might be expected, we also found that mutable hubs, but not non-hubs, are more likely to participate in multiple protein complexes than less mutable proteins (Figure 4—figure supplement 5; Costanzo et al., 2016). Finally, confirming another prediction from co-expression data (Singh et al., 2007), we found that proteins in mutable PPIs are more likely to contain intrinsically disordered regions, suggesting that they may adopt different conformations in some environments or intracellular contexts to promote a PPI. Taken together, these correlations largely confirm predictions from co-expression data, but highlight differences between hub and non-hub proteins in the core and accessory PPI networks.

Given the above results, we suspected that changes in mutable PPIs may be more likely to be driven by post-translational regulation than protein abundance changes (Taylor et al., 2009). We first tested if the relationship between protein abundance and PPI abundance changes with PPI mutability. For each PPI, we compared our estimated PPI abundance in standard rich media (fitness in SD) to a meta-analysis estimate of mean protein abundance of each protein pair (Ho et al., 2018). These two measures correlate weakly (Pearson's r = 0.16, Figure 4—figure supplement 6A). However, when binned by PPI mutability, we find that less mutable PPIs are more strongly correlated than highly mutable PPIs (Figure 4D). Second, we asked if quantitative changes in PPI abundance across conditions are better predicted by changes in protein abundance for less mutable PPIs. Using homodimer PPI abundance estimates (fitnesses) as a proxy for protein abundance (Stynen et al., 2018) in a mass-action kinetics model (Materials and methods), we used linear regression to test if changes in the estimated abundance of a set of PPIs (1212 heterodimers) are explained by our proxy measure of the constituent protein abundances (180 homodimers). We find that some heterodimers fit this simple model well (Figure 4E, right), while others fit poorly (Figure 4E, left), with 19% of heterodimers explained by the model (Figure 4—figure supplement 6B, Materials and methods). We next used logistic regression to determine what features may underlie a good or poor fit to the model (Figure 4—figure supplement 6C) and found that PPI mutability was the best predictor, with more mutable PPIs being less frequently explained (Figure 4F). Unexpectedly, mean protein abundance was the second best predictor, with high abundance predicting a poor fit to the model, particularly for less mutable PPIs (Figure 4—figure supplement 6D and E). Taken together, these data suggest that mutable PPIs are subject to more post-translational regulation across environments and that high basal protein abundance may saturate the binding sites of their partners, limiting the ability of gene expression changes to regulate PPIs.

Rewiring of the glucose transporter network

We have shown above that most protein network rewiring occurs within a dynamic accessory module and that changes in the module are more likely to reflect condition-specific protein localization, binding, or other mechanisms, rather than by changes in protein abundance. To explore whether coordinated changes are also occurring in the core module and what drives these changes, we examined how core PPIs involved in carbohydrate transport change across environments (Figure 5A). Glucose is transported into the cell using membrane-spanning hexose transporters (HXT genes) (Boles and Hollenberg, 1997; Kruckeberg, 1996; Özcan and Johnston, 1999). The major Hxt transporters can be divided by their extracellular glucose binding into low affinity (Hxt1, Hxt3), moderate affinity (Hxt5), and high affinity (Hxt2, Hxt4, Hxt6, Hxt7) (Ozcan and Johnston, 1995; Reifenberger et al., 1995; Verwaal et al., 2002), two of which were excluded from our screen (Hxt4, Hxt6). For PPIs involving at least one HXT gene product, we found that there were only minor differences in PPI abundance between most environments that contained glucose as their sugar source. However, several environments showed marked differences (NaCl, 16°C, Doxorubicin, Raffinose), presumably reflecting how transport is altered in these environments. We examined changes in the Raffinose (low glucose [Özcan and Johnston, 1999]) and NaCl (high glucose, osmotic stress [Verwaal et al., 2002]) environments more closely by plotting a carbohydrate transport subnetwork in each environment (Figure 5B and C) and validating differences using optical density growth trajectories of 90 randomly chosen PPIs (Figure 5D). Most Hxt PPIs that were detectable in SD were lost in the low-glucose Raffinose environment. However, Hxt5, the only glucose transporter expressed during the stationary phase (Verwaal et al., 2002; Wu et al., 2004), maintained most of its PPIs (Figure 5B and C). Surprisingly, the high-affinity glucose transporter Hxt7, whose mRNA and protein expression has been reported to increase in Raffinose (Lai et al., 2007; Ye et al., 2001), lost most of its PPIs, suggesting other factors such as protein endocytosis or degradation could be important to regulating its activity over our growth cycles (Ye et al., 2001). In NaCl, fewer PPIs were detectable than in SD, but some PPIs, especially those involving Hxt5 increased in relative abundance (Figure 5B and C). Previous work has found that HXT5 expression increases during salt stress (Verwaal et al., 2002), suggesting the change in Hxt5 PPIs may be a direct consequence of its change in abundance. Taken together and consistent with mechanisms of core network regulation described above, these results suggest that the changes in the glucose transport subnetwork are mainly driven by changes in protein expression (Celaj et al., 2017).

Carbohydrate transport network rewiring as captured by PPiSeq.

(A) Heatmap of abundances (fitnesses) of PPIs involved in carbohydrate transport across different environments. (B) Boxplots of fitnesses of PPIs involving Hxt proteins in SD, Raffinose and NaCl environments. The bottom of each box, the line drawn in the box, and the top of the box represent the 1st, 2nd, and 3rd quartiles, respectively. The whiskers extend to ±1.5 times the interquartile range. (C) Circular network plots of PPIs containing Hxt proteins in SD, Raffinose, and NaCl environments. Nodes are proteins and colors are as in (A). Node size is proportional to its degree in the multi-environment PPI network. Edge width is proportional to abundance in each environment. (D) Scatter plot of fitness changes relative to SD as measured by PPiSeq and clonal growth dynamics for randomly chosen carbohydrate-transport PPIs in Raffinose (80 PPIs) and NaCl (90 PPIs).

Size of the pan-environment protein interactome

The yeast protein interactome has been previously estimated to contain 37,600 to 75,500 detectable interactions in standard growth conditions (Hart et al., 2006; Sambourg and Thierry-Mieg, 2010). We have shown here that much of the PPI network is rewired across conditions (Figure 3A), suggesting that the pan-environment PPI network is likely to be larger than these projections. To estimate how many PPIs remain to be discovered within our search space (~9% of protein pairs), we constructed several hypothetical PPI-by-environment observation matrices by randomly assigning a PPI observation at a rate proportional to its confidence score within an environment (Materials and methods, Appendix 2 Table S5). We then plotted PPI accumulation curves across permuted orders of environment addition (Oksanen et al., 2019) and found that the number of observed PPIs is beginning to approach saturation, with fewer PPIs accumulating with the addition of each new environment (Figure 6). Borrowing a species richness estimator from ecology (Oksanen et al., 2019), we estimate that there are ~10,840 true interactions within our search space across all environments, ~3 fold more than are detected in SD (note difference to Figure 3, which counts observed PPIs). This analysis shows that the number of PPIs present across all environments is much larger than the number observed in a single condition, but that it is feasible to discover most of these new PPIs by sampling a limited number of conditions.

The estimated number of true PPIs discovered by PPiSeq using repeated sampling of data in permuted orders of environment addition.

Boxplots summarize the distribution of the number of unique PPIs across permutations. The bottom of each box, the line drawn in the box, and the top of the box represent the 1st, 2nd, and 3rd quartiles, respectively. The whiskers extend to ±1.5 times the interquartile range. Overlayed solid red lines and dashed red lines are the Kindt exact accumulation curves and the bootstrap estimators of the total number of unique PPIs across infinite environments for each simulation, respectively.

Discussion

We developed a massively parallel quantitative PPI assay to characterize how the yeast protein interactome changes across conditions. We found a large, previously undersampled accessory protein interactome that changes across conditions and is enriched for proteins involved in transcription, RNA processing, and translation. Mutable PPIs are less likely to co-express, co-localize, and be explained by simple mass action kinetics, and more likely to contain intrinsically disordered regions, evolve quickly, and be of low abundance in standard conditions. Taken together, these results suggest that major rewiring within the protein interactome is driven to a larger extent by post-translational regulation, with protein abundance changes being more likely to tune levels of relatively immutable PPIs in the core interactome.

Results presented here and elsewhere (Huttlin et al., 2020) suggest that PPIs discovered under a single condition or cell type are a small subset of the full protein interactome emergent from a genome. We sampled nine diverse environments and found approximately threefold more interactions than in a single environment. However, the discovery of new PPIs began to saturate, indicating that most condition-specific PPIs can be captured in a limited number of conditions. Testing in many more conditions and with PPI assays orthogonal to PPiSeq will undoubtedly identify new PPIs, however a more important outcome could be the identification of coordinated network changes across conditions. Using a test set of ~1.6 million (of ~18 million) protein pairs across nine environments, we find that specific parts of the protein interactome are relatively stable (core modules) while others frequently change across environments (accessory modules). However, two important caveats of our study must be recognized before extrapolating these results to the entire protein interactome across all environment space. First, we tested for interactions between a biased set of proteins that have previously been found to participate in at least one PPI as measured by mDHFR-PCA under standard growth conditions (Tarassov et al., 2008). Thus, proteins that are not expressed under standard growth conditions are excluded from our study, as are PPIs that are not detectable by mDHFR-PCA or PPiSeq. It is possible that a comprehensive screen using multiple orthogonal PPI assays would alter our observations related to the relative dynamics of different regions of the protein interactome and the features of mutable and immutable PPIs. Second, we tested a limited number of environmental perturbations under similar growth conditions (batch liquid growth). It is possible that more extreme environmental shifts (e.g. growth as a colony, anareobic growth, pseudohyphal growth) would introduce new accessory modules or alter the mutability of the PPIs we detect. Nevertheless, results presented here provide a new mechanistic view of how the cell changes in response to environmental challenges, building on the previous work that describes coordinated responses in the transcriptome (Brauer et al., 2008; Gasch et al., 2000) and proteome (Breker et al., 2013; Chong et al., 2015).

We have shown that our iSeq platform is capable of building libraries that exceed one billion interactions (Liu et al., 2019; Schlecht et al., 2017), making it feasible to expand assay coverage to an entire protein interactome. While we used reconstruction of C-terminal-attached mDHFR fragments as a reporter for PPI abundance, similar massively parallel assays could be constructed with different PCA reporters or tagging configurations to validate our observations and overcome false negatives that are specific to our reporter. Indeed, the recent development of ‘swap tag’ libraries, where new markers can be inserted C- or N-terminal to most genes (Weill et al., 2018; Yofe et al., 2016), in combination with our iSeq double barcoder collection (Liu et al., 2019), makes extension of our approach eminently feasible.

Our assay detected subtle fitness differences across environments (Figure 3—figure supplement 1B and C), which we used as a rough estimate for changes in relative PPI abundance. While it would be tempting to use fitness as a direct readout of absolute PPI abundance within a cell, non-linearities between fitness and PPI abundance may be common and PPI dependent. For example, the relative contribution of a reconstructed mDHFR molecule to fitness might diminish at high PPI abundances (saturation effects) and fitness differences between PPIs may be caused, in part, by differences in how accessible a reconstructed mDHFR molecule is to substrate. In addition, environmental shifts might impact cell growth rate, initiate a stress response, or result in other unpredictable cell effects that impact the selective pressure of methotrexate and thereby fitness (Figure 1—figure supplements 2, 3 and 5). Finally, our assays were performed across cycles of batch growth meaning that changes in PPI abundance across a growth cycle (e.g. lag, exponential growth, saturation) are coarse grained into one measurement. While this method potentially increases our chance of discovering a diverse set of PPIs, it might have an unpredictable impact on the relationship between fitness and PPI abundance (Li et al., 2018). To overcome these issues, strains containing natural or synthetic PPIs with known abundances and intracellular localizations could be spiked into cell pools to calibrate the relationship between fitness and PPI abundance in each environment. In addition, continuous culturing systems may be useful for refining precision of growth-based assays such as ours.

PPIs have often been reported as a qualitative presence or absence of a PPI, but the propensity of two proteins expressed in the same cell to form a complex is a continuum. For some analyses in this work, we use qualitative statistical calls to identify ‘positive’ PPIs, but our approach preserves and analyzes a quantitative signal for each PPI. Quantitative assays presented here and elsewhere (Celaj et al., 2017; Diss and Lehner, 2018; Rochette et al., 2014; Schlecht et al., 2012; Schlecht et al., 2017) hold promise to shift the paradigm of high-throughput PPI assays from PPI detection to in vivo PPI characterization. This will require novel analyses to re-conceptualize the PPI network as a continuum of interaction probabilities that are dependent not only on changes in protein abundance, but also on post-translational modifications, intracellular localization, steric effects, and competitive binding (Stynen et al., 2018). One important goal would be to estimate an in vivo ‘functional binding affinity’ for each PPI — an important analog to in vitro binding affinity that reports how PPI abundance scales with the abundance of its constituent proteins (Kastritis and Bonvin, 2013). Here, we use homodimer abundance as a proxy for protein abundance. However, genome-wide mRNA abundance measures could be used as a proxy for protein abundance or protein abundance could be measured directly in the same pool (Levy et al., 2014) by, for example, attaching a full length mDHFR to each gene using ‘swap tag’ libraries mentioned above (Weill et al., 2018; Yofe et al., 2016). Changes in the functional affinity across environments would point to other mechanisms of in vivo regulation that could be dissected in high-throughput by combining PPiSeq with mutagenesis of interacting proteins or trans-acting factors (Diss and Lehner, 2018).

Materials and methods

PPiSeq library construction

Construction of diploid PPiSeq library

Request a detailed protocol

A interactome-scale protein-fragment complementation assay (PCA) screen (Tarassov et al., 2008), found 2770 PPIs with a positive predictive value (PPV) of 98.2%. In this study, MATa bait (F[1,2], 1757 strains) and MATα prey (F[3], 1135 strains) PCA strains that participated in a PPI with > 80% PPV (~10,000 PPIs) in Tarassov et al., 2008 were selected for barcoding. Each strain was barcoded in duplicate, pooled with strains of the same mating type, and mated to opposite mating type strains as part of a pool to generate ~4 double barcoded strains per PPI. Strains lost at each stage of this process are detailed in Supplementary file 1. The final diploid PPiSeq pool contained ~6 million double barcodes, representing ~1.6 million protein pairs. Controls added to this pool are discussed later.

Construction of haploid PPiseq libraries

Request a detailed protocol

Haploid PCA strains were picked from Yeast-Interactome Collection (Dharmacon, YSC5849) using the ROTOR HDA (SINGER instruments) and mated as arrays to the double barcoder collection on YPD agar plates (Liu et al., 2019). Following a 24 hr incubation at 30°C, diploids were selected by replicating colonies onto YPD + Nat + G418 plates (bait-F[1,2]) or YPD + Hyg + G418 plates (prey-F[3]) and incubating at 30°C for 48 hr. Selected diploids were replicated onto the sporulation plates (Baryshnikova et al., 2010), sealed with parafilm, and incubated at room temperature for a week. Sporulated bait-F[1,2] diploids were replicated onto SC - Met - Lys - Cys - Arg - His + Canavanine + G418 + Nat plates and incubated at 30°C for 96 hr to select MATa barcoded bait (F[1,2]) strains. Sporulated prey-F[3] diploids were replicated onto SC - Met - Lys - Cys - Arg - Leu + Canavanine + G418 + Hyg plates and incubated at 30°C for a week to select for MATα barcoded prey (F[3]) strains. To further purify barcoded haploids, they were replicated again onto the same selection plates and incubated at 30°C for another 48 hr. Purified haploids were stored at -80°C in 384-well plates.

Generation of the diploid PPiSeq library by bulk mating

Request a detailed protocol

Frozen barcoded F[1,2] and F[3] strains were thawed and replicated into 384-well plates with SC - Met - Lys - Cys - Arg - His + Canavanine + G418 + Nat and SC - Met - Lys - Cys - Arg - Leu + Canavanine + G418 + Hyg liquid media, respectively. The cells were grown to saturation at 30°C for 96 hr. All F[1,2] and F[3] clones were pooled by pipetting, resulting in 2.13 X 1010 and 3.36 X 1010 cells, respectively. F[1,2] and F[3] pools were each transferred to independent flasks of 1 L YPD + G418 liquid media and grown at 30°C for 24 hr. The two cell pools were mixed (2 L, 5.36 X 1011 cells), pelleted, resuspended in 50 mL water, plated onto 46 YPD plates at a density of ~1.15 X 1010 cells per plate, and incubated for 24 hr at 30°C to mate. All cells were scraped from YPD plates and pooled in ~250 mL of water. The number of cells in the pool was counted (7 X 1011 cells) and all cells were plated onto 265 YPD + Nat + Hyg + G418 plates at equal cell densities (~2.5 X 109 cells per plate). Cells were incubated at 30°C for 48-h and then replica plated onto another 265 YPD + Nat + Hyg + G418 plates. After another 48 hr incubation at 30°C, cells were scraped from the 265 plates and pooled in ~1.3 L of water. All cells (~1.67 X 1012) were spun down at 1500 g, resuspended with 8.5 L YP + 2% galactose liquid media, and grown at 30°C for 48 hr. Cells were counted (~2.88 X 1012) and ~44.4% of cells were transferred into 16 L SC-Ura liquid media, and incubated for 72 hr at 30°C. Cells were 1:10 diluted into another 16 L SC-Ura liquid media and grown for another 48-h. Finally, all the cells were collected to form the pooled diploid PPiSeq library.

Construction of barcoded DHFR-fragment control strains

Overview

Bait or prey PCA constructs that nonspecifically bind to other tagged proteins will result in false positive interactions calls. To screen for these promiscuous constructs (Tarassov et al., 2008), we constructed barcoded strains that constitutively express one of four unlinked DHFR fragments under the TEF1 promoter at the HO locus: linker-F[1,2], F[1,2], linker-F[3], and F[3], where the linker codes for a 10 amino acid (Gly.Gly.Gly.Gly.Ser)2 flexible polypeptide. These strains were mated to the haploid PPiSeq libraries to generate diploid DHFR-fragment control (ORF x Null) strains. ORF x Null control strains were spiked into the final diploid PPiSeq library. Identification and removal of promiscuous bait or prey constructs because of interactions with DHFR-fragment controls is described in Appendix 1.

Construction and barcoding haploid strains that express DHFR fragments

Request a detailed protocol

The linker-DHFR[1,2]-NatMx and DHFR[1,2]-NatMX fragments were cloned from the plasmid pAG25 linker-DHFR[1,2]-NatMX (Tarassov et al., 2008) with primers oSL368 and oSL373, and oSL369 and oSL373, respectively (Supplementary file 2). Similarly, the linker-DHFR[3]-HygMx and DHFR[3]-HygMX fragments were cloned from the plasmid pAG32 linker-DHFR[3]-HygMX (Tarassov et al., 2008) with primers oSL368 and oSL373, and oSL370 and oSL373, respectively. These primers add BamHI and XhoI restriction sites on either end of each amplicon. BamHI and XhoI restriction sites were used to subclone amplicons downstream of the TEF1pr sequence in the pS413 TEF1pr-His3 plasmid. These plasmids were used as a PCR template to construct homologous recombination cassettes for integration at the HO locus. PCR was carried out using primers oSL378 and oSL379, which add 40-nucleotides of homology to the HO locus to either end of the aplicons. Amplicons were integrated into the HO locus of BY4741 (MATα, his3Δ1, leu2Δ0, met15Δ0, ura3Δ0) or BY4742 (MATα, his3Δ1, leu2Δ0, lys2Δ0, ura3Δ0) using standard lithium acetate transformation and selected on YPD + Nat or YPD + Hyg, respectively. Each of these strains was barcoded twice, as described above, resulting in strains ySL235-241 (Supplementary file 3).

Bulk mating of barcoded DHFR-fragment control strains with haploid PPiSeq libraries

Request a detailed protocol

Pooled F[1,2] and F[3] haploid PPiSeq libraries were constructed as described above. Approximately 1.5 × 109 cells of the pooled MATa PPiSeq library (~2×1010 cells) and ~1.5×109 cells of the pooled MATα PPiseq library (~3×1010 cells) were inoculated into 75 mL YPD + Nat + G418 and 75 mL YPD + Hyg + G418 liquid media, respectively, and grown for 16 hr. Meanwhile, barcoded MATa and MATα fragment control strains were each inoculated into 3 mL YPD + Nat + G418 or YPD + Hyg + G418 liquid media, respectively, and grown for 24 hr. 1 mL of each of the four barcoded MATa fragment control strains (ySL235-238) was added to 80 mL YPD + Nat + G418 liquid media and grown for 16 hr. Similarly, 1 mL of each of the four barcoded MATα fragment control strains (ySL239-242) was added to 80 mL YPD + Hyg + G418 liquid media and grown for 16 hr. These fragment control strain libraries were each mixed with the corresponding haploid PPiSeq library at equal cell numbers to yield mating pools of 2 × 1010 cells. Mating pools were pelleted, resuspended in water and plated on two YPD plates at a density of 1010 cells/plate. Cells on mating plates were incubated for 24 hr at 30°C, scraped and pooled in water (~3.6×1010 cells). Cells were plated onto 10 YPD + Nat + Hyg + G418 plates at equal cell densities, incubated for 48 hr at 30°C, and replica plated onto another 10 YPD + Nat + Hyg + G418 plates. After another 48 hr incubation at 30°C, cells were scraped, pooled in water, counted (~6×1010 for each mating), spun down, transferred to 350 mL YP + Galactose liquid media, and grown for 24 hr at 30°C. Next, 120 mL cells from each mated pool were transferred into 1 L SC - Ura liquid media and incubated for 48 hr at 30°C. Cells from each pool were diluted 1:10 into another 1 L of SC - Ura liquid media and grown for another 48 hr. The two polls were mixed at a ratio such that the final pool contains ~2000 copies of each genotype.

Adding spike-in controls to the diploid PPiSeq library

Request a detailed protocol

We spiked-in several different barcoded control strains into the diploid PPiSeq library at various frequencies to aid in the downstream data analysis (Supplementary file 4). In addition to DHFR-fragment controls developed here, several other previously developed controls were spiked-in: 10 positive control strains containing a full length of mDHFR (DHFR+), 100 negative control strains lacking a mDHFR (DHFR-), a set of 70 likely protein-protein interaction pairs (positive reference set or PRS) (Liu et al., 2019; Yu et al., 2008), and a set of 67 random pairs (random reference set or RRS) (Liu et al., 2019; Yu et al., 2008). After mixing, the pool of 1.5 × 1011 cells was centrifuged, washed with SC-Ura liquid media, transferred into 6 L SC-Ura liquid media, and grown at 30°C for 24 hr to form the final PPiSeq library.

Cell growth

Request a detailed protocol

The final PPiSeq library was grown in serial batch culture in 14 environments with 0.5 μg/mL methotrexate, 9 of which were used for PPI detection (Supplementary file 5, see below for a discussion why some environments could not be used). The PPiSeq library was grown in the SD environment twice, with replicates being performed in different labs and at different times (Figure 1F). For each environment, ~7.4×109 of frozen cells were inoculated into 1.2 L media in a 2 L Delong flask (Bellco). The cells were grown for a total of 21 generations, bottlenecking 1:8 every 48 hr (~3 generations between transfers). Bottlenecks were performed by pelleting 150 mL of the evolution and then transferring the pellet into 1.2 L of fresh media. At each bottleneck, the cell concentration was measured to estimate the number of generations that passed. The ~1.05 L of cells remaining after bottlenecking was centrifuged, resuspended in water (~1×109 cells/mL) and then stored in −20 °C for downstream processing.

The methotrexate concentration chosen (0.5 μg/mL) was previously determined to cause a ~30% fitness deficit for cells that lack DHFR in SD (Schlecht et al., 2017). Prior to barcode sequencing, we assessed whether this methotrexate concentration causes a similar fitness deficit in the other environments by monitoring growth of cells that lack DHFR (DHFR- control stains) in that environment with or without the addition of methotrexate (Figure 1—figure supplement 5). To our surprise, the fitness cost of 0.5 μg/mL methotrexate varied considerably between environments, with some environments showing similar growth curves of the DHFR- cells in the presence and absence of methotrexate. Because our ability to detect PPIs depends on methotrexate exerting a large fitness cost in the absence of any DHFR molecules, five environments with small fitness costs were not processed further. In future studies, we recommend first tuning the methotrexate concentration in each environment so that all environments cause a similar fitness deficit.

Barcode sequencing

Request a detailed protocol

Genomic DNA was extracted using the MasterPure Yeast DNA purification Kit (epicentre) with modifications, as described (Schlecht et al., 2017). Double barcodes were then amplified using a two-step PCR protocol (Levy et al., 2015; Liu et al., 2019). First, a four-cycle PCR was performed with OneTaq polymerase (New England Biolabs) in 200 reactions (125 μL/reaction), with 500 ng of genomic DNA template per reaction (~500 copies per double barcode total). Primers for this reaction follow this format:

  • Forward:

  • ACACTCTTTCCCTACACGACGCTCTTCCGATCTNNNNNNNNXXXXXTTAATATGGACTAAAGGAGGCTTTT

  • Reverse:

  • CTCGGCATTCCTGCTGAACCGCTCTTCCGATCTNNNNNNNNXXXXXXXXXTCGAATTCAAGCTTAGATCTGATA.

The Ns in these sequences correspond to any random nucleotide and are used as Unique Molecular Identifiers (UMIs) in the downstream analysis to remove skew in the counts caused by PCR jackpotting. The Xs correspond to a one of several multiplexing tags, which allows different samples to be distinguished when loaded on the same sequencing flow cell. The complete list of all primers used here are in Appendix 2 Table S3 of Liu et al., 2019. The PCR products were pooled and purified with NucleoSpin Gel and PCR Clean-up columns (Macherey-Nagel) at 20 reactions per column. A second 22-cycle PCR was performed with PrimeSTAR HS polymerase (Takara) in 20 X125 μL reactions. For each reaction, 30 μL of purified product from the previous PCR was used as template, and primers were standard Illumina paired-end ligation primers (pE1 and pE2 [Liu et al., 2019]). PCR products were pooled and purified by NucleoSpin Gel and PCR Clean-up columns (Macherey-Nagel) at five reactions per column. Amplicons were further purified by agarose gel electrophoresis. Because barcode amplicons may form heteroduplex DNA, with a different electrophoretic mobility, the complex PCR product typically formed a smear with apparent sizes between 400 bp and 3 kb. To prevent biases that may be caused by differences between barcodes in the propensity to form heteroduplex molecules, we cut out an agarose chunk that spanned this entire smear and used this for sequencing. Purified amplicons were quantified by Qubit fluorometry (Life Technologies, Q32854), pooled, and pair-end sequenced on Illumina Hiseq 4000 with 25% balanced DNA spike-in.

Barcode sequencing analysis

Barcode counting

Request a detailed protocol

Barcode reads were processed with custom written software in Python, as described (Levy et al., 2015; Schlecht et al., 2017), with modifications (available in Github https://github.com/sashaflevy/PPiSeq). Briefly, sequences were first sorted by their multiplexing tags, and then parsed to isolate the two barcodes (26 base pairs each) and two UMIs (eight base pairs each). Barcode reads were removed if they failed to pass any of three quality filters: (1) The average Illumina quality score for both barcode regions must be greater than 30, (2) the first barcode must match the regular expression: '\D*?(.ACC|T.CC|TA.C|TAC.)\D{4,7}?AA\D{4,7}?TT\D{4,7}?TT\D{4,7}?(.TAA|A.AA|AT.A|ATA.)\D*', (3) the second barcode must match the regular expression: '\D*?(.ACC|T.CC|TA.C|TAC.)\D{4,7}?AA\D{4,7}?AA\D{4,7}?TT\D{4,7}?(.TAC|T.AC|TT.C|TTA.)\D*'. Each barcode (of 2) was independently clustered with Bartender (default settings except z = −1) (Zhao et al., 2018). Bartender cluster centroids (with associated read indices) were matched to the known barcode sequences (Liu et al., 2019) if they were less than two mismatches away (Levenshtein distance). These matches provided a list of all read indices for a particular known barcode. The read indices for all unique double barcodes were identified, and the number of unique UMI combinations at these indices was used as the ultimate count for each double barcode. For all time points and conditions, the average double barcode read count was greater than 30.

Correction for putative PCR chimeras

Request a detailed protocol

PCR chimeras are two barcodes that stem from two different templates that are merged during PCR or sequencing (Jaffe et al., 2019; Schlecht et al., 2017; Sinha et al., 2017). PCR chimeras are most easily detected by the presence of an erroneous double barcode (that is not in the pool) made from two single barcodes that are in the pool, but paired with other barcodes. However, PCR chimeras can also inflate counts of double barcodes that are in the pool. We have previously found that the number of PCR chimeras for a barcode pair (BC1-BC2) scales linearly with the product of the total count of each constituent barcode in the pool across all double barcodes (BC1*BC2) (Schlecht et al., 2017). This linear relationship can vary subtlety from sample to sample. To determine the relationship for each sample, we therefore fit a line (using the lm() function in R) for each BC1-BC2 combination that did exist in the experimental pool at each time point in each environment. This line was used to estimate the expected number of PCR chimeras for each BC1-BC2 combination that did exist in the pool. The expected number of PCR chimeras was subtracted from the observed BC1-BC2 read count to generate a corrected read count.

Fitness estimation

Request a detailed protocol

Fitness of each double barcode was estimated by likelihood maximization using Fit-Seq (default settings) (Li et al., 2018). Some double barcodes were present in the pool at low or undetectable frequencies, making fitness estimation unreliable. Prior to performing Fit-Seq, we therefore merged all lineages that were likely to result in a poor fitness estimate into a single large lineage. If a lineage met any of the following criteria, it was merged: (1) it contains less than three times points with a read count greater than zero, (2) it has no time point with greater than four reads, or (3) it has less than 10 reads across all time points. In all environments, the merged lineage trajectory was similar to negative control lineages, indicating that merged lineages are likely to be negative for PPIs. To remove poor fitness estimates that were not initially merged, we calculated the difference (d) between the observed trajectory and the predicted trajectory (from Fit-Seq) given the estimated fitness:

d = (Predicted counti  Observed countiObserved counti)2

where i is the generation number.

We defined lineages with poor fitness estimates as those with d >= 19 and removed them from downstream analyses.

Identification of PPIs

Request a detailed protocol

In each environment and for each protein pair, we calculated a mean fitness value (f) and a p-value (p) against negative controls from its replicate fitness measurements. We called PPIs using a combination of f and p by using a dynamic threshold that rendered the best balance between precision and recall, as assessed using reference sets constructed from the BioGRID database. We removed PPIs containing ‘promiscuous proteins’ by identifying proteins that interact with untethered mDHFR fragment controls. Details can be found in Appendix 1.

PPI validation by split mDHFR clonal growth dynamics

Request a detailed protocol

MATa (BAIT-DHFR-F[1,2]-NatMX) and MATα (PREY-DHFR-F[3]-HgyMX) PCA strains were cherry-picked from the Yeast-Interactome Collection (Dharmacon, YSC5849) and mated on YPD plates using a ROTOR HDA (SINGER instruments). A negative control diploid strain that lacks DHFR was generated by mating a MATa (HO::NatMX) strain with a MATα (HO::HygMx) strain. Following mating, cells were plated onto YPD + Nat + Hyg agar and grown for 48 hr at 30°C. Each diploid colony was re-arrayed into 3 wells of YPD + Nat + Hyg liquid media in the same 96-well plate. Negative control diploids were manually inoculated into 6 wells of each 96-well plate. Diploids (PPIs) on each plate were randomly picked using ‘sample’ function in R (replace = FALSE) from PPIs that meet specific requirements. Plates were grown for 24 hr at 30°C, and then stored in 15% glycerol at −80°C. Frozen cells were thawed and inoculated into a new 96-well plate filled with yeast nitrogen base + ammonium sulfate + His + Leu + Ura + glucose and grown for 48 hr at 30°C. Cells were next inoculated into a fresh 96-well plate containing the condition media and grown for 48 hr at 30°C. Growth conditions are as listed in Supplementary file 5, plus an additional 72 mg/L uracil, since these diploids do not contain URA3. For growth curve measurements, cells were re-inoculated into fresh media, and the optical density (OD600) of each culture was monitored every 15 mins for 48 hr using a GENios microplate reader (Tecan). The optical density trajectory of each strain was also measured in the absence of methotrexate (MTX-) as a control. The area under the curve (AUC) for each well was calculated as the sum of all OD readings right after saturation (40 hr for MTX + and 25 hr for MTX-). We compared the replicate AUCs of each protein pair (AUCtarget MTX + - AUCtarget MTX -) against those of negative controls (AUCcontrol MTX + - AUCcontrol MTX -) using a one-sided Welch’s t-test, and calculated an adjusted p-value (Q-value) for each protein pair. A protein pair was considered as a validated PPI if Q-value <= 0.05. Since there is a marginal difference of AUC between positive PPIs and the negative controls in the absence of MTX, we did not measure AUCs without MTX when validating PPI dynamics across different conditions. The change in AUC for a protein pair in a specific condition compared to SD was calculated as follows:

AUC changecondition =(AUCtarget MTX+  AUCcontrol MTX+AUCcontrol MTX+)condition  (AUCtarget MTX+  AUCcontrol MTX+AUCcontrol MTX+)SD

Network density within Gene Ontology terms

Request a detailed protocol

Gene Ontology (GO) terms of each protein were obtained from SGD (20190405, https://downloads.yeastgenome.org/curation/literature/go_slim_mapping.tab). Non-informative GO terms (‘cellular_component’, ‘biological_process’, ‘molecular_function’, ‘not_yet_annotated’, ‘other’) were removed. GO terms cover three domains: cellular compartment (CC), biological process (BP), and molecular function (MF). In this analysis, a PPI is considered as an interaction between two GO terms. For each GO term interaction, the interaction density was calculated as the ratio of the number of PPIs identified over the number of protein pairs assayed. To estimate GO term enrichment in our PPI network, we constructed 1000 random networks by replacing each bait or prey protein that was involved in a PPI with a randomly chosen protein from all proteins in our screen. This randomization preserves the degree distribution of the network. The interaction density for a GO term pair was calculated across all random networks and the distribution of network densities was used to assess statistical significance of the real network using a one-way Fisher-Pitman permutation test (Hothorn et al., 2006). The coefficient of variation (CV) of interaction density is the variability in PPI number for each GO term pair across all nine environments tested. The variability for an individual GO term was calculated by averaging the CVs for all GO term interactions in which it participates.

Modeling validation rate

Request a detailed protocol

For each PPI, we obtained three measurements: the mean fitness value (f), the log10(p-value) against negative controls (p), and the number of environments in which the PPI is detected (n). In Appendix 1, we identified positive PPIs by using a combination of f and p. In the section on PPI validation, we re-tested 502 PPIs by comparing their optical density growth trajectories against controls. With this validation data, we aimed to predict a validation rate (v) for each PPI based on f and n. Each of these two features was a good predictor by itself (Figure 3—figure supplement 3A and B). To further improve predictions, we split the 502 re-tested PPIs into different bins of f and n. When binning by f for example, if there are 50 PPIs with 0.25 < f <= 0.26, and 45 of them re-tested as positive, then vbin is 0.9 (45/50), fbin is the mean f of the 50 PPIs, and nbin is the mean n of the 50 PPIs. Similarly, when binning by n, we calculated vbin and fbin by taking the mean v and f of PPIs within that bin. By following this procedure, we accumulated two datasets (one binned on f, and the other binned on n) of how the vbin depends on fbin and nbin. We trained a linear model (lm(vbin ~fbin + nbin) in R) with one dataset, and examined the accuracy of the model with the other dataset. The model performed well with the validated data (Figure 3—figure supplement 3C). We then applied this model to each PPI in each environment to calculate a predicted validation score.

Calculating mutability scores

Request a detailed protocol

We obtained replicate fitness values for each PPI in each environment. Due to the differences in the range of fitness values in different environments (Figure 1—figure supplement 5), we first normalized fitness values for each PPI replicate (barcode) using the following formula:

Normalized Fitnessbarcode=Fitnessbarcode - FitnessDHFR(-)FitnessDHFR(+) - FitnessDHFR(-)

where Fitnessbarcode is the fitness of each double barcoded PPiSeq strain, FitnessDHFR(-) is the mean fitness of 100 control strains that lack a DHFR reporter, and FitnessDHFR(+) is the mean fitness of 10 control strains that have a full-length DHFR reporter under the TEF1 promoter. From these replicate normalized fitness values, we calculated an average fitness value for each protein pair in each environment. For protein pairs with a different protein chimera acting the bait protein (i.e. ORF1-DHFR[1,2] X ORF2-DHFR[3] and ORF1-DHFR[3] X ORF2-DHFR[1,2]), only one version was kept and the average fitness was used. Negative mean fitness values, which are likely due to measurement error of a non-interacting protein pair, were replaced with 0. Using these normalized fitness values, we calculated the coefficient of variation (CV) for each PPI using its fitness values across all environments. The mutability score for each protein was then calculated by averaging the CVs for PPIs in which it participates. The mutability score for a target protein’s neighbor was calculated as above, only the interaction between the target protein and its neighbor was first removed.

Network visualization and community detection

Request a detailed protocol

The comprehensive PPI interaction network was generated with Cytoscape (v3.7.2) (Shannon et al., 2003) using the edge-weighted spring embedded layout with the default setting. Glucose transport related PPI networks were generated by igraph (R package) (Csárdi and Nepusz, 2006) using the layout_in_circle layout. Communities were identified with three algorithms: Fast-Greedy (Clauset et al., 2004), Walktrap (Pons and Latapy, 2005), InfoMAP (Rosvall et al., 2009), which were implemented by igraph (R package) (Csárdi and Nepusz, 2006) with default settings. Fast-Greedy is a bottom-up hierarchical approach, which tries to optimize modularity in a greedy manner. Initially, every vertex belongs to a separate community, and communities are merged iteratively so that each merge yields the largest increase in modularity. The algorithm stops when the maximum modularity is reached. Walktrap is an approach based on random walks, with the rationale that walks are more likely to be trapped within the same community because there are only a few edges that lead outside a community. It uses the results of these random walks to merge separate communities in a bottom-up manner like the Fast-Greedy algorithm. Fast-Greedy and Walktrap suffer from a resolution limit (small communities below a size threshold will always be merged to neighboring communities), and thus cannot detect small communities. The InfoMap algorithm uses community partitions of the graph as a Huffman code that compresses the information about a random walker exploring the network. It finds an optimal partition that assigns nodes to modules such that the information needed to compress the movement of the random walkers is minimized. The three major communities detected by the InfoMAP algorithm (Rosvall et al., 2009) are shown in Figure 3C.

Gene Ontology enrichment of network communities

Request a detailed protocol

We examined whether proteins in three major communities detected by infoMAP (Rosvall et al., 2009) were overrepresented in specific cellular compartments or biological processes. Gene Ontology enrichment was implemented with a conditional hypergeometric algorithm using the GOstats R package (Falcon and Gentleman, 2007). The set of proteins used for enrichment comparison are proteins that are involved in at least one PPI as determined by PPiSeq.

Co-expression analysis

Request a detailed protocol

Co-expression mutual rank scores were downloaded from COXPRESdb v7.3 (https://coxpresdb.jp/) and used directly to determine if differences exist between PPI groups (Obayashi et al., 2019).

Colocalization rate analysis

Request a detailed protocol

Two proteins were defined as colocalized by fluorescence if they localize to at least one cellular compartment in common in synthetic medium, as identified from high-throughput microscopy of GFP-fusion proteins (Chong et al., 2015). The colocalization rate for a PPI set was calculated by only considering proteins pairs for which localization is reported for both proteins. Two proteins were defined as colocalized by gene ontology if they are annotated to the same GO-slim cellular compartment annotation, not counting co-annotation to the general terms ‘other’, ‘membrane’, or ‘cellular_component’. Bootstrapping was performed by sampling with replacement from all detected PPIs prior to binning PPIs (by number of environments detected) and determining the colocalization rates within those bins.

Features of genes that participate in mutable and less mutable PPIs

Request a detailed protocol

Genes were binned by their PPI degree, as reported in BioGRID (Stark et al., 2006), and protein mutability scores were correlated with various previously defined gene features (Costanzo et al., 2016; Finn et al., 2014; Gavin et al., 2006; Holstege et al., 1998; Krogan et al., 2006; Levy and Siegal, 2008; Myers et al., 2006; Newman et al., 2006; Ostlund et al., 2010; Rice et al., 2000; Stark et al., 2011; Wapinski et al., 2007; Ward et al., 2004; Yang, 2007; Yu et al., 2008). Gene features are from Costanzo et al., 2016. Correlations binned by PPI degree are reported in Figure 4C and unbinned correlations are reported in Figure 4—figure supplement 4. Bootstrapping was performed by sampling with replacement from all genes that participate in at least one PPI prior to determining the correlation between the PPI variability score and a gene feature.

Protein and PPI abundance analysis

Request a detailed protocol

Protein abundance was taken from a meta-analysis study (Ho et al., 2018) and compared to PPI fitness in the SD environment, under the assumption that it is the most comparable to standard lab growth conditions. The base R ‘cor’ function was used to calculate the Pearson correlation of the PPiSeq signal of each PPI to the geometric mean abundance of the two constituent proteins. We then repeated this analysis on sets of PPIs binned by the number of environments in which a PPI was detected.

Homodimer/heterodimer mass-action kinetics model

Request a detailed protocol

To estimate how variation in protein abundance across environments affects abundance of the PPI complex, we used a simple mass-action kinetics model of two proteins, of concentrations A and B, binding to form a dimer AB. This relationship can be expressed as AB=kfABAB, where kfAB is a constant that reflects the population-average functional affinity of the two proteins. We assume that this constant also encompasses effects from heterozygosity (only 1/4 of heterodimers contain complementary mDHFR tags) and that cell volume does not change. We assume that the fitness F of a PPiSeq strain depends linearly on the concentration of a dimer of complementary tagged proteins, such as FAB=kmDHFRAB. In order to model the expected AB, we used the fitness of the homodimers (FAA and FBB) as a proxy for abundance of the constituent proteins (A and B), assuming that most of the proteins are dissociated. Getting a relationship like [A]FAAkmDHFRkfAA for both homodimers AA and BB, we use this to model the fitness of the strain corresponding to the heterodimer AB as FABkfABkfAAkfBBFAAFBB. Thus, the fitness of the heterodimer strain should correspond to the geometric mean of the fitnesses of the homodimers of the constituent proteins, scaled by a term relating the functional affinities of each dimer. This model does not capture the complexity of in vivo cellular biology, but serves as a simple quantitative tool to dissect the contribution of these two factors (abundance and functional affinity) with respect to each other.

We selected data from all heterodimer PPIs amongst all homodimer-participating proteins, requiring that each PPI be quantified in at least four conditions and be considered positive in at least one condition. We pooled these measurements of PPIs from both tag configurations (F[1,2]-F[3] or F[3]-F[1,2]), if available, collecting a dataset of 1,212 PPIs amongst 180 proteins. With this, we used ordinary least-square linear regression in R to fit a model of the geometric mean of the homodimer signals multiplied by a free constant and plus a free intercept. Significantly explained heterodimer PPIs were judged by a significant coefficient (FDR < 0.05) of homodimer fitnesses predicting the heterodimer fitness (slope) and an insignificant intercept (p-value>0.05, single-test). This criteria was used to identify PPIs for which protein expression does or does not appear to play as significant of a role as other post-translational mechanisms.

To select protein/PPI features that may be associated with being explained or not explained by this expression-variation mass-action model, we collected features with sufficient data from Byrne and Wolfe, 2005; Cherry, 2015; Chong et al., 2015; Costanzo et al., 2016; Marchant et al., 2019; Stark et al., 2006. We constructed a dataset from these by treating each heterodimer as a separate observation for each of the two constituent proteins, thus associating the features of a protein with the modeling result for each considered heterodimer in which it participates. We then used the 'glm' function in R to fit a logistic model. We adjusted the resulting p-values for each feature's coefficient by the Benjamini-Hochberg correction.

Estimating the size of the pan-environment PPI network

Request a detailed protocol

To estimate the size of the pan-environment PPI network, we used a bootstrap approach from the ecology R package vegan (Oksanen et al., 2019). However, false positives will inflate this estimate, so we first sought to obtain a more confident estimate of the true PPIs detected. We took our dataset of each PPI observed in each condition, and then sub-sampled each PPI in each environment at a rate of the modeled validation rates (as calculated above, Appendix 2 Table S5). We repeated this procedure 32 times, and for each of these trials then calculated the Kindt exact accumulation curve and the bootstrapped species richness estimate, treating each PPI analogous to a species. Then for each sub-sample, we randomly permuted the order of environment addition and calculated the curve of the accumulated number of unique PPIs 10,000 times.

Data and software availability

Request a detailed protocol

Raw barcode sequencing data are available from the NIH Sequence Read Archive as accession PRJNA630095 (https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP259652). Barcode sequences, counts, fitness values, and PPI calls are available in the Supplementary Tables (https://osf.io/jmhrb/). Additional data to make figures are available in Mendeley data (https://data.mendeley.com/datasets/9ygwhk5cs3/2) and Open Science Framework (https://osf.io/7yt59/) as detailed in code repository README files. Analysis scripts are written in R and Python. All code used to analyze data, perform statistical analyses, and generate figures is available at Github (Liu, 2020a; copy archived at https://github.com/elifesciences-publications/PPiSeq).

Appendix 1

Identification of PPIs

In each environment and for each protein pair, we calculated a mean fitness value (f) and a p-value (p) against negative controls from its replicate fitness measurements. We called PPIs using a combination of f and p by using a dynamic threshold that rendered the best balance between precision and recall, as assessed using reference sets constructed from the BioGRID database. We removed PPIs containing ‘promiscuous proteins’ by identifying proteins that interact with untethered mDHFR fragment controls.

Merging two replicate SD datasets

Since we collected fitness measurements from two independent growth cultures in the SD environment, we merged the data from these two experiments prior to PPI calling. Because the range of fitness values of the two SD datasets differed slightly, we first normalized the fitness value for each barcode:

NormalizedFitnessbarcode=FitnessbarcodeFitnessDHFR()FitnessDHFR(+)FitnessDHFR(),

where Fitnessbarcode is the fitness of each double barcoded PPiSeq strain, FitnessDHFR(-) is the mean fitness of 100 control strains that lack a mDHFR reporter, and FitnessDHFR(+) is the mean fitness of 10 control strains that have a full-length mDHFR reporter under the TEF1 promoter. We considered all barcodes that mark the same PPI from two datasets as replicate measurements for that PPI. To compare PPIs called in the SD environment with other environments (Figure 2 and after), we removed 783 PPIs that were called using a single barcode (measured twice in SD, but only once in all other environments).

Calculating a p-value for each protein-protein pair

To identify positive PPIs in each environment, we first generated a null fitness distribution from control strains (ORF x Null; 17,594 double barcodes) that contain a DHFR-fragment control (F[1,2] or F[3]) that is not fused with an ORF paired with an ORF-DHFR fusion (F[3] or F[1,2], respectively). For each protein pair with at least 2 fitness scores (2 double barcodes), we compared the replicate fitness scores against this null distribution using a one-sided Welch’s t-test and obtained a p-value for each protein pair. Protein pairs with only one fitness score were not considered for PPI detection

Construction of positive reference sets

A positive reference set in each environment was constructed by identifying high-confidence PPIs from the BioGRID database (BIOGRID-organism-Saccharomyces_cerevisiae_S288C-3.4.160) (Stark et al., 2006). We defined any protein pair identified as a PPI by at least three separate methods, and at least one binary method (‘Two-hybrid’, ‘FRET’, ‘PCA’), to be a high-confidence PPI. For each environment, we selected this high-confidence set from all protein pairs with at least two fitness scores in that environment (i.e. it has a p-value determined in section of calculating a p-value for each protein-protein pair) as the positive reference set (SD1: 580 protein pairs; SD2: 586; SD-merge: 633; Forskolin: 534; FK506: 517; NaCl: 563; Raffinose: 432; Hydroxyurea: 576; H2O2: 585; Doxorubicin: 561; 16°C: 578). We note that most previously defined PPIs have been observed in standard growth media (e.g. SD), so we expect that some fraction of positive reference PPIs may not be present in other environments. In the absence of any previous environment-specific PPI calls, these positive reference sets were nevertheless useful for selecting thresholds for our PPI calling (see below).

Construction of random reference sets

Fifty random reference sets were constructed in each environment by sampling from all protein pairs that we assayed with no evidence of being a PPI in the BioGRID database. Because most protein protein pairs are not expected to physically interact, we used this random reference set as putative negative PPIs for selecting thresholds for our PPI calling. The number of protein pairs selected for the random reference set was chosen to reflect the percentage of protein pairs in our screen that have been identified as a PPI in BioGRID by any form of evidence (~1%). That is, in an environment with 600 protein pairs in its positive reference set, we selected ~60,000 (100x) protein pairs for each random reference set. Choosing a random reference set of this size allows us to more accurately report positive predictive values (Jensen and Bork, 2008).

We note that the number of PPIs included in the positive and negative reference sets impact positive predictive values (PPVs), and thus PPVs between studies that use different sized reference sets are not directly comparable (Jensen and Bork, 2008). For example, the ratio of number of PPIs in the positive reference set over the negative reference set is ~40 times smaller in the proteome-scale PCA study, resulting in higher PPVs (Jensen and Bork, 2008; Tarassov et al., 2008). If the proteome-scale PCA study used similar sized reference sets as we do here, the PPV for positive calls would be comparable to this study.

Calling positive PPIs using a dynamic threshold on mean fitness and p-value

PPIs were identified in each environment using a combination of the mean fitness (f) and p-value (Log10(p-value): p) for each protein pair in that environment. To determine dynamic f and p thresholds for a PPI call, we first chose many discrete threshold combinations of f (0 to 0.5) and p (−5 to 0) and determined the PPV for each using the positive and random reference sets to estimate true and false positives:

PPV = TPTP + FP

where TP is the number of true positives. FP is the number of false positives.

Many combinations of discrete f and p values have a similar PPV (Appendix 1—figure 1A). However, the PPV for a given f and p can vary when using different random reference sets (Appendix 1—figures 1B and 2). Therefore, to determine the mean relationship between f and p for each PPV in an environment, we fit a sigmoidal curve [nls(f ~ SSlogis(p, Asym, xmid, scal)) in R] using data from all 50 random reference sets in that environment. Data at p < −4 (p-value<10−4) was generally too sparse to confidently fit a regression. We therefore applied a more conservative constant f threshold at these p (threshold equals the minimum f when p >= −4). We next considered each PPV regression line as a potential dynamic threshold for PPI calling. To determine the optimal dynamic threshold for each environment (the PPV regression that renders the best balance of precision and recall), we calculated the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). The threshold with the maximal Matthews correlation coefficient (MCC) (Matthews, 1975) was chosen.

MCC = TP × TN - FP × FN (TP + FP) (TP + FN) (TN + FP) (TN + FN)

The chosen threshold and its performance in each environment are shown in Appendix 1—figure 2 and Appendix 1—table 1. The precision and recall plots are shown in Appendix 1—figure 3. In all cases, using a dynamic threshold showed superior performance over any combination of discrete thresholds (Appendix 1—figure 4).

Appendix 1—figure 1
Defining a dynamic threshold for PPI calling.

(A) A discrete combination of a fitness threshold (f) and a p-value threshold (p) results in a PPV. Colored lines are fitness and p-value thresholds that result in the same PPV in SD. (B) Density plot of all f and p combinations that result in a PPV of 0.7 using 50 different random reference sets in SD. The black line is the fitted sigmoid model that is used for the dynamic threshold.

Appendix 1—figure 2
Density plots of the dynamic thresholds in each environment.

Data were split into two groups: p < −4 and p >= −4. For p >= −4, as in Appendix 1—figure 1B, a sigmoidal function was fit to f and p combinations that result in the same PPV value. For p < −4, the fitness threshold was set to equal the minimum fitness value when p >= −4. The dynamic threshold that results in the maximum MCC in each environment was shown in the plot.

Appendix 1—figure 3
The precision-recall curves of dynamic thresholds in each environment.

Red asterisks mark the thresholds with maximal Matthews correlation coefficients.

Appendix 1—figure 4
Dynamic thresholds (red) of f and p have a higher positive predictive value (PPV) than most discrete combinations (blue).

Points represent the PPVs for dynamic thresholding and for all combinations of discrete fitness and p-value thresholds underlying a constant range of false positive rates obtained from the optimal dynamic threshold in each environment.

To examine whether our analyses were robust to different thresholds, we also defined a conservative dynamic threshold with an FPR <0.1%. This strict threshold identifies ‘higher confidence’ PPIs in each environment (Appendix 1—table 1).

Appendix 1—table 1
Metrics for the dynamic thresholds used in each environment.

‘FPR’: false positive rate; ‘TPR’: true positive rate; ‘PPV’: positive predictive value; ‘MCC’: Matthews correlation coefficient; ‘Detected_PRS(70)”: 70 likely protein interaction pairs in a positive reference set; ‘Detected_RRS(67)”: 67 random pairs in a random reference set (Liu et al., 2019; Yu et al., 2008).

Optimal dynamic threshold based on the best balance between precision and recall
EnvironmentOptimal_thresholdFPRTPRPPVMCCF1_ScoreDetected_PRS(70)Detected_RRS(67)
SD10.70.002830.46470.60750.52740.5266203
SD20.730.0023150.43170.63860.52140.5152202
SD-merge0.70.0024730.41240.61870.50120.4949193
FK5060.720.001960.43520.65510.53070.523203
H2O20.730.0022320.43420.65170.52830.5212203
Hydroxyurea0.740.002220.45690.66130.54610.5405191
NaCl0.730.001450.28950.64620.42920.3999181
Forskolin0.640.0037270.54240.56080.54760.5514222
Raffinose0.480.0081050.56330.40010.46880.4679202
Doxorubicin0.770.0017650.3440.64250.46670.4481182
16 °C0.410.002810.13670.31640.2030.1909213
Arbitrary strict dynamic threshold in each environment 
EnvironmentOptimal_thresholdFPRTPRPPVMCCF1_ScoreDetected_PRS(70)Detected_RRS(67)
SD-merge0.790.0009452010.2839494470.7453517470.4570892530.411234763172
FK5060.780.0009783460.3384912960.7475739790.5003650020.465989091182
H2O20.80.0008730360.3060512820.7716751720.483121280.438278516172
Hydroxyurea0.80.0009848720.3221180560.756605230.4907675920.45186047171
NaCl0.760.0009650720.2373712260.6927896780.4025794520.353591157181
Forskolin0.770.0009214970.3026966290.7427091670.4714209050.430101999171
Raffinose0.560.0044538940.4264814810.4792530190.4470399930.451329915181
Doxorubicin0.820.0009893510.269162210.715580610.4359290610.391182865172
16 °C0.510.0009001860.0713840830.4322812110.1725182620.122533747173

Detection and removal of promiscuous proteins

We excluded from our PPiSeq library any proteins that have been previously found to promiscuously form (likely spurious) PPIs in standard growth conditions (Tarassov et al., 2008). However, new promiscuous proteins may arise in other conditions screened here. We detected these promiscuous bait or prey constructs by determining if a PPI was called when it was paired with a DHFR-fragment control . If a bait or prey construct was identified as promiscuous in one environment, all PPIs containing that construct were removed in that environment. If a bait or prey construct was identified as promiscuous in two or more environments, all PPIs containing that construct were removed from all environments. Promiscuous proteins are summarized in Appendix 1—table 2.

Appendix 1—table 2
Summary of promiscuous proteins that interact with an mDHFR fragment that is not tethered to any protein.

Promiscuous and non-promiscuous proteins are represented by 1 and 0, respectively, in each environment.

PPIPositive_environme_
number
SD_mergeH2O2HydroxyureaDoxorubicinForskolinRaffinoseNaClFK50616 °C
YMR120C6111011010
YIL143C6111011100
YLL034C5110100101
YPL139C4110000110
YGR278W201000100
YPL112C2000100100
YIL070C2000100010
YHL007C2000100010
YDR452W2000100010
YOL147C1010000000
YER087W1000100000
YER063W1000100000
YOR323C1000100000
YNL064C1000100000
YKR080W1000010000
YJL153C1000010000
YDL208W1000000100
YLR182W1000000100
YPR124W1000000100
YHR114W1000000100
YDR381C-A1000000100
YGR198W1000000100
YDR171W1000000100
YGR130C1000000100
YLL022C1000000100
YMR136W1000000100
YKL010C1000000100
YCR033W1000000100
YPL083C1000000100
YOR360C1000000100
YOR393W1000000100
YNL026W1000000100
YGR195W1000000100
YOR306C1000000100
YDL093W1000000100
YCR059C1000000100
YOL081W1000000100
YGR140W1000000100
YKL139W1000000100
YEL017C-A1000000010
YDL112W1000000010
YDR057W1000000010
YFR001W1000000001
YJL124C1000000001
YDR151C1000000001
YBR057C1000000001
YHR146W1000000001
YDR379W1000000001
YDR513W1000000001
YMR227C1000000001
YDR420W1000000001

Appendix 2

Supplementary tables

Tables can be downloaded from: https://osf.io/jmhrb/ (Liu, 2020b).

Table S1. Bottleneck cell size and generation number of serial batch culture in different environments. ‘T-1’ is frozen stock from which all competitions start. ‘T0’ to ‘T7’ are the serial population bottlenecks.

Table S2. Read count of each double barcode at different time points in each environment. Four or five time points were sequenced per environment. Spike-in control PPIs were named as follows in the column ‘PPI’: ‘ORF_HO:TEF1pr-DHFR1-2’, ‘ORF_HO:TEF1pr-linker-DHFR1-2’, ‘HO:TEF1pr-DHFR3_ORF’, and ‘HO:TEF1pr-linker-DHFR3_ORF’ are interactions with a DHFR fragment that is not tethered to a yeast protein (ORF X Null in Figure 1D); ‘positive_DHFR’ are yeast strains that contain a full length mDHFR under a strong promoter (DHFR + in Figure 1D); ‘negative_non_DHFR’ are yeast strains that lack any mDHFR fragment (DHFR- in Figure 1D); ‘Pos_PPI_number-first(ORF1 ~Pos_ORF2 ~Pos)’ and ‘Pos_PPI_number-second(ORF2 ~Pos_ORF1 ~Pos)’ are 70 likely protein interaction pairs; ‘Neg_PPI_number-first(ORF1 ~Neg_ORF2 ~Neg)’ and ‘Neg_PPI_number-second(ORF2 ~Neg_ORF1 ~Neg)’ are 67 random pairs. These likely protein interaction pairs and random pairs were chosen from the previously constructed reference sets (Liu et al., 2019; Yu et al., 2008). The suffixes of ‘first’ and ‘second’ stand for the same protein pair with a different protein chimera acting the bait protein (i.e. ORF1-DHFR[1,2] X ORF2-DHFR[3] and ORF1-DHFR[3] X ORF2-DHFR[1,2]).

Table S3. Estimated fitness and estimation error of each double barcode in each environment. Spike-in controls are as in Appendix 2 Table S2. The estimation error, d, describes the deviation of the predicted counts from the observed counts at different time points for each double barcode (see Materials and methods). High d indicates higher error in fitness estimation.

Table S4. Number of double barcodes found in the pool, mean fitness, p-value, and whether or not a PPI is called. For PPI calling, one represents a positive PPI and 0 represents a negative PPI. Spike-in controls are as in Appendix 2 Table S2.

Table S5. Predicted validation rates for identified PPIs in each environment.

Table S6. High-confidence PPIs in each environment. In each environment, one represents a positive PPI and 0 represents a negative PPI.

Table S7. Significant gene ontology terms enriched for proteins in each community of the PPI network as detected by the InfoMAP algorithm.

Table S8. Protein features derived from the PPiSeq multi-environment network and other gene features. ‘PPI.degree.PPiSeq’ is the number of PPIs detected for each protein. ‘Mean.positive.environment.number.PPiSeq’ is the mean number of environments in which those PPIs are detected. ‘Standard.deviation.positive.environment.number.PPiSeq’ is the standard deviation of the number of environments in which those PPIs are detected.

Data availability

Raw barcode sequencing data are available from the NIH Sequence Read Archive as accession PRJNA630095 (https://trace.ncbi.nlm.nih.gov/Traces/study/?acc=SRP259652). Barcode sequences, counts, fitness values, and PPI calls are available in the Supplementary Tables (https://osf.io/jmhrb/). Additional data to make figures are available in Mendeley data (https://data.mendeley.com/datasets/9ygwhk5cs3/2) and Open Science Framework (https://osf.io/7yt59/) as detailed in code repository README files. Analysis scripts are written in R and Python. All code used to analyze data, perform statistical analyses, and generate figures is available at Github (https://github.com/sashaflevy/PPiSeq; copy archived at https://github.com/elifesciences-publications/PPiSeq).

The following data sets were generated
    1. Liu Z
    2. Miller D
    3. Levy S
    (2020) Mendeley Data
    Protein-protein interaction network rewiring across environments.
    https://doi.org/10.17632/9ygwhk5cs3.2
    1. Liu Z
    2. Miller D
    3. Levy S
    (2020) NIH Sequence Read Archive
    ID PRJNA630095. Protein-protein interaction network dynamics across environments by DNA barcode sequencing.
    1. Liu Z
    (2020) Open Science Framework
    ID jmhrb. Protein-protein interaction network rewiring across environments.

References

    1. Csárdi G
    2. Nepusz T
    (2006)
    The igraph software package for complex network research
    InterJournal, Complex Systems 1695:1–9.
    1. Jensen LJ
    2. Bork P
    (2008)
    Not comparable, but complementary
    Science 322:56–57.
    1. Rosvall M
    2. Axelsson D
    3. Bergstrom CT
    (2009) The map equation
    The European Physical Journal Special Topics 178:13–23.
    https://doi.org/10.1140/epjst/e2010-01179-1

Decision letter

  1. Christian R Landry
    Reviewing Editor; Université Laval, Canada
  2. Naama Barkai
    Senior Editor; Weizmann Institute of Science, Israel

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

[Editors' note: this paper was reviewed by Review Commons.]

Acceptance summary:

The organization of protein-protein interaction networks has mainly been described in standard laboratory conditions. Because of this, we do not know how static and how complete these networks are. Here, the authors combine an in vivo protein-protein interaction detection assay with barcode sequencing to interrogate more than one million potential interactions across multiple environments. They show that some interactions are stable while others are condition-dependent, and they identify protein properties associated with each category. Importantly, they show that our current knowledge of these networks cannot be complete unless we study their organization in multiple conditions.

https://doi.org/10.7554/eLife.62365.sa1

Author response

Reviewer #1 (Evidence, reproducibility and clarity):

The manuscript is clearly written and the figures appropriate and informative. Some descriptions of data analyses are a little dense but reflect what would appear long hard efforts on the part of the authors to identify and control for possible sources of misinterpretation due to sensitivities of parameters in their fitness model. The authors efforts to retest interactions under non-competition conditions allay fears of most concerns that I would have. One problem though that I could not see explicitly addressed was that of potential effects of interactions between methotrexate and the other conditions and how this is controlled for. Specifically, I could be argued that the fact that a particular PPI is observed under a specific condition could have more to do with a synthetic effect of treatment of cells with a drug plus methotrexate. Is this controlled for and how? I raise this because in a chemical genetic screen for fitness it was shown that methotrexate is particularly promiscuous for drug-drug interactions (Hillenmeyer ME et al., Science 2008). I tried to think of how this works but couldn't come up with anything immediately. I'd appreciate if the authors would take a crack at resolving this issue. Otherwise I have no further concerns about the manuscript.

We thank the reviewer for the kind comments. We agree with the reviewer’s point that methotrexate could be interacting with drugs or other perturbagens, similar to how the chosen nitrogen source, carbon source, or other growth conditions may interact with a drug. However, the methotrexate concentration is held constant across all conditions, as is the rest of the media components such as the nitrogen and carbon source (with the exception of the raffinose perturbation). Any interactions with methotrexate, or other media components, is undetectable without systematically varying all components for all stressors. Therefore, we use the typical experimental design of measuring molecular variation from a reference, holding invariant media components (such as methotrexate, glucose, or vitamins) fixed between conditions. This is a general practice, and we describe that every condition contains methotrexate:

“The library was grown under mild methotrexate selection in 9 environments for 12-18 generations in serial batch culture, diluting 1:8 every ~3 generations, with a bottleneck population size greater than 2 x 109 cells (Table S1).”

We also list the full details of each environment in Table S1.

Reviewer #1 (Significance):

Lui et al. expand on previous work from the Levy group to explore a massive in vivo protein interactome in the yeast S. cerevisiae. They achieve this by performing screens cross 9 growth conditions, which, with replication, results in a total of 44 million measurements. Interpreting their results based on a fitness model for pooled growth under methotrexate selection, they make the key observation that there is a vastly expanded pool of protein-protein interactions (PPI) that are found under only one or two condition compared to a more limited set of PPI that are found under a broad set of conditions (mutable versus immutable interactors). The authors show that this dichotomy suggests some important features of proteins and their PPIs that raise important questions about functionality and evolution of PPIs. Among these are that mutable PPIs are enriched for cross-compartmental, high disorder and higher rates of evolution and subcellular localization of proteins to chromatin, suggesting roles in gene regulation that are associated with cellular responses to new conditions. At the same time these interactions are not enriched for changes in abundance. These results are in contrast to those of immutable PPIs, which seem to form a core background noise, more determined by changes in abundance than what the authors interpret must be post-translational processes that may drive, for instance, changes in subcellular localization resulting in appearance of PPIs under specific conditions. The authors are also able to address a couple of key issues about protein interactomes, including the controversial Party-date Hub hypothesis of Vidal, in which they could now affirm support for this hypothesis based on their results and notably negative correlation of PPIs to protein abundance for mutable PPIs. Finally, they also addressed the problem of predicting the upper limit of PPIs in yeast, showing the remarkable results that it may be no more than about 2 times the number of proteins expressed by yeast. Such an upper limit is profoundly important to modelling cellular network complexity and, if it holds up, could define a general upper limit on organismal complexity.

This manuscript is a very important contribution to understanding dynamics of molecular networks in living cells and should be published with high priority.

Reviewer #2 (Evidence, reproducibility and clarity):

Report on Liu et al. "A large accessory protein interactome is rewired across environments" Liu et al. use a mDHFR-based, pooled barcode sequencing / competitive growth / mild methotrexate selection method to investigate changes of PPI abundance of 1.6 million protein pairs across different 9 growth conditions. Because most PPI screens aim to identify novel PPIs in standard growth conditions, the currently known yeast PPI network may be incomplete. The key concept is to define immutable PPIs that are found in all conditions and "mutable" PPIs that are present in only some conditions.

The assay identified 13764 PPIs across the 9 conditions, using optimized fitness cut offs. Steady PPI i.e. across all environments, were identified in membrane compartments and cell division. Processes associated with the chromosome, transcription, protein translation, RNA processing and ribosome regulation were found to change between conditions. Mutable PPIs are form modules as topological analyses reveals.

Interestingly, a correlation on intrinsic disorder and PPI mutability was found and postulated as more flexible in the conformational context, while at the same time they are formed by less abundant proteins.

I appreciate the trick to use homodimerization as an abundance proxy to predict interaction between heterodimers (of proteins that homodimerize). This "mass-action kinetics model" explains the strength of 230 out of 1212 tested heterodimers.

A validation experiment of the glucose transporter network was performed and 90 "randomly chosen" PPIs that were present in the SD environment were tested in NaCl (osmotic stress) and Raffinose (low glucose) conditions through recording optical density growth trajectories. Hxt5 PPIs stayed similar in the tested conditions, supported by the current knowledge that Hxt5 is highly expressed in stationary phase and under salt stress. In Raffinose, Hxt7, previously reported to increase the mRNA expression, lost most PPIs indicating that other factors might influence Hxt7 PPIs.

Points for consideration:

A clear definition of mutable and immutable is missing or could not be found.

We thank the reviewer for pointing this out. We have now added better definition of mutable and immutable in subsection “A large dynamic accessory protein interactome”:

“We partitioned PPIs by the number of environments in which they were identified and defined PPIs at opposite ends of this spectrum as “mutable” PPIs (identified in only 1-3 environments) and “immutable” (identified in 8-9 environments).”

Approximately half of the PPIs have been identified in one environment. Many of those mutable PPIs were detected in the 16°C condition. Is there an explanation for the predominance of this specific environment? What are these PPIs about?

The reviewer is correct that ~40% of the PPIs identified in only one environment were found in the 16℃ environment. One reason for this could be technical: the positive predictive value (PPV) is the lowest amongst the conditions (16℃: 31.6%, mean: 57%, Appendix 1—table 1). It must be noted, however, that PPVs are calculated using reference data that has generally been collected in standard growth conditions. So, it might be expected that the most divergent environment from standard growth conditions (resulting in the most differences in PPIs) would result in a lower PPV in our study even if the true frequency of false positives was equivalent across environments. We have attempted to be transparent about the quality of the data in each environment by reporting PPVs and other metrics in Appendix 1—table 1. However, we suspect that the large number of PPIs unique to 16°C is due in part to the fact that it causes the largest changes in the protein interactome, and believe that it should be included, even at the risk of lowering the overall quality of the data. The main reason for this is that this data is likely to contain valuable information about how the cell copes with this stress. For example, we find, but do not highlight in the manuscript, that 16°C-specific PPIs contain two major hubs (DID4: 285 PPIs involved in endocytosis and vacuolar trafficking, and DED1: 102 PPIs involved in translation), both of which are reported to be associated with cold adaptation in yeast (Hilliker et al., 2011; Isasa et al., 2015).

To assess whether the potentially higher false-positive rate in 16°C could be impacting our conclusions related to PPI network organization and features of immutable and mutable PPIs, we repeated these analyses leaving out the 16°C data and found that our main conclusions did not change. This new analysis is now presented in Figure 3—figure supplement 6 and described in subsection “A large dynamic accessory protein interactome”.

“Finally, we used a pair of more conservative PPI calling procedures that either identified PPIs with a low rate of false positives across all environments (FPR < 0.1%) or removed data from the most extreme environmental perturbation (16°C, see Materials and methods, Table S6). Using these higher confidence sets, we still found that mutable PPIs far outnumbered others in the multi-condition PPI network (Figure 3—figure supplement 4).”

We have also added references to other panels in Figure 3—figure supplement 4 throughout the manuscript, where appropriate.

50 % overall retest validation rate is fair and reflects a value comparable to other large-scale approaches. However, what is the actual variation, e.g. between mutable PPIs and immutable or between condition. e.g. at 16°C.

We validated 502 PPIs present in the SD environment and an additional 36 PPIs in the NaCl environment. As the reviewer suggests, we do indeed observe differences in the validation rate across mutability bins. This data is reported in Figures 3B and Figure 3—figure supplement 4B, and we use this information to provide a confidence score for each PPI in subsection “A large dynamic accessory protein interactome”.

“To better estimate how the number of PPIs changes with PPI mutability, we used these optical density assays to model the validation rate as a function of the mean PPiSeq fitness and the number of environments in which a PPI is detected. […] Using this more conservative estimate, we still found a preponderance of mutable PPIs (Figure 3—figure supplement 4).”

The validation rate in NaCl is similar to SD (39%, 14/36), suggesting that validation rates do not vary excessively across environments. Because validation experiments are time consuming (we performed 6 growth experiments per PPI), performing a similar scale of validations in all environments as in SD would be resource intensive. Instead, we report a number of metrics (true positive rate, false positive rate, positive predictive value) in Appendix 1—table 1 using large positive and random reference sets. We believe these metrics are sufficient for readers to compare the quality of data across environments.

What is the R correlation cut-off for PPIs explained in the mass equilibrium model vs. not explained?

We do not use an R correlation cut-off to assess if a PPI is explained by the mass-action equilibrium model. We instead rely on ordinary least-squares regression as detailed in the Materials and methods.

“…we used ordinary least-squares linear regression in R to fit a model of the geometric mean of the homodimer signals multiplied by a free constant and plus a free intercept.

[…] This criteria was used to identify PPIs for which protein expression does or does not appear to play as significant of a role as other post-translational mechanisms.”

The first criterion identifies a quantitative fit to the model of variation being related. The second criterion is used to filter out PPIs for which the relationship appears to be explained by more than just the homodimer signals. This approach is more stringent, but we believe this is the most appropriate statistical test to assess fit to this linear model.

90 "randomly chosen" PPIs for validation. It needs to be demonstrated that these interactions are a random subset otherwise is could also mean cherry picked interactions.

We selected 90 of the 284 glucose transport-related PPIs for validation using the “sample” function in R (replace = FALSE). We have now included text that describes this in the Materials and methods:

“Diploids (PPIs) on each plate were randomly picked using the “sample” function in R (replace = FALSE) from PPIs that meet specific requirements.”

Figure 4 provides interesting correlations with the goal to reveal properties of mutable and less mutable PPIs. PPIs detected in the PPIseq screen can partially be correlated to co-expression (4A) as well as co-localization. Does it make sense to correlate the co-expression across number of conditions? Are the expression correlation conditions specific? In this graph it could be that expression correlation stems from condition 1 and 2 and the interaction takes place in 4 and 5 still leading to the same conclusion.… Is the picture of the co-expression correlation similar when you simply look at individual environments like in S4A?

We use co-expression mutual rank scores from the COXPRESdb v7.3 database (Obayashi et al., 2019). These mutual rank scores are derived from a broad set of 3593 environmental perturbations that are not limited to the environments we tested here. By using this data, we are asking if co-expression in general is correlated with mutability and report that it is in Figure 4A. We thank the reviewer for pointing out that this was not clear and have now added text to clarify that the co-expression analysis is derived from external data:

“We first asked whether co-expression is indeed a predictor of PPI mutability and found that it is: co-expression mutual rank (which is inversely proportional to co-expression across thousands of microarray experiments) declined with PPI mutability (Figures 4A and Figure 4—figure supplement 1) (Obayashi and Kinoshita, 2009; Obayashi et al., 2019).”

The new Figure 4—figure supplement 1 examines how the co-expression mutual rank changes with PPI mutability for PPIs identified in each environment, as the reviewer suggested. For each environment, we find the same general pattern as in Figure 4A (which considers PPIs from all environments).

Figure 4C: Interesting, how dependent are the various categories?

It is well known that many of these categories are correlated (e.g. mRNA expression level and protein abundance, and deletion fitness effect and genetic interaction degree). However, we believe it is most valuable to report the correlation of each category with PPI mutability independently in Figures 4C and Figure 4—figure supplement 4, since similar correlations with related categories provide more confidence in our conclusions.

Figure 4 F: When binned in the number of environments in which the PPI was found, the distribution peaks at 6 environments and decreases with higher and lower number of environments. The description /explanation in the text clearly says something else.

We reported in subsection:

“We next used logistic regression to determine what features may underlie a good or poor fit to the model (Figure 4—figure supplement 6C) and found that PPI mutability was the best predictor, with more mutable PPIs being less frequently explained (Figure 4F). Unexpectedly, mean protein abundance was the second-best predictor, with high abundance predicting a poor fit to the model, particularly for less mutable PPIs (Figure 4—figure supplement 6D and Figure 4—figure supplement 6E).”

As the reviewer notes, Figure 4F shows that the percent of heterodimers explained by the model does appear to decrease for PPIs observed in the most environments. We suspect that the reviewer is correct that something more complicated is going on. One possibility is that extraordinarily stable PPIs (stable in all conditions) would have less quantitative variation in protein or PPI abundance across environments. If this is true, it would be statistically difficult to fit the mass action kinetics model for these PPIs (lower signal relative to noise), thereby resulting in the observed dip.

A second possibility is that multiple correlated factors are associated with contributing positively or negatively to a good fit, and the simplicity of Figure 4F or a Pearson correlation does not capture this interplay. This second possibility is why we used multivariate logistic regression (Figure 4—figure supplement 6C) to dissect the major contributing factors. In the text quote above, we report that high abundance is anti-correlated with a good fit to the model (Figure 4—figure supplement 6D, Figure 4—figure supplement 6E). Figure 4C shows that immutable PPIs tend to be formed from highly abundant proteins. One possible explanation is that highly abundant proteins saturate the binding sites of their binding partners, breaking from the assumptions of mass action kinetics model. We have now changed the word “limit” to “saturate” in subsection “Properties of mutable PPIs” to make this concept more explicit.

“Taken together, these data suggest that mutable PPIs are subject to more post-translational regulation across environments and that high basal protein abundance may saturate the binding sites of their partners, limiting the ability of gene expression changes to regulate PPIs.”

A third possibility is that the dip is simply due to noise. Given the complexity of the possible explanations and our uncertainty about which is more likely, we chose to leave this description out of the main text and focus on the major finding: that PPIs detected in more environments are generally associated with a better fit to the mass action kinetics model.

Figure 6: I apologize, but for my taste this is not a final Figure 6 for this study. Investigation of different environments increases the PPI network in yeast, yes, yet it is very well known that a saturation is reached after testing of several conditions, different methods and even screening repetition (sampling). It does not represent an important outcome. Move to supplement or remove.

We included Figure 6 to summarize and illustrate the path forward from this study. This is an explicit reference to impactful computational analyses done using earlier generations of data to assess the completeness of single-condition interaction networks (Hart et al., 2006; Sambourg and Thierry-Mieg, 2010). Here, we are extending PPI measurement of millions-scale networks across multiple environments and are using this figure to extend these concepts to multi-condition screens. We agree that the property of saturation in sampling is well known, but it is surprising that we can quantitatively estimate convergence of this expanded condition-specific PPI set using only 9 conditions. Thus, we agree with reviewer 1 that these are “remarkable results” and that the “upper limit is profoundly important to modelling cellular network complexity and, if it holds up, could define a general upper limit on organismal complexity.” We think this is an important advance of the paper, and this figure is useful to stimulate discussion and guide future work.

Reviewer #2 (Significance):

Liu et al. increase the current PPI network in yeast and offer a substantial dataset of novel PPIs seen in specific environments only. This resource can be used to further investigate the biological meaning of the PPI changes. The data set is compared to previous DHFR providing some sort of quality benchmarking. Mutable interactions are characterized well. Clearly a next step could be to start some "orthogonal" validation, i.e. beyond yeast growth under methotrexate treatment.

The reviewer makes a great point that we also discuss in the Discussion:

“While we used reconstruction of C-terminal-attached mDHFR fragments as a reporter for

PPI abundance, similar massively parallel assays could be constructed with different PCA reporters or tagging configurations to validate our observations and overcome false negatives that are specific to our reporter. Indeed, the recent development of “swap tag” libraries, where new markers can be inserted C- or N-terminal to most genes (Weill et al., 2018; Yofe et al., 2016), in combination with our iSeq double barcoder collection (Liu et al., 2019), makes extension of our approach eminently feasible.”

Reviewer #3 (Evidence, reproducibility and clarity):

Summary:

The manuscript "A large accessory protein interactome is rewired across environments" by Liu et al. scales up a previously described method (PPiSeq) to test a matrix of ~1.6 million protein pairs of direct protein-protein interactions in each of 9 different growth environments.

While the study found a small fraction of immutable PPIs that are relatively stable across environments, the vast majority were “mutable” across environments. Surprisingly, PPIs detected only in one environment made up more than 60% of the map. In addition to a false positive fraction that can yield apparently mutable interactions, retest experiments demonstrate (not surprisingly) that environment-specificity can sometimes be attributed to false-negatives. The study authors predict that the whole subnetwork within the space tested will contain 11K true interactions.

Much of environment-specific rewiring seemed to take place in an “accessory module”, which surrounds the core module made of mostly immutable PPIs. A number of interesting network clustering and functional enrichment analyses are performed to characterize the network overall and “mutable” interactions in particular. The study report other global properties such as expression level, protein abundance and genetic interaction degree that differ between mutable and immutable PPIs. One of the interesting findings was evidence that many environmentally mutable PPI changes are regulated post-translationally. Finally, authors provide a case study about network rewiring related to glucose transport.

Major issues:

The Results section should more prominently describe the dimensions of the matrix screen, both in terms of the set of protein pairs attempted and the set actually screened (I think this was 1741 x 1113 after filtering?). More importantly, the study should acknowledge in the Introduction that this was not a random sample of protein pairs, but rather focused on pairs for which interaction had been previously observed in the baseline condition. This major bias has a potentially substantial impact on many of the downstream analyses. For example, any gene which was not expressed under the conditions of the original Tarrasov et al. study on which the screening space was based will not have been tested here. Thus, the study has systematically excluded interactions involving proteins with environment-dependent expression, except where they happened to be expressed in the single Tarrasov et al. environment. Heightened connectivity within the “core module” may result from this bias, and if Tarrasov et al. had screened in hydrogen peroxide (H2O2) instead of SD media, perhaps the network would have exhibited a code module in H2O2 decorated by less-densely connected accessory modules observed in other environments. The paper should clearly indicate which downstream analyses have special caveats in light of this design bias.

We have now added text the matrix dimensions of our study in the Results:

“To generate a large PPiSeq library, all strains from the protein interactome

(mDHFR-PCA) collection that were found to contain a protein likely to participate in at least one PPI (1742 X 1130 protein pairs), (Tarassov et al., 2008) were barcoded in duplicate using the double barcoder iSeq collection (Liu et al., 2019), and mated together in a single pool (Figure 1A). Double barcode sequencing revealed that the PPiSeq library contained 1.79 million protein pairs and 6.05 million double barcodes (92.3% and 78.1% of theoretical, respectively, 1741 X 1113 protein pairs), with each protein pair represented by an average of 3.4 unique double barcodes (Figure 1—figure supplement 1).”

We agree with the reviewer that our selection of proteins from a previously identified set can introduce bias in our conclusions. Our research question was focused on how PPIs change across environments, and thus we chose to maximize our power to detect PPI changes by selecting a set of protein pairs that are enriched for PPIs. We have now added a discussion of the potential caveats of this choice to the Discussion:

“Results presented here and elsewhere (Huttlin et al., 2020) suggest that PPIs discovered under a single condition or cell type are a small subset of the full protein interactome emergent from a genome. […] Nevertheless, results presented here provide a new mechanistic view of how the cell changes in response to environmental challenges, building on the previous work that describes coordinated responses in the transcriptome (Brauer et al., 2007; Gasch et al., 2000) and proteome (Breker et al., 2013; Chong et al., 2015).”

Related to the previous issue, a quick look at the proteins tested (if I understood them correctly) showed that they were enriched for genes encoding the elongator holoenzyme complex, DNA-directed RNA polymerase I complex, membrane docking and actin binding proteins, among other functional enrichments. Genes related to DNA damage (endonuclease activity and transposition), were depleted. It was unclear whether the functional enrichment analyses described in the paper reported enrichments relative to what would be expected given the bias inherent to the tested space?

We did two functional enrichment analyses in this study: network density within Gene Ontology terms (related to Figure 2) and gene ontology enrichment of network communities (related to Figure 3). For both analyses, we performed comparisons to proteins included in PPiSeq library. This is described in the Materials and methods:

“To estimate GO term enrichment in our PPI network, we constructed 1000 random networks by replacing each bait or prey protein that was involved in a PPI with a randomly chosen protein from all proteins in our screen. This randomization preserves the degree distribution of the network.”

And:

“The set of proteins used for enrichment comparison are proteins that are involved in at least one PPI as determined by PPiSeq.”

Re: data quality. To the study's great credit, they incorporated positive and random reference sets (PRS and RRS) into the screen. However, the results from this were concerning: Appendix 1—table 1 shows that assay stringency was set such that between 1 and 3 out of 67 RRS pairs were detected. This specificity would be fine for an assay intended for retest or validate previous hits, where the prior probability of a true interaction is high, but in large-scale screening the prior probability of true interactions that are detectable by PCA is much lower, and a higher specificity is needed to avoid being overwhelmed by false positives. Consider this back of the envelope calculation: Let's say that the prior probability of true interaction is 1% as the authors' suggest, and if PCA can optimistically detect 30% of these pairs, then the number of true interactions we might expect to see in an RRS of size 67 is 1% * 30% * 67 = 0.2. This back of the envelope calculation suggests that a stringency allowing 1 hit in RRS will yield 80% [ (1 – 0.2) / 1 ] false positives, and a stringency allowing 3 hits in RRS will yield 93% [ (3 – 0.2) / 3] false positives. How do the authors reconcile these back of the envelope calculations from their PRS and RRS results with their estimates of precision?

We thank the reviewer for bringing up with this issue. We included positive and random reference sets (PRS:70 protein pairs, RRS:67 protein pairs) to benchmark our PPI calling (​Yu et al., 2008). The PRS reference lists PPIs that have been validated by multiple independent studies and is therefore likely to represent true PPIs that are present in some subset of the environments we tested. For the PRS set, we found a rate of detection that is comparable to other studies (PPiSeq in SD: 28%, Y2H and yellow fluorescent protein-PCA: ~20%) (Yu et al., 2008). The RRS reference, developed ten years ago, is randomly chosen protein pairs for which there was no evidence of a PPI in the literature at the time (mostly in standard growth conditions). Given the relatively high rate of false negatives in PPI assays, this set may in fact contain some true PPIs that have yet to be discovered. We could detect PPIs for four RRS protein pairs in our study, when looking across all 9 environments. Three of these (Grs1_Pet10, Rck2_Csh1, and YDR492W_Rpd3) could be detected in multiple environments (9, 7, and 3, respectively), suggesting that their detection was not a statistical or experimental artifact of our bar-seq assay (see Author response table 1 derived from Table S4). The remaining PPI detected in the RRS, was only detected in SD (standard growth conditions) but with a relatively high fitness (0.35), again suggesting its detection was not a statistical or experimental artifact. While we do acknowledge it is possible that these are indeed false positives due to erroneous interactions of chimeric DHFR-tagged versions of these proteins, the small size of the RRS combined with the fact that some of the protein pairs could be true PPIs, did not give us confidence that this rate (4 of 70) is representative of our true false positive rate. To determine a false positive rate that is less subject to biases stemming from sampling of small numbers, we instead generated 50 new, larger random reference sets, by sampling for each set ~60,000 protein pairs without a reported PPI in BioGRID. Using these new reference sets, we found that the putative false positive rate of our assay is generally lower than 0.3% across conditions for each of the 50 reference sets. We therefore used this more statistically robust measure of the false positive rate to estimate positive predictive values (PPV = 62%, TPR = 41% in SD). We detail these statistical methods in the Appendix and report all statistical metrics in Appendix 1—table 1.

Author response table 1
Mean fitness in each environment.
PPIEnviron ment_n umberSDH2O2HydroxyureaDoxorubicinForskolinRaffinoseNaCl16℃FK506
Rck2_Csh170.350.3500.200.540.7400.170.59
Grs1_Pet1090.440.390.340.250.651.190.20.160.95
YDR492W_R
pd3
300.18000000.170.61
Mrps35_Bub
3
10.3500000000
Positive_cont
rol
910.80.730.621.42.440.40.281.8

Methods for estimating precision and recall were not sufficiently well described to assess. Precision vs recall plots would be helpful to better understand this tradeoff as score thresholds were evaluated.

We describe in detail our approach to calling PPIs in the Appendix, including Appendix 1—table 1, and Appendix 1—figures 1, 2, 4, and now Figure 3—figure supplement 3. We identified positive PPIs using a dynamic threshold that considers the mean fitness and p-value in each environment. For each dynamic threshold, we estimated the precision and recall based on the reference sets. We then chose the threshold with the maximal Matthews correlation coefficient (MCC) to obtain the best balance between precision and recall. We have now added an additional plot (Appendix 1—figure 4) that shows the precision and recall for the chosen dynamic threshold in each environment.

Within the tested space, the Tarassov et al. map and the current map could each be compared against a common “bronze standard” (e.g. literature curated interactions), at least for the SD map, to have an idea about how the quality of the current map compares to that of the previous PCA map. Each could also be compared with the most recent large-scale Y2H study (Yu et al.).

We thank the reviewer for this suggestion. We have now added a figure panel (Figure 1—figure supplement 4) that compares PPiSeq in SD (2 replicates) to mDHFR PCA (Tarassov et al., 2008), Y2H (Yu et al., 2008), and our newly constructed “bronze standard” high-confidence positive reference set (Appendix 1 subsection “Construction of positive reference sets”).

Experimental validation of the network was done by conventional PCA. However, it should be noted that this is a form of technical replication of the DHFR-based PCA assay, and not a truly independent validation. Other large-scale yeast interaction studies (e.g., Yu et al., Science 2008) have assessed a random subset of observed PPIs using an orthogonal approach, calibrated using PRS and RRS sets examined via the same orthogonal method, from which overall performance of the dataset could be determined.

We appreciate the reviewer’s perspective, since orthogonal validation experiments have been a critical tool to establish assay performance following early Y2H work. We know from careful work done previously that modern orthogonal assays have a low cross validation rate (< 30%) (Yu et al., 2008) and that they tend to be enriched for PPIs in different cellular compartments (Jensen and Bork, 2008), indicating that high false negative rates are the likely explanation. High false negative rates have been confirmed here and elsewhere using positive reference sets (e.g. Y2H 80%, PCA 80%, PPiSeq 74% using the PRS in (Yu et al., 2008)). Therefore, the expectation is that PPiSeq, as with other assays, will have a low rate of validation using an orthogonal assay – although we would not know if this rate is 10%, 30% or somewhere in between without performing the work. However, the exact number – whether it be 10% or 30% – has no practical impact on the main conclusions of this study (focused on network dynamics rather than network enumeration). Neither does that number speak to the confidence in our PPI calls, since a lower number may simply be due to less overlap in the sets of PPIs that are callable by PPiSeq and another assay. Our method uses bar-seq to extend an established mDHFR-PCA assay (Tarassov et al., 2008). The validations we performed were aimed at confirming that our sequencing, barcode counting, fitness estimation, and PPI calling protocols were not introducing excessive noise relative to mDHFR-PCA that resulted in a high number of PPI miscalls. Confirming this, we do indeed find a high rate of validation by lower throughput PCA (50-90%, Figure 3B). Finally, we do include independent tests of the quality of our data by comparing it to positive and random reference sets from literature curated data. We find that our assay performs extremely well (PPV > 61%, TPR > 41%) relative to other high-throughput assays.

The Venn diagram in Figure 1G was not very informative in terms of assessing the quality of data. It looks like there is a relatively little overlap between PPIs identified in standard conditions (SD media) in the current study and those of the previous study using a very similar method. Is there any way to know how much of this disagreement can be attributed to each screen being sub-saturation (e.g. by comparing replica screens) and what fraction to systematic assay or environment differences?

We have now added a figure panel (Figure 3—figure supplement 1) that compares PPiSeq in SD (2 replicates) to mDHFR-PCA (Tarassov et al., 2008), Y2H (Yu et al., 2008), and our newly constructed “bronze standard” high-confidence positive reference sets (Appendix 1 subsection “Construction of positive reference sets”). We find that SD replicates have an overlap coefficient of 79% with each other, ~45% with mDHFR-PCA, ~45% the “bronze standard” PRS, and ~13% with Y2H. Overlap coefficients between the SD replicates and mDHFR-PCA are much higher than those found between orthologous methods (< 30%) or even Y2H replicates (< 35%) (Yu et al., 2008), indicating that these two assays are identifying a similar set of PPIs. We do note that PPiSeq and mDHFR-PCA do screen for PPIs under different growth conditions (batch liquid growth vs. colonies on agar), so some fraction of the disagreement is due to environmental differences. PPIs that overlap between the two PPiSeq SD replicates are more likely to be found in mDHFR-PCA, PRS, and Y2H, indicating that PPIs identified in a single SD replicate are more likely to be false positives. However, we do find (a lower rate of) overlaps between PPIs identified in only one SD replicate and other methods, suggesting that a single PPiSeq replicate is not finding all discoverable PPIs.

In Figure 3—figure supplement 2C, the environment-specificity rate of PPIs might be inflated due to the fact that authors only test for the absence of SD hits in other conditions, and the SD condition is the only condition that has been sampled twice during the screening. What would be the environment-specific verification rate if sample hits from each environment were tested in all environments? This seems important, as robustly detecting environment-specific PPIs is one of the key points of the study.

We use PPIs found in the SD environment to determine the environment-specificity because this provides the most conservative (highest) estimate of the number of PPIs found in other environments that were not detectable by our bar-seq assay. To identify PPIs in the SD environment, we pooled fitness estimates across the two replicates (~4 fitness estimates per replicate, ~ 8 total). The higher number of replicates results in a reduced rate of false positives (an erroneous fitness estimate has less impact on a PPI call), meaning that we are more confident that PPIs identified in SD are true positives. Because false positives in one environment (but not other environments) are likely to erroneously contribute to the environment-specificity rate, choosing the environment with the lowest rate of false positives (SD) should result in the lowest environment-specificity rate (highest estimate of PPIs found in other environments that were not detectable by our bar-seq assay).

Reviewer #3 (Significance):

Knowledge of protein-protein interactions (PPIs) provides a key window on biological mechanism, and unbiased screens have informed global principles underlying cellular organization. Several genome-scale screens for direct (binary) interactions between yeast proteins have been carried out, and while each has provided a wealth of new hypotheses, each has been sub-saturation. Therefore, even given multiple genome-scale screens our knowledge of yeast interactions remains incomplete. Different assays are better suited to find different interactions, and it is now clear that every assay evaluated thus far is only capable (even in a saturated screen) of detecting a minority of true interactions. More relevant to the current study, no binary interaction screen has been carried out at the scale of millions of protein pairs outside of a single “baseline” condition.

The study by Liu et al. is notable from a technology perspective in that it is one of several recombinant-barcode approaches have been developed to multiplex pairwise combinations of two barcoded libraries. Although other methods have been demonstrated at the scale of 1M protein pairs, this is the first study using such a technology at the scale of >1M pairs across multiple environments.

A limitation is that this study is not genome-scale, and the search space is biased towards proteins for which interactions were previously observed in a particular environment. This is perhaps understandable, as it made the study more tractable, but this does add caveats to many of the conclusions drawn. These would be acceptable if clearly described and discussed. There were also questions about data quality and assessment that would need to be addressed.

Assuming issues can be addressed, this is a timely study on an important topic, and will be of broad interest given the importance of protein interactions and the status of S. cerevisiae as a key testbed for systems biology.

References

Brauer, M.J., Huttenhower, C., Airoldi, E.M., Rosenstein, R., Matese, J.C., Gresham, D., Boer, V.M., Troyanskaya, O.G., and Botstein, D. (2007). Coordination of Growth Rate, Cell Cycle, Stress Response, and Metabolic Activity in Yeast. Mol. Biol. Cell 19, 352–367.

Breker, M., Gymrek, M., and Schuldiner, M. (2013). A novel single-cell screening platform reveals proteome plasticity during yeast stress responses. J. Cell Biol. 200, 839–850.

Chong, Y.T., Koh, J.L.Y., Friesen, H., Kaluarachchi Duffy, S., Cox, M.J., Moses, A., Moffat, J., Boone, C., and Andrews, B.J. (2015). Yeast Proteome Dynamics from Single Cell Imaging and Automated Analysis. Cell 161, 1413–1424.

Gasch, A.P., Spellman, P.T., Kao, C.M., Carmel-Harel, O., Eisen, M.B., Storz, G., Botstein, D., and Brown, P.O. (2000). Genomic Expression Programs in the Response of Yeast Cells to Environmental Changes. Mol. Biol. Cell 11, 4241–4257.

Hart, G.T., Ramani, A.K., and Marcotte, E.M. (2006). How complete are current yeast and human protein-interaction networks? Genome Biol. 7, 120.

Hilliker, A., Gao, Z., Jankowsky, E., and Parker, R. (2011). The DEAD-box protein Ded1 modulates translation by the formation and resolution of an eIF4F-mRNA complex. Mol. Cell 43, 962–972.

Isasa, M., Suñer, C., Díaz, M., Puig-Sàrries, P., Zuin, A., Bichmann, A., Gygi, S.P., Rebollo, E., and Crosas, B. (2015). Cold Temperature Induces the Reprogramming of Proteolytic Pathways in Yeast. J. Biol. Chem. jbc.M115.698662.

Jensen, L.J., and Bork, P. (2008). Not Comparable, But Complementary. Science 322, 56–57.

Lahtvee, P.-J., Sánchez, B.J., Smialowska, A., Kasvandik, S., Elsemman, I.E., Gatto, F., and Nielsen, J. (2017). Absolute Quantification of Protein and mRNA Abundances Demonstrate Variability in Gene-Specific Translation Efficiency in Yeast. Cell Syst. 4, 495-504.e5.

Obayashi, T., Kagaya, Y., Aoki, Y., Tadaka, S., and Kinoshita, K. (2019). COXPRESdb v7: a gene coexpression database for 11 animal species supported by 23 coexpression platforms for technical evaluation and evolutionary inference. Nucleic Acids Res. 47, D55–D62.

Sambourg, L., and Thierry-Mieg, N. (2010). New insights into protein-protein interaction data lead to increased estimates of the S. cerevisiae interactome size. BMC Bioinformatics 11, 605.

Tarassov, K., Messier, V., Landry, C.R., Radinovic, S., Molina, M.M.S., Shames, I., Malitskaya, Y., Vogel, J., Bussey, H., and Michnick, S.W. (2008). An in Vivo Map of the Yeast Protein Interactome. Science 320, 1465–1470.

Yu, H., Braun, P., Yıldırım, M.A., Lemmens, I., Venkatesan, K., Sahalie, J., Hirozane-Kishikawa, T., Gebreab, F., Li, N., Simonis, N., et al. (2008). High-Quality Binary Protein Interaction Map of the Yeast Interactome Network. Science 322 , 104–110.

https://doi.org/10.7554/eLife.62365.sa2

Article and author information

Author details

  1. Zhimin Liu

    1. Department of Biochemistry, Stony Brook University, Stony Brook, United States
    2. Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, United States
    Contribution
    Resources, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9333-8101
  2. Darach Miller

    1. Joint Initiative for Metrology in Biology, Stanford, United States
    2. Department of Genetics, Stanford University, Stanford, United States
    Contribution
    Formal analysis, Investigation, Visualization, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2264-7900
  3. Fangfei Li

    1. Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, United States
    2. Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, United States
    Contribution
    Software, Formal analysis
    Competing interests
    No competing interests declared
  4. Xianan Liu

    1. Department of Biochemistry, Stony Brook University, Stony Brook, United States
    2. Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, United States
    Contribution
    Methodology
    Competing interests
    SFL and XL have filed a patent application (WO2017075529A1) on the double barcoding platform used in this manuscript.
  5. Sasha F Levy

    1. Department of Biochemistry, Stony Brook University, Stony Brook, United States
    2. Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, United States
    3. Joint Initiative for Metrology in Biology, Stanford, United States
    4. Department of Genetics, Stanford University, Stanford, United States
    5. Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, United States
    6. SLAC National Accelerator Laboratory, Menlo Park, United States
    Contribution
    Conceptualization, Formal analysis, Supervision, Funding acquisition, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    sflevy@stanford.edu
    Competing interests
    SFL and XL have filed a patent application (WO2017075529A1) on the double barcoding platform used in this manuscript.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0923-1636

Funding

National Institutes of Health (R01HG008354)

  • Sasha F Levy

Louis and Beatrice Laufer Center

  • Sasha F Levy

New York State Center for Biotechnology

  • Sasha F Levy

Joint Initiative for Metrology in Biology

  • Sasha F Levy

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We are grateful to Stephen Michnick for providing the pAG25 linker-DHFR[1,2]-NatMX and pAG32 linker-DHFR[3]-HygMX plasmids, to Charlie Boone and Michael Costanzo for providing a dataset of different gene features, to Aaron Neiman for providing the pS413-TEF1pr-His3 plasmid, to Robert St. Onge and Ron Davis for providing the equipment to measure yeast growth curves, to Xiaoyu Zhao, Takeshi Matsui, Jamie Blundell, David Catoe, Adam Dziulko, Danielle Francois, and Richard Bennett for the assistance or suggestions on the experiments and data analysis, to Robert St. Onge and Guillaume Diss for suggestions on the manuscript. This work was supported by a grant from the US National Institutes of Health (R01HG008354 to SL), the Louis and Beatrice Laufer Center, the New York State Center for Biotechnology, and the Joint Initiative for Metrology in Biology.

Senior Editor

  1. Naama Barkai, Weizmann Institute of Science, Israel

Reviewing Editor

  1. Christian R Landry, Université Laval, Canada

Publication history

  1. Received: August 21, 2020
  2. Accepted: September 4, 2020
  3. Accepted Manuscript published: September 14, 2020 (version 1)
  4. Version of Record published: October 21, 2020 (version 2)

Copyright

© 2020, Liu et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,173
    Page views
  • 252
    Downloads
  • 4
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Computational and Systems Biology
    2. Structural Biology and Molecular Biophysics
    Cathrine Bergh et al.
    Research Article

    Ligand-gated ion channels conduct currents in response to chemical stimuli, mediating electrochemical signaling in neurons and other excitable cells. For many channels the details of gating remain unclear, partly due to limited structural data and simulation timescales. Here, we used enhanced sampling to simulate the pH-gated channel GLIC, and construct Markov state models (MSMs) of gating. Consistent with new functional recordings we report in oocytes, our analysis revealed differential effects of protonation and mutation on free-energy wells. Clustering of closed- versus open-like states enabled estimation of open probabilities and transition rates, while higher-order clustering affirmed conformational trends in gating. Furthermore, our models uncovered state- and protonation-dependent symmetrization. This demonstrates the applicability of MSMs to map energetic and conformational transitions between ion-channel functional states, and how they reproduce shifts upon activation or mutation, with implications for modeling neuronal function and developing state-selective drugs.

    1. Computational and Systems Biology
    2. Epidemiology and Global Health
    Olivier Thomine et al.
    Research Article

    Simulating nationwide realistic individual movements with a detailed geographical structure can help optimize public health policies. However, existing tools have limited resolution or can only account for a limited number of agents. We introduce Epidemap, a new framework that can capture the daily movement of more than 60 million people in a country at a building-level resolution in a realistic and computationally efficient way. By applying it to the case of an infectious disease spreading in France, we uncover hitherto neglected effects, such as the emergence of two distinct peaks in the daily number of cases or the importance of local density in the timing of arrival of the epidemic. Finally, we show that the importance of super-spreading events strongly varies over time.