A large accessory protein interactome is rewired across environments

  1. Zhimin Liu
  2. Darach Miller
  3. Fangfei Li
  4. Xianan Liu
  5. Sasha F Levy  Is a corresponding author
  1. Department of Biochemistry, Stony Brook University, United States
  2. Laufer Center for Physical and Quantitative Biology, Stony Brook University, United States
  3. Joint Initiative for Metrology in Biology, United States
  4. Department of Genetics, Stanford University, United States
  5. Department of Applied Mathematics and Statistics, Stony Brook University, United States
  6. SLAC National Accelerator Laboratory, United States
10 figures, 3 tables and 6 additional files

Figures

Figure 1 with 5 supplements
PPiSeq.

(A) A cartoon of PPiSeq yeast library construction. Strains from the protein interactome collection are individually mated to strains from the double barcoder collection and sporulated to recover haploids that contain a mDHFR-tagged protein and a barcode. Haploids are mated as pools. In diploids, expression of Cre recombinase causes recombination between homologous chromosomes at the loxP locus, resulting in a contiguous double barcode that marks the mDHFR-tagged protein pair. (B) Representative double barcode frequency trajectories over twelve generations of competitive growth. Trajectories are used to calculate a quantitative fitness for each double barcoded strain. (C) Standard error of fitness estimates of protein pairs. The blue and red lines represent the median standard error for a sliding window (width = 0.05) of all fitness-ranked protein pairs and of only the positive protein-protein interactions, respectively. (D) Estimated fitness of strains with different double barcodes representing the same protein pair in the same pooled growth. Positive protein pairs are randomly selected within a fitness window. ORF x Null is a violin plot of the fitness distribution of all interactions with a mDHFR fragment that is not tethered to a yeast protein. DHFR(-) is yeast strains that lack any mDHFR fragment. DHFR(+) is yeast strains that contain a full length mDHFR under a strong promoter. (E) Density plot of the fitness of double barcodes that represent the same putative PPI in the same pooled growth. In (B–E), the data in SD environment are used. (F) Density plot of the normalized mean fitness of the same PPI between two pooled growth cultures in SD environment. PPIs detected in either one growth culture are included. (G) Venn diagram of the number of PPIs identified within our search space by PPiSeq in nine environments (magenta), PPiSeq in SD environment (pink), the interactome-scale protein-fragment complementation screen (PCA, yellow), and the BioGRID database excluding any PPIs previously detected by PCA (blue).

Figure 1—figure supplement 1
Double barcodes and protein pairs in the PPiSeq library.

(A) Distribution of the initial double barcode count of the PPiSeq library in SD environment at a sequencing depth of 209,899,687 reads. (B) Number of barcodes per protein pair in the PPiSeq library. Spike-in control protein pairs are not included in the plot.

Figure 1—figure supplement 2
Standard error of fitness estimates of protein pairs in each environment.

The blue and red lines represent the median standard error for a sliding window (width = 0.05) of all fitness ranked protein pairs and of only the positive protein-protein interactions, respectively.

Figure 1—figure supplement 3
Density plot of the fitness of double barcodes that represent the same positive PPI in the same pooled growth of each environment.
Figure 1—figure supplement 4
Comparison of PPiSeq data in SD condition to other PPI datasets.

(A) Heatmap of overlap coefficients across different datasets. PPiSeq SD1 and PPiSeq SD2 are PPIs identified from two replicate growth cultures. PPiSeq SD-merge is PPIs identfied from the merged data of two replicate growth cultures. PCA is PPIs identified from mDHFR-PCA colony screening (Tarassov et al., 2008). Y2H is PPIs identifed from the latest large-scale yeast-two-hybrid screen (Yu et al., 2008). PRS is a newly constructed ‘bronze standard’ high-confidence positive reference set. Only PPIs within our search space are considered and the numbers are shown under each dataset. (B) Venn diagram of the number of PPIs identified in the two SD PPiSeq replicates and in each of PCA, PRS, or Y2H. (C) Barplot of the overlap coefficients between PPiSeq and other methods for PPIs that do or do not across the two SD PPiSeq replicates.

Figure 1—figure supplement 5
The OD 600 trajectories of DHFR(-) strain in various conditions with and without 0.5 μg/mL methotrexate.
Functional enrichment of PPIs detected by PPiSeq.

PPI enrichment (red) and variability (blue) across environments of gene ontology cellular compartments (A) and biological processes (B). Red node size is the percent of interacting protein pairs (interaction density) observed for a given pair of GO terms and the node color is the p-value of this percent over a random expectation. Blue node size and color are the variability (coefficient of variation, CV) of interaction densities across nine environments tested. GO terms are hierarchically clustered by the interaction density (red dots). Boxes mark frequently interacting and invariable cellular compartments and biological processes involved in membrane transport and protein maturation (blue and green) and cell division (purple). Barplots show the mean CV of interaction densities for each GO term across all other GO terms. Orange, black and brown triangles highlight three different groups of related GO terms: chromosome, transcription, and translation, respectively.

Figure 3 with 6 supplements
A large accessory protein interactome.

(A) Barplot of PPI number binned by the number of environments in which a PPI is observed. Colors indicate PPIs called by both PPiSeq and BioGRID inclusive of mDHFR-PCA (red), PPIs called by PPiSeq that scored high but were not called by mDHFR-PCA (yellow), and PPIs called by PPiSeq that scored low by mDHFR-PCA (blue). (B) Validation rates of PPIs binned by the number of environments in which a PPI is observed. Validations were performed using OD600 trajectories of clones grown in multi-well plates. (C) Mutable and less mutable PPIs form distinct modules in the network. PPIs that are detected in at least five environments (red edges) form two tight core modules. PPIs that are detected in fewer than five environments (blue edges) form a less connected accessory module. Proteins in different modules are labeled with different shapes and colors. The network uses an edge-weighted spring embedded layout. (D) Number of PPIs within and between each community. PPIs detected in at least or fewer than five environments are shown in red and blue, respectively. The size of the square or circle is proportional to the number of PPIs. The number below each community is the number of proteins within each community. (E) Scatter plot of degrees and mutability scores of proteins in each community.

Figure 3—figure supplement 1
PPIs across conditions.

(A) Barplot of number of PPIs in each environment binned by the number of environments in which a PPI is observed. (B) Heatmap of fitness values of all detected PPIs across different environments. PPIs (rows) and environments (columns) are hierarchically clustered by the fitness values across environments. (C) Scatter plot of mean fitness values of the same PPI across two different growth conditions. Colors indicate in which condition(s) PPIs are called. PPIs that have been detected in at least one environment are shown in (B) and (C). Negative values and missing measurements are replaced with zeros.

Figure 3—figure supplement 2
Validating PPIs.

(A) Boxplots and univariate scatterplots of fitness values of PPIs binned by the number of environments in which a PPI is observed. The bottom of each box, the line drawn in the box, and the top of the box represent the 1st, 2nd, and 3rd quartiles, respectively. The whiskers extend to 1.5 times the interquartile range (from the 1st to 3rd quartile). The fitness for each PPI is calculated by taking the mean of the fitness values for all environments where that PPI was detected. (B) Validation rates of PPIs binned by the number of environments in which a PPI is observed. Validations were performed using OD600 trajectories of clones grown in multi-well plates. PPIs that have been previously reported in BioGRID (red) or are previously unreported (blue) are shown. (C) Validation of 20 randomly chosen PPIs that were only detected in SD by PPiSeq. Validations use OD600 trajectories of clones grown in different environments. Red and blue boxes represent positive and negative PPI detection, respectively. (D) Density plot of the increase in the relative area under the growth curve (AUC) against a negative control strain for the 20 PPIs shown in (C). Dashed vertical lines represent the mean AUC increase for an environment.

Figure 3—figure supplement 3
Predicting validation rates.

(A and B) Distributions of mean fitness value (f) and number of environments in which the PPI is detected (n) for validated and unvalidated PPIs. (C) Comparison between observed validation rates and predicted validation rates for 502 PPIs binned by n. Predicted validation rates were calculated using the mean f and n within each bin. (D) Barplot of the PPI number and the predicted true positive PPI number binned by the number of environments in which a PPI is observed.

Figure 3—figure supplement 4
Mutable PPIs outnumber immutable PPIs in higher confidence PPI networks.

Barplot of the PPI number binned by the number of environments in which a PPI is observed in the multi-condition network made by either higher confidence PPI calls (A) or by excluding the 16°C condition (B). Colors indicate PPIs called by both PPiSeq and BioGRID inclusive of mDHFR-PCA (red), PPIs called by PPiSeq that scored high but were not called by mDHFR-PCA (yellow), and PPIs called by PPiSeq that scored low and were not called by mDHFR-PCA (blue).

Figure 3—figure supplement 5
PPIs with a similar mutability are more likely to be connected.

(A) Degree density of proteins binned by the number of environments in which a PPI is detected. (B) Degree density of all proteins that are neighbors of proteins binned by the number of environments in which a PPI is detected. (C) Mutability score density of proteins binned by the number of environments in which a PPI is detected. (D) Mutability score density of all proteins that are neighbors of proteins binned by the number of environments in which a PPI is detected. Any unique protein that participates in a PPI within each bin was counted. The degree (A) and mutability score (C) of a protein were obtained from a multi-environment network that includes all PPIs detected in at least one environment. The degree (B) and mutability score (D) for a target protein’s neighbor was calculated as above, only the interaction between the target protein and its neighbor was first removed.

Figure 3—figure supplement 6
The multi-environment PPI network contains three major communities with different mutability scores.

Boxplots and univariate scatterplots of mutability scores of communities with at least 10 proteins identified by (A) Fast-Greedy (B) Walktrap, and (C) InfoMAP algorithms. The bottom of each box, the line drawn in the box, and the top of the box represent the 1st, 2nd, and 3rd quartiles, respectively. The whiskers extend to 1.5 times the interquartile range.

Figure 4 with 6 supplements
Properties of mutable and less mutable PPIs.

(A) The co-expression mutual rank for PPIs binned by the number of environments in which the PPI is detected. A higher mutual rank means worse co-expression. Notches are the 95% confidence interval for the median, hinges correspond to the first and third quartiles, and whiskers extend 1.5 times the interquartile range. (B) The percent of protein pairs that have been found colocalized by gene ontology (GO Slim, dashed line) and fluorescence (solid line) (Chong et al., 2015). (C) Spearman correlation between the PPI mutability score and other gene features, binned a gene’s PPI degree. In (B) and (C), the error bars are the standard deviation from 1000 bootstrapped data sets. (D) Pearson correlation between a PPI’s fitness and geometric mean abundance of two interacting proteins in Ho et al., 2018, binned by the number of environments in which a PPI is detected. (E) Examples of non-significant (Erv25 x Shr3) and significant (Akr1 x Any1) predictions. Observed heterodimer fitness (SAB) is plotted against the expectation based on the geometric mean of the two constituent homodimer fitnesses (SAA and SBB). (F) Percent of heterodimers whose fitness changes can be significantly predicted by the geometric mean of the two constituent homodimers, binned by the number of environments in which a PPI is observed.

Figure 4—figure supplement 1
The co-expression mutual rank for PPIs detected in each condition binned by the number of environments in which the PPI is detected.

Notches are the 95% confidence interval for the median, hinges correspond to the first and third quartiles, and whiskers extend 1.5 times the interquartile range.

Figure 4—figure supplement 2
Mutable PPIs and their properties for higher confidence PPI calls.

(A) The co-expression mutual rank for PPIs binned by the number of environments in which the PPI is detected. A higher mutual rank means worse co-expression. Notches are the 95% confidence interval for the median, hinges correspond to the first and third quartiles, and whiskers extend 1.5 times the interquartile range. (B) The percent of protein pairs that have been found colocalized by gene ontology (GO Slim, dashed line) and fluorescence (solid line) (Chong et al., 2015). (C) Spearman correlation between the protein’s mutability score and other gene features. (D) Spearman correlation between the PPI mutability score and other gene features, binned a gene’s PPI degree. In (B–D), the error bars are the standard deviation from 1000 bootstrapped data sets.

Figure 4—figure supplement 3
Mutable PPIs and their properties, excluding the 16°C condition.

(A) The co-expression mutual rank for PPIs binned by the number of environments in which the PPI is detected. A higher mutual rank means worse co-expression. Notches are the 95% confidence interval for the median, hinges correspond to the first and third quartiles, and whiskers extend 1.5 times the interquartile range. (B) The percent of protein pairs that have been found colocalized by gene ontology (GO Slim, dashed line) and fluorescence (solid line) (Chong et al., 2015). (C) Spearman correlation between the protein’s mutability score and other gene features. (D) Spearman correlation between the PPI mutability score and other gene features, binned by a gene’s PPI degree. In (B–D), the error bars are the standard deviation from 1000 bootstrapped data sets.

Figure 4—figure supplement 4
Spearman correlation between the protein’s mutability score and other gene features.

The error bars are the standard deviation from 1000 bootstrapped data sets.

Figure 4—figure supplement 5
Proteins that participate in multiple complexes are distributed over a wide range of complexes.

(A) Barplot of the number of proteins binned by the number of protein complexes in which a protein participates in. Color represents a protein’s PPI degree. (B) Histogram of number of genes per protein complex (Costanzo et al., 2016). (C) Histogram of number of genes per protein complex for genes included in the PPiSeq screen and involved in at least two protein complexes (Costanzo et al., 2016).

Figure 4—figure supplement 6
Exploring the relationship of protein abundance, PPI abundance, and PPI mutability.

(A) Density plot of the fitness of a PPI strain in the SD environment against the geometric mean protein abundance from Ho et al., 2018. Colors are the density of points in hexagonal bins. (B) Linear fits to a mass-action kinetic model where the x-axis is the heterodimer PPI fitness expected from the homodimer fitnesses of the constituent proteins, and the y-axis is the measured heterodimer fitness. The left panel contains heterodimers significantly explained by the model that do not require a significant intercept (FDR < 0.05, see Materials and methods). Colors are the R2 of the mass-action kinetics model fit. (C) The coefficients of each scaled feature in a logistic model predicting a good fit to the mass-action kinetics model, as fit by ‘glm’ function in R. (D) Density plot of the geometric mean abundance of a heterodimer pair against the Pearson correlation between the predicted and observed heterodimer fitness across conditions. Colors are the density of points in hexagonal bins. Red line is a Deming regression, r is the Pearson correlation. (E) Explained PPIs are composed of less abundant proteins. Box and dot plot of the mean protein abundance of a heterodimer for PPIs that are explained and not explained by the mass-action kinetics model. Boxplot summarizes the first, second, and third quartiles. ***p<10−9 Wilcoxon signed-rank test.

Carbohydrate transport network rewiring as captured by PPiSeq.

(A) Heatmap of abundances (fitnesses) of PPIs involved in carbohydrate transport across different environments. (B) Boxplots of fitnesses of PPIs involving Hxt proteins in SD, Raffinose and NaCl environments. The bottom of each box, the line drawn in the box, and the top of the box represent the 1st, 2nd, and 3rd quartiles, respectively. The whiskers extend to ±1.5 times the interquartile range. (C) Circular network plots of PPIs containing Hxt proteins in SD, Raffinose, and NaCl environments. Nodes are proteins and colors are as in (A). Node size is proportional to its degree in the multi-environment PPI network. Edge width is proportional to abundance in each environment. (D) Scatter plot of fitness changes relative to SD as measured by PPiSeq and clonal growth dynamics for randomly chosen carbohydrate-transport PPIs in Raffinose (80 PPIs) and NaCl (90 PPIs).

The estimated number of true PPIs discovered by PPiSeq using repeated sampling of data in permuted orders of environment addition.

Boxplots summarize the distribution of the number of unique PPIs across permutations. The bottom of each box, the line drawn in the box, and the top of the box represent the 1st, 2nd, and 3rd quartiles, respectively. The whiskers extend to ±1.5 times the interquartile range. Overlayed solid red lines and dashed red lines are the Kindt exact accumulation curves and the bootstrap estimators of the total number of unique PPIs across infinite environments for each simulation, respectively.

Appendix 1—figure 1
Defining a dynamic threshold for PPI calling.

(A) A discrete combination of a fitness threshold (f) and a p-value threshold (p) results in a PPV. Colored lines are fitness and p-value thresholds that result in the same PPV in SD. (B) Density plot of all f and p combinations that result in a PPV of 0.7 using 50 different random reference sets in SD. The black line is the fitted sigmoid model that is used for the dynamic threshold.

Appendix 1—figure 2
Density plots of the dynamic thresholds in each environment.

Data were split into two groups: p < −4 and p >= −4. For p >= −4, as in Appendix 1—figure 1B, a sigmoidal function was fit to f and p combinations that result in the same PPV value. For p < −4, the fitness threshold was set to equal the minimum fitness value when p >= −4. The dynamic threshold that results in the maximum MCC in each environment was shown in the plot.

Appendix 1—figure 3
The precision-recall curves of dynamic thresholds in each environment.

Red asterisks mark the thresholds with maximal Matthews correlation coefficients.

Appendix 1—figure 4
Dynamic thresholds (red) of f and p have a higher positive predictive value (PPV) than most discrete combinations (blue).

Points represent the PPVs for dynamic thresholding and for all combinations of discrete fitness and p-value thresholds underlying a constant range of false positive rates obtained from the optimal dynamic threshold in each environment.

Tables

Appendix 1—table 1
Metrics for the dynamic thresholds used in each environment.

‘FPR’: false positive rate; ‘TPR’: true positive rate; ‘PPV’: positive predictive value; ‘MCC’: Matthews correlation coefficient; ‘Detected_PRS(70)”: 70 likely protein interaction pairs in a positive reference set; ‘Detected_RRS(67)”: 67 random pairs in a random reference set (Liu et al., 2019; Yu et al., 2008).

Optimal dynamic threshold based on the best balance between precision and recall
EnvironmentOptimal_thresholdFPRTPRPPVMCCF1_ScoreDetected_PRS(70)Detected_RRS(67)
SD10.70.002830.46470.60750.52740.5266203
SD20.730.0023150.43170.63860.52140.5152202
SD-merge0.70.0024730.41240.61870.50120.4949193
FK5060.720.001960.43520.65510.53070.523203
H2O20.730.0022320.43420.65170.52830.5212203
Hydroxyurea0.740.002220.45690.66130.54610.5405191
NaCl0.730.001450.28950.64620.42920.3999181
Forskolin0.640.0037270.54240.56080.54760.5514222
Raffinose0.480.0081050.56330.40010.46880.4679202
Doxorubicin0.770.0017650.3440.64250.46670.4481182
16 °C0.410.002810.13670.31640.2030.1909213
Arbitrary strict dynamic threshold in each environment 
EnvironmentOptimal_thresholdFPRTPRPPVMCCF1_ScoreDetected_PRS(70)Detected_RRS(67)
SD-merge0.790.0009452010.2839494470.7453517470.4570892530.411234763172
FK5060.780.0009783460.3384912960.7475739790.5003650020.465989091182
H2O20.80.0008730360.3060512820.7716751720.483121280.438278516172
Hydroxyurea0.80.0009848720.3221180560.756605230.4907675920.45186047171
NaCl0.760.0009650720.2373712260.6927896780.4025794520.353591157181
Forskolin0.770.0009214970.3026966290.7427091670.4714209050.430101999171
Raffinose0.560.0044538940.4264814810.4792530190.4470399930.451329915181
Doxorubicin0.820.0009893510.269162210.715580610.4359290610.391182865172
16 °C0.510.0009001860.0713840830.4322812110.1725182620.122533747173
Appendix 1—table 2
Summary of promiscuous proteins that interact with an mDHFR fragment that is not tethered to any protein.

Promiscuous and non-promiscuous proteins are represented by 1 and 0, respectively, in each environment.

PPIPositive_environme_
number
SD_mergeH2O2HydroxyureaDoxorubicinForskolinRaffinoseNaClFK50616 °C
YMR120C6111011010
YIL143C6111011100
YLL034C5110100101
YPL139C4110000110
YGR278W201000100
YPL112C2000100100
YIL070C2000100010
YHL007C2000100010
YDR452W2000100010
YOL147C1010000000
YER087W1000100000
YER063W1000100000
YOR323C1000100000
YNL064C1000100000
YKR080W1000010000
YJL153C1000010000
YDL208W1000000100
YLR182W1000000100
YPR124W1000000100
YHR114W1000000100
YDR381C-A1000000100
YGR198W1000000100
YDR171W1000000100
YGR130C1000000100
YLL022C1000000100
YMR136W1000000100
YKL010C1000000100
YCR033W1000000100
YPL083C1000000100
YOR360C1000000100
YOR393W1000000100
YNL026W1000000100
YGR195W1000000100
YOR306C1000000100
YDL093W1000000100
YCR059C1000000100
YOL081W1000000100
YGR140W1000000100
YKL139W1000000100
YEL017C-A1000000010
YDL112W1000000010
YDR057W1000000010
YFR001W1000000001
YJL124C1000000001
YDR151C1000000001
YBR057C1000000001
YHR146W1000000001
YDR379W1000000001
YDR513W1000000001
YMR227C1000000001
YDR420W1000000001
Author response table 1
Mean fitness in each environment.
PPIEnviron ment_n umberSDH2O2HydroxyureaDoxorubicinForskolinRaffinoseNaCl16℃FK506
Rck2_Csh170.350.3500.200.540.7400.170.59
Grs1_Pet1090.440.390.340.250.651.190.20.160.95
YDR492W_R
pd3
300.18000000.170.61
Mrps35_Bub
3
10.3500000000
Positive_cont
rol
910.80.730.621.42.440.40.281.8

Additional files

Supplementary file 1

Strain losses during barcoding and pool construction.

https://cdn.elifesciences.org/articles/62365/elife-62365-supp1-v2.xlsx
Supplementary file 2

Primers used in the construction of DHFR-fragment control strains.

https://cdn.elifesciences.org/articles/62365/elife-62365-supp2-v2.xlsx
Supplementary file 3

Barcoded haploid DHFR-fragment control strains.

https://cdn.elifesciences.org/articles/62365/elife-62365-supp3-v2.xlsx
Supplementary file 4

Strains in the PPiSeq library.

https://cdn.elifesciences.org/articles/62365/elife-62365-supp4-v2.xlsx
Supplementary file 5

Description of the environmental conditions tested.

Cells were shaken at 220 rpm. In SD, 0.2% DMSO was added as a vehicle control.

https://cdn.elifesciences.org/articles/62365/elife-62365-supp5-v2.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/62365/elife-62365-transrepform-v2.docx

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Zhimin Liu
  2. Darach Miller
  3. Fangfei Li
  4. Xianan Liu
  5. Sasha F Levy
(2020)
A large accessory protein interactome is rewired across environments
eLife 9:e62365.
https://doi.org/10.7554/eLife.62365