Emergence of power law distributions in protein-protein interaction networks through study bias

  1. David B Blumenthal  Is a corresponding author
  2. Marta Lucchetta
  3. Linda Kleist
  4. Sándor P Fekete
  5. Markus List  Is a corresponding author
  6. Martin H Schaefer  Is a corresponding author
  1. Biomedical Network Science Lab, Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
  2. Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Italy
  3. Department of Computer Science, TU Braunschweig, Germany
  4. Braunschweig Integrated Centre of Systems Biology (BRICS), Germany
  5. Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Germany
  6. Munich Data Science Institute (MDSI), Technical University of Munich, Germany
9 figures and 1 additional file

Figures

Study overview.

(A) We seek to answer the question of how much we can learn about the topology of ground truth networks from the topology of observed and aggregated protein-protein interaction (PPI) networks and …

Figure 2 with 1 supplement
A large aggregated PPI network shows PL behavior while individual studies often do not.

(A) The black dots represent the degree distribution of our aggregated network and the red line corresponds to the fitted power law (PL) distribution with parameters kmin=278 and α=3.3 in a log-log scale. (B)…

Figure 2—figure supplement 1
Histogram of protein-protein interaction (PPI) number per study in the aggregated network.
Aggregating more PPI networks increases the mean probability of obtaining a PL fit, potentially due to bait usage and study biases.

(A) Distribution of p-values obtained through the aggregation of 100, 200, and 300 random non-power law (PL) studies. The numbers on the top of each boxplot represent the fraction of PL networks …

Exemplification of Proposition 1 for an empty ground truth interactome, a small positive error rate, and the real-world bait distribution b obtained from IntAct.

The simulated observed degree distribution is power law (PL)-distributed with parameters kmin=64 and α=3.63.

After correcting for bait or prey usage, a third of the PL networks lose the PL property.

(A) Scheme to illustrate how the degree is recalculated when the number of baits is smaller than the number of preys. (B) Distribution of the size balance (ratio between the number of baits and …

Figure 6 with 3 supplements
Gene set enrichment analysis of hub proteins after bias correction yields biologically plausible terms that differ from uncorrected analysis.

(A) Gene ontology enrichment analysis of the top 50 corrected (prey hubs, normalized hubs, and Y2H hubs) and non-corrected hubs (uncorrected aggregated network hubs). (B) Disease ontology enrichment …

Figure 6—source data 1

Detailed results of gene set enrichment analysis.

https://cdn.elifesciences.org/articles/99951/elife-99951-fig6-data1-v2.xlsx
Figure 6—figure supplement 1
Gene Ontology enrichment analysis results of the top-200, top-500, top-1000, top-2000, and top-3000 most abundant proteins.
Figure 6—figure supplement 2
Reactome enrichment analysis of the top 50 corrected and uncorrected hubs.

The numbers in brackets represent the number of hubs included in the reference databases, and the ‘Gene ratio’ represents the fraction of hubs included in the corresponding pathway.

Figure 6—figure supplement 3
Overlaps between chaperones and genes related with, respectively, schizophrenia and psychotic disorders.

(A) Overlap between schizophrenia-related genes and chaperones (p=0.01; one-sided Fisher test). (B) Overlap between psychotic disorder-related genes and chaperones (p=0.005; one-sided Fisher test).

Conceptual overview of simulated aggregated protein-protein interaction (PPI) testing under study bias and downstream analyses to assess if the empirical aggregated PPI network GIntAct obtained from IntAct is more likely to have emerged from a power law (PL)-distributed than from a binomially distributed true biological interactome.

The colored dots in the gray area represent degree distributions; dissimilarity between degree distributions is quantified using the earth mover’s distance.

Figure 8 with 2 supplements
Simulations show that, in the presence of study bias and small non-zero false positive rates in affinity purification-mass spectrometry (AP-MS) studies, binomially and PL-distributed ground truth interactomes are equally likely origins of observed aggregated PPI networks.

(A) Histogram of earth mover’s distances between the degree distribution of the observed protein-protein interaction (PPI) network GIntAct obtained via aggregation of all AP-MS studies annotated in …

Figure 8—figure supplement 1
Degree distributions of all affinity purification-mass spectrometry (AP-MS) and yeast-2-hybrid (Y2H) studies annotated in IntAct.

(A) Degree distribution for observed protein-protein interaction (PPI) network obtained via aggregation of all AP-MS studies annotated in IntAct. Node degrees are power law (PL)-distributed (p=0.36). (B)…

Figure 8—figure supplement 2
Results of simulation study for yeast-2-hybrid (Y2H) testing.

(A) Histogram of earth mover’s distances between the degree distribution of the observed protein-protein interaction (PPI) network GIntAct obtained via aggregation of all Y2H studies annotated in IntAct …

Appendix 1—figure 1
Protein mass and abundance are significantly positively correlated with the node degree in aggregated PPI networks.

(A) Spearman correlation between degree and protein mass within affinity purification-mass spectrometry (AP-MS) studies of the aggregated network. (B) Spearman correlation between degree and protein …

Additional files

Download links