Defining hierarchical protein interaction networks from spectral analysis of bacterial proteomes

  1. Mark A Zaydman  Is a corresponding author
  2. Alexander S Little
  3. Fidel Haro
  4. Valeryia Aksianiuk
  5. William J Buchser
  6. Aaron DiAntonio
  7. Jeffrey I Gordon
  8. Jeffrey Milbrandt
  9. Arjun S Raman  Is a corresponding author
  1. Department of Pathology and Immunology, Washington University School of Medicine, United States
  2. Duchossois Family Institute, University of Chicago, United States
  3. Department of Genetics, Washington University School of Medicine, United States
  4. Department of Developmental Biology, Washington University School of Medicine, United States
  5. The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, United States
  6. Department of Pathology, University of Chicago, Chicago, United States
  7. Center for the Physics of Evolving Systems, University of Chicago, Chicago, United States
8 figures and 8 additional files

Figures

Shallow components of covariation measured across bacterial orthologs reflect broad phylogenetic relationships.

(A) DOGG. Rows are 7047 bacterial proteomes, columns are 10,177 orthologous gene groups (OGGs), entries are the number of annotations of an OGG within a bacterial proteome (Figure 1—source data 1). …

Figure 2 with 2 supplements
Workflow for relating patterns of ortholog covariation with phylogeny and protein interactions.

(A) Singular value decomposition (SVD) performed on DOGG yields UOGG (rows are proteomes, columns are ‘left singular vectors’ [LSVs]) and VOGG (columns are OGGs, rows are ‘right singular vectors’ …

Figure 2—figure supplement 1
Computing spectral correlations between two proteins.

Shown here is an example of computing protein-protein spectral correlations using ArgA and ArgH in Escherichia coli K12. (A,B) The orthologous gene group (OGG) structures of E. coli K12 ArgA (panel …

Figure 2—figure supplement 2
Computing background mutual information (MI) between spectral correlations and a benchmark.

(Top) Mdata consists of contributions of six variables (rows) onto six singular value decomposition (SVD) components (columns). If the variables correspond to the rows or columns of DOGG, the …

Figure 3 with 1 supplement
Shallow to deep spectral components of ortholog covariation reflect global to local biological ‘scales’.

(A) Distribution of information (y-axis, ‘mutual information [MI] density’) for each benchmark (see legend) measured across the singular value decomposition (SVD) spectrum (x-axis, ‘spectral …

Figure 3—source data 1

NCBI taxonomic strings for each organism used to generate phylogenetic benchmarks.

https://cdn.elifesciences.org/articles/74104/elife-74104-fig3-data1-v2.xlsx
Figure 3—source data 2

Benchmarks of protein-protein interactions (PPIs) in Escherichia coli K12.

https://cdn.elifesciences.org/articles/74104/elife-74104-fig3-data2-v2.xlsx
Figure 3—figure supplement 1
Impact of down-sampling overrepresented phyla on results shown in Figure 3.

(A) Histogram of the number of proteomes belonging to each of the top 15 out of a total 116 phyla in the data matrix DOGG. Inset is the four most abundant phyla. (B) Percent variance versus …

Figure 4 with 1 supplement
Workflow for computing the ‘spectral depth’ between pairs of proteins.

(A) Spectral components enriched for indirect and direct protein interactions (25th to 75th interquartile range of cumulative mutual information [MI] density) are selected, thereby filtering …

Figure 4—figure supplement 1
Determining a threshold for statistically significant spectral correlations.

(A) Cumulative distribution functions (cdfs) for spectral correlations between all proteins in Escherichia coli K12 across windows of different widths (legend) centered on SVD component 1001. (B) …

Pattern of spectral correlations with flagellar filament, FliC, in Escherichia coli K12.

(A) Proteins that shared significant spectral correlations with FliC after filtering for phylogeny and noise. (B) Hierarchically clustered spectral depth matrix for all pairs of proteins in panel A. …

Figure 6 with 4 supplements
A statistically derived hierarchical model of Escherichia coli K12 motility.

(A) Statistical interaction networks defined at spectral depths 50 (top), 300 (middle), and 1000 (bottom). Nodes (yellow circles) are proteins; edges (red lines) reflect statistical interactions …

Figure 6—figure supplement 1
Protein interaction networks of spectrally correlated proteins with FliC in Escherichia coli K12 at spectral depths of 225, 500, and 750.

Nodes, edges, and contours are defined in the same manner as described in Figure 6A.

Figure 6—figure supplement 2
A statistically derived hierarchical model of motility in Escherichia coli K12 using MotB as a query protein.

(A) Statistical interaction networks defined by thresholding spectral depth at 50 (top panel), 300 (middle panel), and 1000 (bottom panel) (Supplementary file 2). Nodes, edges, and contours are …

Figure 6—figure supplement 3
A statistically derived hierarchical model of motility in Bacillus subtilis 168 using Hag as a query protein.

(A) Statistical interaction networks defined by thresholding spectral depth at 50 (top panel), 300 (middle panel), and 1000 (bottom panel; Supplementary file 3). Nodes, edges, and contours are …

Figure 6—figure supplement 4
A statistically derived hierarchical model of amino acid metabolism in Escherichia coli K12 using HisG as a query protein.

(A) Statistical interaction networks defined by thresholding spectral depth at 300 and 1000. Nodes, edges, and contours are defined in the same manner as described in Figure 6—figure supplement 1, Su…

Figure 7 with 1 supplement
Prediction and validation of a novel effector of twitch motility in Pseudomonas aeruginosa.

(A) Statistical network derived by applying a spectral depth threshold of 300 to the set of 141 protein in P. aeruginosa (strain PAO1) that were significantly correlated with PilA across SVD34 to …

Figure 7—figure supplement 1
Statistically derived hierarchical model of directed motility in Pseudomonas aeruginosa using PilA as a query.

(A) Statistical interaction network defined by thresholding spectral depth at 50. The inset illustrates significantly enriched terms resulting from gene-set enrichment analysis (GSEA) of the entire …

Figure 8 with 3 supplements
Mutual information (MI) windowed spectral correlations (MIWSCs) enable accurate classification of indirect and direct protein-protein interactions (PPIs).

See Figure 8—figure supplement 1 for definition of ‘MIWSCs’. (A) F-scores for predicting interaction classes for Escherichia coli K12 protein pairs using random forest (RF) models trained on MIWSCs …

Figure 8—source data 1

Data and statistical support for random forest (RF) model validation studies, related to Figure 8Figure 8.

https://cdn.elifesciences.org/articles/74104/elife-74104-fig8-data1-v2.xlsx
Figure 8—figure supplement 1
Workflow for training and validating random forest (RF) models on mutual information windowed spectral correlations (MIWSCs).

The five-step process described here yielded RF models trained to predict protein-protein interaction (PPI) class (either not-interacting, indirect PPIs, or direct PPIs) from the set of three MIWSC …

Figure 8—figure supplement 2
Workflow for training and validating various random forest (RF) models designed to predict non-interacting proteins, indirect protein-protein interactions (PPIs), and direct PPIs across the proteome of Escherichia coli K12.

A gold-standard dataset of well-characterized E. coli K12 protein pairs was assembled and partitioned into training and validation datasets. The labeled examples from the training set were used to …

Figure 8—figure supplement 3
F-scores for predicting interaction classes for out-of-bag examples in the training datasets (A) and four additional comprehensive benchmarks (B).

The violin plots describe the distribution of F-scores for models trained and validated on 50 random partitions of the gold-standard dataset (Figure 8—figure supplement 1). Numbering indicates the …

Additional files

Download links