(A) DOGG. Rows are 7047 bacterial proteomes, columns are 10,177 orthologous gene groups (OGGs), entries are the number of annotations of an OGG within a bacterial proteome (Figure 1—source data 1). …
DOGG matrix shown in Figure 1A.
(A) Singular value decomposition (SVD) performed on DOGG yields UOGG (rows are proteomes, columns are ‘left singular vectors’ [LSVs]) and VOGG (columns are OGGs, rows are ‘right singular vectors’ …
Shown here is an example of computing protein-protein spectral correlations using ArgA and ArgH in Escherichia coli K12. (A,B) The orthologous gene group (OGG) structures of E. coli K12 ArgA (panel …
(Top) Mdata consists of contributions of six variables (rows) onto six singular value decomposition (SVD) components (columns). If the variables correspond to the rows or columns of DOGG, the …
(A) Distribution of information (y-axis, ‘mutual information [MI] density’) for each benchmark (see legend) measured across the singular value decomposition (SVD) spectrum (x-axis, ‘spectral …
NCBI taxonomic strings for each organism used to generate phylogenetic benchmarks.
Benchmarks of protein-protein interactions (PPIs) in Escherichia coli K12.
(A) Histogram of the number of proteomes belonging to each of the top 15 out of a total 116 phyla in the data matrix DOGG. Inset is the four most abundant phyla. (B) Percent variance versus …
(A) Spectral components enriched for indirect and direct protein interactions (25th to 75th interquartile range of cumulative mutual information [MI] density) are selected, thereby filtering …
(A) Cumulative distribution functions (cdfs) for spectral correlations between all proteins in Escherichia coli K12 across windows of different widths (legend) centered on SVD component 1001. (B) …
(A) Proteins that shared significant spectral correlations with FliC after filtering for phylogeny and noise. (B) Hierarchically clustered spectral depth matrix for all pairs of proteins in panel A. …
(A) Statistical interaction networks defined at spectral depths 50 (top), 300 (middle), and 1000 (bottom). Nodes (yellow circles) are proteins; edges (red lines) reflect statistical interactions …
Nodes, edges, and contours are defined in the same manner as described in Figure 6A.
(A) Statistical interaction networks defined by thresholding spectral depth at 50 (top panel), 300 (middle panel), and 1000 (bottom panel) (Supplementary file 2). Nodes, edges, and contours are …
(A) Statistical interaction networks defined by thresholding spectral depth at 50 (top panel), 300 (middle panel), and 1000 (bottom panel; Supplementary file 3). Nodes, edges, and contours are …
(A) Statistical interaction networks defined by thresholding spectral depth at 300 and 1000. Nodes, edges, and contours are defined in the same manner as described in Figure 6—figure supplement 1, Su…
(A) Statistical network derived by applying a spectral depth threshold of 300 to the set of 141 protein in P. aeruginosa (strain PAO1) that were significantly correlated with PilA across SVD34 to …
(A) Statistical interaction network defined by thresholding spectral depth at 50. The inset illustrates significantly enriched terms resulting from gene-set enrichment analysis (GSEA) of the entire …
See Figure 8—figure supplement 1 for definition of ‘MIWSCs’. (A) F-scores for predicting interaction classes for Escherichia coli K12 protein pairs using random forest (RF) models trained on MIWSCs …
Data and statistical support for random forest (RF) model validation studies, related to Figure 8Figure 8.
The five-step process described here yielded RF models trained to predict protein-protein interaction (PPI) class (either not-interacting, indirect PPIs, or direct PPIs) from the set of three MIWSC …
A gold-standard dataset of well-characterized E. coli K12 protein pairs was assembled and partitioned into training and validation datasets. The labeled examples from the training set were used to …
The violin plots describe the distribution of F-scores for models trained and validated on 50 random partitions of the gold-standard dataset (Figure 8—figure supplement 1). Numbering indicates the …
Data pertaining to Figure 6A.
Data related to Figure 6—figure supplement 2.
Data related to Figure 6—figure supplement 3.
Data related to Figure 6—figure supplement 4 .
Data related to Figure 7A,B and Figure 7—figure supplement 1.
Data related to gene co-occurrence, gene fusion, gene neighborhood, and co-expression data using pilA in Pseudomonas aeruginosa (PAO1) as a query protein.
Data related to Figure 7C.