A computational interactome and functional annotation for the human proteome

Abstract
Data availability
Article and author information
Metrics

Abstract

We present a database, PrePPI (Predicting Protein-Protein Interactions), of more than 1.35 million predicted protein-protein interactions (PPIs). Of these at least 127,000 are expected to constitute direct physical interactions although the actual number may be much larger (~500,000). The current PrePPI, which contains predicted interactions for about 85% of the human proteome is related to an earlier version but is based on additional sources of interaction evidence and is far larger in scope. The use of structural relationships allows PrePPI to infer numerous previously unreported interactions. PrePPI has been subjected to a series of validation tests including reproducing known interactions, recapitulating multi-protein complexes, analysis of disease associated SNPs, and identifying functional relationships between interacting proteins. We show, using Gene Set Enrichment Analysis (GSEA), that predicted interaction partners can be used to annotate a protein's function. We provide annotations for most human proteins, including many annotated as having unknown function.

Data availability

The following previously published data sets were used

(2004) The Database of Interacting Proteins:
Available at DIP.

http://dip.doe-mbi.ucla.edu/dip/Download.cgi?SM=10
(2007) MINT: the Molecular INTeraction database
Available at IntAct.

ftp://ftp.ebi.ac.uk/pub/databases/intact/current/all.zip
1. Stark
2. C
3. B-J Breitkreutz
4. A Chatr-aryamontri
5. L Boucher
6. R Oughtred
7. MS Livstone
8. J Nixon
9. K Van Auken
10. X Wang
11. X Shi
12. T Reguly
13. JM Rust
14. A Winter
15. K Dolinski and M Tyers
(2011) The BioGRID Interaction Database
Available at BioGRID.

https://thebiogrid.org/downloads/archives/Release%20Archive/BIOGRID-3.4.141/BIOGRID-ALL-3.4.141.tab.zip
(1997) MIPS
Available at MIPS.

http://mips.helmholtz-muenchen.de/proj/ppi/data/mppi.gz
1. Keshava Prasad
2. TS
3. R Goel
4. K Kandasamy
5. S Keerthikumar
6. S Kumar
7. S Mathivanan
8. D Telikicherla
9. R Raju
10. B Shafreen
11. A Venugopal
12. L Balakrishnan
13. A Marimuthu
14. S Banerjee
15. DS Somanathan
16. A Sebastian
17. S Rani
18. S Ray
19. CJ Harrys Kishore
20. S Kanth
21. M Ahmed
22. MK Kashyap
23. R Mohmood
24. YL Ramachandra
25. V Krishna
26. BA Rahiman
27. S Mohan
28. P Ranganathan
29. S Ramabadran
30. R Chaerkady and A Pandey
(2009) Human Protein Reference Database
Available at HPRD.

http://hprd.org/download
1. Franceschini
2. A
3. D Szklarczyk
4. S Frankild
5. M Kuhn
6. M Simonovic
7. A Roth
8. J Lin
9. P Minguez
10. P Bork
11. C von Mering and LJ Jensen
(2013) HumanNet
Available at HumanNet.

http://www.functionalnet.org/humannet/HumanNet.v1.join.txt
(2009) PIPs
Available at PIPs.

http://www.compbio.dundee.ac.uk/www-pips/downloads/PredictedInteractions100.txt
1. rown
2. KR and I Jurisica
(2007) OPHID
Available at I2D.

http://ophid.utoronto.ca/ophidv2.204/downloads.jsp
1. Kanehisa
2. M and S Goto
(2000) KEGG
Available at KEGG.

http://www.genome.jp/kegg-bin/download_htext?htext=ko00001.keg&format=htext&filedir=
1. Huerta-Cepas
2. J
3. D Szklarczyk
4. K Forslund
5. H Cook
6. D Heller
7. MC Walter
8. T Rattei
9. DR Mende
10. S Sunagawa
11. M Kuhn
12. LJ Jensen
13. C von Mering and P Bork
(2016) eggNOG
Available at eggNOG.

http://eggnog.embl.de/version_3.0/data/downloads/all.members.tar.gz
1. Penel
2. S
3. A-M Arigon
4. J-F Dufayard
5. A-S Sertier
6. V Daubin
7. L Duret
8. M Gouy and G Perrière
(2009) Hogenom
Available at Hogenom.

ftp://pbil.univ-lyon1.fr/pub/hogenom/release_06/hogenom6_prot.tar.gz
1. Altenhoff
2. AM
3. N Škunca
4. N Glover
5. C-M Train
6. A Sueki
7. I Piližota
8. K Gori
9. B Tomiczek
10. S Müller
11. H Redestig
12. GH Gonnet and C Dessimoz
(2015) OMA
Available at OMA.

http://omabrowser.org/All/oma-pairs.txt.gz
(2015) OrthoDB
Available at OrthoDB.

http://www.orthodb.org/v9/download/odb9_OGs.tab.gz
1. Okamura
2. Y
3. Y Aoki
4. T Obayashi
5. S Tadaka
6. S Ito
7. T Narise and K Kinoshita
(2015) Coexpression data for C. elegans
Publicly available at Coexpressdb (accession no: Cel.c2-0).

http://coxpresdb.jp/download/Cel.v12-08.G17256-S1034.rma.mrgeo.d.tar.bz2
1. Okamura
2. Y
3. Y Aoki
4. T Obayashi
5. S Tadaka
6. S Ito
7. T Narise and K Kinoshita
(2015) Coexpression data for dog
Publicly available at Coexpressdb (accession no: Cfa.c1-0).

http://coxpresdb.jp/download/Cfa.v12-08.G16211-S377.rma.mrgeo.d.tar.bz2
1. Okamura
2. Y
3. Y Aoki
4. T Obayashi
5. S Tadaka
6. S Ito
7. T Narise and K Kinoshita
(2015) Coexpression data for fruit fly
Publicly available at Coexpressdb (accession no: Dme.c2-0).

http://coxpresdb.jp/download/Dme.v12-08.G12626-S3336.rma.mrgeo.d.tar.bz2
1. Okamura
2. Y
3. Y Aoki
4. T Obayashi
5. S Tadaka
6. S Ito
7. T Narise and K Kinoshita
(2015) Coexpression data for zebrafish
Publicly available at Coexpressdb (accession no: Dre.c2-0).

http://coxpresdb.jp/download/Dre.v12-08.G10112-S1126.rma.mrgeo.d.tar.bz2
1. Okamura
2. Y
3. Y Aoki
4. T Obayashi
5. S Tadaka
6. S Ito
7. T Narise and K Kinoshita
(2015) Coexpression data for chicken
Publicly available at Coexpressdb (accession no: Gga.c2-0).

http://coxpresdb.jp/download/Gga.v12-08.G13757-S1024.rma.mrgeo.d.tar.bz2
1. Okamura
2. Y
3. Y Aoki
4. T Obayashi
5. S Tadaka
6. S Ito
7. T Narise and K Kinoshita
(2015) Coexpression data for human
Publicly available at Coexpressdb (accession no: Hsa.c4-0).

http://coxpresdb.jp/download/Hsa.v12-08.G19803-S73083.rma.mrgeo.d.tar.bz2
1. Okamura
2. Y
3. Y Aoki
4. T Obayashi
5. S Tadaka
6. S Ito
7. T Narise and K Kinoshita
(2015) Coexpression data for human
Publicly available at Coexpressdb (accession no: Hsa2.c1-0).

http://coxpresdb.jp/download/hsa.v14-04.G19816-S5626.quantile.mrgeo.d.tar.bz2
1. Okamura
2. Y
3. Y Aoki
4. T Obayashi
5. S Tadaka
6. S Ito
7. T Narise and K Kinoshita
(2015) Coexpression data for rhesus monkey
Publicly available at Coexpressdb (accession no: Mcc.c1-0).

http://coxpresdb.jp/download/Mcc.v12-08.G15779-S675.rma.mrgeo.d.tar.bz2
1. Okamura
2. Y
3. Y Aoki
4. T Obayashi
5. S Tadaka
6. S Ito
7. T Narise and K Kinoshita
(2015) Coexpression data for mouse
Publicly available at Coexpressdb (accession no: Mmu.c3-0).

http://coxpresdb.jp/download/Mmu.v12-08.G20403-S31479.rma.mrgeo.d.tar.bz2
1. Okamura
2. Y
3. Y Aoki
4. T Obayashi
5. S Tadaka
6. S Ito
7. T Narise and K Kinoshita
(2015) Coexpression data for Norway rat
Publicly available at Coexpressdb (accession no: Rno.c2-0).

http://coxpresdb.jp/download/Rno.c2-0.G13751-S3526.tar.gz
1. Okamura
2. Y
3. Y Aoki
4. T Obayashi
5. S Tadaka
6. S Ito
7. T Narise and K Kinoshita
(2015) Coexpression data for budding yeast
Publicly available at Coexpressdb (accession no: Sce.c1-0).

http://coxpresdb.jp/download/Sce.v12-08.G4461-S2693.rma.mrgeo.d.tar.bz2
1. Okamura
2. Y
3. Y Aoki
4. T Obayashi
5. S Tadaka
6. S Ito
7. T Narise and K Kinoshita
(2015) Coexpression data for fission yeast
Publicly available at Coexpressdb (accession no: Spo.c1-0).

http://coxpresdb.jp/download/Spo.v14-08.G4881-S224.rma.mrgeo.d.tar.bz2
1. Kolesnikov
2. N
3. E Hastings
4. M Keays
5. O Melnichuk
6. YA Tang
7. E Williams
8. M Dylag
9. N Kurbatova
10. M Brandizi
11. T Burdett
12. K Megy
13. E Pilicheva
14. G Rustici
15. A Tikhonov
16. H Parkinson
17. R Petryszak
18. U Sarkans and A Brazma
(2015) Coexpression data for human
Publicly available at Array Express.

https://www.ebi.ac.uk/arrayexpress/files/E-MTAB-62/E-MTAB-62.processed.2.zip

Article and author information

Author details

José Ignacio Garzón

Center for Computational Biology and Bioinformatics, Department of Systems Biology, Columbia University, New York, United States

Competing interests
The authors declare that no competing interests exist.
Lei Deng

Center for Computational Biology and Bioinformatics, Department of Systems Biology, Columbia University, New York, United States

Competing interests
The authors declare that no competing interests exist.
Diana Murray

Center for Computational Biology and Bioinformatics, Department of Systems Biology, Columbia University, New York, United States

Competing interests
The authors declare that no competing interests exist.
Sagi Shapira

Center for Computational Biology and Bioinformatics, Department of Systems Biology, Columbia University, New York, United States

Competing interests
The authors declare that no competing interests exist.
Donald Petrey

Center for Computational Biology and Bioinformatics, Department of Systems Biology, Columbia University, New York, United States

Competing interests
The authors declare that no competing interests exist.
Barry Honig

Center for Computational Biology and Bioinformatics, Department of Systems Biology, Columbia University, New York, United States

For correspondence
bh6@cumc.columbia.edu

Competing interests
The authors declare that no competing interests exist.

"This ORCID iD identifies the author of this article:" 0000-0002-2480-6696

Funding

National Institutes of Health (GM030518)

Barry Honig

National Institutes of Health (S10OD012351)

José Ignacio Garzón
Lei Deng
Diana Murray
Sagi Shapira
Donald Petrey
Barry Honig

National Institutes of Health (S10OD021764)

José Ignacio Garzón
Lei Deng
Diana Murray
Sagi Shapira
Donald Petrey
Barry Honig

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.