Systematic integration of biomedical knowledge prioritizes drugs for repurposing

  1. Daniel Scott Himmelstein
  2. Antoine Lizee
  3. Christine Hessler
  4. Leo Brueggeman
  5. Sabrina L Chen
  6. Dexter Hadley
  7. Ari Green
  8. Pouya Khankhanian
  9. Sergio E Baranzini  Is a corresponding author
  1. University of California, San Francisco, United States
  2. University of Pennsylvania, United States
  3. University of Nantes, France
  4. University of Iowa, United States
  5. Johns Hopkins University, United States
  6. University of California, San Fransisco, United States
6 figures, 4 tables and 1 additional file


Hetionet v1.0.

(A) The metagraph, a schema of the network types. (B) The hetnet visualized. Nodes are drawn as dots and laid out orbitally, thus forming circles. Edges are colored by type. (C) Metapath counts by path length. The number of different types of paths of a given length that connect two node types is shown. For example, the top-left tile in the Length 1 panel denotes that Anatomy nodes are not connected to themselves (i.e. no edges connect nodes of this type between themselves). However, the bottom-left tile of the Length 4 panel denotes that 88 types of length-four paths connect Symptom to Anatomy nodes.
Performance by type and model coefficients.

(A) The performance of the DWPCs for 1206 metapaths, organized by their composing metaedges. The larger dots represent metapaths that were significantly affected by permutation (false discovery rate < 5%). Metaedges are ordered by their best performing metapath. Since a metapath’s performance is limited by its least informative metaedge, the best performing metapath for a metaedge provides a lower bound on the pharmacologic utility of a given domain of information. (B) Barplot of the model coefficients. Features were standardized prior to model fitting to make the coefficients comparable (Himmelstein and Lizee, 2016a).
Predictions performance on four indication sets.

We assess how well our predictions prioritize four sets of indications. (A) The y-axis labels denote the number of indications (+) and non-indications (−) composing each set. Violin plots with quartile lines show the distribution of indications when compound–disease pairs are ordered by their prediction. In all four cases, the actual indications were ranked highly by our predictions. (B) ROC Curves with AUROCs in the legend. (C) Precision–Recall Curves with AUPRCs in the legend.
Evidence supporting the repurposing of bupropion for smoking cessation.

This figure shows the 10 most supportive paths (out of 365 total) for treating nicotine dependence with bupropion, as available in this prediction’s Neo4j Browser guide. Our method detected that bupropion targets the CHRNA3 gene, which is also targeted by the known-treatment varenicline (Mihalak et al., 2006). Furthermore, CHRNA3 is associated with nicotine dependence (Thorgeirsson et al., 2008) and participates in several pathways that contain other nicotinic-acetylcholine-receptor (nAChR) genes associated with nicotine dependence. Finally, bupropion causes terminal insomnia (Boshier et al., 2003) as does varenicline (Hays et al., 2008), which could indicate an underlying common mechanism of action.
Top 100 epilepsy predictions.

(A) Compounds — ranked from 1 to 100 by their predicted probability of treating epilepsy — are colored by their effect on seizures (Khankhanian and Himmelstein, 2016). The highest predictions are almost exclusively anti-ictogenic. Further down the prediction list, the prevalence of drugs with an ictogenic (contraindication) or unknown (novel repurposing candidate) effect on epilepsy increases. All compounds shown received probabilities far exceeding the null probability of treatment (0.36%). (B) A chemical similarity network of the epilepsy predictions, with each compound’s 2D structure (Himmelstein et al., 2017a). Edges are Compound–resembles–Compound relationships from Hetionet v1.0. Nodes are colored by their effect on seizures. (C) The relative contribution of important drug targets to each epilepsy prediction (Himmelstein et al., 2017a). Specifically, pie charts show how the eight most-supportive drug targets across all 100 epilepsy predictions contribute to individual predictions. Other Targets represents the aggregate contribution of all targets not listed. The network layout is identical to B.
The growth the Project Rephetio corpus on Thinklab over time.

This figure shows Project Rephetio contributions by user over time. Each band represented the cumulative contribution of a Thinklab user to discussions in Project Rephetio (Himmelstein and Lizee, 2016v). Users are ordered by date of first contribution. Users who contributed over 4500 characters are named. The square root transformation of characters written per user accentuates the activity of new contributors, thereby emphasizing collaboration and diverse input.


Table 1

Hetionet v1.0 includes 11 node types (metanodes). For each metanode, this table shows the abbreviation, number of nodes, number of nodes without any edges, and the number of metaedges connecting the metanode.
Biological processBP11,38101
Cellular componentCC139101
Molecular functionMF288401
Pharmacologic classPC34501
Side effectSE5734331
Table 2

Hetionet v1.0 contains 24 edge types (metaedges). For each metaedge, the table reports the abbreviation, the number of edges, the number of source nodes connected by the edges, and the number of target nodes connected by the edges. Note that all metaedges besides Gene→regulates→Gene are undirected.
Compound–causes–Side EffectCcSE138,94410715701
Gene–participates–Biological ProcessGpBP559,50414,77211,381
Gene–participates–Cellular ComponentGpCC73,56610,5801391
Gene–participates–Molecular FunctionGpMF97,22213,0632884
Gene→regulates→GeneGr > G265,67246347048
Pharmacologic Class–includes–CompoundPCiC1029345724
Table 3
The predictiveness of select metapaths.

A small selection of interesting or influential metapaths is provided (complete table online). Len. refers to number of metaedges composing the metapath. Δ AUROC and −log10(p) assess the performance of a metapath’s DWPC in discriminating treatments from non-treatments (in the all-features stage as described in Materials and methods). p assesses whether permutation affected AUROC. For reference, p=0.05 corresponds to −log10(p) = 1.30. Note that several metapaths shown here provided little evidence that Δ AUROC ≠ 0 underscoring their poor ability to predict whether a compound treated a disease. Coef. reports a metapath’s logistic regression coefficient as seen in Figure 2B. Metapaths removed in feature selection have missing coefficients, whereas metapaths given zero-weight by the elastic net have coef. = 0.0.
Abbrev.Len.Δ auroc−log₁₀(P)Coef.Metapath
CcSEcCtD314.0%6.80.08Compound–causes–Side Effect–causes–Compound–treats–Disease
CiPCiCtD323.3%7.50.16Compound–includes–Pharmacologic Class–includes–Compound–treats–Disease
CbGpBPpGaD44.9%3.80.00Compound–binds–Gene–participates–Biological Process–participates–Gene–associates–Disease
Table 4
The 29 public data resources integrated to construct Hetionet v1.0.

Components notes which types of nodes and edges in Hetionet v1.0 derived from the resource (as per the abbreviations in Table 1 and 2). Cat. notes the general category of license (Himmelstein et al., 2015i). Category 1 refers to United States government works that we deemed were not subject to copyright. Category 2 refers to resources with licenses that allow use, redistribution, and modification (although some restrictions may still exist). The subset of category 2 licenses that we deemed to meet the the Open Definition are denoted with OD. Category 4 refers to resources without a license, hence with all rights reserved. References provides Research Resource Identifiers as well as citations to resource publications and related Project Rephetio materials. For information on license provenance, institutional affiliations, and funding for each resource, see the online table.
Entrez GeneGcustom1RRID:SCR_002473 (Maglott et al., 2011; Himmelstein et al., 2015h; Himmelstein, 2016l)
LabeledInCtD, CpDcustom1RRID:SCR_015667 (Khare et al., 2014; Khare et al., 2015; Himmelstein and Khare, 2015s)
MEDLINEDlA, DpS, DrDcustom1RRID:SCR_002185 (Himmelstein and Pankov, 2015a; Himmelstein, 2016u)
MeSHScustom1RRID:SCR_004750 (Himmelstein and Pankov, 2015a; Himmelstein, 2016h)
Pathway Interaction DatabasePW, GpPW1RRID:SCR_006866 (Schaefer et al., 2009; Pico and Himmelstein, 2015; Himmelstein and Pico, 2016a)
Disease OntologyDCC BY 3.02ODRRID:SCR_000476 (Schriml et al., 2012; Kibbe et al., 2015; Himmelstein and Li, 2015d; Himmelstein, 2016g)
DISEASESDaGCC BY 4.02ODRRID:SCR_015664 (Himmelstein and Jensen, 2015l; Himmelstein and Jensen, 2016c; Pletscher-Frankild et al., 2015)
DrugCentralPC, CbG, PCiCCC BY 4.02ODRRID:SCR_015663 (Ursu et al., 2017; Himmelstein et al., 2016d)
Gene OntologyBP, CC, MF, GpBP, GpCC, GpMFCC BY 4.02ODRRID:SCR_002811 (Ashburner et al., 2000; Huntley et al., 2015; Himmelstein et al., 2015g; Himmelstein et al., 2015f)
GWAS CatalogDaGcustom2ODRRID:SCR_012745 (Himmelstein and Baranzini, 2016b; MacArthur et al., 2017; Himmelstein, 2015h; Himmelstein et al., 2015v)
ReactomePW, GpPWcustom2ODRRID:SCR_003485 (Fabregat et al., 2016; Cerami et al., 2011; Pico and Himmelstein, 2015; Himmelstein and Pico, 2016a)
LINCS L1000CdG, CuG, Gr > Gcustom2OD(Himmelstein and Chung, 2015q; Himmelstein et al., 2016k; Himmelstein, 2015k)
TISSUESAeGCC BY 4.02ODRRID:SCR_015665 (Santos et al., 2015; Himmelstein and Jensen, 2015g; Himmelstein and Jensen, 2015h)
UberonACC BY 3.02ODRRID:SCR_010668 (Mungall et al., 2012; Malladi et al., 2015; Himmelstein, 2016m)
WikiPathwaysPW, GpPWCC BY 3.0/custom2ODRRID:SCR_002134 (Kutmon et al., 2016; Pico et al., 2008; Pico and Himmelstein, 2015; Himmelstein and Pico, 2016a)
BindingDBCbGmixed CC BY 3.0 and CC BY-SA 3.02ODRRID:SCR_000390 (Chen et al., 2001; Gilson et al., 2016; Himmelstein and Gilson, 2015i; Himmelstein et al., 2015d)
DisGeNETDaGODbL2ODRRID:SCR_006178 (Himmelstein, 2015f; Himmelstein and Piñero, 2016d; Piñero et al., 2015; Piñero et al., 2017)
DrugBankC, CbG, CrCcustom2RRID:SCR_002700 (Law et al., 2014; Himmelstein, 2015b; Himmelstein, 2016i; Himmelstein et al., 2016r)
MEDICtD, CpDCC BY-NC-SA 3.02RRID:SCR_015668 (Himmelstein et al., 2015e; Wei et al., 2013)
PREDICTCtD, CpDCC BY-NC-SA 3.02(Gottlieb et al., 2011; Himmelstein et al., 2015e)
SIDERSE, CcSECC BY-NC-SA 4.02RRID:SCR_004321 (Kuhn et al., 2016; Himmelstein, 2015c; Himmelstein, 2016j)
BgeeAeG, AdG, AuG4RRID:SCR_002028 (Himmelstein et al., 2016f; Himmelstein and Bastian, 2015e; Himmelstein and Bastian, 2015f; Bastian et al., 2008)
DOAFDaG4RRID:SCR_015666 (Himmelstein, 2015g; Himmelstein, 2016s; Xu et al., 2012)
ehrlinkCtD, CpD4(McCoy et al., 2012; Himmelstein, 2015j)
Evolutionary Rate CovariationGcG4RRID:SCR_015669 (Priedigkeit et al., 2015; Himmelstein and Partha, 2015r; Himmelstein, 2016w)
hetio-dagGiG4(Himmelstein and Baranzini, 2015a; Himmelstein et al., 2015z; Himmelstein and Baranzini, 2016e)
Incomplete InteractomeGiG4(Himmelstein et al., 2015z; Himmelstein and Baranzini, 2016e; Menche et al., 2015; Himmelstein, 2015a)
Human Interactome DatabaseGiG4RRID:SCR_015670 (Himmelstein et al., 2015z; Himmelstein and Baranzini, 2016e; Rual et al., 2005; Venkatesan et al., 2009; Yu et al., 2011; Rolland et al., 2014)
STARGEODdG, DuG4(Himmelstein et al., 2015a; Himmelstein et al., 2016j; Hadley et al., 2017)

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Daniel Scott Himmelstein
  2. Antoine Lizee
  3. Christine Hessler
  4. Leo Brueggeman
  5. Sabrina L Chen
  6. Dexter Hadley
  7. Ari Green
  8. Pouya Khankhanian
  9. Sergio E Baranzini
Systematic integration of biomedical knowledge prioritizes drugs for repurposing
eLife 6:e26726.