A human-based multi-gene signature enables quantitative drug repurposing for metabolic disease

  1. James A Timmons  Is a corresponding author
  2. Andrew Anighoro
  3. Robert J Brogan
  4. Jack Stahl
  5. Claes Wahlestedt
  6. David Gordon Farquhar
  7. Jake Taylor-King
  8. Claude-Henry Volmar
  9. William E Kraus
  10. Stuart M Phillips
  1. William Harvey Research Institute, Queen Mary University of London, United Kingdom
  2. Augur Precision Medicine LTD, United Kingdom
  3. Relation Therapeutics LTD, United Kingdom
  4. Fiona Stanley Hospital, Australia
  5. Center for Therapeutic Innovation, Miller School of Medicine, University of Miami, United States
  6. School of Medicine, Duke University, United States
  7. Faculty of Science, Kinesiology, McMaster University, Canada

Abstract

Insulin resistance (IR) contributes to the pathophysiology of diabetes, dementia, viral infection, and cardiovascular disease. Drug repurposing (DR) may identify treatments for IR; however, barriers include uncertainty whether in vitro transcriptomic assays yield quantitative pharmacological data, or how to optimise assay design to best reflect in vivo human disease. We developed a clinical-based human tissue IR signature by combining lifestyle-mediated treatment responses (>500 human adipose and muscle biopsies) with biomarkers of disease status (fasting IR from >1200 biopsies). The assay identified a chemically diverse set of >130 positively acting compounds, highly enriched in true positives, that targeted 73 proteins regulating IR pathways. Our multi-gene RNA assay score reflected the quantitative pharmacological properties of a set of epidermal growth factor receptor-related tyrosine kinase inhibitors, providing insight into drug target specificity; an observation supported by deep learning-based genome-wide predicted pharmacology. Several drugs identified are suitable for evaluation in patients, particularly those with either acute or severe chronic IR.

Editor's evaluation

This study reports the discovery of EGFR related tyrosine kinase inhibitors as agents that could potentially be repurposed to counteract metabolic disturbances arising from insulin resistance. The authors have used a computational approach to define a gene signature, which was then inputted to identify 130 compounds that interacted with pathways involved in insulin resistance. Important clinical implications may eventually follow from these studies.

https://doi.org/10.7554/eLife.68832.sa0

eLife digest

Developing a new drug that is both safe and effective is a complex and expensive endeavor. An alternative approach is to ‘repurpose’ existing, safe compounds – that is, to establish if they could treat conditions others than the ones they were initially designed for. To achieve this, methods that can predict the activity of thousands of established drugs are necessary.

These approaches are particularly important for conditions for which it is hard to find promising treatment. This includes, for instance, heart failure, dementia and other diseases that are linked to the activity of the hormone insulin becoming modified throughout the body, a defect called insulin resistance. Unfortunately, it is difficult to model the complex actions of insulin using cells in the lab, because they involve intricate networks of proteins, tissues and metabolites.

Timmons et al. set out to develop a way to better assess whether a drug could be repurposed to treat insulin resistance. The aim was to build a biological signature of the disease in multiple human tissues, as this would help to make the findings more relevant to the clinic. This involved examining which genes were switched on or off in thousands of tissue samples from patients with different degrees of insulin resistance. Importantly, some of the patients had their condition reversed through lifestyle changes, while others did not respond well to treatment. These ‘non-responders’ provided crucial new clues to screen for active drugs.

Carefully piecing the data together revealed the molecules and pathways most related to the severity of insulin resistance. Cross-referencing these results with the way existing drugs act on gene activity, highlighted 138 compounds that directly bind 73 proteins responsible for regulating insulin resistance pathways. Some of the drugs identified are suitable for short-term clinical studies, and it may even be possible to rank similar compounds based on their chemical activity.

Beyond giving a glimpse into the complex molecular mechanisms of insulin resistance in humans, Timmons et al. provide a fresh approach to how drugs could be repurposed, which could be adapted to other conditions.

Introduction

Systemic insulin resistance (IR) is a multi-organ pathophysiological state and an early characteristic of type 2 diabetes mellitus (T2DM). IR contributes to the pathobiology of neurodegeneration (Norambuena et al., 2017), heart failure (Wamil et al., 2021) and viral infections, such as COVID-19 (Ceriello et al., 2020; Donath, 2021). Several T2DM drug treatments indirectly reduce IR following improved metabolic homeostasis, making them candidate treatments for various diseases (Donath, 2021; Everett et al., 2018; Norambuena et al., 2017). Drug repurposing (DR) aims to accelerate the discovery and reduce the costs of new treatments. Multiple evolving strategies are being trialled, including mining of medical records, development of large databases of drug-gene interactions (Subramanian et al., 2017) and virtual compound screening (Himmelstein et al., 2017). Drug transcriptome responses in cells represent one of the most extensive resources (Subramanian et al., 2017), while transcriptomics is also an ideal technology to capture complex biological processes in human tissues (Jenkinson et al., 2016; Timmons et al., 2018; Timmons et al., 2005). Effectiveness at reversing of the molecular responses to disease (Wagner et al., 2015) helps to predict drug efficacy in cancer (Brown and Patel, 2018; Iorio et al., 2018; Karatzas et al., 2017; Wang et al., 2016). Successful application of DR specifically to oncology may reflect that drug profiles are typically generated in tumour cell lines (Subramanian et al., 2017) and that barriers to clinical validation can be lower compared with many other diseases.

Identifying informative disease signatures for DR in cells is challenging (Chen et al., 2020; Karatzas et al., 2017; Regan-Fendt et al., 2019), particularly when no positive controls exist (Williams et al., 2019). Currently, there are no reliable human cellular models for systemic IR, while it remains unclear if multi-gene assays can capture quantitative pharmacological relationships suitable for optimising drug design. However, clinically effective drugs typically target several proteins, many of which are unknown (Keenan et al., 2018), highlighting the limitations of single-target drug development programmes. Network modelling and deep learning (DL) have been utilised to connect the pharmacological properties of active drugs to their protein targets (Woo et al., 2015; Zeng et al., 2020). For IR we also have effective non-drug treatments (Nakhuda et al., 2016; Slentz et al., 2016; Timmons et al., 2018; Phillips et al., 2017), and this enabled production of a novel human-based IR-DR assay – using more than 2000 tissue profiles generated in our laboratories (Nakhuda et al., 2016; Slentz et al., 2016; Timmons et al., 2018) and one other (Civelek et al., 2017). Performance of the present RNA-based multi-gene assays was judged against positive control in vivo drug signatures (Stathias et al., 2020), genome-wide association (Lotta et al., 2017; Vujkovic et al., 2020) and blood proteome-based assays (Gudmundsdottir et al., 2020). Validation of the in vitro results for >2500 drugs (Subramanian et al., 2017) relied on a variety of protein, drug- and disease-centric (Parisi et al., 2020) criteria: DL-based modelling of drug-to-protein interactions; targeted gene knock down; and published evidence that the drug reduced IR in vivo (Figure 1).

The project analysis process.

The three major phases of the project are defined by the grey boxes. A limited number of gene signatures were considered (four) to limit false-positive associations. The compound (CMPD) selection phase first confirmed that the drug repurposing signature provided valid matches with in vivo positive control drugs, and then a full list of in vitro active drug matches was generated. The third phase was an iterative process in that validation was considered on several levels. We utilised four main independent validation strategies, incorporating multiple data sources, to demonstrate that the insulin resistance drug repurposing (IR-DR) signature produced a high rate of likely true-positive drugs that would reverse IR.

Results and discussion

Feature selection and in vivo validation of novel IR-DR assays

Homeostasis model assessment version 2 (HOMA2-IR) was used to quantify IR (Wallace et al., 2004). RNA biomarkers consistently related to fasting IR (‘disease’) across tissues were combined with those regulated in common across tissues following lifestyle-based reversal of IR (‘treatment’). Biomarkers were ranked based on consistent direction and strength of association across two major human organs targeted by insulin (human adipose and muscle) because most orally dosed drugs will act systemically. Quantitative network modelling (Song and Zhang, 2015) was used to rank genes for their tissue-based hub connectedness (Appendix 1—figure 1). In the present study, we considered the performance of only four RNA-based IR-DR assays (Appendix 1—figure 1; Ganter et al., 2006); testing their ability to match the in vivo directionality of positive controls, thiazolidinedione (TZD) and oestrogen, expression signatures correctly (Hevener et al., 2018; Sears et al., 2009). The top-scoring RNA signature (Signature 3A from Appendix 1—figure 1) was a statistically ranked combination of disease- and treatment-associated genes (n = 120 genes) outranked selection by hub connectedness, recapitulated cellular gene expression patterns indicative of TZD treatment responses in vivo in muscle (moderated Z-score, p<0.0000008, Appendix 1—figure 2) and is referred to as the IR-DR signature/assay hereafter (Appendix 1—figure 1). Lack of superiority for the assay designed using hub connectedness may be considered at odds with other studies (Cheng et al., 2012) but could reflect that inclusion of multi-tissue treatment response biomarkers supersedes any benefit of using network weighting. Several Genome-wide Association Study (GWAS) IR and T2DM (Lotta et al., 2017; Vujkovic et al., 2020)-derived signatures (e.g. Signature 4, Appendix 1—figure 1) were considered but were unable to match positive control drugs in vivo. The T2DM blood proteome signature (Gudmundsdottir et al., 2020) had a weak association with one positive control drug. Protein-level network interactions formed by each list (Appendix 1—figure 1) were distinct (Appendix 1—figure 2) and were only possible to partially recreate from existing databases (Li et al., 2018a).

IR-DR assay identifies drugs and pathways with established links with insulin signalling

The largest available database of in vitro drug signatures (Subramanian et al., 2017) was used to identify cell-type agnostic drug responses. To achieve this, we utilised aggregated scores (the maximum quantile statistic from the within-cell line-normalised scores) from across nine human cell lines. This approach also increases the sample size per drug by at least ninefold, making any inferences more reliable (Subramanian et al., 2017). At the request of a reviewer, we provide results from individual cells (Table S3 Appendix 1—figure 3); however, we caution that these within-cell rank-order values are known to be less robust (Subramanian et al., 2017; Xu et al., 2018). Critically, we noted that members of each drug ‘class’ (drugs sharing a nominal primary protein target in common) were segregated with either active or neutral IR-DR scores, with extremely few drug classes having both positive- and negative-scoring compounds. Only 10% of the database matched the IR-DR signature (n = 254, Appendix 1—figure 3 and Table S3), and 138 compounds (after excluding assay codes with ambiguous compound labels) positively regulated the IR-DR signature (potential treatments), 45% of which were kinase inhibitors (Appendix 1—figure 4). Most negative acting drugs targeted tubulin and cell cycle proteins or were pro-inflammatory agents (Appendix 1—figure 5). Positive and negative acting compounds did not differ in average physiochemical properties (Appendix 1—figure 6), while assays based on GWAS-selected genes for IR (Lotta et al., 2017; Vujkovic et al., 2020) and T2DM produced no discernible pattern of in vitro hits.

The pharmacology of the 138 positive compounds indicated that a substantial number of targeted aspects of insulin signalling were known, empirically, to reverse IR in vivo (Table 1). Compounds identified varied in nature from inhibitors of glucosylceramide synthase, which reverses IR and fatty liver disease (Aerts et al., 2007; Herrera Moro Chao et al., 2019), to 10 mTOR inhibitors. The mTOR complex, mTORC1, coordinates a negative feedback loop on insulin signalling, for example, through activation of GRB10 or via S6K1 (Um et al., 2004). mTORC1 signalling is also regulated by protein kinase C (PKC) (Zhan et al., 2019), and specific PKC isoforms are dysregulated in ageing, metabolic, neurodegenerative and inflammatory diseases (Li et al., 2015; Sajan et al., 2018; Sharma et al., 2019). We observed that the broad-spectrum PKC inhibitor, bisindolylmaleimide I, induced a strong positive IR signature score (+87) and the related compound, ruboxistaurin, reverses IR in vivo (Guo et al., 2020; Naruse et al., 2006). In contrast, bisindolylmaleimide IX, a 20-fold more potent broad-spectrum PKC inhibitor, was inactive in the IR-DR assay, probably reflecting its greater non-specific pharmacology (against other kinase families). Three so-called PKC activators (French et al., 2020; Lee et al., 2020) induced negative IR-DR scores (phorbol-12-myristate-13-acetate = –87, ingenol = –96 and prostratin = –97). RNAi targeting of individual PKC isoforms (clue.io) demonstrated that the IR-DR assay was sensitive to specific PKC isoformactivity. While >95% of all RNAi assays produced no significant scores, knock-down (KD) of PKC-beta (+74) and PKC-theta (+97) yielded positive IR-DR scores, while loss of PKC-alpha (–75) and -eta (–75) produced negative IR-DR scores and overexpression (OE) of PKC-alpha was positive scoring (+85). The multi-gene IR-DR assay therefore identifies numerous true-positive drugs (Table 1) and reflects isoform-specific activity, strongly validating the cell-agnostic aggregation methodology.

Table 1
Examples of the major drug classes producing a positive insulin resistance-drug repurposing (IR-DR) score and associated literature evidencing efficacy.

In vivo refers to evidence for in vivo validation of the drug and/or its target proteins.

PathwayExample drugBiology narrativeIn vivoExample literature
ATPase/cardiac glycosideProscillaridin, digoxinHeart failure drug; possibly mimicking the action of metformin on mitochondria in vitro; senolytic.NoFürstenwerth, 2012; Triana-Martínez et al., 2019
Calcium channelNifedipineRestores autophagy, improves glucose tolerance and insulin action.YesIwai et al., 2011; Koyama et al., 2002; Lee et al., 2019; Sheu et al., 1991
Calcium/calmodulin signallingNM-PP1Insulin signalling upstream of p38; restores ATF6-related autophagy; insulin resistance, diabetes and Alzheimer’s pathophysiology.YesAlfazema et al., 2019; Ozcan et al., 2015; Ozcan et al., 2013; Yin et al., 2017
DopamineL-741626Central and peripheral role in regulation of glucose tolerance – contradictory/paradoxical behavioural/hepatic agonist/antagonist activity.YesAmamoto et al., 2006; Fontaine et al., 2020; Kellar and Craft, 2020; Park et al., 2007; Stoelzel et al., 2020
Tyrosine kinase/ERBB receptor inhibitorsCanertinib, gefitinib, afatinibInhibition of EGFR, DDR1, ABL1 and related kinases produces a positive IR-DR score. Extensive data link EGFR and inhibitors of EGFR to insulin resistance and neurodegeneration. Pro-inflammatory signalling via iRHOM2 and MAP3K7; circulating biomarker of insulin resistance and hepatic metabolic disease.YesChen et al., 2019; Chiu et al., 2020; Fowler et al., 2020; Kyohara et al., 2020; Li et al., 2018b; Skurski et al., 2020; Vella et al., 2019; Wang et al., 2012; Wu et al., 2017
Glucocorticoid/anti-inflammatoryValdecoxib, Spectrum_001832Anti-inflammatory; various steroidal and non-steroidal anti-inflammatory drugs reduce IR in a variety of models of diabetes/obesity. Excess corticosteroids induce IR.YesChakraborti et al., 2010; Chan et al., 2018; Reading et al., 2013
Glucosylceramide synthaseBRD-K88761633, AMP-DNMGlycosphingolipid biosynthesis – inhibition treats insulin resistance and fatty liver disease.YesAerts et al., 2007; Herrera Moro Chao et al., 2019
Heat-shock protein 90LuminespibATPase cycle and chaperone function – inhibition improves insulin sensitivity; Hsp90 activated in dementia. Role in INSR turnover and protein phosphatase 5 activation.YesImamura et al., 1998; Jing et al., 2018; Shelton et al., 2017; Yang et al., 2005
MAPK/MEK/ERK inhibitorsPD-184352, PD-0325901, XMD-892Multiple roles in insulin signalling and metabolism; inhibitors target multiple kinases.YesOzaki et al., 2016; Sharma et al., 2014; Tarragó et al., 2018; Wauson et al., 2013
mTOR relatedAZD-8055, WYE-354, torin-2Inhibition of mTORC1 activity – including knock-down of RAPTOR – produces a strong positive IR-DR score. In multiple studies, mTOR inhibition reduces age-related metabolic dysfunction.YesHowell et al., 2017; Jahng et al., 2019; Morita et al., 2013; Nie et al., 2018a; Norambuena et al., 2017; Zhan et al., 2019
Nicotinamide phosphoribosyltransferaseCAY-10618 (GPP78)NAMPT (or visfatin) inhibitor which attenuates atherosclerosis in the high-fat-induced insulin resistance model and is anti-inflammatory.YesLi et al., 2016; Lockman et al., 2010; Travelli et al., 2017
Phosphodiesterase 5AMBCQ, sidenafilPDE5A is negative regulator of insulin, aspects of ageing – potentially via miR-22-3p.YesBlagosklonny, 2017; Fiore et al., 2018; Fiore et al., 2016; Liu et al., 2019
Phosphoinositide 3-kinaseAZD-6482, PI-103, GDC-0941Multiple PI3K inhibitors produce strong positive IR-DR scores. In multiple studies PI3K varies with metabolic dysfunction; however, all kinase inhibitors target multiple related kinases, so specific target unclear.NoChiu et al., 2020; Copps et al., 2016; Wang et al., 2020; Zou et al., 2004
RAF kinaseAZ-628, vemurafenibRAF1 is increased in obesity-induced IR, inhibitors can block insulin/AKT1/MAPK signalling in a context-specific manner. AZ-628 also RIP3 inhibitor – anti-arthritis strategy.NoMacLaren et al., 2008; Osrodek et al., 2020; Sun et al., 2016
  1. EGFR, epidermal growth factor receptor.

Identification of the active drug-protein targets through single-gene targeting and network biology

The 138 positive scoring IR-DR drugs target 1007 proteins (Mendez et al., 2019; Moret et al., 2019). Of these, 465 genes had single-gene KD or OE scores, aggregated across 6–9 cell lines (Appendix 1—figure 7). Seventy-three targets (15.7%) yielded a significant IR-DR score; double the assay hit rate (p<0.0001, see Methods and Figure 2). Predictably, due to input bias (Timmons et al., 2015), these targets regulated ‘peptidyl-serine phosphorylation-related processes’ (q-value <1 × 10–23). None belonged to the IR-DR gene signature (Appendix 1—figure 8), but they did belong to numerous common pathways (Appendix 1—figure 8). These observations are consistent with the idea that an effective DR signature captures the pathway biology of the disease and/or treatment (Brown and Patel, 2018; Chen et al., 2020; Karatzas et al., 2017; Keenan et al., 2018; Regan-Fendt et al., 2019; Wagner et al., 2015; Woo et al., 2015) but does not necessarily include the nominal drug targets (Figure 2).

The overlap between protein targets of positively acting drugs and the insulin resistance-drug repurposing (IR-DR) input signature.

(A) A network of significant pathways coloured by p-values, derived from the IR-DR input genes and the 73 validated protein targets of the 150 positively acting drugs. (B) Edges represent connected Gene Ontology (GO) biological processes (>0.3), and nodes within each cluster are coloured/named by their most statistically enriched GO term. (C) Each node is presented as a pie chart, scaled in size by the total number of terms represented by that (top-scoring) ontology, and with the ‘slices’ coloured to indicate which gene list the terms originate from. The same network structure is separately colour-coded by list membership to identify when pathways include members of Signature 3A (red), or protein targets which are negative acting genes (blue, where inhibition yields a positive and overexpression yields a negative IR-DR score) or genes appear to be positively acting (green).

Network representation of how these 73 independently validated target proteins of active drugs (Appendix 1—figure 4) interact with IR-DR signature at a pathway level is presented in Figure 2A (Benjamini-Hochberg corrected p-values). The GO terms are scaled by the total number of significant terms and labelled by the top-level category (Figure 2B). Coloured orange (Figure 2B), a module of the ‘carboxylic acid biosynthetic process’ pathway contains genes that, when more ‘active,’ positively modulate the IR-DR signature (Z = 8.3, p=1 × 10–9). Each pathway is also coloured (Figure 2C) to indicate whether it contains a known drug target or was part of the IR-DR signature. The IR-DR assay genes formed eight pathway clusters, of which the majority directly contain some RNAi validated protein targets, for example, ‘negative regulation of phosphate metabolic process’ (coloured brown, q-value 1 × 10–7). As with the analysis of the individual PKC isoforms, there are compelling examples of proteins contributing to metabolic disease, for example, SMAD3 (+87 IR-DR score from OE and –95 IR-DR score from RNAi) is an in vivo-validated IR pathway (Budi et al., 2019; Sun et al., 2015; Tan et al., 2011). However, if multi-gene DR assays are to be used for optimising drug properties, it is critical to establish that they can produce quantitative pharmacological feedback when comparing related drugs (Hopkins, 2008).

An analysis of the relationship between the insulin resistance-drug repurposing (IR-DR) score and laboratory-based pharmacological potency and selectivity or deep learning-based predictions of compound potency and selectivity.

(A) Inhibitory constants (nM) derived from laboratory assays against top-ranked targets for a series of epidermal growth factor receptor (EGFR) inhibitors. (B) Relationship between IR-DR score (100 = best score) and log potency against EGFR. (C) Expanded range of known targets, for at least one of the inhibitors, helps identify potentially positive (red box) and negative (blue box) off-target inhibitory actions. (D) Rank order score (RS, 1–19211) of predicted compound binding for all protein-coding genes using the DeepPurpose ML model; lab-validated targets feature in top 0.15% of target predictions. Log rank order (‘predicted potency’) for EGFR, over the protein-coding genome, partly predicts efficacy in IR-DR assay, confirming that the ML model matches the relationship observed using the laboratory pharmacology. (E) Using the predicted protein targets and the DeepPurpose rank order scores, it is possible to cluster positively acting ‘EGFR’ compounds from less active or negatively acting compounds.

Aggregated IR-DR assay score directly relates to pharmacologically derived in vitro potency

The relationship between in vitro drug potency and IR-DR assay score for 37 compounds (Appendix 1—figure 5), nominally targeting epidermal growth factor receptor (EGFR or HER1) tyrosine kinase, was investigated. This stress-induced inflammatory protein has recently emerged as a target for treating metabolic disease and neurodegeneration (Donath, 2021; Menden et al., 2019; Norambuena et al., 2017). One endogenous EGFR ligand, amphiregulin, is induced by high-fat feeding to drive TNF-mediated IR (Skurski et al., 2020), while EGFR is overexpressed in astrocytes of Alzheimer’s disease (AD). The EGFR inhibitor afatinib (IR-DR score = 77) attenuates astrocyte activation (Chen et al., 2019) while inhibition of EGFR can reduce FOXK1 and FOXK2 phosphorylation (Klaeger et al., 2017) to normalise mTORC1-regulated autophagy and reduce IR (Bowman et al., 2014; Jahng et al., 2019). Nine EGFR inhibitors have been screened against >300 kinases (Davis et al., 2011; Klaeger et al., 2017), which enabled us to directly contrast laboratory-derived kinase selectivity with the IR-DR score. Potency versus EGFR directly related to IR-DR assay score (Figure 3A), yet this could not fully explain why certain compounds were inactive. Cluster analysis of the most targeted proteins (<300 nM potency for at least one compound) illustrated that alisertib, the only potent EGFR inhibitor with a negative IR-DR score (–90), inhibited PLK4, AURKB and AURKA (Figure 3B). Orantinib, a 24 nM inhibitor of AURKB, also had a negative IR-DR score (–77), as did MK-5108, an inhibitor of AURKB and AURKA (–90), indicating that alisertib’s profile reflects pharmacology beyond EGFR (probably AURKB as AURKB KD had an IR-DR score of –83, Table S3).

Multiple protein targets help explain the positive activity of EGFR targeting drugs

A broader exploration of ‘EGFR’ inhibitor targets provides a better understanding of the activity of this group of compounds in the IR-DR assay. For example, neutral scoring bosutinib and neratinib target several mitogen-activated protein kinase (MAPK) family members, and some of these oppose positive IR-DR scoring, for example, MAP2K2 (–78, Table S4). Neutral scoring, yet potent EGFR inhibitors (e.g. neratinib, bosutinib and lapatinib) also inhibit the related proteins, ERBB2, ERBB3 and ERBB4 (<5 nM, HER2-4). Some of these family members may represent beneficial ‘off targets’ while others may be detrimental (Moret et al., 2019). For example, hyperglycaemia induces erbb4 in mice and erbb4 expression is increased in AD (Huh et al., 2016; Woo et al., 2011), where OE increases tau phosphorylation via mTOR activation (Nie et al., 2018b). Gefitinib (an EGFR inhibitor) reduces IR-mediated glucose excursions in vivo in a RIPK2-dependent manner (Duggan et al., 2020) and rescues memory deficits in mice at a very low chronic dose of 0.01 mg/kg (Wang et al., 2012). Loss of RIPK2 (or a dominant-negative mutant of RIPK2) prevents excessive NFκB activation (Chin et al., 2002), and RIPK2 is a downstream effector of innate immunity (TLR signalling).

Some EGFR targeting drugs also potently inhibit the tyrosine kinase ABL1, and loss of adipose ABL1 reduces obesity-induced IR in the mouse (Wu et al., 2017). Erlotinib (IR-DR score = +72), which also inhibits ABL1, reduces kidney inflammation and preserves pancreas function and insulin sensitivity in a mouse model of diabetes (Li et al., 2018b). Furthermore, Aβ activates neuronal ABL1 in vitro, intra-hippocampal injection of Aβ fibrils increases expression of ABL1 in vivo and imatinib (STI571), a 90 nM inhibitor of ABL1, inhibits ABL1-mediated Aβ neurodegenerative pathways (Cancino et al., 2011; Cancino et al., 2008; Gutierrez et al., 2019). However, imatinib does not yield a significant IR-DR score (nor inhibit EGFR), indicating that targeting ABL1 alone might be insufficient to treat human IR. Importantly, potency of the EGFR inhibitors against ABL1 also correlates with their potency against several ephrin receptors (EPHA5 R = 0.74, EPHA6 R = 0.95 and EPHA8 R = 0.76), as well as with RIPK2 (R = 0.77). Oral dosing of the ephrin A receptor inhibitor, UniPR500, reverses high-fat feeding-induced glucose intolerance without changes in plasma insulin (Giorgio et al., 2019), and these benefits likely reflect UniPR500 targeting proteins in common with our top-ranked ‘EGFR’ kinase inhibitors. Thus, while drug potency against EGFR quantitatively tracks with the IR-DR score (Figure 3A), this may reflect binding affinity at other related protein kinases, and identification of these additional targets is important (Redhead et al., 2021).

A DL model of genome-wide binding affinity ranks compound affinity in the IR-DR assay

Thus we find that (Figure 3A–C) the best scoring potent ‘EGFR’ kinase inhibitors reflect a balance of activities against positively and negatively acting kinases and that this is partly interpretable versus the extensive in vitro screening data for those drugs (Davis et al., 2011; Klaeger et al., 2017). Genome-wide pharmacological profiles are however prohibitively costly and thus often unavailable. Predicted drug-protein interactions, using emerging techniques from graph machine learning (Sturm et al., 2020), aim to overcome this lack of laboratory data. Using the DeepPurpose DL suite of algorithms (Huang et al., 2021), we modelled EGFR family kinase inhibitors as simplified molecular-input line-entry system (SMILES) string and all proteins by their amino acid sequence (a strategy that obfuscates the need for 3D structures obtained by costly experimental models). This expanded the scope of drug target information of the EGFR inhibitors discussed above to a genome-wide level (Tables S6-S8). Each of the EGFR inhibitors was scored against 19,211 proteins using 14 pre-trained models (Table S9), and we relied on a fusion of ranking scores across models to identify the top protein targets of each compound.

The nine EGFR inhibitors described above inhibit 25 proteins with nanomolar potency (<300 nM), and the DL model accurately ranked these proteins in the top 0.1–1.7% of all 2,420,586 predictions (median = 0.15%, Table S6). The DL-predicted rank score (‘potency’) against EGFR strongly related to the measured IR-DR score (Figure 3D), replicating the potency-activity relationship noted using laboratory data (Figure 3A) correctly clustering the nine compounds (Figure 3C, Appendix 1—figure 9A). DL also ranked several proteins that we already identified may compromise a positive IR-DR score (Figure 3E, Appendix 1—figure 9). For example, AURKB was a top-ranked predicted target for alisertib (26/19211), aligning with the data that inhibition of AURKB drives a negative IR-DR score. Additional predicted protein targets (Table S8) will also influence the IR-DR score, independently of EGFR, for example, MERTK, KCNH6 and PTK2B (Appendix 1—figure 9). Loss of PTK2B (focal adhesion kinase 2 [FAK2]), a risk gene for the development of tauopathy in AD (Tan et al., 2021), can promote the development of IR in vivo and in adipocytes (Luk et al., 2017; Yu et al., 2005) while inhibition of KCNH6 should probably be avoided,as it regulates insulin secretion (Yang et al., 2018). In contrast, a genetic loss-of-function variation in MERTK appears protective against IR, fatty liver disease and pro-inflammatory mediators in humans (Musso et al., 2017), and thus it represents a potential protein target against which current ‘EGFR’ inhibitors should be screened against. Therefore, we applied the same modelling strategy to 16 of the 28 less well-characterised EGFR inhibitors with proven sub-micromolar activity. Each was ranked highly by the model against EGFR (Table S7), with MAP3K19 being one of the highest ranked additional targets (Appendix 1—figure 10), and there was a negative correlation between predicted MAP3K19 binding and IR-DR score (Appendix 1—figure 10). Little is known about MAP3K19 (a ‘dark’ kinase) other than that it may contribute to ERK pathway activation (Hoang et al., 2020) – and some ERK inhibitors proved positive scoring in the IR-DR assay (Appendix 1—figure 3) – and thus MAP3K19 may be a novel positive effector of insulin signalling. MAP3K19 is not abundantly expressed in adipose or muscle tissue (lowest 10th percentile of gene expression in our studies) such that net compound efficacy in vivo may also reflect tissue-specific patterns of protein activity.

General conclusions and limitations

We illustrate that a cell-line transcriptome-based high-throughput DR assay yields interpretable and quantitative pharmacological data when designed around robust clinical RNA signatures, and that DL-based drug target predictions can be used to interpret assay scores. Some of the drugs we identified may be suitable for treating acute IR, such as occurring during infection (Ceriello et al., 2020; Donath, 2021), and encouragingly several positive scoring drugs appear tolerable in longer-term preclinical models of metabolic or neurogenerative disease (Li et al., 2018b; Wang et al., 2012). The present approach could be extended to include a stratified medicine component, where evaluation of positively acting compounds is first trialled in T2DM patients with extreme IR (Choi et al., 2019). A number of positively acting IR-DR compounds, including selected mTOR inhibitors (Appendix 1—figure 3), are able to mimic a longevity-related RNA signature (Timmons et al., 2019) and thus may be potential geroprotectors (Fuentealba et al., 2019). A more extensive multi-disease signature approach could ultimately help tailor the DR process to the individual patient. We do acknowledge that some IR-DR assay negative scores may be false negatives, for example, selective HDAC inhibition (HDACi) can, through lower and shorter daily exposure (Sartor et al., 2019; Volmar et al., 2017), be beneficial (although positive attributes of HDACi on IR appear to reflect non-specific actions; Martins et al., 2019). Furthermore, despite correctly matching with nuclear receptor-induced muscle transcriptome signatures in vivo, there was a dearth of matches in vitro indicating that further optimisation of the IR-DR assay format is merited. It can be the case that certain classes of ligand require more sophisticated assay condition or require the use of primary cells. In conclusion; human transcriptome signatures, classic pharmacological assays, drug action in vivo and DL-based target prediction consistently link with drug transcriptional profiles in cell lines, establishing that expansion of such resources represents an important strategy for future DR efforts.

Materials and methods

Key resources table
Reagent type (species) or resourceDesignationSource or referenceIdentifiersAdditional information
Software, algorithmRhttps://www.r-project.org/3.6.3 and 4.04
Software, algorithmPythonhttps://www.python.org/D1306
Software, algorithmDeepPurposehttps://github.com/kexinhuang12345/DeepPurpose.git;
Huang et al., 2021
2020
Software, algorithmVennyhttps://bioinfogp.cnb.csic.es/tools/venny/2.10
Software, algorithmMetascapehttp://metascape.org/gp/index.html#/main/step12020
Software, algorithmCLUEhttps://clue.io/March 2020
Software, algorithmPubChemhttps://pubchem.ncbi.nlm.nih.gov/December 2020
Software, algorithmPubMedhttps://pubmed.ncbi.nlm.nih.gov/December 2020
Software, algorithmSMShttps://labsyspharm.shinyapps.io/smallmoleculesuite/December 2020
Software, algorithmiLINCShttp://www.ilincs.org/ilincs/signatures/search/March 2020
Software, algorithmMorpheushttps://clue.io/morpheus2021
Software, algorithmCodeSource_code_file.docx-R/Python code used in project

We utilised human muscle and adipose tissue transcriptome profiles from multiple large studies (Civelek et al., 2017; Nakhuda et al., 2016; Slentz et al., 2016; Timmons et al., 2018). The individual sample identifiers utilised in this study are reported (Appendix 1—figure 10) and deposited online at GEO. Gene expression (IRON-normalised data; Welsh et al., 2013) was contrasted against log-transformed HOMA2-IR values (measured in fasting blood [Wallace et al., 2004] using the Excel-based version of http://www.dtu.ox.ac.uk/homacalculator, adjusting for patient age) using ANOVA and linear regression (Timmons et al., 2018). A robust transcriptional signature of IR shared across two human insulin-targeted tissues was identified (from a total of 337 genes significantly regulated in muscle, FDR < 5%, absolute correlation-coefficient (CC) >0.15); this represented our ‘disease signature’ (Appendix 1—figure 11). Gene expression responses that are proportional to treatment efficacy (reduced IR) and consistent across two human tissue types have not been previously investigated. Change in gene expression was derived from biopsy samples obtained before and following supervised lifestyle intervention. Lifestyle intervention ranged from aerobic to resistance training with modest calorie restriction, and varied in duration from 15 min (e.g. high-intensity cycle-based exercise) to >1 hr 3 days a week, as previously detailed (AbouAssi et al., 2015; Nakhuda et al., 2016; Phillips et al., 2017; Slentz et al., 2011). In short, we located genes tracking with efficacy, regardless of clinical protocol or ‘dosage’. Change in gene expression was related to change in HOMA2-IR identifying a consistent ‘treatment signature’ for muscle (mean q-value < 0.08, CC values consistent in 3/4 of muscle studies), and then those similarly regulated in adipose tissue were retained.

Feature selection and in vivo DR validation step

Request a detailed protocol

The disease and the treatment genes lists represent the pool of features from which we selected IR-DR signatures. Quantitative network modelling (Song and Zhang, 2015) was applied to tissue expression values of these genes, as previously described (Timmons et al., 2019), to identify hub genes; genes with greater connectivity (Appendix 1—figure 1). Four alternative similarly sized sets of genes were selected for validation, comprising 60 positively associated and 60 negatively associated RNAs. The final models shared only two genes named as candidates from genome-wide IR association studies (Lotta et al., 2017), and those genes (INSR and GRB14) were not essential for our analysis. To rank the performance of each of our four RNA assays (Table S1), we utilised DrugMatrix, a database of in vivo rodent tissue drug-response signatures (http://www.ilincs.org/ilincs/) that includes TZD and oestrogen-related molecules known to reverse IR in vivo (Hevener et al., 2018; Sears et al., 2009) and target IR pathways in cell models (Sood et al., 2016). Signatures validated at this stage were considered suitable for further use. Lists that failed this step (lists 4 and 5) included genes inferred from genome-wide IR association studies, and T2DM proteome biomarkers (Gudmundsdottir et al., 2020; Lotta et al., 2017; Vujkovic et al., 2020) were modelled in vitro only to produce summary statistics for Table S1.

In vitro DR analysis

Request a detailed protocol

The in vivo-validated IR signatures were screened using the largest public database of in vitro drug signatures (Subramanian et al., 2017) via the clue.io resource (version 1, 2020). These drug transcriptional signatures were generated in nine cell lines, and while each cell line captures some unique signals from each compound (Baillif et al., 2020), part of this will be noise, reflecting the small sample size (typically n = 3). Therefore, we used aggregated signature matching across the nine human cell lines (PC3, VCAP, A375, A549, HA1E, HCC515, HT29, MCF7 and HEPG2) to both deliberately reduce the influence of cell line-specific effects (Xu et al., 2018) and to increase the sample size ninefold (Subramanian et al., 2017). Active compounds were those with scores exceeding ~10th percentile of positive and negative scores (Appendix 1—figure 3), a value that represents the mean threshold (±1 standard deviation) of scores exceeding the assay scoring threshold (Subramanian et al., 2017). The use of aggregated scores across cell types was validated using an extensive validation process (Figure 1). In addition to our IR-DR assay, we considered two additional in vivo RNA models. One is a novel 141-gene human-derived muscle growth signature (Stokes et al., 2020), which demonstrated – as expected – that the clue.io database contains a sizeable number of compounds known to inhibit cell growth (negative-scoring compounds, Appendix 1—figure 2 and Appendix 1—figure 3). The second was an IR-adjusted longevity-associated signature (Timmons et al., 2019), which identified relevant drug matches from the cell-line perturbagen database (Appendix 1—figure 2 and ‘Discussion’). The output from each assay was a list of >2500 DR scores, each assigned to a particular assay ID and drug name. There then followed a laborious manual annotation process, reflecting that study of drugs is challenging (Christmann-Franck et al., 2016), and annotation errors populate all databases, including iLINCS. For all of the active compounds, we carried out a manual check to ensure that compound labels in CLUE were verifiable in Chembl (Mendez et al., 2019), and that both were consistent with the data deposited in the small molecule suit (Moret et al., 2019). The manually confirmed data progressed to the next phase of the analysis.

Characterisation of active compounds

Request a detailed protocol

Active compounds belonged to a wide range of distinct pharmacological classes (Appendix 1—figures 4 and 5). To identify if positive and negatively acting compounds (from IR-DR score) could be easily distinguished from each other, we calculated simple molecular descriptors. A set of 2837 compounds for which in vitro results were available was considered, and chemical structures were extracted from the CLUE database (https://clue.io/) as SMILES strings. These were parsed with RDKit (version 2020.03.6), with 14 compounds failing to parse correctly. For each compound, a set of 13 physicochemical descriptors was calculated with RDKit including molecular weight, heavy atom count, number of heteroatoms, LogP, number of rotatable bonds, topological polar surface area (TPSA), number of rings, number of aromatic rings, number of saturated rings, number of aliphatic rings, Balaban’s J index, number of hydrogen bond donors and number of hydrogen bond acceptors. Positive acting compounds were coloured in orange, and negative acting in blue (Appendix 1—figure 6). For each compound active against list 3A IR-DR signature (Appendix 1—figure 3), we identified their protein targets using a variety of resources (Christmann-Franck et al., 2016, https://www.ebi.ac.uk/chembl/, https://clue.io/, and https://pubchem.ncbi.nlm.nih.gov/). The small-molecule suite was used to extract laboratory-derived potency values against each protein (https://labsyspharm.shinyapps.io/smallmoleculesuite/). Very few compounds are profiled against the majority (>300) kinases. For those that had values, in vitro potency values (log10) were plotted against the IR-DR scores and Pearson’s correlation coefficients calculated in Excel, allowing us to establish the relationship between potency against a protein and the IR-DR score. Very extensive manual PubMed searches were then undertaken to characterise the positive and negative acting drugs using the terms ‘drug name’ ‘alternative drug name’ with the following terms used in sequence until either relevant publications were identified or no hits were obtained; insulin, diabetes, obesity, dementia, Alzheimer’s disease, COVID-19 and inflammation (during 2020).

Single-gene manipulation and network biology

Request a detailed protocol

The in vitro drug signatures in clue.io are the most robust data from that project (Subramanian et al., 2017). However, the database also contains gene KD (n = 3799 genes) and OE (n = 2161 genes) data, making it possible to link the IR-DR signature with specific protein targets. The KD or OE activity score was again averaged across 6–9 cell lines, and we considered protein targets derived from the known targets of the active compound list (to limit the known higher false-positive rate with the KD/OE data; Subramanian et al., 2017). As stated in the results, >15% of IR-active drug targets yielded a significant IR-DR score. In comparison, a total of 459 genes were associated with an IR-DR score at a threshold of >70 or <-70, representing 265 positive associations (mean = 85) and 194 negative IR-DR scores (mean = –83), equalling only ~7% of all 5954 genomic assays. The pathway biology of the 73 independently validated protein targets was analysed using Metascape (Zhou et al., 2019), along with the IR-DR signature (Table 1, Appendix 1—figure 2), where ‘Combo_sig’ represents the IR-DR pathways, and the 73 genes split into two lists depending on how KD or OE impacted on the IR-DR signature match (Table S4). For the data plot, edges represent connected GO biological processes (>0.3), and nodes within each cluster are coloured/named by their most statistically enriched GO term (Figure 3B). Each node is presented as a pie chart, scaled in size by the total number of terms represented by that (top-scoring) ontology, and with the ‘slices’ coloured to indicate from which gene list the terms originate. The network structure is separately colour-coded (Figure 3C) by list membership to identify when drugs directly target IR-DR assay pathways (red); when KD or OE genes negatively correlated (blue) with the IR-DR score (inhibition yields a positive or OE yields a negative IR-DR score) or positively correlated (green) with the IR-DR score.

Drug-target interaction prediction with DeepPurpose

Request a detailed protocol

To investigate the mechanisms of action of positively acting IR-DR compounds, we extended the existing pharmacological data with computationally derived drug-target interaction (DTI) predictions. Publicly available chemogenomic databases are very far from complete, and therefore, ML modelling approaches can be used to provide estimates for missing data. DL-based models have shown promise in this context (Gaudelet et al., 2021; James et al., 2020). We used DeepPurpose, a DL library for DTI prediction (Huang et al., 2021) that takes as an input SMILES of the small molecules of interest and the amino acid sequences of the protein-coding genome. Different encoders were implemented to provide a compound and a protein embedding. The small molecule and protein embeddings are concatenated and fed to a multi-layer perceptron that predicts the binding affinity as a dissociation constant (Kd). DeepPurpose provides a set of pre-trained models that can be used ‘off-the-shelf’. We used 14 pre-trained models that were available as of 01/12/2020. Those models differ from one another depending on the encoders and on the DTI training set. Drug encoders included convolutional neural network (CNN), daylight fingerprints, Morgan fingerprints and message-passing neural network (MPNN), while protein encoders amino acid composition (AAC) and CNN were used. The training sets were BindingDB (Liu et al., 2007), DAVIS (Davis et al., 2011) and KIBA (Tang et al., 2014). A list of the models used is shown in Table S9. All DTI models described above were applied to obtain 14 rankings of 19,211 human proteins as potential targets for each compound. A final score was obtained by an average ranking of each protein across 14 models, with the final top-ranking targets predicted to be the most likely protein targets of the input drug list. Comparable consensus-oriented strategies are often applied in virtual screening to exploit the strengths of multiple models (Gaudelet et al., 2021; James et al., 2020) and achieve improved performance (Palacio-Rodríguez et al., 2019; Perez-Castillo et al., 2017). DeepPurpose models showed promising performance in various testing scenarios, and we refer to the original publication for further details. The code used for the entire analysis can be located in the supplemental document.

Appendix 1

Appendix figures (Appendix 1—figures 111).

Appendix 1—figure 1
Network analysis statistics from MEGENA.

Analysis of the combined ‘disease’ insulin resistance (IR) and ‘treatment’ response IR genes using tissue gene expression data (Timmons et al., 2018; Timmons et al., 2019) identified those genes that had the greatest connectivity in tissue. These values were used to re-rank and select the top 60 upregulated (positively correlating) and 60 downregulated (negatively correlating) genes regulated in common in adipose and muscle tissue (see Methods).

Appendix 1—figure 2
Network-based analysis of the gene lists from Table 1 analysed using protein-protein interaction data.

A notable characteristic of these gene lists was that prior knowledge using multi-omic interaction databases (http://apps.broadinstitute.org/genets) was unable to connect the members of the list, indicating that previously unknown information was contained within each list and our novel analysis. There were <3% overlap of genes across the signatures considered in this study (Table S1) and only two genes common between signatures 3A and 4 (INSR, GRB14), and these were not essential for the performance of the RNA-based insulin resistance-drug repurposing (IR-DR) signature. See Figure S3 and ‘Discussion’ for reference to additional human signatures beyond IR.

Appendix 1—figure 3
Distribution of scores from clinical input signatures.

Using the input gene lists from Table 1 and the CLUE dataset of >2500 compound signatures, the degree of match (-100 to +100) was established across nine cell types. (A) Distribution of scores was plotted and demonstrates that the majority of compounds are not significantly active. (B) For comparison, it can also be observed that a human muscle in vivo pro-growth signature (Stokes et al., 2020) yields an excess of negatively acting compounds, confirming as expected that the drug database may be biased for anti-tumour/anti-growth compounds.

Appendix 1—figure 4
Breakdown of the drug classes of positive scoring insulin resistance-drug repurposing (IR-DR) compounds based on their known primary pharmacological actions.

From >250 active compounds, 140 were associated with a positive action on the IR-DR signature and thus predicted to treat IR. Using the pre-assigned pharmacological descriptors of each compound, they were groups into general classes of compound and represented by a pie-chart. Over 45% of the compounds were classed as kinase inhibitors, while the remaining positively acting compounds belonged to a broad range of pharmacological classes

Appendix 1—figure 5
Breakdown of the drug classes of negatively scoring insulin resistance-drug repurposing (IR-DR) compounds.

From >250 active compounds, ~100 were associated with a negative action on the IR-DR signature and thus predicted to aggravate IR. Using the pre-assigned pharmacological descriptors of each compound, they were grouped into general classes of compound. 17% of the compounds were classed as tubulin inhibitors, while the remaining compounds belonged to a relatively narrow range of pharmacological classes including compounds associated with activation of pro-inflammatory pathways and disruption of cell cycle.

Appendix 1—figure 6
Calculated physicochemicals do not distinguish between positive and negative acting drugs.

Calculated physicochemical descriptors (RDKit) were used to compare positively or negatively acting on the insulin resistance-drug repurposing (IR-DR) signature for systemic properties that might contribute to assay score (MolWt, HeavyAtomCount, NumHeteroatoms, MolLogP, NumRotatableBonds, TPSA, NumAromaticRings, NumSaturatedRings, NumAliphaticRings, RingCount, BalabanJ, NumHAcceptors, NumHDonors). The results were plotted using mean and standard deviation.

Appendix 1—figure 7
Venn diagram overlap of putative drug targets.

Comparison of drug signature, estimated protein targets of active drugs and knock-down (KD) and overexpression (OE) resources in CLUE that could be cross-compared with the insulin resistance-drug repurposing (IR-DR) signature.

Appendix 1—figure 8
Contrast between gene list and pathway-level connections.

Comparison of insulin resistance-drug repurposing (IR-DR) drug signature genes with 73 knock-down/overexpression (KD/OE) validated proteins from the positively acting IR-DR drug list. This demonstrates that while (A) no individual validated protein targets were in the drug repurposing signature, many belonged to pathways that contained the known drug targets (B). Blue lines connect common pathways.

Appendix 1—figure 9
DeepPurpose-based target protein predictions.

(A) Identification of predicted positive mediators of a positive insulin resistance-drug repurposing (IR-DR) score agrees with the pharmacological analysis. (B) Identification of predicted negatively acting proteins, diminishing the strength of a positive IR-DR score, or cancelling out any positive activity, agrees with and extends the known pharmacology.

Appendix 1—figure 10
Exploratory analysis of predicted protein target affinity.

(A) Drugs that positively or negatively associate with the insulin resistance-drug repurposing (IR-DR) score applied to compounds with limited or no existing pharmacological data (other than for epidermal growth factor receptor [EGFR]). (B) Correlation between fusion rank score and IR-DR assay score for individual protein targets. For example, the more potent the predicted action was against MAP3K19 (smaller value) the poorer the IR-DR score was.

Appendix 1—figure 11
Transcripts related to HOMA2-IR, independent of donor chronological age, in muscle and adipose tissue.

Primary analysis relied on identifying RNA that tracked with HOMA2-IR in muscle as this tissue represented the largest number of independent data sets – for both fasting tissue status and response to lifestyle intervention. Thereafter the candidates identified in muscle were examined in adipose tissue. The statistical ‘significance’ of the relationship (e.g. FDR < 5%), the magnitude and direction of the linear relationship (correlation coefficient) all informed the final selection of marker genes. As can be observed, in this analysis, the fasting HOMA2-IR genes and the treatment response HOMA2-IR genes represent a largely independent pool of genes (only VCL, GSTO1, SEC31B, FERMT2, OGFOD3, CENPV and NDUFAF5 were common to both states).

Data availability

The gene expression data sets are available at GSE154846, GSE58559 and GSE70353.

The following previously published data sets were used
    1. Timmons JA
    (2018) NCBI Gene Expression Omnibus
    ID GSE154846. META-PREDICT: Dynamic responses of the global human skeletal muscle coding and noncoding transcriptome to exercise.
    1. Civelek M
    (2017) NCBI Gene Expression Omnibus
    ID GSE70353. Subcutaneous adipose tissue gene expression from men that are part of the METSIM study.
    1. Josse AR
    2. Timmons JA
    (2016) NCBI Gene Expression Omnibus
    ID GSE58559. Human adipose tissue profiled before and after 16 weeks of intense exercise training with a modest energy deficit.

References

Decision letter

  1. Mone Zaidi
    Reviewing Editor; Icahn School of Medicine at Mount Sinai, United States
  2. Carlos Isales
    Senior Editor; Medical College of Georgia at Augusta University, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Decision letter after peer review:

Thank you for submitting your article "A human multi-tissue and dynamic transcript signature enables drug repurposing for metabolic disease" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Nancy Carrasco as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Reviewer #1:

Pure drug repurposing (e.g. moving a treatment for cancer into clinical evaluation in diabetes) is a controversial yet attractive path toward increasing our treatment options for diseases in need of effective therapies. Critics point out that this type of 'true repurposing' is rarely successful. The concept remains popular given the cost, time and risk associated with de novo drug development. Methods continue to evolve and the study by Timmons et al. advances a computational approach that aligns the transcriptional signatures associated with insulin resistance (a common pathogenesis of multiple diseases) to the reported transcriptional responses to drug exposures.

Reviewer #2 (Public Review):This article provides insights into drug repurposing for metabolic disease.

https://doi.org/10.7554/eLife.68832.sa1

Author response

Essential revisions:

1) The paper is relatively hard to follow with the sequential description of figures, supplemental figures, tables and supplemental tables occupying multiple sections of the manuscript. Most of these figures and tables are not highly informative and required significant effort to relate back to key points being made in the manuscript. The authors would be well served to attempt to make their figures more descriptive.

The article is formatted in the “results-discussion” format due to the complexity of the topic, the broad range of pharmacological properties to discuss and the multiple disciplines used in generating the data for each subtopic. We would like to emphasis the section headers as important information that help walk the reader through the article. We have made significant efforts to edit and simplify the manuscript.

The figures are data-driven and not illustrations and as such the format is often determined by the underlying raw data and clustering methodologies. We assume Figure 1 is acceptable as it provides an over-view to the methods and flow of data through the analysis steps.

We have modified Figure 2 to make it more descriptive, and to deal with a misunderstanding regarding our presentation of compound physico-chemical data (which is now placed in the supplement, as those data are ‘negative’ control data). [THIS IS NOW FIG S6]

Figure 3 [THIS IS NOW FIG 2] is complex, but then it is a summary of the entire biological pathways targeted by all positive acting drug targets and it traverses many biological processes. It is however a very convention data plot for these types of data. It demonstrates that the in vivo IR genes do not always reside with pathways targeted by the drugs (based on known protein targets and pathway membership). Critically, the 73 protein targets were individually validated as being active in the IR screen using the RNAi data. Figure 4 and 5 [NOW FIG 3] are parallel and relatively simple drug vs target plots and we provide clearer titles. One set derived from laboratory pharmacology data (blue) and the other replicating the general wet-lab findings using a deep learning model of drug target predictions (purple).

Apologies, we do not understand what is confusing about either Table 1 or Table 2.

2) A major issue is the free-flowing nature of the manuscript's description of the data signatures, drug signatures, experimental outcomes and speculation around relevance of each of these elements. The section describing the potential role of PKC is a good illustration where the authors implicate the role of the various PKC isoforms as possibly playing a role as either promotors or suppressors of IR. They highlight the scores of two PKC activators (both based on the naturally occurring phorbal esters scaffolds) and two PKC inhibitors (both based on the bisindolylmaleimide family of kinase inhibitors). The latter (the inhibitors) show divergent outcomes in the IR-DR scoring system yet somehow seem to reinforce the authors interpretation that PKC is a highly interconnected target within the broader world of insulin resistance. There is a lot to unpack in all this including the broad polypharmacology of the bisindolylmaleimides and a vast literature involving PKC as metabolic diseases. However, the authors move past the potential relevance without any real examination and notably dismiss the highly divergent outcome of the two PKC inhibitors. This is an odd choice as it immediately causes the reader to question the value of the authors up-front assumptions and all of the outcomes that follow.

We do not agree that there is a great deal of speculation – we focus on explaining the known pharmacology of the positive acting compounds, illustrating a novel work-flow to explore their selectivity.

The representation of our PKC data inaccurate however we recognise we contributed to this confusion with some of our descriptions. We also feel that the placement of the section of the PKC text was suboptimal – as PKC topic was not meant to represent a benchmark for our work – and have revised that aspect of the article.

To address their specific concerns about PKC.

Firstly, we demonstrate that RNAi directed against specific isoforms produce some expected positive and negative IR assay scores, while there are no isoform selective PKC drugs to model specificity. We show that “knockdown of PKC-β (+74) and PKC-theta (+97) yielded positive IR-DR scores; while loss of PKC-α (-75) and -Eta (-75) produced poor IR-DR scores and overexpression of PKC-α was positive (+85).”.

This data is very striking and an important contribution to the PKC field. These isoform specific assay scores are consistent with the field of work on PKC in Diabetes (See citations within Roden and Shulman, 2019). So, for example inhibition of PKC-β and -theta appears beneficial while loss of -α is detrimental. We complete the comment on the known roles of PKC isoforms by referring to PKC-δ’s detrimental role. These data are novel and clearly supportive of the importance of our IR-DR model. These data also explain why predicting the net activity of non-specific PKC inhibitors is challenging.

Regarding the pharmacology – in total from >2500 drugs, only 27 are listed as targeting PKC, 4 of which are listed as activators. Unfortunately, most of the so-called PKC inhibitors demonstrate poly-pharmacology, targeting proteins beyond kinases and most are inactive in our assay so there is no ‘pharmacology vs selectivity’ pattern to model (or contradict our other PKC data). Still, the pharmacology of the active compounds related to PKC were not contradictory (the available compounds do not have good isoform specific properties). The four compounds credited with global “PKC activator” properties, all score negatively in our assay, including the two we mentioned (Prostratin, Ingenol, phorbol-12-myristate-13-acetate and thapsigargin (non-significantly negative, -57)). There is no good isoform specific data but it is quite plausible that excess α or eta activation is sufficient to yield a negative score.

For the well described PKC inhibitors, one inhibitor was active, another was inactive (not negative-acting as the reviewer implies). Bisindolylmaleimide has ~0.2uM activity against -α, -δ, -theta, -eta and -epsilon and yields a net positive score (+87). One rational for its net positive score is simple; it inhibits more of the negatively acting PKC isoforms, on balance, in the cells studied. The related high molecular mass Bisindolylmaleimide IX was inactive. It also targets -α, -eta, -iota, -δ, -theta and -γ in vitro but with far greater (<10nM) potency but is also a <10nM inhibitor of other kinases. A simple explanation why Bisindolylmaleimide IX was inactive in the IR-DR assay is that it targets a wider range of ATP binding sites in a variety of kinases. Either way there is no contradiction in the scores, as there were no score reversals for PKC inhibitors.

There is no doubt that studies have revealed that “PKC is a highly interconnected target within the broader world of insulin resistance” and we would direct you to the work of Shulman et al., and other laboratories as the basis for that statement. It would be churlish of us to not cite this body of work as our isoform specific RNAi analysis significantly adds to the IR literature as we link the isoform knock-down to a robust novel human assay for IR. Our quote of literature reports that “Bisindolylmaleimide is a potent PKC-δ inhibitor” is however a mistake (as it is equally active against other isoforms) and have amended that statement along with moving the PKC section to page 7.

3) Papers that provide new hypothesis-generating data (and methods) are great. But this work also purports to identify immediate options for patients. The sections on PKC and TOR do not advance the goal of highlighting translationally promising outcomes from this study.

We agree, and they both belong to the category of chemicals previously associated with insulin signalling and insulin resistance rather than key examples we pick to illustrate a pathway of translation to the clinic.

As the reviewer recognises in their following comment (4) we focus on the class “EGFR inhibitors” – but this is because we had sufficient data (range of active and inactive compounds) to form some position on potential mechanisms of actions; while independent laboratories have chronically doses some of this class, to demonstrate efficacy in models of metabolic disease. It would seem that lower-dose members of this class have obvious potential utility at reversing IR especially in shorter term clinical conditions (e.g. viral/inflammation induced severe IR) and this hypothesis could be tested immediately, as trial-ready pharmacological tools exist.

4) The authors ultimately land on EGFR inhibitors (mostly) and spend much of the paper trying to deconvolute the role of each agents polypharmacology and potency as a contributing factor in the overall score in the IR-DR scale. Ultimately highlighting that 'multiple protein targets of EGFR inhibitors explain the IR-DR assay performance'. Mostly this section names the proteins and refers to a published association with some form of relevant biology. Again, this type of result is categorized as 'hypothesis generating' not an explanation.

We do not agree with this representation of our approach or the implications made.

Our manuscript cannot be an in-depth analysis of every class of chemical identified as we have >200 active drugs. We focus on the potential of large-scale OMIC driven pharmacology to produce detailed quantitative relationships within a large class of chemical inhibitor and their protein targets which is why we ‘land’ on EGFR. It is only possible to contrast the OMIC driven scores when there is matching large scale primary in vitro pharmacological potency and plentiful selectivity data (i.e. off target data). This was plausible for EGFR inhibitors and the sort of quantitative relationship we present is highly novel.

In parallel to our project (running from 2017 to date) independent laboratories have demonstrated efficacy of selected EGFR inhibitors in a number of animal models of metabolic disease (See manuscript citations). Their recently published work was discovered during the literature analysis of our unbiased modelling and does not represent “hypothesis generation” but rather represents independent confirmation that illustrates a degree of potential efficacy. Critically, our entirely independent and human data-driven evidence provides novel connection to this class of drug, which indicates it merits clinical evaluation in metabolic disease.

The most efficient way to progress the use of this new human-data driven knowledge is to publish it. If we were a pharmaceutical company, and had >$10M to spend, we would re-screen and progress multiple analogues in vivo; and target of every combination of the protein kinases we rank within the EGFR analysis, to clarify specificity. It is also worth noting that our use of transcriptomics proved more impactful/actionable than the comparable high profile GWAS studies – which also claim to provide targets for treatment in metabolic disease.

5) Overall, the field of drug repurposing would benefit from new methods. Pharmacological transcriptional signature manipulation has real potential. But in its current form the manuscript doesn't organize its methods, data or outcomes in a manner that provides the reader confidence in the method or outcomes. Further, the high degree of speculation and the frequent use of literature association to derive conclusion is ultimately a distraction from the potential findings.

We agree that ‘pharmacological transcriptional signature manipulation has real potential’.

We present the data in a “results and discussion” format because we have >10 nominal classes of drugs and their numerous targets. To scatter the multiple levels of new data and supportive evidence across the manuscript would seem to us to make the article more difficult to assimilate, not less. Sadly, multi-disciplinary projects do not necessarily benefit from traditional publishing formats.

We do not agree with the critic of our methods – we provide all relevant methods in a substantial methods section and provide the computational code used to implement the analysis, as well as a list of the primary data files used and a full table of the results. The placement of the methods section is dictated by the journal format.

We do not recognise the claim that we include a ‘high degree of speculation’. It is not speculation to demonstrate a direct quantitative relationship between drug potency (Ki) and OMIC assay score. It is not speculation that most OMIC drug repurposing articles present a detailed examination of just 1 drug and fail to establish if the input transcriptional signature provides a substantial number of true positives – while we provide a set of results that contains a substantial number of true positives. It is not speculation to rely on entirely independent in vivo and clinical data to indicate that a compound from our analysis already demonstrates in vivo efficacy. It also seems unreasonable to criticise use of pre-existing in vivo efficacy from the literature as if this is a weakness. We have made numerous stylistic modifications to simplify the text and improve the flow of each section.

6) There were several instances where the authors implicated the possibility that alignment in 'physiochemical' properties drive the in vitro activity that they note. First, the authors are certainly referring to 'physicochemical' properties which are descriptors of a small molecule which govern how small molecules interact with biological constructs like membranes and proteins (TPSA, MW, LogP, etc). Next, they pass by their analysis entirely – simply saying its not the issue (they present a tSNE plot in Figure 2A but don't explain it). It is highly unlikely that these agents are working through a non-specific pharmacology that is driven by physicochemical properties. But if the authors are going to introduce the concept they have to explain it.

We apologise for the confusion created; we do NOT propose that ‘physicochemical' profiles explain our cell-based activity scores. Quite the opposite.

It is well accepted that simple caveats (such as drug solubility or cell penetrance) contribute to variation in high through put screens and impact on compound potency. We considered the ‘physicochemical' profiles of the ‘active’ vs ‘inactive’ compounds (or positive or negative scoring) to examine if any simple unfortunate bias related to the observed pharmacology. Not because we believed that this was a factor (other attempts to look at this have also considered it a to have limited influence on the OMIC driven results) but rather just as part of a standard quality control process of looking at this type of large scale pharmacology data. We have modified the text to clarify this point.

7) The authors place great importance on using data from multiple tissues, more specifically – adipose and muscle. However, it is not clear how significantly the pool of the reported compounds was reduced, relative to the union of the compounds that could be detected if either one tissue was used. And more importantly, would these compounds display similar properties and statistics as the reported list of compounds?

The project aimed to compare the utility of a ‘fasting disease’ or ‘treatment response’ signature versus the combination (Table 1 in the manuscript) and used much larger (and far more costly) GWAS data as one comparator.

The reviewer raises a very interesting additional line of investigation, that would require additional adipose biopsy treatment responses studies (to match the number of muscle treatment response studies). This would allow comparable muscle and adipose signatures for comparison. Instead, by combining all our available tissue treatment responses data we focus on the true-positive within the OMIC data while this ignores tissue-specific processes. The fact that there are many common features is also a very novel observation.

8) At the very least, the "disease signature" of 337 genes should be put in the perspective of the total number of genes differentially expressed in adipose or muscle tissues. A figure similar to Figure S6 or a funnel diagram would greatly help in following the flow of the experiment.

We agree and we have produced the Venn diagram as request and placed it within the method description (Figure S13).

9) Compounds that negatively regulate the IR-DR signature (Figure S4) include HDAC inhibitors; however, beneficial effects of HDAC inhibitors in terms of IR have been demonstrated in multiple settings, including in vivo models and clinical trials (Lee et al. 2020) (Sharma and Taliyan 2016) (Dewanjee et al. 2021). Therefore, it's not clear why HDAC inhibitors may aggravate IR, according to negative scores. It would be interesting to investigate, if this result may be explained by the off-target effects of compounds in this list.

We agree with the reviewer that HDAC inhibition is an interesting area for metabolic disease.

We and others have recently reported that low-dose HDAC inhibition, including HDAC3i, has a beneficial profile in neurodegenerative disease models(Janczura et al., 2018; Sartor et al., 2019). We have come to the general conclusion, based on those studies, that lower intermittent exposure (due to limited t/2) of HDACi was potentially the key. This and the high concentration of HDACi in the cell studies has implications for the present IR study.

As you mention (Lee et al., 2020) it was reported that MS-275 (Entinostat, Class I) positively treated IR in the high-fat fed model yet we find that high dose Entinostat yields a negative assay score (consistently with several other HDACi, e.g. SAHA (Vorinostat) a Class I, II and IV inhibitor).

We believe this could relate to the high dose of HDACi used in the large-scale drug-screen – potentially reflecting an inappropriate impact on tubulin biology (majority of TUBB inhibitors also scored negatively) and their general lack of selectivity. However, it may also be more complex. Recent studies by the Schenk Lab, using scriptaid – which was inactive in our assay –illustrated that Scriptaid’s use to supporting the potential for HDAC5 inhibition in muscle insulin resistance, may reflect coincidental non-HDAC activities (Martins et al., 2019). A broader role for the known actions of HDACs in β-cell function add to the complexity. – see updated general discussion.

When carrying out large scale screening, ensuring a high true positive rate is key as it would be intractable to sort through a high false positive rate and completely limit the approach for novel diseases with few bench markers. In that sense our priority is to demonstrate that a high true positive rate is possible, and one that yields some relationship to potency across a family of compounds (e.g. EGFR family). We have updated the manuscript to more clearly emphasise this point (Page 6).

10) The explanation for Alisertib-mediated inhibition of AURKB, which drives a detrimental IR-DR score is controversial since Alisertib treatment has anti-inflammatory properties and significantly reduced HOMA-IR index in the STZ+HFD diabetic mice model (Meng et al. 2020). This result suggests that the positive effects mediated by Alisertib targets, AURKA and EGFR, outweigh the detrimental ones from AURKB inhibition, which is not reflected in terms of IR-DR score. Notably, Imatinib, which lacks a significant IR- DR score, was also capable to improve IR in HFD murine model (Pichavaram et al. 2021). The following observations indicate, that more precise balance is required in terms of compound targets, as in its current setting, the application of scores may lead to the potential loss of candidates for drug repurposing.

While we expect any high throughput assay to generate some false negative results, the aim is to have a high true positive rate as this is the only manageable way to progress ‘hits’ to the clinic.

However, for the two examples provided we do not think these are false-negative results – as both examples include serious contradictions, and neither provide evidence for a direct reversal of IR. Indeed, Imatinib made IR worse in chow fed animals, Table 2, Page 6.

Regarding Alisertib: Meng et al. 2020(Meng et al., 2020) used Alisertib in a model where insulin production was removed with Streptozotocin (STZ) and a HFD fed. STZ ablates insulin production and is not a model of insulin resistance. The reported benefits from Alisertib were also unclear – for example fasting glucose and weight-gain remained unchanged. Alisertib did reduce serum insulin during an insulin tolerance test, but this implies enhanced clearance of injected insulin. HOMA values were also calculated from ITT data (injected exogenous insulin), but this is not a valid use of the HOMA model. An indirect impact on IR by targeting macrophage inflammation was proposed but it is not an insulin resistance model and the efficacy data are contradictory.

Regarding Imatinib: As Imatinib is not an EGFR inhibitor we did not consider from the perspective of targeting EGFR – but rather it targets a kinase that some EGFR inhibitors also targeted. Indeed, Imatinib was reported to improve vascular responses to insulin (CVJ et al., 2021) and this is ascribed to PDGF signalling.

In the study highlighted by the reviewer, Pichavaram et al. 2021, vascular hyperplasia was studied in a high fat fed mode of insulin resistance. Imatinib (25mg/kg daily IP) was dosed 3 weeks into the 6-week high fat diet (HFD) and insulin was studied at the end of the intervention. Imatinib prevented HFD weight gain (a phenotype driven by greater calorie intake over controls), yet Imatinib was reported to neither change food nor calorie intake but yet ‘abolished’ the HFD feed efficiency calculation indicating an inconsistent data set.

Nevertheless, Imatinib reversed the HFD weight gain first, and this led to lower fasting insulin (after the prevention of weight-gain). This is not direct reversal of insulin resistance, but a ‘removal’ of the experimental intervention that helped cause insulin resistance. Interestingly, Imatinib ~ doubled fasting HOMA-IR from 1.5 to 2.89 in the control chow fed animals, indicating that it does not directly improve insulin sensitivity.

129 – "because most drugs act systemically" – Although that is true, a drug's action in different tissues is not always uniform. See PMID:26626077 for reference. The authors are advised to discuss the issue of off-target tissue-specific interactions of the method used in their project.

The present project is not designed to examine tissue specific negative interactions. As mentioned above, to build additional tissue-specific models additional human clinical intervention data is needed to make robust statistical models for those conditions.

158 – "We checked whether the 254 active compounds had any simple physicochemical properties driving in vitro activity" – This sentence is followed by the biological annotations of the compounds, although their common chemical properties are just as important. Although the reader is advised to inspect Figure S5, this is a matter that deserves more coverage in the main text. Do these 254 compounds share any common chemical properties? Can they be separated into clearly defined structural clusters? How many of them comply with Lipinski's rule? Are any of these compounds protected by a patent? Although Figure 2A displays clusters of compounds grouped by chemical properties, it is unclear if any of these clusters can be characterized by certain chemical properties.

The analysis presented in original Figure S5 [NOW FIG S6] indicates that positive or negative activity is NOT associated with any of the chemical features (some of which relate to Lipinski’s rule of 5 (which also inferences oral bioavailability, something not captured by a cell based in vitro screen). We are unsure why the reviewer would like the relationship between patent status and drug action discussed.

Further, the original Figure 2A [NO LONGER INCLUDED] shows no pattern that segregates active drugs from those that were inactive. While interesting, a more in-depth analysis of the relationship between active drugs in cells and physicochemical properties should really be based on multiple bio-assays and represents a focus and project out-with the aims of the present study. The present physicochemical analysis was simply there to ensure that no obvious simple confounding properties distinguish active drugs from inactive (Figure 2A), nor positive from negatively scoring active drugs (Figure S5). The data presented indicate there were no obvious confounding physicochemical properties – to limit confusion we have moved both data plots to the supplement. We have revised the text to make this analysis clear and fit better within the flow of the manuscript.

188 – "PKC-δ also increases with age and is associated with age-related IR" – It is interesting to see if the authors tried finding the links between the identified compounds and their potential as geroprotectors? Are any of the compounds listed in a database such as geroprotectors.org? A lot of attention is paid to the metabolic pathways IR-DR molecules may be involved in. Perhaps, the authors should make a small subsection about aging-related pathways, such as for example, NAD-salvage.

This is an interesting suggestion. We examined this idea by contrasting a robust multi-tissue human age gene expression signature (of the same size as our IR signature) using the same methods as employed in the present study. This age signature represents one that responds in primary cells to drugs, shown to induce an increase in life-span in model systems (Timmons Age Cell, 2019). We then examined the response of each of our significant IR active drugs and found 64 of the IR drugs were also able to regulate the Age signature. Somewhat predictably, the vast majority belonged to the IGF1/AKT/mTOR axis. All but 1 EGFR compound was inactive in the ‘age’ assay. We have referred to the idea of using multiple bio-assays to triangulate or prioritise compounds in the final discussion.

433 – "gold-standard lifestyle intervention" – The authors are advised to briefly describe the lifestyle IR treatment. Is the identified "treatment signature" of 148 signatures characteristic of IR people only? Do healthy controls under this kind of intervention experience similar expression shifts?

Revised as requested (description of lifestyle treatments). Please not that IR is a continuum rather than a classification, and treatment responses are also on a continuum. We integrated data from a wide range of life-style intervention protocols applied to sedentary and/or over-weight individuals. Our work on this topic, so far, indicates that molecular markers for the variable outcome response can be agnostic to precise life-style protocol. Indeed, our focus was to identify genes that contribute to treatment responses irrespective of the precise life-style protocol as these are more likely to be general regulators of metabolism.

456 – "We averaged signature matching across the 9 cell lines" – It is implied that all cell lines are equal, although some of them are probably more similar to the adipose/muscle domain of the study.

Similarity between cell lines and primary tissues can only be true to a point, however our interpretation of the present analysis is that the cell lines – rather distinct from mature muscle or adipocytes – do reflect the major biochemical processes sufficiently to capture/represent relevant insulin signalling pathway responses. Critically, averaging across all cell types helps remove cell specific noise/responses and increases the sample size – providing the most statistically robust and cell agnostic profile possible. We think it is, from a statistical perspective, an error to look at single cell type data with the existing assay format.

Thank you for resubmitting the paper entitled "A human multi-tissue and dynamic transcript signature enables drug repurposing for metabolic disease" for further consideration by eLife. Your revised article has been evaluated by a Senior Editor and a Reviewing Editor. We are sorry to say that we have decided that this submission will not be considered further for publication by eLife.

[Editors’ note: The authors appealed the second decision. What follows is the authors’ response to the second round of review]

– The whole manuscript is dependent on the assumption that the IR drug response signature derived from the Connectivity Map can be generated by averaging data across 9 cell lines (listed on page 15 of the resubmitted manuscript: PC3 and VCAP are prostate cancer cells. A375 are melanoma cells. A549 is an epithelial lung cancer line. HA1E is an epithelial kidney line. HCC515 is a lung adenocarcinoma. HT29 is a colon adenocarcinoma. MCF-7 is a breast carcinoma line. HEPG2 is a hepatocellular carcinoma.).

This is an unfortunate misrepresentation of the project. We do not assume what the reviewer is stating. In fact, the reviewer is stating part of our research hypothesis. We were evaluating if a novel multi-gene DR model, developed from our extensive clinical studies, can match relevant drugs signatures with a high true positive rate compared with alternative signatures generated using alternative OMIC strategies (as one set of controls).

The assumption is that the transcriptional responses to a class of drugs (say EGFR inhibitors) can be collected from each of these divergent cell types and worked into a set of common transcriptional responses and subsequently we assume that all cells respond similarly when exposed to that drug class. Doesn't this require that the 9 cell lines express EGFR (the correct isoforms) and/or the correct off-target kinases that are contributing to the signature? Even a cursory review of the literature will tell us that A549 and HepG2 are known to overexpress and have a growth dependence on EGFR. Inhibiting a target that a cell line is dependent on will have broad transcriptional consequences relative to inhibiting the same target in a cell that has no dependency.

We used the term “averaging” to convey a simple idea and have now reworded it to “aggregate” across cell lines as this might help clarify any misunderstanding on the influence of ranking across cells.

Firstly, the cell normalised scores considers the tendency for a cell type to respond more or less frequently to all drugs. Scores are summarised using a method that would not be influenced by a few cells with negative responses as the aggregation method uses the maximum quantile statistic from the within cell line normalised scores. Thus a few inactive cells don’t influence the outcome.

Secondly, rather than being an assumption, our research strategy stated up front that we were seeking cell agnostic drug responses. We also reflected on the fact that moving from n=3 to n=27 replicates per drug adds robustness (based on the published analyses of the database).

From first principals each drug targets multiple proteins, that contribute to its net activity, one cannot reject or include individual cell lines in the manner described by the reviewer; nor does it require the exact same conditions for each protein per cell type for several reasons. We narrow the potential targets using three distinct methods – kinase panels with clustering, deep learning predicted target ranking and the RNAi data from proteins known to be bound by active drugs and this modelling and the discussion illustrate that these choices were valid.

For example, we show that EGFR inhibitors are not “just” EGFR inhibitors, and thus the net response per cell can’t be predicted just based only on EGFR status (we find up to 40% shared variance but as we illustrate this can be due to shared potency relationships across related kinases not just EGFR, as primary potency vs EGFR co-varies with some other related kinases).

– Insulin responses in the key tissues/cells which are susceptible to disease causing resistance mechanisms we hope to abrogate are dependent on multiple, diverse and complex autocrine, paracrine and endocrine factors which are almost certainly not present in cell cultures. How do the authors account and correct for the lack of these critical elements of the insulin response? Ultimately, the IR-DR scoring system relies solely on cell intrinsic factors. Possibly this is enough, but the authors would need to demonstrate that fact.

The reviewer appears to be overlooking our results, while reflecting on some valid but ultimately incorrect assumptions. Of course, we agree that in vivo metabolic signalling is complex. However, it’s not possible to screen millions of compounds in vivo in animals (nor desirable). Steppingstones are needed and we demonstrate that the present novel IR assay, and our approach, selects a substantial number of diversely acting true positive compounds. The broader context is that there is no other valid cell-based insulin resistance screen published at this time.

Thus, our results demonstrate that the approach is in fact useful, and the preconceptions mentioned, while very understandable, are wrong. We find that up to 50% of our drugs and drug families (Table 1) are functionally linked to modulating insulin signalling and treating metabolic disease, many in vivo. Critically, we also demonstrate a quantitative relationship between binding affinity and multi-gene score illustrating the potential utility to facilitate drug design – such as those used by companies employing AI multi-target drug design.

– The authors do highlight studies that suggest EGFR inhibitors have a positive effect on various IR related disease states. When reviewing these studies, the mechanistic aspects are not fully explained so we can't directly compare with the IR-DR signatures posited in this study. Again, it's an interesting hypothesis that IR phenotypes in diverse cells/tissues in organs ranging from the kidney to the pancreases to the liver could all be responding to EGFR inhibitors through a reversion of a transcriptional program uncovered by averaging the transcriptional response nine immortalized cancer cell lines to EGFR inhibition. But that's honestly a big theoretical jump that must be backed up with some data.

We feel that it is odd to refer to a “theoretical jump” when we have clearly done the “jump” and shown that in practice a robust clinical RNA signature identifies a set of diverse classes of compound, many of which directly modulate IR and metabolic disease in vivo.

The fact that a medicinal chemist at one time in their life called some of these drugs “EGFR” inhibitors is unfortunate. We would call them “multi-gene inhibitors targeting the EGFR tyrosine kinase like proteins”.

Evaluation of a drugs mechanism of action (MOA) is secondary connecting a novel clinical IR signature directly with drugs that have efficacy. Selecting an active drug from thousands of options is the key aim of DR, not establishing a specific MOA. That is a secondary aim and we do go on to illustrate that our analysis helps to provide greater detail on how some drugs, which are active in vivo, may be working.

– Essential Revisions 7 Author's Reply: The project aimed to compare the utility of a 'fasting disease' or 'treatment response' signature versus the combination (Table 1 in the manuscript) and used much larger (and far more costly) GWAS data as one comparator. The reviewer raises a very interesting additional line of investigation, that would require additional adipose biopsy treatment responses studies (to match the number of muscle treatment response studies).

New Comment:

Why is it impossible to perform the same kind of analysis without more biopsies? Although the statistical power of the findings will not be the same, there should be significant hits. It is beneficial to demonstrate how much the double tissue approach has allowed shrinking the search space.

The actual unknown here is whether the search space would be increased with a larger adipose data-set (more detectable in common with muscle) and not as the reviewer is implying. This is not a topic we can explore without additional clinical data. We do show how the double tissue approach shrinks the search space by showing the gene list size is reduced by including adipose constraints on muscle (shared genes). Adipose responses to treatment are less well-defined in our clinical studies. But a gene passes the filter only if in both. If we take the adipose list and ‘shrink’ it by the muscle list’ we get the exact same answer- the subset of adipose in muscle.

– Additional Reviewer Comments – 158 Author's Reply: The analysis presented in original Figure S5 indicates that positive or negative activity is NOT associated with any of the chemical features

New Comment:

i) tSNE is highly sensitive to minor changes in initial settings, so there is no association between activity and the position of a compound in this particular tSNE implementation hardly proves the point. There might be a tSNE implementation in which the compounds are neatly separated by activity.

ii) The illustration of Figure S5 is not self-evident. Some might say that there are actually clusters enriched in active/inactive compounds. To demonstrate that there is no association between tSNE position and activity, the observations should be separated into clusters followed by a comparison of odds ratios (for example, using the chi2 test).

These tSNE results should have been placed in the Supplementary section.

We have removed the tSNE plot, as the options are too numerous to systematically explore what the reviewer is referring to. There is no information, within in the current tSNE plot that supports the view that there is physical-chemical bias between active and inactive drugs and thus nothing to direct the exploration of many thousands of permutations.

The purpose of the tSNE plot was to illustrate the properties of the clue.io database and not specifically address our data and so we have removed this plot from the manuscript. Notably, the mean physical chemical properties for the active vs inactive compounds speaks more directly to our data and that plot is retained.

– Additional Reviewer Comments – 188 Author's Reply: This age signature represents one that responds in primary cells to drugs, shown to induce an increase in lifespan in model systems (Timmons Age Cell, 2019). We then examined the response of each of our significant IR active drugs and found 64 of the IR drugs were also able to regulate the Age signature. Somewhat predictably, the vast majority belonged to the IGF1/AKT/mTOR axis. All but 1 EGFR compound was inactive in the 'age' assay. We have referred to the idea of using multiple bio-assays to triangulate or prioritize compounds in the final discussion.

New Comment: This information should be discussed in the main text. Perhaps, other works on data-driven search for geroprotectors and assembling the gene expression signatures of tissue-specific aging should be mentioned in the discussion.

We have cited a key geroprotection drug repurposing article in the discussion however we are over the length limits for the present manuscript format.

– Additional Reviewer Comments – 456 Author's Reply: We think it is, from a statistical perspective, an error to look at single-cell type data with the existing assay format.

New Comment: Adipose and muscle tissues were chosen for being directly involved in energy storage affected by IR. But the cell lines chosen for verification contain epithelium and cancer cell lines. The authors should clearly state the limitations of this methodology. Some truly effective compounds might not be able to generate the desired signature in these cell lines due to their intrinsic differences. While it is presented to decrease the number of "false positive" compounds, this procedure may increase the number of "false negative" compounds. In addition, we would suggest reviewing the popular methods in computational workflows of drug repurposing.

The reviewer rather misrepresents our project with this new comment. We do not “verify” in cancer cell lines, as active scores in these cells per se does not validate the novel input signatures. Validation reflects that compounds we identified include numerous true positives or that they target pathways known to control insulin sensitivity, and thus illustrating that our novel IR clinical input signature is valid. It is important to note our RNA models were successful while other OMIC models were not.

We agree that all assays will yield false negatives and false positives – and we discuss the reasons for this throughout the article and in the limitations of our study. A key property of a good DR assay is one that produces a low rate of false positives while delivering a substantial number of chemically diverse potential leads, as only a finite number of small molecules can be further screened in animal models or taken into clinical development. Unfortunately, space limits us to producing a more comprehensive review of the DR literature but it is something we agree is needed in the literature as not all available approaches are genuinely showing promise.

https://doi.org/10.7554/eLife.68832.sa2

Article and author information

Author details

  1. James A Timmons

    1. William Harvey Research Institute, Queen Mary University of London, London, United Kingdom
    2. Augur Precision Medicine LTD, Stirling, United Kingdom
    Contribution
    Conceptualization, Formal analysis, Funding acquisition, Investigation, Project administration, Resources, Supervision, Visualization, Writing – original draft, Writing – review and editing, Data curation, Methodology
    For correspondence
    jamie.timmons@gmail.com
    Competing interests
    received consultancy fees from McMaster University, and has a majority share of the stock in Augur Precision Medicine LTD. A patent may be processed for the drug screen
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2255-1220
  2. Andrew Anighoro

    Relation Therapeutics LTD, London, United Kingdom
    Contribution
    Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review and editing
    Competing interests
    is an employee of Relation Therapeutics LTD and owns stocks in the company. The author has no other competing interests to declare
  3. Robert J Brogan

    Fiona Stanley Hospital, Perth, Australia
    Contribution
    Data curation, Writing – review and editing, Formal analysis, Validation
    Competing interests
    is affiliated with Augur Precision Medicine LTD. An application for the drug signature may be processed by the lead parties in the project
  4. Jack Stahl

    Center for Therapeutic Innovation, Miller School of Medicine, University of Miami, Miami, United States
    Contribution
    Data curation, Writing – review and editing
    Competing interests
    An application for the drug signature may be processed by the lead parties in the project
  5. Claes Wahlestedt

    Center for Therapeutic Innovation, Miller School of Medicine, University of Miami, Miami, United States
    Contribution
    Supervision, Writing – review and editing
    Competing interests
    An application for the drug signature may be processed by the lead parties in the project
  6. David Gordon Farquhar

    Augur Precision Medicine LTD, Stirling, United Kingdom
    Contribution
    Funding acquisition, Project administration, Resources, Writing – review and editing
    Competing interests
    is affiliated with Augur Precision Medicine LTD and owns stocks in the company. An application for the drug signature may be processed by the lead parties in the project
  7. Jake Taylor-King

    Relation Therapeutics LTD, London, United Kingdom
    Contribution
    Conceptualization, Supervision, Writing – review and editing
    Competing interests
    is an employee of Relation Therapeutics LTD and owns stocks in the company. The author has no other competing interests to declare
  8. Claude-Henry Volmar

    Center for Therapeutic Innovation, Miller School of Medicine, University of Miami, Miami, United States
    Contribution
    Supervision, Writing – review and editing
    Competing interests
    An application for the drug signature may be processed by the lead parties in the project
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9437-051X
  9. William E Kraus

    School of Medicine, Duke University, Durham, United States
    Contribution
    Funding acquisition, Resources, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1930-9684
  10. Stuart M Phillips

    Faculty of Science, Kinesiology, McMaster University, Hamilton, Canada
    Contribution
    Conceptualization, Data curation, Funding acquisition, Investigation, Resources, Writing – original draft
    Competing interests
    received travel fees and honoraria from the US National Dairy Council and the US Dairy Export Council. SP also received fees from Enhanced Recovery which were donated to charity. SP has a Canadian patent issued (Canadian patent 3052324 issued to Exerkine) and a US patent pending US patent 16/182891 pending to Exerkine, with no financial gains associated with these patents. The author has no other competing interests to declare
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1956-4098

Funding

Natural Sciences and Engineering Research Council of Canada (RGPIN-2020)

  • Stuart M Phillips

National Institute on Aging (R56AG061911)

  • Claude-Henry Volmar
  • James Timmons
  • Claes Wahlestedt

Queen Mary University London (BHF Senior Fellowship)

  • James A Timmons

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We acknowledge the contributions of current and former members of our research groups to the clinical studies and the funding associated with those published projects. Aspects of the present work were supported by a grant from the National Science and Engineering Research Council (NSERC) of Canada (RGPIN-2020) and R56AG061911 (NIA). JAT was supported by the British Heart Foundation. SMP was supported by the Canada Research Chairs funding scheme. The funders were not involved in the decision to submit the work for publication. Additional informatics analysis costs were supported by APM and RT LTD. A patent application for the use of RNA signatures to discover treatments for metabolic disease may be filed. JAT and DGF are shareholders in APM LTD, while AA and JPTK are shareholders in RT LTD. RJB is affiliated with APM LTD on a non-commercial basis. There are no other competing interests to declare.

Senior Editor

  1. Carlos Isales, Medical College of Georgia at Augusta University, United States

Reviewing Editor

  1. Mone Zaidi, Icahn School of Medicine at Mount Sinai, United States

Publication history

  1. Received: March 26, 2021
  2. Accepted: November 26, 2021
  3. Version of Record published: January 17, 2022 (version 1)
  4. Version of Record updated: January 20, 2022 (version 2)

Copyright

© 2022, Timmons et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,247
    Page views
  • 134
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. James A Timmons
  2. Andrew Anighoro
  3. Robert J Brogan
  4. Jack Stahl
  5. Claes Wahlestedt
  6. David Gordon Farquhar
  7. Jake Taylor-King
  8. Claude-Henry Volmar
  9. William E Kraus
  10. Stuart M Phillips
(2022)
A human-based multi-gene signature enables quantitative drug repurposing for metabolic disease
eLife 11:e68832.
https://doi.org/10.7554/eLife.68832

Further reading

    1. Cancer Biology
    2. Computational and Systems Biology
    Iurii Petrov, Andrey Alexeyenko
    Research Article

    Late advances in genome sequencing expanded the space of known cancer driver genes several-fold. However, most of this surge was based on computational analysis of somatic mutation frequencies and/or their impact on the protein function. On the contrary, experimental research necessarily accounted for functional context of mutations interacting with other genes and conferring cancer phenotypes. Eventually, just such results become 'hard currency' of cancer biology. The new method, NEAdriver employs knowledge accumulated thus far in the form of global interaction network and functionally annotated pathways in order to recover known and predict novel driver genes. The driver discovery was individualized by accounting for mutations' co-occurrence in each tumour genome - as an alternative to summarizing information over the whole cancer patient cohorts. For each somatic genome change, probabilistic estimates from two lanes of network analysis were combined into joint likelihoods of being a driver. Thus, ability to detect previously unnoticed candidate driver events emerged from combining individual genomic context with network perspective. The procedure was applied to ten largest cancer cohorts followed by evaluating error rates against previous cancer gene sets. The discovered driver combinations were shown to be informative on cancer outcome. This revealed driver genes with individually sparse mutation patterns that would not be detectable by other computational methods and related to cancer biology domains poorly covered by previous analyses. In particular, recurrent mutations of collagen, laminin, and integrin genes were observed in the adenocarcinoma and glioblastoma cancers. Considering constellation patterns of candidate drivers in individual cancer genomes opens a novel avenue for personalized cancer medicine.

    1. Computational and Systems Biology
    2. Evolutionary Biology
    Louisa Gonzalez Somermeyer et al.
    Research Article Updated

    Studies of protein fitness landscapes reveal biophysical constraints guiding protein evolution and empower prediction of functional proteins. However, generalisation of these findings is limited due to scarceness of systematic data on fitness landscapes of proteins with a defined evolutionary relationship. We characterized the fitness peaks of four orthologous fluorescent proteins with a broad range of sequence divergence. While two of the four studied fitness peaks were sharp, the other two were considerably flatter, being almost entirely free of epistatic interactions. Mutationally robust proteins, characterized by a flat fitness peak, were not optimal templates for machine-learning-driven protein design – instead, predictions were more accurate for fragile proteins with epistatic landscapes. Our work paves insights for practical application of fitness landscape heterogeneity in protein engineering.