1. Genetics and Genomics
Download icon

Genetic association and causal inference converge on hyperglycaemia as a modifiable factor to improve lung function

  1. William R Reay
  2. Sahar I El Shair
  3. Michael P Geaghan
  4. Carlos Riveros
  5. Elizabeth G Holliday
  6. Mark A McEvoy
  7. Stephen Hancock
  8. Roseanne Peel
  9. Rodney J Scott
  10. John R Attia
  11. Murray J Cairns  Is a corresponding author
  1. School of Biomedical Sciences and Pharmacy, The University of Newcastle, Australia
  2. Hunter Medical Research Institute, Australia
  3. School of Medicine and Public Health, The University of Newcastle, Australia
Research Article
  • Cited 1
  • Views 746
  • Annotations
Cite this article as: eLife 2021;10:e63115 doi: 10.7554/eLife.63115

Abstract

Measures of lung function are heritable, and thus, we sought to utilise genetics to propose drug-repurposing candidates that could improve respiratory outcomes. Lung function measures were found to be genetically correlated with seven druggable biochemical traits, with further evidence of a causal relationship between increased fasting glucose and diminished lung function. Moreover, we developed polygenic scores for lung function specifically within pathways with known drug targets and investigated their relationship with pulmonary phenotypes and gene expression in independent cohorts to prioritise individuals who may benefit from particular drug-repurposing opportunities. A transcriptome-wide association study (TWAS) of lung function was then performed which identified several drug–gene interactions with predicted lung function increasing modes of action. Drugs that regulate blood glucose were uncovered through both polygenic scoring and TWAS methodologies. In summary, we provided genetic justification for a number of novel drug-repurposing opportunities that could improve lung function.

eLife digest

Chronic respiratory disorders like asthma affect around 600 million people worldwide. Although these illnesses are widespread, they can have several different underlying causes, making them difficult to treat. Drugs that work well on one type of respiratory disorder may be completely ineffective on another. Understanding the biological and environmental factors that cause these illnesses will allow them to be treated more effectively by tailoring therapies to each patient.

Reduced lung function is a factor in respiratory disorders and it can have many genetic causes. Studying the genes of patients with reduced lung function can reveal the genes involved, some of which may already be targets of existing drugs for other illnesses. So, could a patient’s genetics be used to repurpose existing drugs to treat their respiratory disorders?

Reay et al. combined three methods to link genetics and biological processes to the causes of reduced lung function. The results reveal several factors that could lead to new treatments. In one example, reduced lung function showed a link to genes associated with high blood sugar. As such, treatments used in diabetes might help improve lung function in some patients. Reay et al. also developed a scoring system that could predict the efficacy of a treatment based on a patient’s genetics. The study suggests that COVID-19 infection could be affected by blood sugar levels too.

Chronic respiratory disorders are a critical issue worldwide and have proven difficult to treat, but these results suggest a way to identify new therapies and target them to the right patients. The findings also support a connection between lung function and blood sugar levels. This implies that perhaps existing diabetes treatments – including diet and lifestyle changes aimed at reducing or limiting blood sugar – could be repurposed to treat respiratory disorders in some patients. The next step will be to perform clinical trials to test whether these therapies are in fact effective.

Introduction

Optimal lung (pulmonary) function is vital for the ongoing maintenance of homeostasis, with reduced pulmonary function associated with a marked increase in the risk of mortality (Vasquez et al., 2017; Young et al., 2007). This is particularly critical due to the considerable number of disorders for which diminished pulmonary function is a clinical hallmark. For instance, chronic obstructive pulmonary disease (COPD), characterised by an irreversible limitation of airflow, is one of the leading causes of death worldwide (Quaderi and Hurst, 2018). Pulmonary manifestations are also common amongst disorders not directly classified as respiratory conditions, including diabetes (Pitocco et al., 2012; Walter et al., 2003), congenital heart disease (Alonso-Gonzalez et al., 2013), and inflammatory bowel disease (Ji et al., 2016; Yilmaz et al., 2010). Bacterial and viral infection, such as Streptococcus pneumoniae, Mycobacterium tuberculosis, influenza, and coronaviruses, also cause severe declines in respiratory function. In order to better manage the spectrum of respiratory disorders, there is a desperate need for new interventions, including those that can be targeted to an individual’s heterogeneous risk factors. While the development pathway for new compounds is difficult, there are likely to be opportunities for precision repurposing of existing drugs to enhance lung function and improve patient outcomes.

Spirometry measures of pulmonary function have been shown to display significant heritability both in twin designs and genome-wide association studies (GWAS) (Palmer et al., 2001; Ingebrigtsen et al., 2011; Shrine et al., 2019). Genomics may reveal clinically relevant insights into the biology underlying lung function, and thus, could be leveraged for drug repurposing. We sought to interrogate the genomic architecture of three spirometry indices to propose drug-repurposing candidates which could be used to improve lung function: forced expiratory volume in 1 s (FEV1), forced vital capacity (FVC), and their ratio (FEV1/FVC). Firstly, we assessed each lung function trait for evidence of genetic correlation with biochemical traits that could be pharmacologically modulated, followed by models to investigate whether there was evidence of causation. The previously developed pharmagenic enrichment score (PES) framework was then implemented to identify druggable pathways enriched with lung function-associated variation and calculate pathway-specific polygenic scores (PGS) to prioritise individuals who may benefit from a repurposed compound which interacts with the pathway (Reay et al., 2020). A transcriptome-wide association study (TWAS) of FEV1 and FVC was also undertaken to reveal genes which could be targeted by existing drugs that may increase pulmonary function. Finally, we considered the repurposing candidates proposed by these strategies in the context of three respiratory viruses (severe acute respiratory syndrome coronavirus 2 [SARS-CoV2], influenza [H1N1], and human adenovirus [HAdV]), specifically analysing the interactions between viral and human proteins. An overview schematic of this study is detailed in Figure 1 and Figure 1—figure supplement 1.

Figure 1 with 1 supplement see all
Overview of strategies for genetically informed drug repurposing to improve lung function.

The left flow chart outlines our workflow for using causal inference to identify drug targets, while the right flow chart shows the workflow for functionally partitioning the heritable component into drug targets. In both cases, we utilise or integrate genome-wide association studies (GWAS) data for lung function (including three spirometry phenotypes: forced expiratory volume in 1 s [FEV1], forced vital capacity [FVC], and their ratio [FEV1/FVC]) and quantitative biochemical traits (e.g. hormones and metabolites) which can be pharmacologically modulated. Using this data, we established genetic correlation between lung function and the biochemical traits using linkage disequilibrium score regression (LDSC). We then constructed a latent causal variable (LCV) model to investigate evidence of causality for significantly correlated biochemical–lung function trait pairs. To further support causal inference between significant pairs, we implemented Mendelian randomisation. Where a causal relationship between a modifiable biochemical trait and lung function is established, we can infer a novel treatment. The right flow chart shows the workflow for utilising heritable components for drug repurposing. Specifically, polygenic scores for lung function were calculated using lung function GWAS single nucleotide polymorphisms (SNPs) within biological pathways that can be targeted by approved drugs, rather than a genome-wide score. Individuals with low genetically predicted lung function by a pharmagenic enrichment score (PES) (low PES) relative to a reference population may benefit from a compound which modulates said pathway. To further support putative genetically predicted targets for drug repositioning a transcriptome-wide association study of lung function was performed. Druggable genes for which genetically predicted expression was correlated with a spirometry measure. Genes with positive genetic covariance between imputed expression and lung function (i.e. increased expression associated with increased lung function) could be modulated by an agonist compound, whilst genes for which decreased predicted expression is associated with improved lung function could be targeted by an antagonist compound.

Results

Measures of lung function were genetically correlated with clinically significant metabolites and hormones

We assessed genetic correlation between three pulmonary function measurements (FEV1, FVC, and FEV1/FVC) and 172 GWAS summary statistics of European ancestry using bivariate linkage disequilibrium score regression (LDSC) (Bulik-Sullivan et al., 2015; Zheng et al., 2017). A number of clinically significant traits displayed significant genetic correlation with FEV1, FVC, and/or FEV1/FVC after correcting for the number of tests performed (p<2.9×10−4, Figure 2a, Supplementary file 1a–c). FVC had the largest number of genetic correlations which surpassed Bonferroni correction (N = 35), followed by FEV1 and FEV1/FVC for which 25 and 8 traits survived multiple testing correction, respectively. The trait most significantly correlated with both FEV1 and FVC was waist circumference – FEV1: rg = −0.19, SE = 0.02, p=5.71×10−20; FVC: rg = −0.24, SE = 0.02, p=9.54×10−33  whilst asthma demonstrated the most significant correlation with FEV1/FVC (rg = −0.35, SE = 0.05, p=3.49×10−12).

Figure 2 with 1 supplement see all
Genome-wide investigation of biochemical traits related to lung function.

(a) Heatmap of genetic correlations (rg) between three spirometry measures (forced expiratory volume in 1 s [FEV1], forced vital capacity [FVC], and their ratio [FEV1/FVC]) and a number of European ancestry genome-wide association studies. Genetic correlation estimates were plotted if the trait was significantly correlated with at least one of the lung function traits after Bonferroni correction. Hierarchical clustering was applied to the rows and utilised Pearson’s correlation distance. (b) Latent causal variable models between correlated biochemical traits (selected by linkage disequilibrium score regression) that are potentially drug targets (metabolite or hormone traits) and each measure of lung function. The posterior mean genetic causality proportion (GCP) is plotted, with the error bars representing the upper and lower limits defined by its posterior mean standard error. A positive GCP estimate significantly different than zero indicates partial genetic causality of the biochemical trait on the spirometry measure.

Interestingly, there was evidence of genetic correlation between measures of lung function and circulating levels of both metabolites and hormones. This is notable as these molecules can be pharmacologically modulated, potentially informing novel therapeutic strategies and drug-repurposing opportunities to improve lung function. Significant genetic correlations were observed with four metabolites (fasting glucose, high-density lipoprotein [HDL], triglycerides, and urate) and two hormones (fasting insulin and leptin) for at least one measure of lung function (Table 1).

Table 1
Significant genetic correlations between lung function measures and metabolite and hormone GWAS.
Lung function traitBiochemical traitGenetic correlation (rg)*p-Value
FEV1Fasting insulin−0.23 (0.04)6.61 × 10−8
Leptin (BMI unadjusted)−0.25 (0.05)3.74 × 10−7
Leptin (BMI adjusted)−0.24 (0.05)9.13 × 10−7
Urate−0.12 (0.03)9.46 × 10−6
Fasting glucose−0.13 (0.03)1 × 10−4
FVCFasting insulin−0.31 (0.04)6.98 × 10−14
Leptin (BMI unadjusted)−0.33 (0.05)2.85 × 10−12
Leptin (BMI adjusted)−0.27 (0.05)1.21 × 10−8
HDL cholesterol0.14 (0.03)9.97 × 10−7
Urate−0.12 (0.02)9.54 × 10−7
Triglycerides−0.11 (0.03)1.53 × 10−5
Fasting glucose−0.12 (0.03)1 × 10−4
FEV1/FVCHDL cholesterol−0.11 (0.03)2 × 10−4
  1. *Genetic correlations which survived multiple testing correction for each lung function trait individually are reported with their respective standard error.

    Evidence of a causal relationship between fasting glucose and lung function supports antihyperglycaemic compounds as drug-repurposing candidates.

  2. FEV1: forced expiratory volume in 1 s; FVC: forced vital capacity; FEV1/FVC: ratio of FEV1 to FVC; HDL: high-density lipoprotein; BMI: body mass index; GWAS: genome-wide association studies.

The genetic correlations observed between lung function measures and metabolite/hormone traits may be clinically actionable; however, a significant estimate of genetic correlation does not imply causality (O'Connor and Price, 2018). In response, we constructed a latent causal variable (LCV) model to estimate mean posterior genetic causality proportion (GCP^) for each metabolite or hormone trait and the lung function measure with which it is genetically correlated (Figure 2b, Supplementary file 1d). The LCV method assumes that a latent variable mediates the genetic correlation between two traits and tests whether this latent variable displays stronger correlation with either of the traits. We used the recommended threshold for partial genetic causality of |GCP| > 0.6 as this has been demonstrated in simulations to appropriately guard against false positives (O'Connor and Price, 2018). There was strong evidence of partial genetic causality of fasting glucose on FVC: GCP^ = 0.77, SE = 0.15, P H0:GCP = 0 = 1.32 × 10−56. Importantly, the posterior mean GCP estimate for the relationship between fasting glucose and FVC remained strong (|GCP| > 0.6) using a fasting glucose GWAS additionally adjusted for body mass index (BMI): GCP^ = 0.63, SE = 0.22, p=1.67×10−56. The LCV model constructed between fasting glucose and FEV1 did not surpass the threshold of |GCP| > 0.6 we use to designate partial genetic causality; however, it was directionally consistent with the fasting glucose to FVC estimate and closely approaches this threshold, FEV1: GCP^ = 0.57, SE = 0.18, P H0:GCP = 0 = 7.18 × 10−12. A strong posterior GCP estimate was observed for urate and FVC (GCP^=0.73), although the relatively low heritability z score as calculated by the LCV framework (z < 7) may lead to a biased estimate. As a result, the relationship between urate and FVC should be treated with caution and further study would be needed to replicate this finding in a urate GWAS with a more precise heritability estimate. The estimate between HDL cholesterol and FEV1/FVC (GCP^=0.59, SE = 0.26, P H0:GCP = 0 = 4.12 × 10−7) was also close to the GCP threshold but we do not denote this as strong evidence of genetic causality given the 0.6 threshold was not exceeded. There was no strong evidence of genetic causality between any of the remaining LDSC-prioritised hormone or metabolite traits and FEV1, FVC, or FEV1/FVC.

As it was the most significant LCV model, the causal effect of fasting glucose on FEV1 and FVC was further investigated utilising a Mendelian randomisation (MR) approach. MR differs from an LCV model as it exploits genome-wide significant variants as genetic instrumental variables (IVs) to calculate a causal estimate of an exposure (fasting glucose) on an outcome (lung function). Given genetic correlation may bias MR due to pleiotropy, we implement MR here as a validation of the LCV results as it uses different set of statistical parameters and assumptions; however, the estimates derived from MR should be viewed cautiously in light genetic correlation which exists between fasting glucose and lung function (O'Connor and Price, 2018). We selected 32 genome-wide significant variants associated with glucose in approximate linkage equilibrium as IVs (p<5×10−8, r2 < 0.001) to ensure that variants were both rigorously associated with the exposure and independent from one another. A 1 mmol/L increase in fasting glucose was associated with a −0.088 (95% confidence interval [CI]: −0.17, −0.01) standard deviation decline in FVC using an inverse variance weighted (IVW) estimator with multiplicative random effects. Similarly, elevated fasting glucose was also shown to have a negative effect on FEV1: βIVW = −0.096 (95% CI: –0.18, –0.01). The IVW estimate for fasting glucose was only nominally significant for both FVC and FEV1 (p=0.033 and 0.023, respectively), with relatively wide confidence intervals to approach zero, and thus, the estimate should be treated with appropriate caution. We implemented a number of sensitivity analyses to test the rigour of our causal estimate of the effect of fasting glucose on lung function (Figure 2—figure supplement 1, Supplementary file 1e–g). Firstly, we obtained an analogous, and statistically significant, causal estimate using the weighted median method (FVC: βWeighted median = −0.09 [95% CI: –0.16, –0.04], p=1.87×10−3, FEV1: βWeighted median = −0.07 [95% CI: –0.13, –0.01], p=0.025). The weighted median method relaxes the assumption that all IVs must be valid, as described elsewhere (Bowden et al., 2016). An MR–Egger model was then constructed, which includes a non-zero intercept term which can be used as a measure of unbalanced pleiotropy (Bowden et al., 2015). The causal estimate using MR–Egger was in the same direction for FEV1 and FVC; however, it was non-significant (FVC: βMR Egger = –0.13 [95% CI: −0.30, 0.04], p=0.148, FEV1: βMR Egger = –0.12 [95% CI: −0.30, 0.06], p=0.21). Importantly, the MR–Egger intercept was not significantly different from zero in the FEV1 or FVC model, indicating no evidence of unbalanced pleiotropy. This was supported by a non-significant global test of pleiotropy implemented as part of the MR-Pleiotropy Residual Sum and Outlier (MR PRESSO) framework (Supplementary file 1fVerbanck et al., 2018). Furthermore, we evaluated whether there was any evidence of reverse causation, that is, FEV1 and FVC exerting a causal effect on fasting glucose using the MR Steiger directionality test, with our observed direction of causation from glucose to lung function supported.

Finally, we successively recalculated the IVW causal estimate for the effect of fasting glucose on FEV1 and FVC by removing one IV at a time in a ‘leave-one-out’ analysis (Supplementary file 1gBurgess et al., 2017). An analogous causal estimate was derived regardless of which IV was removed; however, there were five IVs (FEV1 model = two outlier single nucleotide polymorphisms (SNPs) , FVC model = four outlier SNPs [two outlier SNPs shared]) for which the estimate was marginally non-significant after exclusion (maximum p=0.11, IVW with multiplicative random effects). We then used a phenome-wide association approach to demonstrate that these five SNPs were (i) annotated to genes with important roles in glycaemic homeostasis and (ii) were almost exclusively associated with glycaemic traits or diabetes (Supplementary file 1g–l). As a result, we concluded that these IVs did not likely represent horizontal pleiotropy, which would bias the causal estimate, but instead were biologically salient IVs with large effects (Supplementary file 1g).

Whilst smoking status (ever vs. never smoked) was a covariate in the lung function GWAS, we sought to assess whether the relationship between blood glucose and lung function could be driven by residual effects of smoking. There was a significant genetic correlation between the number of cigarettes smoked per day and fasting glucose (rg = 0.16, SE = 0.043), although this was not observed with the ‘ever vs. never smoked’ phenotype (rg = 0.007, SE = 0.039). However, an LCV model constructed for fasting glucose and cigarettes smoked per day did not indicate evidence of genetic causality in contrast to the glucose/lung function models: GCP^ = −0.47, SE = 0.33, P H0:GCP = 0 = 0.25. The MR IVs for glucose were further checked for association with either ‘ever vs. never smoked’ and ‘cigarettes per day’, with none of the IVs demonstrating any association with either smoking phenotype at a genome-wide (p<5×10−8) or suggestive (p<1×10−5) significance threshold (Supplementary file 1m, n). Moreover, we investigated the possibility that our results may be impacted by collider bias given the lung function GWAS we utilised was phenotypically covaried for smoking status. We leveraged a smaller UK Biobank GWAS of FEV1 and FVC from the Neale Lab that did not adjust for smoking (N = 272,338). The posterior mean GCP and IVW estimates were in the same direction and relatively analogous for both spirometry measures to that observed using the larger GWAS covaried for smoking status, with no apparent evidence that the negative relationship between glucose and lung function was influenced by smoking as a collider variable. In summary, these data suggested that there is an effect of fasting glucose on lung function beyond what is directly attributable to a residual impact of smoking.

Implementation of the pharmagenic enrichment score for genetically informed drug repurposing in respiratory distress

We aimed to further expand drug-repurposing opportunities for lung function using the PES approach (Reay et al., 2020). Briefly, PES aims to implement genetically informed drug repurposing with PGS calculated using genetic variants specifically within druggable pathways (Figure 3a). In the context of this study, individuals with a depleted PES for lung function (lower genetically predicted lung function) mapped to pathways with known drug targets may specifically benefit from drugs which modulate these pathways. Firstly, we performed gene-set association of FEV1 and FVC using a collection of high-quality gene-sets from the molecular signatures database (MSigDB). These sets contain at least one gene which is modulated by an approved pharmacological agent (NSets = 1030). The FEV1/FVC phenotype is less directly interpretable in this context, given that it is used primarily as a diagnostic tool rather than as a quantitative measure, and thus, we focused on repurposing candidates for FEV1 and FVC individually. Variants were annotated to genes using genomic proximity, with both conservative and liberal upstream and downstream boundary definitions.

Figure 3 with 2 supplements see all
The pharmagenic enrichment score (PES) framework to identify and implement drug-repurposing candidates for lung function.

(a) Overview of the PES approach, whereby polygenic scores of lung function measures are constructed using variants specifically within druggable pathways. Individuals with a depleted PES, that is, lower genetically predicted spirometry measures using variants in the gene-set, may benefit from a drug which modulates the pathway in question. (b) The number of U.S. Food and Drug Administration-approved drugs with overrepresented targets in at least one candidate PES gene-sets per Anatomical Therapeutic Classification (ATC) level 1 code. Each ATC level 1 code is shaded a different colour with its frequency on the x-axis. (c) The phenotypic association between a polygenic score (PGS) of forced vital capacity (FVC) and an FVC PES which was nominally significant (p<0.05) but did not survive multiple testing correction after adjustment for genome-wide PGS. The relationship between the PES/PGS and normalised residual FVC in an independent cohort is plotted, with 95% confidence intervals of the regression trendline indicated by shading. (d) Significant correlations between the expression of genes in a candidate PES and three lung function PES (FVC): class B/2 secretin family receptors, circadian clock, and pathways in cancer. The relationship between PES and gene expression is presented as a volcano plot, where the x-axis is the t value (coefficient divided by standard error) and the y-axis is the –log10p-value, with higher points more significant. Genes which are associated after multiple testing correction for the number of genes in the pathway are coloured blue (strict FDR < 0.05) or red (lenient FDR < 0.1). The dotted line denotes an uncorrected nominally significant association (p<0.05).

Gene-set association using the FEV1 and FVC GWAS was undertaken at each PT with both conservative and liberal genic boundaries. If a gene-set was significant at multiple PT, the most significantly associated PT was retained. The conservative genic boundaries only yielded one druggable gene-set enriched with FEV1-associated variants after multiple testing correction (q < 0.05): signalling events mediated by the Hedgehog family – β = 0.973, SE = 0.2, p=9.3×10−7, PT <0.5, NGenes = 22 (Supplementary file 2a). There were no gene-sets with known drug targets using conservative genic-boundaries which survived multiple testing correction (false discovery rate (FDR) < 0.05) for association with FVC. Extending the genic boundaries to capture more regulatory variation (liberal boundaries) uncovered more druggable gene-sets (Supplementary file 2b). Specifically, there were seven and nine unique gene-sets which survived correction for FEV1 and FVC, respectively (q < 0.05, Table 2).

Table 2
Gene-sets with known drug targets enriched with lung function-associated common variation after the application of multiple testing correction (FDR < 0.05).
PhenotypeGene-setLowest p*Genic boundaries
FVCHedgehog signalling pathway (KEGG)6.66 × 10−9Liberal
BMP receptor signalling4.08 × 10−7Liberal
FEV1Signalling events mediated by the Hedgehog family9.30 × 10−7Conservative
Hedgehog signalling pathway (KEGG)3.45 × 10−6Liberal
FVCALK in cardiac myocytes4.57 × 10−6Liberal
Pathways in cancer5.43 × 10−6Liberal
FEV1Basal cell carcinoma8.86 × 10−6Liberal
FVCTGF-β signalling pathway1.21 × 10−5Liberal
Circadian clock3.00 × 10−5Liberal
Class B/2 (secretin family receptors)8.08 × 10−5Liberal
FEV1TGF-β signalling pathway8.15 × 10−5Liberal
Extension of telomeres8.59 × 10−5Liberal
Pathways in cancer8.94 × 10−5Liberal
Dilated cardiomyopathy9.54 × 10−5Liberal
FVCECM/ECM-associated proteins2.28 × 10−4Liberal
  1. *The lowest p is the most significant gene-set association p-value across all the p-value thresholds (PT) and genic boundary configurations tested.

    ECM: extracellular matrix; FVC: forced vital capacity; FEV1: forced expiratory volume in 1 s: TGF-transforming growth factor; ALK: activin receptor-like kinase; BMP: bone morphogenetic protein.

It should be noted that there were two pathways related to Hedgehog signalling; however, as these were from different annotation sources and had a different number of genes, we considered them separately. A number of biological processes were encompassed by these prioritised gene-sets, such as cancer (pathways in cancer, basal cell carcinoma), transforming growth factor (TGF)-β superfamily signalling (TGF-β signalling pathway, bone morphogenetic protein [BMP] receptor signalling, activin receptor-like kinase [ALK] in cardiac myocytes), and cardiac function (dilated cardiomyopathy).

For each candidate PES gene-set, we performed computational drug selection to identify approved compounds predicted to modulate the enriched pathway. Firstly, we investigated U.S. Food and Drug Administration (FDA)-approved pharmacological agents with a statistically significant overrepresentation of target genes in each of these sets (NOverlap ≥3, q < 0.05). Drugs which target (i) multiple gene-set members and (ii) more genes than expected by chance were assumed to be particularly relevant for a biological pathway. There were six such gene-sets from the PES candidates which survived multiple testing correction enriched with the targets of an FDA-approved compound (pathways in cancer, dilated cardiomyopathy, class B/2 [secretin family receptors], circadian clock, extension of telomeres, and extracellular matrix (ECM)/ECM-associated proteins, Supplementary file 2c) – notable drugs included the anti-mineralocorticoid spironolactone, antihyperglycaemic compounds (rosiglitazone, pramlintide), antihypertensives (e.g. verapamil and felodipine), antineoplastic agents (e.g. bexarotene and sunitinib), and nutraceuticals (zinc, vitamin E, and doconexent). Each compound was annotated with its Anatomical Therapeutic Chemical (ATC) classification; the most common first-level ATC code amongst these compounds was antineoplastic and immunomodulating agents (L, N = 16), followed by cardiovascular system (C, N = 15), and alimentary tract and metabolism (A, N = 12; Figure 3b). Each of these compounds was subjected to expert curation by a pharmacist in relation to side effects and prior literature evidence as detailed in Supplementary file 2d (Figure 3—figure supplement 1). Single drug–gene matching was undertaken for remaining PES candidate gene-sets lacking an approved compound with statistically overrepresented target, retaining drug–gene interactions with at least two lines of evidence from Drug–Gene Interaction Database (DGIdb) (Supplementary file 2e–p).

In order to test the phenotypic relevance of FEV1 and FVC PES profiles, we utilised an independent genotyped cohort from the Hunter Community Study (HCS, N = 1804). Firstly, we constructed a genome-wide PGS for FEV1 and FVC at six different p-value thresholds (Supplementary file 3a). The optimum FEV1 genetic score explained approximately 6.4% of the variance in FEV1 measured in the HCS cohort, whilst the FVC PGS explained approximately 5.7% of variance in FVC. Each of the seven PES profiles were tested for association with FEV1 and/or FVC both with and without adjustment for genome-wide PGS. Four of the PES considered had at least a nominally significant association with their respective spirometry measure (p<0.05, Table 3, Supplementary file 3b), whilst three survived correction for the number of tests (p<7.14×10−3). The variance explained by the significant PES was between 0.4 and 0.7%, with the number of independent SNPs in these scores ranging from 76 to 16,390.

Table 3
The association between lung function PES and spirometry measures in the Hunter Community Study cohort.
PhenotypePESZ valuepPES R2NSNP
FEV1Dilated cardiomyopathy0.150.8891.3 × 10−52404
Extension of telomeres−0.180.8611.7 × 10−544
Pathways in cancer2.980.0030.0056214
FVCCircadian clock2.140.0330.003230
Class B/2 secretin family receptors3.140.0020.00576
Extracellular matrix proteins3.505 × 10−40.00716,390
Pathways in cancer2.640.0080.0046212
PES: pharmagenic enrichment score; FVC: forced vital capacity; FEV1: forced expiratory volume in 1 s.
  1. The Z value is the PES model coefficient divided by its standard error. The variance explained (R2) was the null model R2 subtracted from the full model with the PES as a predictor. The number of independent SNPs used to calculate the PES in this cohort is reported in the NSNP column. The reported results are from models unadjusted for genome-wide PGS.

We then constructed a model which was adjusted for genome-wide PGS at the same PT as the PES and found that only the class B/2 secretin family receptor FVC PES remained nominally significant (β = 0.047, SE = 0.022, p=0.038, Figure 3c, Supplementary file 3c), although we acknowledge this association does not survive correction for the seven tests performed. This PES did not display any association with smoking status in this cohort (β = −0.014, SE = 0.047, p=0.758), whilst the signal remained nominally significant upon removing HCS participants with self-reported respiratory illness (N = 1433, β = 0.052, SE = 0.025, p=0.042). Furthermore, there was a significant depletion of FVC within the 10th percentile (low genetically predicted FVC) of the class B/2 secretin receptor family FVC PES in the HCS cohort, with the odds of being in the lowest decile decreasing by around 20% per standard deviation increase in FVC (OR = 0.80 [95% CI: 0.68, 0.93], p=4.7×10−3). All of the PES tested demonstrated small albeit significant correlations with genome-wide PGS at the same pT in the HCS cohort, with the exception of the extracellular matrix PES for which the correlation was relatively large (r = 0.33, Figure 3—figure supplement 2). The higher correlation in this gene-set was probably due to the large number of genes involved (>1000). Interestingly, there was still a number of individuals with high genetically predicted lung function using a genome-wide PGS (90th percentile of HCS cohort) but low genetically predicted lung function using one of the PES (10th percentile). Specifically, 12.17% and 12.05% of the HCS participants in the 90th percentile PGS for FVC and FEV1 respectively had a depleted PES (10th percentile, low predicted lung function by PES). Taken together, this suggests that pathway-based PGS provide distinct biological insights for some individuals with otherwise high genetic load of lung function increasing alleles, although the association between the class B/2 secretin family receptor PES and FVC after covariation for PGS did not survive multiple testing correction, and thus, these data require replication.

The correlation between the expression of genes within each pathway encompassed by the PES and the PES profiles themselves could provide further support for their biological impact. We investigated the association between lung function PES and gene expression using RNA sequencing (RNAseq) on transformed lymphoblastoid cell lines (LCL) from 357 European individuals for which phase 3 whole-genome sequencing data was available from the 1000 genomes project (Figure 3d, Supplementary file 3d–jLappalainen et al., 2013). We identified a significant association between the FVC PES class B/2 [secretin family receptors] and the expression of WNT3 using a strict FDR threshold q < 0.05 (t = −3.53, p=4.71×10−4, q = 0.028); a more lenient FDR cut-off (q < 0.1) yielded two more significant PES–gene expression correlations: FVC circadian clock PES and PPARA: t = −3.23, p=1.37×10−3, q = 0.07; FVC pathways in cancer and HSP90AB1: t = 3.72, p=2.33×10−4, q = 0.066. Expression of WNT3 and PPARA was not associated with genome-wide PGS at the same p-value threshold (p=0.63 and 0.29), whilst the PGS exhibited a weaker, nominal relationship with HSP90AB1 (p=0.04). The remaining four PES tested (FEV1 or FVC) all demonstrated at least one nominal, uncorrected association (p<0.05). The observed effects of PES on gene expression at the population level were subtle; this is not surprising as each PES profile will encompass heterogenous variants for each individual, and thus, impacts on gene expression may be greater within specific genomic contexts.

Transcriptome-wide association identifies putative targets for pharmacological modulation of lung function

We performed a TWAS of the three lung function measures using SNP weights from lung and blood tissue. TWAS leverages models of genetically regulated expression to test for a correlation between predicted expression and a phenotype (Gusev et al., 2016). Models of imputed expression derived from cis-eQTLs are generated from genes for which expression displays significant cis-heritability, that is, a significant genetic contribution to expression variance. We aimed to identify genes for which increased or decreased expression was associated with increased lung function and had approved compounds available which could improve lung function based on their mechanism of action (Figure 4a). For instance, if increased expression of a gene was associated with improved lung function, then an agonist of that gene may be clinically useful or vice versa in the case of decreased expression. Using a Bonferroni threshold for the number of genes tested in lung and blood individually, we identified a number of transcriptome-wide significant genes as follows – FEV1: NGenes [Lung]=232, NGenes [Whole blood]=201; FVC: NGenes [Lung]=222, NGenes [Whole blood]=167 (Supplementary file 3k–n, Figure 4b). The number of significant genes remained very similar using a more conservative threshold for Bonferroni correction that accounted for all genes in both tissues (p<3.6×10−8), which is conservative due to correlation between imputed models (Supplementary file 3k–n). Transcriptome-wide associated genes were only retained if they were not also associated with a smoking phenotype to minimise residual smoking-related confounding. Specifically, we tested whether predicted expression of the genes which survived correction in the FEV1 or FVC TWAS was associated with smoking behaviour (‘ever vs. never smoked’ and ‘cigarettes per day’) in a TWAS using SNP weights from lung, blood, and two brain regions implicated in nicotine addiction (dorsolateral prefrontal cortex and nucleus accumbens; Supplementary file 3m–v; Goldstein and Volkow, 2011; Scofield et al., 2016). We searched each of these significant genes in the DGIdb v3.0.2 to ascertain compounds which may improve lung function based on the direction of effect from the TWAS analyses. In accordance with the PES analyses, FEV1/FVC was not directly considered and we focused on FEV1 and/or FVC-associated genes which could be pharmacologically modulated (Supplementary file 3w).

The application of transcriptome-wide association to identify drug-repurposing candidates for lung function.

(a) Schematic outlining the use of transcriptome-wide association study (TWAS) to reveal clinically actionable drug–gene interactions. Druggable genes with lung function-associated imputed expression can be finemapped to prioritise a credible set of a causal genes at the TWAS locus, that is, a high posterior inclusion probability (PIP). We seek to identify drugs with a mode of action which match the TWAS Z value, that is, compounds which may increase lung function. (b, c) Miami plots of a TWAS of forced expiratory volume in 1 s (left) and forced vital capacity (right) using whole blood (b) and lung (c) SNP weights. TWAS Z > 0 denotes a gene for which increased predicted expression is associated with increased lung function and vice versa. The highlighted genes survived multiple testing correction for the number of genes tested. (d) Probabilistic finemapping of the PYGB TWAS locus. The points denoting each gene are sized and coloured by their PIP for causality, with higher PIP denoted by larger, darker points as represented on the scale. The correlation plot below each region represents the covariance of predicted expression between gene.

Four candidate genes were identified satisfying tier one criteria: PPARD, ADORA2B, KCNJ1, and AMT. For instance, decreased expression of potassium channel gene KCNJ1 was associated with FVC (ZTWAS = −4.60), and this channel can be inhibited by approved compounds such as the antidiabetic drug glimepiride. There were an additional seven genes with tier 2 investigational targets: PYGB, PIK3C2B, LINGO1, APH1A, OPRL1, MST1R, and ACVR2B. Probabilistic finemapping of these transcriptome-wide significant regions using a multi-tissue reference panel was then performed to prioritise whether these genes are likely causal at that locus. A credible set with 90% probability of containing the causal gene was computed for each locus utilising the marginal posterior inclusion probability (PIP) calculated from the observed TWAS statistics. We did not proceed with finemapping the PPARD locus due to its proximity to the defined boundaries of the major histocompatibility complex (MHC) region. Two FEV1-associated genes with tier 1 and/or tier 2 drug interactions, AMT and PYGB, were included in the credible set with a PIP >0.9 or nearing that threshold. Tetrahydrofolate is a co-factor for AMT (ZTWAS = 5.96, PIP = 0.893, whole blood SNP weights), which has been previously implicated as having a beneficial effect on lung function. PYGB (ZTWAS = −6.98, PIP = 0.999, lung SNP weights) encodes a protein involved in glycogenolysis and can be putatively inhibited by the new exploratory treatment for respiratory failure, sivelestat (Figure 4c). We acknowledge that the interaction between PYGB and sivelestat was derived from two public databases curated by DGIdb v.3.0.2 (Supplementary file 3w), and appropriate caution should be exercised in interpreting this relationship given that PYGB is not the primary target of sivelestat. Interestingly, there is evidence of a high-confidence biological interaction between PYGB and the gene that encodes the principal target of sivelestat via the STRING database (ELA2, neutrophil elastase).

In addition, we tested a more conservative Bernoulli prior for each causal indicator (p=1×10−5) but this only had a negligible effect on the PIP for either AMT (PIP = 0.87) or PYGB (PIP = 0.994). Whilst there is a plausible role for AMT in respiratory biology (aminomethyltransferase, involved in glycine cleavage), it should be noted that decreased predicted expression of AMT also trended towards the Bonferroni threshold for a significant association with smoking status (ZTWAS = −4.33, p=1.46×10−5), although this was weaker for the cigarettes per day phenotype (ZTWAS = −2.97, p=2.94×10−3). As a result, the association of this region with FEV1 should be treated cautiously until its biological relevance can be clarified to ensure that this signal is not driven by a residual effect of smoking.

Host–viral interactomes suggested proposed pulmonary drug-repurposing candidates may be significant for respiratory virus infection

Respiratory viruses are an important contributor to acute, and potentially fatal, declines in lung function. We sought to investigate whether our proposed drug-repurposing candidates for lung function may also exhibit antiviral properties against these pathogens. The host–virus interactome was analysed for three respiratory viruses to perform computational drug repurposing – SARS-CoV2, H1N1, and the HAdV family (Supplementary file 4a–c; Gordon et al., 2020; Watanabe et al., 2014; Martinez-Martin et al., 2016). Specifically, human proteins which are predicted to interact with virally expressed proteins (‘prey proteins’) were investigated to identify those which could be inhibited by existing drugs to potentially disrupt the progression of infection. Approved inhibitors or antagonists of proteins in each respective host–virus interactome were sourced using DGIdb and compared to our candidate compounds for lung function from the PES approach. Furthermore, we investigated the reported drug-label side-effect frequencies of each of these overlapping pharmacological agents and retained only candidates with no commonly reported (>1% frequency) respiratory adverse effects. There were three inhibitors of human proteins with evidence of interaction with a viral protein that also targeted a gene which was a member of a PES candidate gene-set. Vorinostat (HDAC2 inhibitor) and aminocaproic acid (PLAT inhibitor) both inhibited a SARS-CoV2 ‘prey protein’ and targeted a gene within the pathways in cancer and extracellular matrix (ECM)/ECM-associated proteins PES pathways, respectively. Similarly, ruxolitinib inhibits the influenza prey protein JAK1, a part of the pathways in cancer gene-set. We caution that these pathways are quite broad in the biology that they encompass, and, as a result, the relevance of these drug–gene interactions to the pathways of interest warrants further study.

We demonstrated using multiple lines of evidence a putative relationship between increased fasting blood glucose and lung function; therefore, we investigated whether any of the host–viral interactome members were enriched within biological pathways involved in glycaemic homeostasis. Interestingly, there was an overrepresentation of SARS-CoV2 ‘prey proteins’ amongst four gene-sets related to glucose metabolism, along with insulin and glucagon signalling pathways (Table 4). Fourteen SARS-CoV2 ‘prey proteins’ were members of at least one of these gene-sets, with a greater number of interactions amongst these genes than expected by chance (p=4.42×10−12, Supplementary file 4d). We outline evidence for the potential role of these viral prey genes in glycaemic homeostasis in Supplementary file 4d. These data support emerging evidence that SARS-CoV2-infected patients with hyperglycaemia are at higher risk of morbidity and mortality (Kumar et al., 2020).

Table 4
Overrepresentation of proteins which interact with viral severe acute respiratory syndrome coronavirus 2-expressed proteins within glycaemic-related pathways.
Glycaemic gene-setp-Value
Glucagon-like peptide-1 regulates insulin secretion7.02 × 10−4
Glucagon signalling in metabolic regulation2.33 × 10−4
Glucose metabolism2.69 × 10−5
Regulation of insulin secretion2.13 × 10−3

None of the glycaemic ‘prey proteins’ were direct target of antidiabetic compounds; however, 57% of these proteins had a high-confidence protein–protein interaction with antidiabetic target gene (Supplementary file 4e–f). For instance, GNB1 putatively binds with a SARS-CoV2 non-structural proteins (Nsp7) that forms the part of the replicase/transcriptase complex, whilst this protein also demonstrated evidence of interacting with 15 proteins modulated by an antidiabetic compound, such as GLP1R, which is the primary target of GLP-1 analogues, including exenatide. Pharmacological interventions which seek to control blood glucose may have positive implications both in terms of improving baseline lung function and reducing the risk of adverse consequences after SARS-CoV2 exposure.

Discussion

We revealed candidate drug-repurposing opportunities to potentially improve pulmonary function and provide the means for aligning their application in individuals that carry a high relative burden of variants associated with their function. Through this process we identify glycaemic interventions in particular as being potentially beneficial in the context of respiratory infection. Our study suggests a causal relationship between blood glucose and lung function using a genome-wide (LCV) and IV (MR) approach, whilst downregulation of the glycogen phosphorylase PYGB was also associated with FEV1 after probabilistic finemapping of TWAS loci. These data support previous literature suggesting that declines in pulmonary function are overrepresented amongst individuals with diabetes and correlates with poor glycaemic control (Walter et al., 2003; Davis et al., 2004; Gutiérrez-Carrasquilla et al., 2019; van den Borst et al., 2010); a phenomenon which has also been reported in non-diabetics (McKeever et al., 2005; Barrett-Connor and Frette, 1996). There are a number of pathophysiological mechanisms postulated to underlie this relationship, including fibrosis mediated by hyperglycaemia-accelerated epithelial-to-mesenchymal transition (Talakatta et al., 2018) and aberrant inflammatory responses to dysglycaemia (Mohanty et al., 2000; Sun et al., 2014). Importantly, our data extends on these previous observational studies to provide novel evidence for a causal relationship. Respiratory sequalae after infection may also be significantly affected by dysregulation of glycaemic control. Acute hyperglycaemia is associated with a significant increase in morbidity and mortality amongst non-diabetic community-acquired pneumonia (CAP) patients, which further supports its utility as a treatment target (Lepper et al., 2012; Jensen et al., 2017; Kornum et al., 2007; McAlister et al., 2005). Notably, even patients with mild hyperglycaemia (serum glucose 6–10.99 mmol/L) have a purported elevated risk of death at 90 days following CAP diagnosis (Lepper et al., 2012), whilst the association between type 2 diabetes and poor pneumonia outcomes appears to be driven by glycaemic control (McAlister et al., 2005). Inflammation is likely to be an important component of glycaemic-influenced adverse effects; for instance, the intracellular carbohydrate O-linked β-N-acetylglucosamine has been recently linked to influenza-associated cytokine storms (Wang et al., 2020). Future work should focus on the relevance of glycaemic biology to specific respiratory illnesses like asthma and COPD. Our findings supported the relevance of glycaemia to respiratory infection through demonstrating that proteins which putatively interact with the SARS-CoV2 virus were overrepresented in glycaemic pathways. Whilst the viral prey proteins we identified as members of glycaemic pathways were not the direct targets of antihyperglycaemic agents, some interact with these compounds, although the biological saliency of these interactions warrants future investigation. The presence of a viral–prey protein interaction also does not necessarily support its essentiality in the viral life cycle, and further data are needed to support this. Furthermore, the viral prey proteins overrepresented in the glycaemic pathways were mostly genes such as nucleoporins and cAMP-dependent protein kinases which have pleiotropic regulatory roles spanning a number of biological systems. It would also be interesting to further explore the relationship between the genetic architecture of fasting glucose and expression of these SARS-CoV2 prey proteins. These data taken together support the utility of managing blood glucose in the clinical improvement of respiratory outcomes.

Targeted drug application and repurposing is by its very nature confounded by biological heterogeneity amongst individuals. This is likely particularly true in the case of complex traits as their polygenic genetic architecture provides the substrate for each individual to display a unique profile of trait-associated variation. In the second stream of this study, we stratified the polygenic architecture of lung function into a series of druggable pathways to provide a framework for pathway-specific genetic scores we designate the PES. We suggest that leveraging inter-individual genetic heterogeneity in this way will improve the precision application of novel drug repurposing. A number of interesting drug-repositioning candidates had overrepresented targets amongst the candidate PES gene-sets. For example, magnesium sulfate had enriched targets in the dilated cardiomyopathy PES and has previously shown promise as a repurposing candidate to improve pulmonary function in asthma (Okayama, 1987; Hossein et al., 2016). Using an independent cohort, several PES profiles tested explained a small, but significant, percentage of variance in FEV1 and/or FVC. The class B/2 secretin family receptors score for FVC was noteworthy given that it remained nominally significant after an adjustment for genome-wide PGS. However, this did not survive multiple testing correction, and thus, further replication is needed to confirm this signal. Interestingly, this gene-set features a number of proteins involved with glycaemic homeostasis, including antidiabetic drug targets glucagon-like peptide receptor 1 (GLP1R) and amylin receptors (RAMP1, RAMP2, and RAMP3). While all of the PES demonstrated significant correlation with genome-wide PGS, in the majority of cases it was small (r < 0.2), suggesting that most of these functionally relevant foci of genomic risk in lung function GWASs were relatively independent of the total PGS. Importantly, we still identified individuals with high genetically predicted lung function using a genome-wide PGS but observed low predicted lung function with a pathway-specific PES. This was supported by the observed correlation between the PES and related mRNA expression which was distinct from a genome-wide PGS. Collectively, these data are consistent with the hypothesis that important treatment-related biology could be captured at a pathway level for individuals with or at risk of respiratory illness. The specific data from this study require future replication and validation in independent cohorts in order to provide greater support to our observed relationships between PES and spirometry measures. In addition, further study is warranted to dissect the signals encompassed by pathway-specific PGS, particularly in light of what would be observed amongst other pathways without drug targets or in gene-sets associated with other related traits.

Taken together, our approach provides template for genetically informed precision drug repositioning to improve lung function. The clinical implementation in its most basic form would involve common variant genotyping using a commercial SNP array followed by imputation and lung function PES-based stratification of treatment options. This would be combined with other biochemical exposure measures, such as fasting glucose, that are causal risk factors and have approved treatments. To illustrate the clinical implementation of our strategy, we generated a schematic representation of individual heterogeneity in biochemical and genetic components of risk in lung function and related them to candidates for precision drug repositioning (Figure 5). We envisage that our approach to variant and exposure risk stratification can be applied more broadly to identify and implement precision drug repositioning in a range of complex traits.

Schematic representation of drug repositioning and precision implementation in lung function deficits directed by causal enrichment of environmental and genetic risk factors.

Each row represents a simulated individual with a heterogeneous presentation of risk factors related to lung function. Case 1 (top row) represents an individual with good lung function (pink lung tissue) and genomic and environmental components consistent with healthy lung function (grey to red nodes). These have a neutral to positive influence on lung function represented by the grey and red edges (arrow), respectively. Case 2 has high fasting glucose and neutral (grey) loading of genetic variants (pharmagenic enrichment score [PES]) associated with lung function pathways. After treatment with antihyperglycaemic agents, or some other intervention to lower blood glucose, lung function is improved (red edge) sufficiently for therapeutic effect, represented by pink lungs. Case 3 has enrichment of genetic variants (PES) associated with poorer lung function in the class b2 secretin pathway. To improve lung function, they are treated with drugs, such as pramlintide (which targets RAMP1, RAMP2, and RAMP3) and exenatide (GLP1R agonist), which works by modulating genes in the class b2 secretin pathway to ameliorate the enrichment of poor lung function variants in that pathway. The broken edge between fasting glucose and the class b2 secretin pathway represents the probable connection or shared genes between these nodes as receptors in this pathway are involved in glycaemic regulation. Case 4 also presents with poor lung function (blue lung tissue) and enrichment of poor lung function-associated variants in the circadian clock pathway (blue node). This individual’s lung function was then treated by compounds, such as doconexent, which act on the circadian clock pathway. This schematic is only representative of many thousands of treatment scenarios potentially informed by this treatment decision tool, which could be applied to any phenotype with large genome-wide association studies available.

Whilst there are some potential confounds in the use of GWAS data for causal inference via both LCV models and MR, such as measurement error, population stratification, and horizontal pleiotropy, we are confident that the relationship between glycaemia and lung function presented in this study is robust given the multiple lines of support. Replicated, well-powered randomised controlled trials, however, are needed to fully resolve the clinical benefit of repurposing antihyperglycaemic compounds to improve lung function and in the context of viral infection. We also acknowledge that the direction of suitable pharmacological intervention is not inherently clear, such that an agonist or antagonist of genes within a pathway implicated by the PES approach is an important consideration (Reay et al., 2020). Careful curation of the proposed repurposing candidates will therefore be critical, particularly in the context of pulmonary traits where a variety of currently approved compounds have adverse respiratory effects. We suggest that TWAS could be utilised to help overcome these issues by identifying druggable genes which are members of candidate PES gene-sets for which a clinically beneficial impact on expression can be predicted. These candidate genes derived from TWAS could in future be explored further using well-powered cohorts with genetic and transcriptomic that recorded spirometry measures. Interestingly, we also saw some evidence of cross-talk between heritable risk at genes associated with lung function and fasting glucose, with the downregulation of the glycogen phosphorylase PYGB (associated with FEV1) observed through the probabilistic finemapping of TWAS loci.

This study demonstrated a variety of methods for which genomic data could be utilised to propose drug-repurposing candidates, ranging from approaches which exploit genome-wide variant effects to the identification of candidate clinically significant drug–gene interactions. Lung function is a particularly relevant phenotype to study in this context as its aetiology is influenced by a variety of complex biological factors, and it is a significant contributor to global morbidity and mortality. Genetics-informed approaches will likely be increasingly useful to target novel respiratory interventions and reposition existing compounds. In future, genetics-based methods could be integrated with other clinical information to further enhance precision drug repurposing, whilst further consideration could be given to experimental compounds to enhance the number of repurposing opportunities. Our data strongly supported the efficacy of antihyperglycaemic compounds as repurposing candidates which could act as the impetus for further clinical investigation via randomised controlled trials.

Materials and methods

Lung function GWAS

Request a detailed protocol

We obtained GWAS summary statistics for FEV1, FVC, and their ratio from a meta-analysis of the UK Biobank sample with the SpiroMeta consortium cohorts as outlined extensively elsewhere (N = 400,102) (Shrine et al., 2019). Phenotypes were adjusted for age, age2, sex, height, smoking status (ever vs. never smoked), and genotyping array before the residuals were subjected to rank inverse-normal transformation. This GWAS was performed using European ancestry individuals.

Genetic correlation

Request a detailed protocol

Bivariate LDSC regression was performed between each lung function trait and a variety of GWAS as implemented by LDhub v1.9.3 (Zheng et al., 2017). Lung function summary statistics were cleaned (‘munged’) prior to LDSC using munge_sumstats.py and merged with common HapMap3 SNPs excluding the MHC region due to its LD complexity, as is usual practice (Bulik-Sullivan et al., 2015). We retained estimates of genetic correlation (rg) for GWAS (N = 172) with European ancestry and a heritability Z value >4, as calculated by LDhub. When a phenotype had multiple GWAS, the GWAS with largest sample size was retained. The Bonferroni method was utilised for multiple testing correction with the significance threshold set as p<2.9×10−4 (α = 0.05/172). A heatmap was constructed using the ComplexHeatmap package (Gu et al., 2016).

LCV models

Request a detailed protocol

LCV models were constructed between each measure of lung function which displayed a significant genetic correlation with a hormone or metabolite trait. The RunLCV.R and MomentFunctions.R scripts were leveraged to perform these analyses (https://github.com/lukejoconnor/LCV). The LCV framework assumes that a latent variable, L, mediates the genetic correlation between two traits (trait 1, trait 2) and uses the mixed fourth moments of the bivariate effect size distribution to estimate the mean posterior GCP as described in detail by O'Connor and Price, 2018. The GCP estimate quantifies the magnitude of genetic causality between the two traits. GCP values range from −1 to 1 (full genetic causality); within these limits, positive values indicate greater partial genetic causality of trait 1 on 2, and vice versa for negative values. All traits were munged prior to LCV analyses, with only HapMap3 SNPs (minor allele frequency [MAF] >0.05) outside the MHC region retained in accordance with the LDSC analyses. We utilised the baseline 1000 genomes phase 3 LD scores for HapMap3 SNPs (MHC excluded). A two-sided t-test was used to assess whether the estimated GCP was significantly different from zero.

Mendelian randomisation

Request a detailed protocol

We investigated the causal effect of fasting glucose on both FEV1 and FVC using two-sample MR. MR is underpinned by the use of genetic variants as IVs, with the random inheritance of these IVs as per Mendel’s laws facilitating the use of IVs to perform causal inference between an exposure and outcome, providing a series of assumptions are met (Burgess et al., 2017). We defined IVs as independent variants which are associated with fasting glucose using the traditional GWAS genome-wide significance threshold (p<5×10−8, r2 <0.001, palindromic SNPs removed). A different GWAS of fasting glucose was utilised for MR than for LDSC and LCV. Scott et al. performed a replication of ~66,000 Illumina CardioMetabochip variants following the Manning et al. GWAS for which more complete summary statistics were available, and thus, the latter was included in the LDhub catalogue instead of the former (Manning et al., 2012; Scott et al., 2012). We required only genome-wide significant SNPs for MR; therefore, the Scott et al. CardioMetabochip replication was more suitable as this was a larger sample size than the Manning et al. GWAS. Fasting glucose data for GWAS were obtained from either plasma or whole blood of non-diabetic individuals of European ancestry and corrected to plasma levels (N = 133,310, unit of effect = mmol/L) (Scott et al., 2012). Our primary MR model was an IVW effect model with multiplicative random effects (Burgess et al., 2013). Further, we implemented a weighted median model which takes the median of the ratio estimates (as opposed to the mean in the IVW model), such that upweighting was applied to ratio estimates with greater precision (Bowden et al., 2016). An MR–Egger model was then constructed; an adaption of Egger regression wherein the exposure effect is regressed against the outcome with an intercept term added to represent the average pleiotropic effect (Bowden et al., 2015). In addition, we examined evidence of reverse causality by using the MR Steiger directionality test (Hemani et al., 2017).

We also tested whether the Egger intercept is significantly different from zero as a measure of unbalanced pleiotropy. In addition, heterogeneity amongst the IV ratio estimates was quantified using Cochran’s Q statistic, given that horizontal pleiotropy may be one explanation for significant heterogeneity. A global pleiotropy test was also implemented via the MR PRESSO framework (Verbanck et al., 2018). Leave-one-out analyses were then performed to assess whether causal estimates are biased by a single IV, which may indicate the presence of outliers, and the sensitivity of the estimate to said outliers. However, outliers may not necessarily be evidence of horizontal pleiotropy. We performed a phenome-wide association study for each of these ‘outlier’ SNPs using summary data collated by GWAS atlas v20191115 to assess evidence of horizontal pleiotropy, that is, acting through non-glycaemic pathways to influence lung function (Watanabe et al., 2019). All MR analyses were performed in R version 3.6.0 using the TwoSampleMR v0.4.25 and MRPRESSO v1.0 packages.

Investigating residual confounding from smoking on the relationship between fasting glucose and lung function

Request a detailed protocol

We investigated whether a residual effect of smoking could confound the link between glucose and lung function. Firstly, we selected two well-powered GWAS of smoking behaviours: ever vs. never smoked (N = 385,013) (Watanabe et al., 2019) and cigarettes smoked per day (N = 263,954) (Liu et al., 2019). Genetic correlation between these two smoking phenotypes and fasting glucose was estimated as described above, followed by the construction of an LCV model. The MR IVs utilised for fasting glucose were also checked for association with each smoking GWAS. We also probed whether there could be an effect of collider bias in the event smoking does indeed exert an effect on fasting glucose. To this end the LCV and MR analyses was repeated for fasting glucose using smaller UK Biobank GWAS of FEV1 and FVC from the Neale Lab as it was not adjusted for smoking (N = 272,338, http://www.nealelab.is/uk-biobank).

Generation of PES candidate gene-sets

Request a detailed protocol

We implemented gene-set association using MAGMA method (MAGMA v1.06b), with some customisations to the framework to identify candidate PES gene-sets (Reay et al., 2020; de Leeuw et al., 2015). These gene-sets became the basis to calculate pathway-specific polygenic scores (PES). MAGMA aggregates SNP-wise p-values for trait association into a gene-based p-value and, thereafter, tests whether a set of genes is more strongly associated with the phenotype than all other genes. Gene-based test statistics were calculated analogous to Brown’s method, which is applicable to dependent p-values with known covariance (as common SNPs display through the phenomenon of linkage disequilibrium [LD], which can be quantified at a population level). p-Value thresholding (PT) was utilised for the gene test statistic calculation; four p-values were selected: all SNPs, PT <0.5, PT <0.05, and PT <0.005, meaning only SNPs below these thresholds were included in the gene-based model. We argue that distinct biological processes in individuals may only be captured when the optimal spectrum of polygenic variation is included in the model. A variety of PT could be utilised; for simplicity, we selected the four p-values thresholds described, as per our previous work (Reay et al., 2020). We mapped variants to 18297 autosomal genes in hg19 assembly defined by NCBI and obtained from the MAGMA website – genes within the MHC were removed due to the complexity of LD within this region. The 1000 genomes phase 3 European reference panel was utilised to define LD for input into MAGMA. Genic boundaries were extended to capture regulatory variation, with both conservative and liberal upstream and downstream boundary definition implemented. An extension of 5 kb upstream of the gene and 1.5 kb downstream was the conservative construct, whilst a larger 35 kb upstream and 10 kb downstream was the liberal construct. Boundaries were longer upstream of the gene in both instances to capture more promoter-related variation, as is usual practice (Wray et al., 2018; Kunkle et al., 2019; Reay and Cairns, 2020).

Genic p-values were transformed to Z-scores with the probit function for input into the gene-set association model. Competitive gene-set association was undertaken by a linear regression model whereby genic Z-scores are the outcome and confounders including gene size and genic minor allele count included as covariates. When these models are constructed at different PT, this approach constitutes testing whether the gene-set is more associated than the other genes, for which test statistics were calculated only including SNPs below the threshold. We selected pathways that survived multiple testing correction for an enrichment of lung function-associated variation relative to all other genes at that threshold by applying correction via the Benjamini–Hochberg (BH) method (FDR < 0.05) to all thresholds combined. These associations can be interpreted based on the p-value threshold for the model, for example, at gene-set which survives FDR correction that includes only variants which displayed a nominally significant univariable association with lung function (p<0.05) is indicative of a set of genes that are more associated with lung function than all other genes with at least one SNP that had p<0.05 in the GWAS. The BH approach was implemented rather than Bonferroni as several gene-sets will be tested multiple times at different p-value thresholds, and thus, the assumption of independence underlying Bonferroni correction likely means this would be overly conservative. In summary, gene-based test statistics were constructed at four different p-value thresholds, whereby only SNPs below the said threshold were included in the gene-based test statistic. Thereafter, competitive gene-set association is conducted for each druggable pathway at the different thresholds, with the null hypothesis being that the druggable pathway is no more associated with the trait (enriched with association) than all other genes for which gene-based p-values could be calculated by virtue of having an SNP below the threshold annotated to it. The concept underlying this is that distinct pathways may be enriched with common variants at differing levels of the polygenic signal, for example, a model including all SNPs will identify gene-set enriched with association relative to all other genes, whilst a less polygenic model, like a threshold of p<0.05, will capture gene-sets enriched with association relative to genes with at least one SNP mapped to it with a univariate association p<0.05. We defined gene-sets with known drug targets by sourcing hallmark and canonical (BioCarta, KEGG, PID, and Reactome) from the Molecular Signatures Database (MSigDB) (Liberzon et al., 2015) and retaining those with at least one gene with a high-confidence interaction with at least one approved pharmacological agent (TClin genes), as annotated using the Target Central Resource Database (TCRD v6.1, NGenes = 613) (Oprea et al., 2018).

PES candidate gene-set drug repurposing

Request a detailed protocol

We tested each candidate PES gene-set for overrepresentation of DrugBank compound targets using WebGestaltR v0.4.2 (Liao et al., 2019). Compounds were retained for each pathway if they survived FDR correction (q < 0.05) and were FDA approved. Single-drug gene matching was performed using the DGIdb v.3.02, with a minimum of three lines of supporting evidence the criterion for selection (Cotto et al., 2018).

The list of FDA-approved DrugBank compounds which were overrepresented targets in a PES candidate gene-set was reviewed by a pharmacist to prioritise potential useful compounds for lung function. A total of eight topical compounds were excluded. The remaining 55 oral and/or parenteral compounds were investigated for lung function-related adverse events (including all of dyspnoea, abnormal breath sounds, decreased respiratory rate, orthopnoea, shallow breathing, respiratory distress, respiratory depression, or any other related term), other alarming adverse events, important precautions, black-box warnings, or any contraindication that might prohibit the drug use in our study population. These data were reviewed for each compound using the following databases: drugs.com, Medscape, SIDER v4.1, and the summaries of each product’s characteristics. We also searched for articles that discussed either an improvement or worsening in the lung functions for each compound along with the allowed paediatric age use.

The drugs were then categorised into one of five categories (Figure 3—figure supplement 1, Supplementary file 2d). Level 1 was assigned for an oral or parenteral formulation, with no documented respiratory side effects and with positive evidence of prior use for lung function in the literature. Level 2 was assigned for an oral or parenteral formulation, with no documented or rare (<1%) respiratory side effects and with/without positive evidence but no negative evidence of prior use for lung function in the literature. Level 3 was assigned to an oral or parenteral formulation, with common (1–15%) respiratory side effects and with/without positive evidence but no negative evidence of prior use for lung function in the literature. Level 4 compounds were those oral or parenteral formulations with very common (16–50%) respiratory side effects or other alarming adverse effects unrelated to respiratory function, without positive evidence but with/without negative evidence of prior use for lung function in the literature. Finally, level 5 was assigned when the drug was associated with a serious adverse event (including a black-box warning or an absolute contraindication).

The PES model for individuals

Request a detailed protocol

We defined the model to calculate PES profiles for individuals as follows (Equation 1). Consider j SNPs for i individuals, wherein the SNPs are those physically mapped to genes which are members of a candidate PES gene-set (m). Let β^j denote the statistical effect size for each variant from the GWAS, multiplied by its dosage Gij. The SNPs included were those below the p-value threshold utilised to discover the gene-set.

(1) PESi= i=1mβ^jGij

We averaged these scores by the number of SNPs carried by each individual and scaled them using the scale() function in R. PES profiles were generated in all instances by first filtering the GWAS summary statistics for common variants (MAF >0.01) within the genic boundaries of variants which comprise the PES gene-set. The genic boundaries were extended using the liberal or conservative configuration, dependent on which boundary definition was utilised in the gene-set association for that pathway. PRSice v2.2.12 calculated the respective PES, along with genome-wide PGS (using the same additive model but genome wide) for FEV1 and FVC (Choi and O'Reilly, 2019).

Lung function PES in the HCS cohort

Request a detailed protocol

We utilised an independent, genotyped cohort for which spirometry measures were recorded to investigate the phenotype relevance of PES profiles for lung function. Participants were drawn from the HCS, a population-based cohort of individuals aged between 55 and 85 years, predominantly of European ancestry and residing in Newcastle, New South Wales, Australia. All work was conducted in accordance with ethics committee approvals. Consenting participants completed a series of questionnaires, attended a clinic visit, and provided blood samples. Individuals were recruited by random selection from the New South Wales State electoral roll with detailed recruitment and data collection methods for the HCS described elsewhere (McEvoy et al., 2010). Participants provided blood samples from which DNA was extracted and genotyped using the Affymetrix Axiom Kaiser array. Quality control excluded SNPs with genotype call rate of <0.95, deviation from Hardy–Weinberg equilibrium (p<1×10−6) or MAF of <0.01. The input for relatedness testing and removal of population outliers were autosomal, common (MAF >0.05), physically genotyped SNPs in relative linkage equilibrium (r2 <0.02), with regions of long-range LD removed, as is usual practice (Price et al., 2008). We used PLINK 1.9 to retain only unrelated individuals (pi_hat >0.185), with one participant from each related pair blinded to phenotype information. Population outliers were determined by performing principal component analysis (PCA) using PLINK 1.9. We clustered individuals in the HCS with the first two principal components from each 1000 genomes phase 3 superpopulation using k-means clustering. Thereafter, we conservatively excluded any HCS individual with a first or second principal component above or below the maximum or minimum 1000 genomes European values for these eigenvectors. PCA was repeated in the filtered European ancestry HCS subset such that eigenvectors could be used as downstream covariates. Imputation to the Haplotype Reference Consortium panel involved a series of steps and additional data clean up, reference lift over to the hg19/GRCh37, and data submission to the Michigan imputation server, as specified in the submission guidelines (Loh et al., 2016; Das et al., 2016). Post-imputation quality control was as follows: imputation R2 >0.8, MAF >0.01, and missingness <0.02. We retained common variants (MAF >0.01) with high imputation quality (R2 >0.8).

Spirometry data from the HCS was then processed by selecting individuals with non-missing FEV1 and FVC. We utilised the maximum FEV1 and FVC from four attempts and fitted a linear model which covaried for sex, age, age2, height, height2, smoking status, self-reported asthma status, and self-reported bronchitis/emphysema status. The phenotype for association testing was residuals from these models transformed via inverse-rank normalisation (Blom transformation) using the RNOmni package. We tested the association between a genome-wide PGS for FEV1 and FVC (PT <1, 0.5, 0.05, 0.005, 5 × 10−5, 5 × 10−8) with their respective transformed spirometry indices adjusted for the first five SNP-derived principal components using PRSice v2.2.12. Similarly, the association between each of the PES profiles with an overrepresentation of FDA-approved drug targets and FEV1 and/or FVC was investigated using the same approach; however, we only constructed the PES at the p-value for which it demonstrated the strongest association signal after multiple testing correction in the GWAS. We further adjusted each of these models for genome-wide PGS at the same PT for which the PES was calculated.

The relationship between PES and mRNA expression

Request a detailed protocol

We obtained RNAseq normalised read counts (PEER normalised RPKM) for 23723 genes which survived QC in the Geuvadis dataset (https://www.ebi.ac.uk/arrayexpress/experiments/E-GEUV-1/files/analysis_results/?ref=E-GEUV-1). The Geuvadis project performed RNAseq on transformed LCL for participants in the 1000 genomes project (Lappalainen et al., 2013). We retained 357 European individuals in this dataset for which phase 3 sequencing data was available from the 1000 genomes. The association between normalised mRNA expression for genes part of the candidate gene-set and each PES was tested using a linear model, adjusted for sex, the first three SNP-derived principal components, and genome-wide PGS at the same PT utilised to calculate the PES. Multiple testing correction was applied for the number of genes in each set via the BH method using the p.adjust() function.

Transcriptome-wide association studies

Request a detailed protocol

A TWAS of each lung function measure was performed using the FUSION software (Gusev et al., 2016). SNP weights were derived for genes with a significant contribution of cis acting SNPs to expression variability (cis-h2p<0.01) using lung and whole blood RNAseq GTEx v7 data. A transcriptome-wide significant gene was defined by accounting for the number of genes with models of genetically regulated expression in lung and whole blood, respectively – lung: p<6.43×10−6(α = 0.05/7776); whole blood: p<8.32×10−6(α = 0.05/6007). A more conservative threshold could be applied which corrects for all models in both tissues (p<3.6×10−6); however, given the correlation between models and the discovery nature of this study, we chose the more liberal correction threshold. We excluded genes within the MHC region due to its LD complexity. Furthermore, we subjected two smoking behaviour phenotypes to TWAS to uncover associations which could be driven by residual effects of smoking. This is inherently conservative as it is possible that genes associated with both lung function and smoking behaviours could exhibit pleiotropic effects; however, as we wish to define drug targets relevant to lung function, the exclusion of these shared genes is warranted. The smoking phenotypes were ‘ever vs. never smoked’ and ‘cigarettes smoked per day’, and TWAS was performed using lung and blood for consistency, along with SNP weights from the dorsolateral prefrontal cortex and nucleus accumbens, as these brain regions have been implicated in nicotine addiction. Genes which survived the above were searched using DGIdb, with the following criteria utilised to define gene-target pairs, where the drug mode of action matched the sign of the TWAS Z value: (i) tier 1 – FDA-approved compound with at least two lines of evidence for interacting with the target gene; (ii) tier 2 – investigational compound (not FDA approved) with at least two lines of evidence for interacting with the target gene.

The TWAS Miami plots were generated using an edited version of the TWAS-plotter.V1.0.R script (https://github.com/opain/TWAS-plotter). A Bayesian method FOCUS was utilised to finemap the TWAS associations which could be therapeutically useful (tier 1 or 2) (Mancuso et al., 2019). Given observed TWAS statistics, the marginal posterior inclusion probability (PIP) was calculated and subsequently used to compute a credible set with 90% probability (ρ) of containing the causal gene (ci=1). As FOCUS allows the null model to be predicted as a possible member of the credible set, we excluded any genes for which that occurred. The credible set (S) was defined by summing normalised PIP such that ρ was exceeded, sorting the genes and then including those genes until at least ρ of the normalised-posterior mass is explained (Equation 2).

(2) S {Gene1, , Genek}=i=1kPIP (ci=1|ZTWAS)  ρ

The Bernoulli prior for each causal indicator was set as the default p = 1×10−3, with a default prior variance for effects at causal genes set as 40 (nσc2=40). Previous work has demonstrated that FOCUS-computed PIPs were robust to different specified prior variances (Mancuso et al., 2019); however, we further utilised a more conservative prior of p = 1×10−5 to assess the effect on the PIP calculated for candidate druggable genes. In all instances, we utilised a multi-tissue panel obtained from FOCUS GitHub repository which combines GTEx v7 SNP-weights with other FUSION TWAS weights (https://github.com/bogdanlab/focus/wiki, GTEx v7 with METSIM, CMC, YFS, and NTR). The marginal TWAS Z to use for finemapping for each locus was selected in the tissue for which the gene was found to be associated via the FUSION TWAS methodology (lung or blood), if available, otherwise by predictive accuracy (cross-validated R2).

Host–viral interactome data

Request a detailed protocol

We selected three respiratory viruses for which host–viral protein interaction data was previously published: SARS-CoV2, H1N1, and the HAdV family. The host–SARS-CoV2 interactome was defined using affinity-purification mass spectrometry (NGenes = 332, MiST score ≥0.7, a SAINTexpress BFDR ≤0.05) (Gordon et al., 2020). We selected 91 proteins which both interact with viral proteins expressed by influenza (mass spectrometry) and siRNA-mediated downregulation-reduced viral replication in cultured cells by at least three log10 units while retaining >80% cell viability (Watanabe et al., 2014). Finally, the HAdV–host interactome was defined using a protein microarray platform (NGenes = 24), which encompasses 20 viral proteins encoded by five HAdV species (Martinez-Martin et al., 2016). We investigated approved inhibitors or antagonists of these genes using DGIdb as described above in the PES candidate gene-set drug repurposing section. The sets of genes which interact with viral proteins for each virus (‘viral prey proteins’) were subjected to overrepresentation analysis using the GENE2FUNC function of FUMA (Watanabe et al., 2017). We selected gene-sets which survived multiple testing correction (q < 0.05), which contained at least one of the following key terms related to glycaemic biology: glucose, insulin, diabetes, or glucagon. Further, we investigated whether there was a significant overrepresentation of interactions amongst these viral prey proteins overlapping a glycaemic pathway using STRING v11.0 (Szklarczyk et al., 2019). We assembled a list of antidiabetic drug targets by searching compounds annotated with the level 2 ATC code A10 (drugs used in diabetes) in DGIdb, retaining drug–gene interactions with two or more lines of evidence. The interactions between these drug target proteins and the glycaemic SARS-CoV2 prey proteins were investigated once more using STRING, with only interactions scoring >0.75 considered.

Data availability

All data are publicly available from the references described in the manuscript. Code related to this study can be found at the following link: https://github.com/Williamreay/Lung_function_drug_repurposing_manuscript copy archived at https://archive.softwareheritage.org/swh:1:rev:01aef11a0cc0c7f897c9497126d2ce454108eff1/.

The following previously published data sets were used

References

    1. Kunkle BW
    2. Grenier-Boley B
    3. Sims R
    4. Bis JC
    5. Damotte V
    6. Naj AC
    7. Boland A
    8. Vronskaya M
    9. van der Lee SJ
    10. Amlie-Wolf A
    11. Bellenguez C
    12. Frizatti A
    13. Chouraki V
    14. Martin ER
    15. Sleegers K
    16. Badarinarayan N
    17. Jakobsdottir J
    18. Hamilton-Nelson KL
    19. Moreno-Grau S
    20. Olaso R
    21. Raybould R
    22. Chen Y
    23. Kuzma AB
    24. Hiltunen M
    25. Morgan T
    26. Ahmad S
    27. Vardarajan BN
    28. Epelbaum J
    29. Hoffmann P
    30. Boada M
    31. Beecham GW
    32. Garnier JG
    33. Harold D
    34. Fitzpatrick AL
    35. Valladares O
    36. Moutet ML
    37. Gerrish A
    38. Smith AV
    39. Qu L
    40. Bacq D
    41. Denning N
    42. Jian X
    43. Zhao Y
    44. Del Zompo M
    45. Fox NC
    46. Choi SH
    47. Mateo I
    48. Hughes JT
    49. Adams HH
    50. Malamon J
    51. Sanchez-Garcia F
    52. Patel Y
    53. Brody JA
    54. Dombroski BA
    55. Naranjo MCD
    56. Daniilidou M
    57. Eiriksdottir G
    58. Mukherjee S
    59. Wallon D
    60. Uphill J
    61. Aspelund T
    62. Cantwell LB
    63. Garzia F
    64. Galimberti D
    65. Hofer E
    66. Butkiewicz M
    67. Fin B
    68. Scarpini E
    69. Sarnowski C
    70. Bush WS
    71. Meslage S
    72. Kornhuber J
    73. White CC
    74. Song Y
    75. Barber RC
    76. Engelborghs S
    77. Sordon S
    78. Voijnovic D
    79. Adams PM
    80. Vandenberghe R
    81. Mayhaus M
    82. Cupples LA
    83. Albert MS
    84. De Deyn PP
    85. Gu W
    86. Himali JJ
    87. Beekly D
    88. Squassina A
    89. Hartmann AM
    90. Orellana A
    91. Blacker D
    92. Rodriguez-Rodriguez E
    93. Lovestone S
    94. Garcia ME
    95. Doody RS
    96. Munoz-Fernadez C
    97. Sussams R
    98. Lin H
    99. Fairchild TJ
    100. Benito YA
    101. Holmes C
    102. Karamujić-Čomić H
    103. Frosch MP
    104. Thonberg H
    105. Maier W
    106. Roshchupkin G
    107. Ghetti B
    108. Giedraitis V
    109. Kawalia A
    110. Li S
    111. Huebinger RM
    112. Kilander L
    113. Moebus S
    114. Hernández I
    115. Kamboh MI
    116. Brundin R
    117. Turton J
    118. Yang Q
    119. Katz MJ
    120. Concari L
    121. Lord J
    122. Beiser AS
    123. Keene CD
    124. Helisalmi S
    125. Kloszewska I
    126. Kukull WA
    127. Koivisto AM
    128. Lynch A
    129. Tarraga L
    130. Larson EB
    131. Haapasalo A
    132. Lawlor B
    133. Mosley TH
    134. Lipton RB
    135. Solfrizzi V
    136. Gill M
    137. Longstreth WT
    138. Montine TJ
    139. Frisardi V
    140. Diez-Fairen M
    141. Rivadeneira F
    142. Petersen RC
    143. Deramecourt V
    144. Alvarez I
    145. Salani F
    146. Ciaramella A
    147. Boerwinkle E
    148. Reiman EM
    149. Fievet N
    150. Rotter JI
    151. Reisch JS
    152. Hanon O
    153. Cupidi C
    154. Andre Uitterlinden AG
    155. Royall DR
    156. Dufouil C
    157. Maletta RG
    158. de Rojas I
    159. Sano M
    160. Brice A
    161. Cecchetti R
    162. George-Hyslop PS
    163. Ritchie K
    164. Tsolaki M
    165. Tsuang DW
    166. Dubois B
    167. Craig D
    168. Wu CK
    169. Soininen H
    170. Avramidou D
    171. Albin RL
    172. Fratiglioni L
    173. Germanou A
    174. Apostolova LG
    175. Keller L
    176. Koutroumani M
    177. Arnold SE
    178. Panza F
    179. Gkatzima O
    180. Asthana S
    181. Hannequin D
    182. Whitehead P
    183. Atwood CS
    184. Caffarra P
    185. Hampel H
    186. Quintela I
    187. Carracedo Á
    188. Lannfelt L
    189. Rubinsztein DC
    190. Barnes LL
    191. Pasquier F
    192. Frölich L
    193. Barral S
    194. McGuinness B
    195. Beach TG
    196. Johnston JA
    197. Becker JT
    198. Passmore P
    199. Bigio EH
    200. Schott JM
    201. Bird TD
    202. Warren JD
    203. Boeve BF
    204. Lupton MK
    205. Bowen JD
    206. Proitsi P
    207. Boxer A
    208. Powell JF
    209. Burke JR
    210. Kauwe JSK
    211. Burns JM
    212. Mancuso M
    213. Buxbaum JD
    214. Bonuccelli U
    215. Cairns NJ
    216. McQuillin A
    217. Cao C
    218. Livingston G
    219. Carlson CS
    220. Bass NJ
    221. Carlsson CM
    222. Hardy J
    223. Carney RM
    224. Bras J
    225. Carrasquillo MM
    226. Guerreiro R
    227. Allen M
    228. Chui HC
    229. Fisher E
    230. Masullo C
    231. Crocco EA
    232. DeCarli C
    233. Bisceglio G
    234. Dick M
    235. Ma L
    236. Duara R
    237. Graff-Radford NR
    238. Evans DA
    239. Hodges A
    240. Faber KM
    241. Scherer M
    242. Fallon KB
    243. Riemenschneider M
    244. Fardo DW
    245. Heun R
    246. Farlow MR
    247. Kölsch H
    248. Ferris S
    249. Leber M
    250. Foroud TM
    251. Heuser I
    252. Galasko DR
    253. Giegling I
    254. Gearing M
    255. Hüll M
    256. Geschwind DH
    257. Gilbert JR
    258. Morris J
    259. Green RC
    260. Mayo K
    261. Growdon JH
    262. Feulner T
    263. Hamilton RL
    264. Harrell LE
    265. Drichel D
    266. Honig LS
    267. Cushion TD
    268. Huentelman MJ
    269. Hollingworth P
    270. Hulette CM
    271. Hyman BT
    272. Marshall R
    273. Jarvik GP
    274. Meggy A
    275. Abner E
    276. Menzies GE
    277. Jin LW
    278. Leonenko G
    279. Real LM
    280. Jun GR
    281. Baldwin CT
    282. Grozeva D
    283. Karydas A
    284. Russo G
    285. Kaye JA
    286. Kim R
    287. Jessen F
    288. Kowall NW
    289. Vellas B
    290. Kramer JH
    291. Vardy E
    292. LaFerla FM
    293. Jöckel KH
    294. Lah JJ
    295. Dichgans M
    296. Leverenz JB
    297. Mann D
    298. Levey AI
    299. Pickering-Brown S
    300. Lieberman AP
    301. Klopp N
    302. Lunetta KL
    303. Wichmann HE
    304. Lyketsos CG
    305. Morgan K
    306. Marson DC
    307. Brown K
    308. Martiniuk F
    309. Medway C
    310. Mash DC
    311. Nöthen MM
    312. Masliah E
    313. Hooper NM
    314. McCormick WC
    315. Daniele A
    316. McCurry SM
    317. Bayer A
    318. McDavid AN
    319. Gallacher J
    320. McKee AC
    321. van den Bussche H
    322. Mesulam M
    323. Brayne C
    324. Miller BL
    325. Riedel-Heller S
    326. Miller CA
    327. Miller JW
    328. Al-Chalabi A
    329. Morris JC
    330. Shaw CE
    331. Myers AJ
    332. Wiltfang J
    333. O'Bryant S
    334. Olichney JM
    335. Alvarez V
    336. Parisi JE
    337. Singleton AB
    338. Paulson HL
    339. Collinge J
    340. Perry WR
    341. Mead S
    342. Peskind E
    343. Cribbs DH
    344. Rossor M
    345. Pierce A
    346. Ryan NS
    347. Poon WW
    348. Nacmias B
    349. Potter H
    350. Sorbi S
    351. Quinn JF
    352. Sacchinelli E
    353. Raj A
    354. Spalletta G
    355. Raskind M
    356. Caltagirone C
    357. Bossù P
    358. Orfei MD
    359. Reisberg B
    360. Clarke R
    361. Reitz C
    362. Smith AD
    363. Ringman JM
    364. Warden D
    365. Roberson ED
    366. Wilcock G
    367. Rogaeva E
    368. Bruni AC
    369. Rosen HJ
    370. Gallo M
    371. Rosenberg RN
    372. Ben-Shlomo Y
    373. Sager MA
    374. Mecocci P
    375. Saykin AJ
    376. Pastor P
    377. Cuccaro ML
    378. Vance JM
    379. Schneider JA
    380. Schneider LS
    381. Slifer S
    382. Seeley WW
    383. Smith AG
    384. Sonnen JA
    385. Spina S
    386. Stern RA
    387. Swerdlow RH
    388. Tang M
    389. Tanzi RE
    390. Trojanowski JQ
    391. Troncoso JC
    392. Van Deerlin VM
    393. Van Eldik LJ
    394. Vinters HV
    395. Vonsattel JP
    396. Weintraub S
    397. Welsh-Bohmer KA
    398. Wilhelmsen KC
    399. Williamson J
    400. Wingo TS
    401. Woltjer RL
    402. Wright CB
    403. Yu CE
    404. Yu L
    405. Saba Y
    406. Pilotto A
    407. Bullido MJ
    408. Peters O
    409. Crane PK
    410. Bennett D
    411. Bosco P
    412. Coto E
    413. Boccardi V
    414. De Jager PL
    415. Lleo A
    416. Warner N
    417. Lopez OL
    418. Ingelsson M
    419. Deloukas P
    420. Cruchaga C
    421. Graff C
    422. Gwilliam R
    423. Fornage M
    424. Goate AM
    425. Sanchez-Juan P
    426. Kehoe PG
    427. Amin N
    428. Ertekin-Taner N
    429. Berr C
    430. Debette S
    431. Love S
    432. Launer LJ
    433. Younkin SG
    434. Dartigues JF
    435. Corcoran C
    436. Ikram MA
    437. Dickson DW
    438. Nicolas G
    439. Campion D
    440. Tschanz J
    441. Schmidt H
    442. Hakonarson H
    443. Clarimon J
    444. Munger R
    445. Schmidt R
    446. Farrer LA
    447. Van Broeckhoven C
    448. C O'Donovan M
    449. DeStefano AL
    450. Jones L
    451. Haines JL
    452. Deleuze JF
    453. Owen MJ
    454. Gudnason V
    455. Mayeux R
    456. Escott-Price V
    457. Psaty BM
    458. Ramirez A
    459. Wang LS
    460. Ruiz A
    461. van Duijn CM
    462. Holmans PA
    463. Seshadri S
    464. Williams J
    465. Amouyel P
    466. Schellenberg GD
    467. Lambert JC
    468. Pericak-Vance MA
    469. Alzheimer Disease Genetics Consortium (ADGC), European Alzheimer’s Disease Initiative (EADI), Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium (CHARGE), Genetic and Environmental Risk in AD/Defining Genetic, Polygenic and Environmental Risk for Alzheimer’s Disease Consortium (GERAD/PERADES),
    (2019) Genetic meta-analysis of diagnosed Alzheimer's disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing
    Nature Genetics 51:414–430.
    https://doi.org/10.1038/s41588-019-0358-2
    1. Liu M
    2. Jiang Y
    3. Wedow R
    4. Li Y
    5. Brazel DM
    6. Chen F
    7. Datta G
    8. Davila-Velderrain J
    9. McGuire D
    10. Tian C
    11. Zhan X
    12. Choquet H
    13. Docherty AR
    14. Faul JD
    15. Foerster JR
    16. Fritsche LG
    17. Gabrielsen ME
    18. Gordon SD
    19. Haessler J
    20. Hottenga JJ
    21. Huang H
    22. Jang SK
    23. Jansen PR
    24. Ling Y
    25. Mägi R
    26. Matoba N
    27. McMahon G
    28. Mulas A
    29. Orrù V
    30. Palviainen T
    31. Pandit A
    32. Reginsson GW
    33. Skogholt AH
    34. Smith JA
    35. Taylor AE
    36. Turman C
    37. Willemsen G
    38. Young H
    39. Young KA
    40. Zajac GJM
    41. Zhao W
    42. Zhou W
    43. Bjornsdottir G
    44. Boardman JD
    45. Boehnke M
    46. Boomsma DI
    47. Chen C
    48. Cucca F
    49. Davies GE
    50. Eaton CB
    51. Ehringer MA
    52. Esko T
    53. Fiorillo E
    54. Gillespie NA
    55. Gudbjartsson DF
    56. Haller T
    57. Harris KM
    58. Heath AC
    59. Hewitt JK
    60. Hickie IB
    61. Hokanson JE
    62. Hopfer CJ
    63. Hunter DJ
    64. Iacono WG
    65. Johnson EO
    66. Kamatani Y
    67. Kardia SLR
    68. Keller MC
    69. Kellis M
    70. Kooperberg C
    71. Kraft P
    72. Krauter KS
    73. Laakso M
    74. Lind PA
    75. Loukola A
    76. Lutz SM
    77. Madden PAF
    78. Martin NG
    79. McGue M
    80. McQueen MB
    81. Medland SE
    82. Metspalu A
    83. Mohlke KL
    84. Nielsen JB
    85. Okada Y
    86. Peters U
    87. Polderman TJC
    88. Posthuma D
    89. Reiner AP
    90. Rice JP
    91. Rimm E
    92. Rose RJ
    93. Runarsdottir V
    94. Stallings MC
    95. Stančáková A
    96. Stefansson H
    97. Thai KK
    98. Tindle HA
    99. Tyrfingsson T
    100. Wall TL
    101. Weir DR
    102. Weisner C
    103. Whitfield JB
    104. Winsvold BS
    105. Yin J
    106. Zuccolo L
    107. Bierut LJ
    108. Hveem K
    109. Lee JJ
    110. Munafò MR
    111. Saccone NL
    112. Willer CJ
    113. Cornelis MC
    114. David SP
    115. Hinds DA
    116. Jorgenson E
    117. Kaprio J
    118. Stitzel JA
    119. Stefansson K
    120. Thorgeirsson TE
    121. Abecasis G
    122. Liu DJ
    123. Vrieze S
    124. 23andMe Research Team
    125. HUNT All-In Psychiatry
    (2019) Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use
    Nature Genetics 51:237–244.
    https://doi.org/10.1038/s41588-018-0307-5
    1. Manning AK
    2. Hivert MF
    3. Scott RA
    4. Grimsby JL
    5. Bouatia-Naji N
    6. Chen H
    7. Rybin D
    8. Liu CT
    9. Bielak LF
    10. Prokopenko I
    11. Amin N
    12. Barnes D
    13. Cadby G
    14. Hottenga JJ
    15. Ingelsson E
    16. Jackson AU
    17. Johnson T
    18. Kanoni S
    19. Ladenvall C
    20. Lagou V
    21. Lahti J
    22. Lecoeur C
    23. Liu Y
    24. Martinez-Larrad MT
    25. Montasser ME
    26. Navarro P
    27. Perry JR
    28. Rasmussen-Torvik LJ
    29. Salo P
    30. Sattar N
    31. Shungin D
    32. Strawbridge RJ
    33. Tanaka T
    34. van Duijn CM
    35. An P
    36. de Andrade M
    37. Andrews JS
    38. Aspelund T
    39. Atalay M
    40. Aulchenko Y
    41. Balkau B
    42. Bandinelli S
    43. Beckmann JS
    44. Beilby JP
    45. Bellis C
    46. Bergman RN
    47. Blangero J
    48. Boban M
    49. Boehnke M
    50. Boerwinkle E
    51. Bonnycastle LL
    52. Boomsma DI
    53. Borecki IB
    54. Böttcher Y
    55. Bouchard C
    56. Brunner E
    57. Budimir D
    58. Campbell H
    59. Carlson O
    60. Chines PS
    61. Clarke R
    62. Collins FS
    63. Corbatón-Anchuelo A
    64. Couper D
    65. de Faire U
    66. Dedoussis GV
    67. Deloukas P
    68. Dimitriou M
    69. Egan JM
    70. Eiriksdottir G
    71. Erdos MR
    72. Eriksson JG
    73. Eury E
    74. Ferrucci L
    75. Ford I
    76. Forouhi NG
    77. Fox CS
    78. Franzosi MG
    79. Franks PW
    80. Frayling TM
    81. Froguel P
    82. Galan P
    83. de Geus E
    84. Gigante B
    85. Glazer NL
    86. Goel A
    87. Groop L
    88. Gudnason V
    89. Hallmans G
    90. Hamsten A
    91. Hansson O
    92. Harris TB
    93. Hayward C
    94. Heath S
    95. Hercberg S
    96. Hicks AA
    97. Hingorani A
    98. Hofman A
    99. Hui J
    100. Hung J
    101. Jarvelin MR
    102. Jhun MA
    103. Johnson PC
    104. Jukema JW
    105. Jula A
    106. Kao WH
    107. Kaprio J
    108. Kardia SL
    109. Keinanen-Kiukaanniemi S
    110. Kivimaki M
    111. Kolcic I
    112. Kovacs P
    113. Kumari M
    114. Kuusisto J
    115. Kyvik KO
    116. Laakso M
    117. Lakka T
    118. Lannfelt L
    119. Lathrop GM
    120. Launer LJ
    121. Leander K
    122. Li G
    123. Lind L
    124. Lindstrom J
    125. Lobbens S
    126. Loos RJ
    127. Luan J
    128. Lyssenko V
    129. Mägi R
    130. Magnusson PK
    131. Marmot M
    132. Meneton P
    133. Mohlke KL
    134. Mooser V
    135. Morken MA
    136. Miljkovic I
    137. Narisu N
    138. O'Connell J
    139. Ong KK
    140. Oostra BA
    141. Palmer LJ
    142. Palotie A
    143. Pankow JS
    144. Peden JF
    145. Pedersen NL
    146. Pehlic M
    147. Peltonen L
    148. Penninx B
    149. Pericic M
    150. Perola M
    151. Perusse L
    152. Peyser PA
    153. Polasek O
    154. Pramstaller PP
    155. Province MA
    156. Räikkönen K
    157. Rauramaa R
    158. Rehnberg E
    159. Rice K
    160. Rotter JI
    161. Rudan I
    162. Ruokonen A
    163. Saaristo T
    164. Sabater-Lleal M
    165. Salomaa V
    166. Savage DB
    167. Saxena R
    168. Schwarz P
    169. Seedorf U
    170. Sennblad B
    171. Serrano-Rios M
    172. Shuldiner AR
    173. Sijbrands EJ
    174. Siscovick DS
    175. Smit JH
    176. Small KS
    177. Smith NL
    178. Smith AV
    179. Stančáková A
    180. Stirrups K
    181. Stumvoll M
    182. Sun YV
    183. Swift AJ
    184. Tönjes A
    185. Tuomilehto J
    186. Trompet S
    187. Uitterlinden AG
    188. Uusitupa M
    189. Vikström M
    190. Vitart V
    191. Vohl MC
    192. Voight BF
    193. Vollenweider P
    194. Waeber G
    195. Waterworth DM
    196. Watkins H
    197. Wheeler E
    198. Widen E
    199. Wild SH
    200. Willems SM
    201. Willemsen G
    202. Wilson JF
    203. Witteman JC
    204. Wright AF
    205. Yaghootkar H
    206. Zelenika D
    207. Zemunik T
    208. Zgaga L
    209. Wareham NJ
    210. McCarthy MI
    211. Barroso I
    212. Watanabe RM
    213. Florez JC
    214. Dupuis J
    215. Meigs JB
    216. Langenberg C
    217. DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium
    218. Multiple Tissue Human Expression Resource (MUTHER) Consortium
    (2012) A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance
    Nature Genetics 44:659–669.
    https://doi.org/10.1038/ng.2274
    1. Scott RA
    2. Lagou V
    3. Welch RP
    4. Wheeler E
    5. Montasser ME
    6. Luan J
    7. Mägi R
    8. Strawbridge RJ
    9. Rehnberg E
    10. Gustafsson S
    11. Kanoni S
    12. Rasmussen-Torvik LJ
    13. Yengo L
    14. Lecoeur C
    15. Shungin D
    16. Sanna S
    17. Sidore C
    18. Johnson PC
    19. Jukema JW
    20. Johnson T
    21. Mahajan A
    22. Verweij N
    23. Thorleifsson G
    24. Hottenga JJ
    25. Shah S
    26. Smith AV
    27. Sennblad B
    28. Gieger C
    29. Salo P
    30. Perola M
    31. Timpson NJ
    32. Evans DM
    33. Pourcain BS
    34. Wu Y
    35. Andrews JS
    36. Hui J
    37. Bielak LF
    38. Zhao W
    39. Horikoshi M
    40. Navarro P
    41. Isaacs A
    42. O'Connell JR
    43. Stirrups K
    44. Vitart V
    45. Hayward C
    46. Esko T
    47. Mihailov E
    48. Fraser RM
    49. Fall T
    50. Voight BF
    51. Raychaudhuri S
    52. Chen H
    53. Lindgren CM
    54. Morris AP
    55. Rayner NW
    56. Robertson N
    57. Rybin D
    58. Liu CT
    59. Beckmann JS
    60. Willems SM
    61. Chines PS
    62. Jackson AU
    63. Kang HM
    64. Stringham HM
    65. Song K
    66. Tanaka T
    67. Peden JF
    68. Goel A
    69. Hicks AA
    70. An P
    71. Müller-Nurasyid M
    72. Franco-Cereceda A
    73. Folkersen L
    74. Marullo L
    75. Jansen H
    76. Oldehinkel AJ
    77. Bruinenberg M
    78. Pankow JS
    79. North KE
    80. Forouhi NG
    81. Loos RJ
    82. Edkins S
    83. Varga TV
    84. Hallmans G
    85. Oksa H
    86. Antonella M
    87. Nagaraja R
    88. Trompet S
    89. Ford I
    90. Bakker SJ
    91. Kong A
    92. Kumari M
    93. Gigante B
    94. Herder C
    95. Munroe PB
    96. Caulfield M
    97. Antti J
    98. Mangino M
    99. Small K
    100. Miljkovic I
    101. Liu Y
    102. Atalay M
    103. Kiess W
    104. James AL
    105. Rivadeneira F
    106. Uitterlinden AG
    107. Palmer CN
    108. Doney AS
    109. Willemsen G
    110. Smit JH
    111. Campbell S
    112. Polasek O
    113. Bonnycastle LL
    114. Hercberg S
    115. Dimitriou M
    116. Bolton JL
    117. Fowkes GR
    118. Kovacs P
    119. Lindström J
    120. Zemunik T
    121. Bandinelli S
    122. Wild SH
    123. Basart HV
    124. Rathmann W
    125. Grallert H
    126. Maerz W
    127. Kleber ME
    128. Boehm BO
    129. Peters A
    130. Pramstaller PP
    131. Province MA
    132. Borecki IB
    133. Hastie ND
    134. Rudan I
    135. Campbell H
    136. Watkins H
    137. Farrall M
    138. Stumvoll M
    139. Ferrucci L
    140. Waterworth DM
    141. Bergman RN
    142. Collins FS
    143. Tuomilehto J
    144. Watanabe RM
    145. de Geus EJ
    146. Penninx BW
    147. Hofman A
    148. Oostra BA
    149. Psaty BM
    150. Vollenweider P
    151. Wilson JF
    152. Wright AF
    153. Hovingh GK
    154. Metspalu A
    155. Uusitupa M
    156. Magnusson PK
    157. Kyvik KO
    158. Kaprio J
    159. Price JF
    160. Dedoussis GV
    161. Deloukas P
    162. Meneton P
    163. Lind L
    164. Boehnke M
    165. Shuldiner AR
    166. van Duijn CM
    167. Morris AD
    168. Toenjes A
    169. Peyser PA
    170. Beilby JP
    171. Körner A
    172. Kuusisto J
    173. Laakso M
    174. Bornstein SR
    175. Schwarz PE
    176. Lakka TA
    177. Rauramaa R
    178. Adair LS
    179. Smith GD
    180. Spector TD
    181. Illig T
    182. de Faire U
    183. Hamsten A
    184. Gudnason V
    185. Kivimaki M
    186. Hingorani A
    187. Keinanen-Kiukaanniemi SM
    188. Saaristo TE
    189. Boomsma DI
    190. Stefansson K
    191. van der Harst P
    192. Dupuis J
    193. Pedersen NL
    194. Sattar N
    195. Harris TB
    196. Cucca F
    197. Ripatti S
    198. Salomaa V
    199. Mohlke KL
    200. Balkau B
    201. Froguel P
    202. Pouta A
    203. Jarvelin MR
    204. Wareham NJ
    205. Bouatia-Naji N
    206. McCarthy MI
    207. Franks PW
    208. Meigs JB
    209. Teslovich TM
    210. Florez JC
    211. Langenberg C
    212. Ingelsson E
    213. Prokopenko I
    214. Barroso I
    215. DIAbetes Genetics Replication and Meta-analysis (DIAGRAM) Consortium
    (2012) Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways
    Nature Genetics 44:991–1005.
    https://doi.org/10.1038/ng.2385
    1. van den Borst B
    2. Gosker HR
    3. Zeegers MP
    4. Schols A
    (2010)
    Pulmonary function in diabetes: a metaanalysis
    Chest 138:393–406.
    1. Wray NR
    2. Ripke S
    3. Mattheisen M
    4. Trzaskowski M
    5. Byrne EM
    6. Abdellaoui A
    7. Adams MJ
    8. Agerbo E
    9. Air TM
    10. Andlauer TMF
    11. Bacanu SA
    12. Bækvad-Hansen M
    13. Beekman AFT
    14. Bigdeli TB
    15. Binder EB
    16. Blackwood DRH
    17. Bryois J
    18. Buttenschøn HN
    19. Bybjerg-Grauholm J
    20. Cai N
    21. Castelao E
    22. Christensen JH
    23. Clarke TK
    24. Coleman JIR
    25. Colodro-Conde L
    26. Couvy-Duchesne B
    27. Craddock N
    28. Crawford GE
    29. Crowley CA
    30. Dashti HS
    31. Davies G
    32. Deary IJ
    33. Degenhardt F
    34. Derks EM
    35. Direk N
    36. Dolan CV
    37. Dunn EC
    38. Eley TC
    39. Eriksson N
    40. Escott-Price V
    41. Kiadeh FHF
    42. Finucane HK
    43. Forstner AJ
    44. Frank J
    45. Gaspar HA
    46. Gill M
    47. Giusti-Rodríguez P
    48. Goes FS
    49. Gordon SD
    50. Grove J
    51. Hall LS
    52. Hannon E
    53. Hansen CS
    54. Hansen TF
    55. Herms S
    56. Hickie IB
    57. Hoffmann P
    58. Homuth G
    59. Horn C
    60. Hottenga JJ
    61. Hougaard DM
    62. Hu M
    63. Hyde CL
    64. Ising M
    65. Jansen R
    66. Jin F
    67. Jorgenson E
    68. Knowles JA
    69. Kohane IS
    70. Kraft J
    71. Kretzschmar WW
    72. Krogh J
    73. Kutalik Z
    74. Lane JM
    75. Li Y
    76. Li Y
    77. Lind PA
    78. Liu X
    79. Lu L
    80. MacIntyre DJ
    81. MacKinnon DF
    82. Maier RM
    83. Maier W
    84. Marchini J
    85. Mbarek H
    86. McGrath P
    87. McGuffin P
    88. Medland SE
    89. Mehta D
    90. Middeldorp CM
    91. Mihailov E
    92. Milaneschi Y
    93. Milani L
    94. Mill J
    95. Mondimore FM
    96. Montgomery GW
    97. Mostafavi S
    98. Mullins N
    99. Nauck M
    100. Ng B
    101. Nivard MG
    102. Nyholt DR
    103. O'Reilly PF
    104. Oskarsson H
    105. Owen MJ
    106. Painter JN
    107. Pedersen CB
    108. Pedersen MG
    109. Peterson RE
    110. Pettersson E
    111. Peyrot WJ
    112. Pistis G
    113. Posthuma D
    114. Purcell SM
    115. Quiroz JA
    116. Qvist P
    117. Rice JP
    118. Riley BP
    119. Rivera M
    120. Saeed Mirza S
    121. Saxena R
    122. Schoevers R
    123. Schulte EC
    124. Shen L
    125. Shi J
    126. Shyn SI
    127. Sigurdsson E
    128. Sinnamon GBC
    129. Smit JH
    130. Smith DJ
    131. Stefansson H
    132. Steinberg S
    133. Stockmeier CA
    134. Streit F
    135. Strohmaier J
    136. Tansey KE
    137. Teismann H
    138. Teumer A
    139. Thompson W
    140. Thomson PA
    141. Thorgeirsson TE
    142. Tian C
    143. Traylor M
    144. Treutlein J
    145. Trubetskoy V
    146. Uitterlinden AG
    147. Umbricht D
    148. Van der Auwera S
    149. van Hemert AM
    150. Viktorin A
    151. Visscher PM
    152. Wang Y
    153. Webb BT
    154. Weinsheimer SM
    155. Wellmann J
    156. Willemsen G
    157. Witt SH
    158. Wu Y
    159. Xi HS
    160. Yang J
    161. Zhang F
    162. Arolt V
    163. Baune BT
    164. Berger K
    165. Boomsma DI
    166. Cichon S
    167. Dannlowski U
    168. de Geus ECJ
    169. DePaulo JR
    170. Domenici E
    171. Domschke K
    172. Esko T
    173. Grabe HJ
    174. Hamilton SP
    175. Hayward C
    176. Heath AC
    177. Hinds DA
    178. Kendler KS
    179. Kloiber S
    180. Lewis G
    181. Li QS
    182. Lucae S
    183. Madden PFA
    184. Magnusson PK
    185. Martin NG
    186. McIntosh AM
    187. Metspalu A
    188. Mors O
    189. Mortensen PB
    190. Müller-Myhsok B
    191. Nordentoft M
    192. Nöthen MM
    193. O'Donovan MC
    194. Paciga SA
    195. Pedersen NL
    196. Penninx B
    197. Perlis RH
    198. Porteous DJ
    199. Potash JB
    200. Preisig M
    201. Rietschel M
    202. Schaefer C
    203. Schulze TG
    204. Smoller JW
    205. Stefansson K
    206. Tiemeier H
    207. Uher R
    208. Völzke H
    209. Weissman MM
    210. Werge T
    211. Winslow AR
    212. Lewis CM
    213. Levinson DF
    214. Breen G
    215. Børglum AD
    216. Sullivan PF
    217. eQTLGen
    218. 23andMe
    219. Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium
    (2018) Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression
    Nature Genetics 50:668–681.
    https://doi.org/10.1038/s41588-018-0090-3

Decision letter

  1. Chris P Ponting
    Reviewing Editor; University of Edinburgh, United Kingdom
  2. David E James
    Senior Editor; The University of Sydney, Australia
  3. Bogdan Pasanuic
    Reviewer; UCLA, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This paper is of interest to researchers seeking to use genetics in order to reposition drugs that improve lung function. The work highlights biochemical traits that could be targeted to modulate lung function. The analyses have been performed to a high level, with some of the most interesting and novel results being of modest statistical significance.

Decision letter after peer review:

[Editors’ note: the authors submitted for reconsideration following the decision after peer review. What follows is the decision letter after the first round of review.]

Thank you for submitting your work entitled "Genetic association and causal inference converge on hyperglycaemia as a modifiable risk factor for respiratory disease" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Bogdan Pasanuic (Reviewer #3).

Our decision has been reached after consultation among all the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife.

Reviewers considered that the manuscript's focus on rapidly repositioning drugs to improve lung function is timely. They, however, remained unconvinced by some of the main claims and their novelty, and expressed concerns regarding the robustness of results that are based on border-line statistics. Replication and/or validation of the main results might have helped to convince the reviewers. Further consideration of other explanatory variables would also have provided greater robustness to the claims of causality.

Reviewer #1:

This paper seeks to use genetics to reposition drugs to improve lung function. A new pipeline is used to connect together several recent genetics methods: filtering traits on genetic correlations; refining with causality tests (LCV+MR); and then testing lung function and gene expression with PES and TWAS.

Overall, I am quite positive on the goals and core ideas in this paper. The focus on quickly repositioning drug to improve lung function is timely. However, the most interesting and novel results are statistically borderline. Methodologically, I do not see anything new in the paper.

1) There is not much evidence that the PES have enriched signal for lung function (Table 3). The primary PES analyses do not adjust for the overall PRS, hence do not establish Pharmagenic Enrichment, a PRS built on a random subset of SNPs would be expected to have a nonzero effect. The authors recognize this and perform secondary analyses conditional on the PRS to test for enrichment; however, this analysis has null results (Bonferroni-adjusted across 8 tests gives minimum p>.2).

(1a) I don't see any explanation of the permutation test used here, but the details wrt multiple PES thresholds and whether covariates and PRS are permuted as well as the PES are essential and can significantly change results.

2) The PYGB finding is very nice, but was previously found in a similar analysis (in the paper producing the summary stats used here, see Table 1 in Shrine et al., 2019, PMC6397078). This paper also identified the TGF-β superfamily signalling pathway. The observation that this gene can be putatively targeted by Sivelestat is novel (as far as I know) and potentially very exciting, however, this is not discussed much, and no validation is given for the gene-drug interaction, and no explanation is given to relate neutrophil elastase to glucose.

3) I believe the covid analysis assesses only glycemic pathways (Table 4), hence it is hard to evaluate whether the “prey proteins” are more enriched in glycemic pathways than in any other biologically meaningful pathways (further, in the Discussion, it is said that these genes are very pleiotropic). In the future, I think this analysis could be strengthened by testing the PES (or ordinary PRS) against measurements of these proteins in healthy samples, which would demonstrate the link from druggable (or general) glucose biology to the covid-relevant proteins. However, nontrivial effort would be required to integrate such pQTL summary statistics, though I believe such datasets are freely available..

(4a) The LCV paper recommends considering only tests with |GCP|>.6, this rules out the LCV test for FEV1-glucose, FEV1/FVC-HDL, or FVC-leptin. If there is a reason to deviate from the recommended practice, it should be explained.

(4b) Likewise, the MR analyses have only a very weak statistical signal (p=.02,.03): 1 this doesn't survive correction for testing two phenotypes (not to mention the implicit tests prioritized by rhog that were discarded based on LCV); 2 the LCV paper proves these tests are susceptible to inflation by genetic correlation; 3 I do not agree that horizontal pleiotropy has been ruled out, a priori it seems almost certain that many heritable traits (BMI, smoking, diet, exercise,.…) will causally effect both glucose and lung function, to some extent, and moreover you do show that AMT has near-significant TWAS effects on both smoking and glucose.

Reviewer #2:

In this study, Reay et al. used publicly available GWAS data with regards to lung function and biochemical traits to identify molecular mechanisms capable of improving lung function. There are several comments and questions:

1) Reay et al. identified multiple biochemical traits that could be targeted in order to modulate lung function including fasting glucose and fasting insulin levels as well as other glycaemic related pathways and traits. Of these traits they also identify four gene-sets overrepresented with proteins that interact with viral SARS-CoV2 proteins. However as mentioned by Reay et al., previous studies have already found glycaemic control in the form of diabetes to have an effect on both lung function [Klein et al. Diabet Med. 2010] as well as Covid-19 risk and severity [Yang et al., Int J Infect Dis. 2020 94:91-95.]. The results presented here are not groundbreaking on their own.

2) In addition, the authors have employed several statistical methods using existing datasets and performed a comprehensive analysis. However, all of the approaches are from literature, which limits the novelty of this study.

3) The benefit of using the framework proposed by Reay et al. is that it identifies potential new uses for existing drugs through the biochemical traits they modulate. This however means that the potential discoveries regarding drug repurposing are limited to only those compounds with known biochemical effects. Another limitation is the use of genetics data exclusively and not integrating more layers of information that might identify causal traits for any given disease. Various other approaches have extensively been reported before (e.g. Pushpakom et al. Nat Rev Drug Discov. 2019) which creates the question as to how this methodology can be edited in order to maximize the possible findings.

4) The authors claimed that "The correlation between the expression of genes within each pathway encompassed by the PES and the PES profiles themselves could provide further support for their biological impact". This is true when the expression data of the genes come from the relevant tissues. Here, the authors focus on lung function but performed "association between lung function PES and gene expression using RNA sequencing (RNAseq) on transformed lymphoblastoid cell lines (LCL)". My question is how is LCL relevant for lung function?

5) Another question is as to how these findings can be validated either in vivo or in vitro. Figure 5 shows a schematic representation on how treatment could be implemented but it is unclear if any validation experiments have been performed.

6) Throughout this study, the authors used three measurements of spirometry phenotypes for lung function. Then, the results and interpretation in this study should be limited to "lung function". However, the authors generalized their observations from "lung function" to "respiratory disease and respiratory infection". This can be misleading (too far-reaching). For example, lung function is often measured by dynamic spirometry which mostly reflects large airway function. However, respiratory disease like COPD is an inflammatory airway disease which affects the small airways in particular (DS Postma, NJEM, ‎2015). Furthermore, "respiratory infection" by bacterial and viral infection such as tuberculosis, influenza and coronaviruses may lead to completely different pathogenesis. It is hard to believe that hyperglycaemia will have causal effect on these respiratory diareses (infections).

7) For all candidate genes identified by TWAS analysis, they could be further prioritized by checking if they are differentially expressed in the lung between samples with and without impaired lung functions.

8) The authors stated "Probabilistic finemapping of these transcriptome-wide significant regions using a multi-tissue reference panel was then performed to prioritize whether these genes are likely causal at that locus". What is the reasoning behind the prioritization using “multi-tissue reference”? It is known that the majority of cis-eQTLs are shared across tissues, but how this could help to prioritize relevant genes?

Reviewer #3:

The manuscript by Reay et al. presents a set of comprehensive analyses of GWAS data to postulate the causal role of hyperglycemia in lung function. The authors perform a series of causal inference analyses on the GWAS data of several blood traits to identify genetically correlated traits that can be explained by a causal role; the authors then seek to identify drug repurposing targets through two complementary analyses, a polygenic risk score restricted to regions within druggable targets and a transcriptome-wide scan linking genetically predicted expression in blood and lung tissue to lung function. Overall, the manuscript leverages recently introduced sophisticate statistical methods and does a thorough job in stress testing the findings. The putative causal role of fasting glucose joint with putative target genes is an important addition to the field. My main comments relate to the robustness of the causal claims.

1) The MR analyses assume the blood traits (i.e. fasting glucose) are mediating lung function. Whereas several biological plausible avenues are given in the discussion for this assumption, it can certainly be the case that lung function is mediating fasting glucose (e.g., lung function causing overall body impairment which in turn causes changes in blood measurements). I strongly encourage the authors to perform analyses under this reverse causality assumption. In particular, the bivariate MR method of Pickrell NG 2016 would be relevant here.

2) As the authors describe in the Discussion section, wrong assumptions in the MR framework can invalidate the findings. The authors do a great job in assaying the impact of pleiotropy on the MR estimates using recently developed methods (LCV, MR-PRESSO etc); however the causal role of smoking is left ambiguous in the causal inference. Clearly smoking has a causal role on lung function, and GWAS of smoking reveals genetic correlates of smoking status (amount). Is there any impact of smoking on blood traits? Is smoking a collider in the causal diagram genetics -> fasting glucose -> lung function? The authors have access to GWAS of smoking and could leverage the MR toolkit to investigate causal effects of smoking on glucose.

3) The identification of drug repurposing tools using the PES score is inconclusive without some replication/validation. The PES is explaining a small proportion of variation in the trait making the interpretation of PES correlations subtle at best; e.g., it is hard to find a biological role for some of the gene-sets that show significance in Table 2. More importantly, it is unclear what is the null expectation of the PES-gene expression correlation analysis; that is, if PES is computed using random pathways (i.e. not specific to druggable pathways) and re-runs the analyses, what are the results? Or, reversely, if the authors perform the same analyses for a randomly chosen complex trait (e.g., height/bmi), what pathways show up in Tables 2/3?

[Editors’ note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "Genetic association and causal inference converge on hyperglycaemia as a modifiable risk factor for respiratory disease" for consideration by eLife. Two of the three reviewers provided further comments on your manuscript and there followed extensive discussion among them and eLife Editors. We note that the third reviewer previously shared concerns of robustness of results based on border-line statistics and requested that the causality analysis needed to be improved. The evaluation was overseen by a Reviewing Editor and David James as the Senior Editor. The Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation. Nevertheless, the reviewers and editors emphasise that as currently written the work is not yet ready for publication.

Summary:

The authors used publicly available GWAS data with regards to lung function and biochemical traits seeking to use genetics to reposition drugs to improve lung function. The authors sought to identify drug repurposing targets through two complementary analyses: a polygenic risk score restricted to regions within druggable targets and a transcriptome-wide scan linking genetically predicted expression in blood and lung tissue to lung function. Reviewers valued the sophisticated approaches applied in this analysis although questioned the authors' statistical interpretations noting that some results were borderline significant.

Revisions:

The joint view of the reviewers and eLife Editors is that several of the reviewers' comments have been addressed adequately. Nevertheless, other reviewers' comments were not resolved by your revision, most importantly Points 1 and 4 of Reviewer 1. As a group we would be more supportive of the revision if the causality claims were to be toned down with appropriate caveats included throughout, and if the statistical results were to be more appropriately presented. The current version, without such changes, is not judged to be ready for publication.

We are content that you now acknowledge the LCV publication's recommendation that only tests with |GCP| > 0.6 are considered. However, your revision does not follow this recommendation unswervingly: e.g. "We acknowledge that the posterior mean GCP estimate for the FEV1 does not quite the threshold of > 0.6, and thus, the causal relationship was more rigorous with FVC". Moreover, your revision unnecessarily clouds this important issue: "|GCP| > 0.6 previously postulated to be evidence of a rigorous relationship". We do not support uneven application of an established threshold.

It is now acknowledged that some tests were not significant after correcting for multiple tests. Nevertheless, an unwarranted emphasis was sometimes placed on non-significant results, e.g. "However, this still suggested that there was a relationship between the Class B/2 secretin family receptor FVC PES and FVC beyond what is attributable to a genome-wide PGS" and "several gene-sets trended towards surviving correction". Similar problems identified among the MR tests and the adaptive choices for PRS p-value thresholds still need to be addressed. The MR and PES results have relatively weak statistical support and yet this is not reflected by an emphasis placed on them in the Abstract. We are of the view that marginally significant (or null) results can still provide a significant contribution to the field as long as their statistical support is reported appropriately. The manuscript will require changes to the application and interpretation of statistical tests throughout.

https://doi.org/10.7554/eLife.63115.sa1

Author response

[Editors’ note: The authors appealed the original decision. What follows is the authors’ response to the first round of review.]

Reviewers considered that the manuscript's focus on rapidly repositioning drugs to improve lung function is timely. They, however, remained unconvinced by some of the main claims and their novelty, and expressed concerns regarding the robustness of results that are based on border-line statistics. Replication and/or validation of the main results might have helped to convince the reviewers. Further consideration of other explanatory variables would also have provided greater robustness to the claims of causality.

Reviewer #1:

This paper seeks to use genetics to reposition drugs to improve lung function. A new pipeline is used to connect together several recent genetics methods: filtering traits on genetic correlations; refining with causality tests (LCV+MR); and then testing lung function and gene expression with PES and TWAS.

Overall, I am quite positive on the goals and core ideas in this paper. The focus on quickly repositioning drug to improve lung function is timely. However, the most interesting and novel results are statistically borderline. Methodologically, I do not see anything new in the paper.

1) There is not much evidence that the PES have enriched signal for lung function (Table 3). The primary PES analyses do not adjust for the overall PRS, hence do not establish Pharmagenic Enrichment, a PRS built on a random subset of SNPs would be expected to have a nonzero effect. The authors recognize this and perform secondary analyses conditional on the PRS to test for enrichment; however, this analysis has null results (Bonferroni-adjusted across 8 tests gives minimum p>.2).

The reviewer is correct to point out that secondary analyses conditional on the overall genome-wide score for models testing PES are an important component. However, we would posit that this does sacrifice some power, particularly as some of the PES display a significant correlation with genome-wide lung function PGS. We agree that the association observed between the Class b/2 secretin PES and FVC was only nominal given the number of tests performed and state as such in the text. In accordance with the reviewer’s, comments we have edited that section of the manuscript to more clearly state the multiple testing burden associated with the results, such that it is more interpretable to the reader.

(1a) I don't see any explanation of the permutation test used here, but the details wrt multiple PES thresholds and whether covariates and PRS are permuted as well as the PES are essential and can significantly change results.

We agree with the reviewer that our description of the permutation terminology was somewhat lacking. In addition, upon reviewing the manuscript we decided that it would enhance the interpretability of our results to provide raw P values and correct for the number of tests using the Bonferroni method. The manuscript has now been adjusted accordingly. Importantly, this did not materially change which PES were significantly associated.

2) The PYGB finding is very nice, but was previously found in a similar analysis (in the paper producing the summary stats used here, see Table 1 in Shrine et al., 2019, PMC6397078). This paper also identified the TGF-β superfamily signalling pathway. The observation that this gene can be putatively targeted by Sivelestat is novel (as far as I know) and potentially very exciting, however, this is not discussed much, and no validation is given for the gene-drug interaction, and no explanation is given to relate neutrophil elastase to glucose.

We agree with the reviewer that PYGB has previously been linked to lung function via the GWAS performed in the Shrine et al. manuscript. However, we would argue that the finding in this study is novel given we are able to integrate gene expression data via TWAS and subsequent Bayesian finemapping to assign a direction of effect to this association, that is, downregulation was associated with increased FEV1. The finemapping procedure is particularly valuable in this situation given that we can be more confident in the robustness of the involvement of this gene in pulmonary biology. The putative interaction between PYGB and Sivelestat was derived using public databases as curated by DGIdb v.3.0.2, and we agree with the reviewer that the potential uncertainty of this interaction needs to be stated to the reader. We have edited the TWAS section of the manuscript as such to address this. Moreover, we have added that PYGB and the gene that encodes neutrophil elastase have a high confidence interaction as per the STRING database.

Furthermore, we acknowledge that the role of TGF-β signalling has been previously characterised in the literature in terms of its involvement in lung function. We would argue that using this pathway to construct a polygenic score that could inform drug repurposing via the pharmagenic enrichment score approach is the novel component of our study, as opposed to discovering genetic support for its role in the lung.

3) I believe the covid analysis assesses only glycemic pathways (Table 4), hence it is hard to evaluate whether the “prey proteins” are more enriched in glycemic pathways than in any other biologically meaningful pathways (further, in the Discussion, it is said that these genes are very pleiotropic). In the future, I think this analysis could be strengthened by testing the PES (or ordinary PRS) against measurements of these proteins in healthy samples, which would demonstrate the link from druggable (or general) glucose biology to the covid-relevant proteins. However, nontrivial effort would be required to integrate such pQTL summary statistics, though I believe such datasets are freely available..

The reviewer makes an important comment that the enrichment of “prey proteins” in the glycaemic pathways needs appropriate contextualisation. It should be noted that the glycaemic gene-sets survived multiple testing correction (q < 0.05) after testing for an overrepresentation of these genes in every gene-set featured by the GENE2FUNC tool in FUMA (thousands of pathways), and thus, whilst these pathways may not necessarily be the most meaningful gene-sets that display an overrepresentation of SARS-CoV2 “prey proteins”, their overrepresentation is statistically robust and relevant given our overarching results related to the importance of glycaemic biology to the lung. It is an excellent suggestion by the reviewer to test the association between PGS and/or PES with protein expression of SARS-CoV2 “prey proteins”. We believe that this could be an important future direction arising from this study and have included this in the discussion.

(4a) The LCV paper recommends considering only tests with |GCP|>.6, this rules out the LCV test for FEV1-glucose, FEV1/FVC-HDL, or FVC-leptin. If there is a reason to deviate from the recommended practice, it should be explained.

We agree with the reviewer, that the |GCP| > 0.6 threshold needs to be more clearly stated and utilised in the contextualisation of the results. We have edited that section of the manuscript accordingly.

(4b) Likewise, the MR analyses have only a very weak statistical signal (p=.02,.03): 1 this doesn't survive correction for testing two phenotypes (not to mention the implicit tests prioritized by rhog that were discarded based on LCV); 2 the LCV paper proves these tests are susceptible to inflation by genetic correlation; 3 I do not agree that horizontal pleiotropy has been ruled out, a priori it seems almost certain that many heritable traits (BMI, smoking, diet, exercise,.…) will causally effect both glucose and lung function, to some extent, and moreover you do show that AMT has near-significant TWAS effects on both smoking and glucose.

We respectively disagree with the characterisation of the MR data here only providing weak statistical support and respond to each point below:

1) It is true that the inverse-variance weighted estimator with random effects yields nominally significant results in terms of statistical significance, however, this overlooks the robustness, and similarity, of the point estimate using both the MR-Egger and Weighted Median models which have completely different assumptions related to IV validity, as discussed in the manuscript. Furthermore, we would posit that the MR was utilised here as a validation of the relationship observed in the LCV model, and thus, is not our primary evidence for an effect of fasting glucose on lung function. The utility of the MR here is that is does provide a point estimate of the putative effect of the exposure on the outcome, which is not afforded in the LCV framework.

2) We agree with the reviewer that MR estimates may be inflated by genetic correlation, however, our GCP estimates from the LCV model supported the causal relationship, and thus, there is still evidence for a causal effect even if there is some inflation of the MR estimate due to genetic correlation.

3) We also agree the horizontal pleiotropy cannot be ruled out, and indeed, this is not possible statistically or biologically. In spite of that, we perform a number of analyses to try and determine whether there is a confounding effect of horizontal pleiotropy on the MR estimate, as outlined in the manuscript. We believe it is particularly important that we biologically annotated potential outlier instrumental variables as having direct relevance to glycaemic biology, for example, an instrumental variable SNP in the glucokinase gene had a particularly large effect (Supplementary file 1G). We also devote a paragraph to the potential confounding effect of smoking. We did not find any evidence of a genetically causal effect of smoking on fasting glucose any evidence of genetic causality between smoking and fasting glucose via the LCV model, whilst none of the fasting glucose IVs were associated with smoking.

Reviewer #2:

In this study, Reay et al. used publicly available GWAS data with regards to lung function and biochemical traits to identify molecular mechanisms capable of improving lung function. There are several comments and questions:

1) Reay et al. identified multiple biochemical traits that could be targeted in order to modulate lung function including fasting glucose and fasting insulin levels as well as other glycaemic related pathways and traits. Of these traits they also identify four gene-sets overrepresented with proteins that interact with viral SARS-CoV2 proteins. However as mentioned by Reay et al., previous studies have already found glycaemic control in the form of diabetes to have an effect on both lung function [Klein et al. Diabet Med. 2010] as well as Covid-19 risk and severity [Yang et al., Int J Infect Dis. 2020 94:91-95.]. The results presented here are not groundbreaking on their own.

We respectively disagree with the reviewer on this point, the studies described in our manuscript which previously have supported the relationship between glycaemic biology and/or diabetes with lung function have been observational in nature. The key difference in our work is that we leverage genetics to provide novel evidence of potential causal relationship between blood glucose and lung function. We achieved this by implementing two approaches: a latent causal variable model and Mendelian Randomisation. These methods and their relationship to causal inference, rather than observational association alone, have been described extensively previously – for instance: PMID: 30374074, PMID: 30002074, PMID: 12689998, PMID: 22607825, PMID: 32249995, and PMID: 29686387. In our manuscript, we also provide an explanation of these methods, as well as their caveats and limitations. We also state that a well-powered, replicated, randomised-control trial would be needed to confirm the causal nature of this relationship. However, our data provides greater justification for an RCT of antihyperglycaemic compounds given the evidence for causality provided by this study. In light of the reviewer’s comments, we have edited the discussion to emphasise the nature of findings relative to previous literature.

2) In addition, the authors have employed several statistical methods using existing datasets and performed a comprehensive analysis. However, all of the approaches are from literature, which limits the novelty of this study.

We would assert in response to the reviewer’s concern that the aim of this study was to synthesise publicly available genomic information to rapidly priortise drug repurposing candidates. As a result, whilst the techniques applied in this study have been developed previously, we believe that the synthesis of causal inference, polygenic scoring, and transcriptomic imputation for drug repurposing in a single integrated study enhances the novelty of the study. It should also be noted that several of our drug repurposing candidates proposed in our study have never previously been identified in the literature, whilst the candidates proposed previously receive greater support resulting from these data.

3) The benefit of using the framework proposed by Reay et al. is that it identifies potential new uses for existing drugs through the biochemical traits they modulate. This however means that the potential discoveries regarding drug repurposing are limited to only those compounds with known biochemical effects. Another limitation is the use of genetics data exclusively and not integrating more layers of information that might identify causal traits for any given disease. Various other approaches have extensively been reported before (e.g. Pushpakom et al. Nat Rev Drug Discov. 2019) which creates the question as to how this methodology can be edited in order to maximize the possible findings.

We agree with the reviewer that a limitation of our work is that we focus on primarily approved compounds and those with characterised molecular effects. Our reasoning for this was to identify drug repurposing candidates that could be utilised most readily, however, this could be expanded in future work. We also acknowledge that other data besides genetics could be integrated for drug repurposing, although this would be the focus on future study. We have added some text to the discussion which acknowledges the reviewer’s comment that other data could be included beyond genotype information.

4) The authors claimed that "The correlation between the expression of genes within each pathway encompassed by the PES and the PES profiles themselves could provide further support for their biological impact". This is true when the expression data of the genes come from the relevant tissues. Here, the authors focus on lung function but performed "association between lung function PES and gene expression using RNA sequencing (RNAseq) on transformed lymphoblastoid cell lines (LCL)". My question is how is LCL relevant for lung function?

We agree with the reviewer that lung tissue would be the ideal tissue to investigate. However, the LCL dataset is a large and easily accessible collection of samples with matched genome and RNA sequencing, and thus, we utilised in this study. We have added a caveat in the respective section of the manuscript, that these relationships should be explored using lung tissue in future.

5) Another question is as to how these findings can be validated either in vivo or in vitro. Figure 5 shows a schematic representation on how treatment could be implemented but it is unclear if any validation experiments have been performed.

We thank the reviewer for this comment. We believe that the ideal validation strategy for this approach would be to test its utility in a clinical trial setting as was outlined in the manuscript. in vitro analysis of polygenic scores remains challenging although approaches like patient derived cell lines would be a useful future avenue.

6) Throughout this study, the authors used three measurements of spirometry phenotypes for lung function. Then, the results and interpretation in this study should be limited to "lung function". However, the authors generalized their observations from "lung function" to "respiratory disease and respiratory infection". This can be misleading (too far-reaching). For example, lung function is often measured by dynamic spirometry which mostly reflects large airway function. However, respiratory disease like COPD is an inflammatory airway disease which affects the small airways in particular (DS Postma, NJEM, ‎2015). Furthermore, "respiratory infection" by bacterial and viral infection such as tuberculosis, influenza and coronaviruses may lead to completely different pathogenesis. It is hard to believe that hyperglycaemia will have causal effect on these respiratory diareses (infections).

While we appreciate the reviewer’s concern that it may not seem logical that such a basic metabolic parameter can have such a broad reaching genetic and epidemiological impact in lung function, however, as we did not hypothesize this the “logic” is not relevant. We discovered this influence using an unbiased (hypothesis-free) data driven approach. As with many things in science, there is not always and immediate explanation of discovery that accords with what is known about a particular phenomenon. While it is possible that not all respiratory disease has an impact on lung function it is likely that lung function will have an impact on lung disease outcome. Lung function is a powerful quantitative trait and a biomarker of lung disease and lung related morbidity/mortality. In other words, regardless of the cause of acute lung disfunction the underlying reserves in lung function will often determine survival and long-term consequences. If these risk factors can be modulated in affected individuals it is likely to have a positive impact on the course of disease.

Given the reviewer’s comments we have modified the text to better explain the clinical significance of the relationship between lung function and lung disease. Furthermore, we have decided to edit the title of this manuscript as follows, which we believe avoids overgeneralising our data: “Genetic association and causal inference converge on hyperglycaemia as a modifiable factor to improve lung function.”

7) For all candidate genes identified by TWAS analysis, they could be further prioritized by checking if they are differentially expressed in the lung between samples with and without impaired lung functions.

This is a sound suggestion by the reviewer, and we have added this to the discussion as a future direction.

8) The authors stated "Probabilistic finemapping of these transcriptome-wide significant regions using a multi-tissue reference panel was then performed to prioritize whether these genes are likely causal at that locus". What is the reasoning behind the prioritization using “multi-tissue reference”? It is known that the majority of cis-eQTLs are shared across tissues, but how this could help to prioritize relevant genes?

Our reasoning for including a multi-tissue panel in our finemapping analyses was to capture the maximum number of genes per locus that have significant cis-heritable models. In other words, our TWAS discovery analyses focused on lung and whole blood as likely the most relevant and/or well-powered tissues. The finemapping model then sought to priortise whether our candidate TWAS gene was the most likely causal gene amongst proximally located genes. As a result, we used a multi-tissue panel to include the maximum number of genes in this model when computing the credible set, including genes that did not have a significantly cis-heritable in lung or whole blood. The reasoning behind using a multi-tissue panel has been further described elsewhere, such as PMID: 30926970.

Reviewer #3:

The manuscript by Reay et al. presents a set of comprehensive analyses of GWAS data to postulate the causal role of hyperglycemia in lung function. The authors perform a series of causal inference analyses on the GWAS data of several blood traits to identify genetically correlated traits that can be explained by a causal role; the authors then seek to identify drug repurposing targets through two complementary analyses, a polygenic risk score restricted to regions within druggable targets and a transcriptome-wide scan linking genetically predicted expression in blood and lung tissue to lung function. Overall, the manuscript leverages recently introduced sophisticate statistical methods and does a thorough job in stress testing the findings. The putative causal role of fasting glucose joint with putative target genes is an important addition to the field. My main comments relate to the robustness of the causal claims.

1) The MR analyses assume the blood traits (i.e. fasting glucose) are mediating lung function. Whereas several biological plausible avenues are given in the discussion for this assumption, it can certainly be the case that lung function is mediating fasting glucose (e.g., lung function causing overall body impairment which in turn causes changes in blood measurements). I strongly encourage the authors to perform analyses under this reverse causality assumption. In particular, the bivariate MR method of Pickrell NG 2016 would be relevant here.

We agree with the reviewer that reverse causality is an important consideration when performing causal inference. We believe that our existing data does provide support for the direction of causal effect to primarily operate from fasting glucose to lung function as the sign of the posterior mean GCP is positive in the LCV model. This is likely indicative of the fact that variants exerting an effect on fasting glucose tend to have proportional effects on lung function, but not vice versa, as described in the original O’Connor and Price LCV manuscript. We have performed some additional analyses to further support that the direction of assumed effect from exposure to outcome is correct by utilising the MR-Steiger directionality test (PMID: 29149188). The concept underlying this approach is that the estimated phenotypic variance explained by the IV SNPs on the exposure (fasting glucose) and outcome (FEV1 or FVC) are compared to establish whether exposure outcome causal effect is correct. We found that this was the case in both instances for FEV1 and FVC, with no evidence of reverse causality. This has now been added to the manuscript.

2) As the authors describe in the Discussion section, wrong assumptions in the MR framework can invalidate the findings. The authors do a great job in assaying the impact of pleiotropy on the MR estimates using recently developed methods (LCV, MR-PRESSO etc); however the causal role of smoking is left ambiguous in the causal inference. Clearly smoking has a causal role on lung function, and GWAS of smoking reveals genetic correlates of smoking status (amount). Is there any impact of smoking on blood traits? Is smoking a collider in the causal diagram genetics -> fasting glucose -> lung function? The authors have access to GWAS of smoking and could leverage the MR toolkit to investigate causal effects of smoking on glucose.

We agree with the reviewer that the role of smoking is an important consideration in these results. We believe that our material as described provides support that our results are not an artefact of confounding due to smoking effects, although as ever with genetically informed causal inference, this cannot be definitively ruled out. Firstly, we demonstrated that glucose and smoking were genetically correlated using a GWAS of smoking heaviness. However, using the LCV approach we found no strong evidence of partial genetic causality between smoking and fasting glucose, whilst the GCP estimate was relatively high (GCP = -0.47), the large standard error rendered this point estimate difficult to interpret and statistically non-significant. We believe the most important evidence is that none of our fasting glucose SNPs utilised as IVs were associated with either smoking phenotype at genome-wide or suggestive significance, meaning the MR signal was unlikely to be biased by smoking.

We appreciate the suggestion from the reviewer in regard to whether our estimate may be impacted by collider bias as the lung function GWAS was covaried for smoking status. We performed some additional analyses using a smaller GWAS of FEV1 and FVC using the UK biobank cohort only from Ben Neale’s group which was not adjusted for smoking (N = 272,338). We found that the posterior mean GCP estimates and IVW estimates for MR were in the same direction, with no evidence for the role of smoking as a collider variable. This new material has now been added to the manuscript.

3) The identification of drug repurposing tools using the PES score is inconclusive without some replication/validation. The PES is explaining a small proportion of variation in the trait making the interpretation of PES correlations subtle at best; e.g., it is hard to find a biological role for some of the gene-sets that show significance in Table 2. More importantly, it is unclear what is the null expectation of the PES-gene expression correlation analysis; that is, if PES is computed using random pathways (i.e. not specific to druggable pathways) and re-runs the analyses, what are the results? Or, reversely, if the authors perform the same analyses for a randomly chosen complex trait (e.g., height/bmi), what pathways show up in Tables 2/3?

We agree with the reviewer that in future the PES results from our study require replication and validation and have added a statement to this effect in the Discussion. We would also argue that due to the hypothesis free nature of pathway selection (other than having a known drug target), this approach may reveal biological aspects of the phenotype in question that have not previously been considered. The analyses in this study whereby we conservatively control for genome wide PGS we believe suitably addresses the question in regard to the null expectation of the PES ~ lung function correlation. In other words, the class b/2 secretin PES, for instance, was associated with FVC beyond what was attributable to genome wide PGS, suggesting a role for variants in this pathway that is not influenced purely by random subsection of the polygenic signal for this trait amongst these genes. Indeed, the trait relevance of these pathways is exemplified by their selection in that this gene-sets were significantly enriched with trait-associated variation relative to all other genes in the original GWAS. There are likely to be non-druggable pathways to also exert a non-zero effect on lung function, however, given each of the PES were either non-significantly or only modestly correlated with each other, we would expect these signals to be distinct. We have added some text to the discussion that outlines that these concerns warrant further investigation and thank the reviewer for the comment. Furthermore, the pathways identified in this study and utilised for PES are unlikely to be only relevant to lung function given the pleiotropic nature of common variants. Therefore, the association of the same pathways with other traits like BMI or height, in our opinion, does not negate their relevance in this study in the same fashion as a genome-wide significant SNP being shared between two phenotypes does not preclude its importance from either.

[Editors’ note: what follows is the authors’ response to the second round of review.]

Revisions:

The joint view of the reviewers and eLife Editors is that several of the reviewers' comments have been addressed adequately. Nevertheless, other reviewers' comments were not resolved by your revision, most importantly Points 1 and 4 of Reviewer 1. As a group we would be more supportive of the revision if the causality claims were to be toned down with appropriate caveats included throughout, and if the statistical results were to be more appropriately presented. The current version, without such changes, is not judged to be ready for publication.

We are content that you now acknowledge the LCV publication's recommendation that only tests with |GCP| > 0.6 are considered. However, your revision does not follow this recommendation unswervingly: e.g. "We acknowledge that the posterior mean GCP estimate for the FEV1 does not quite the threshold of > 0.6, and thus, the causal relationship was more rigorous with FVC". Moreover, your revision unnecessarily clouds this important issue: "|GCP| > 0.6 previously postulated to be evidence of a rigorous relationship". We do not support uneven application of an established threshold.

It is now acknowledged that some tests were not significant after correcting for multiple tests. Nevertheless, an unwarranted emphasis was sometimes placed on non-significant results, e.g. "However, this still suggested that there was a relationship between the Class B/2 secretin family receptor FVC PES and FVC beyond what is attributable to a genome-wide PGS" and "several gene-sets trended towards surviving correction". Similar problems identified among the MR tests and the adaptive choices for PRS p-value thresholds still need to be addressed. The MR and PES results have relatively weak statistical support and yet this is not reflected by an emphasis placed on them in the Abstract. We are of the view that marginally significant (or null) results can still provide a significant contribution to the field as long as their statistical support is reported appropriately. The manuscript will require changes to the application and interpretation of statistical tests throughout.

We agree with the reviewers that the results of this manuscript need to be carefully presented. As a result, we have made several changes in the manuscript pursuant to this. In particular, we acknowledge the concerns of the reviewers related to the presentation of the LCV, Mendelian randomisation, and PES results. We have addressed each of these sections individually below and detail the edits that have been made to the manuscript.

Pharmagenic enrichment score (PES) results

We have made a number of edits related to the PES materials such that the borderline nature of their statistical association in the Hunter cohort is emphasised to the reader:

Abstract: The language regarding the association between PES and pulmonary phenotypes has been tempered given it does not survive multiple testing correction. Specifically, we have edited the Abstract from “Moreover, we developed polygenic scores for lung function specifically within pathways with known drug targets that were significantly associated with both pulmonary phenotypes and gene expression in independent cohorts to prioritise individuals who may benefit from particular drug repurposing opportunities” to “Moreover, we developed polygenic scores for lung function specifically within pathways with known drug targets and investigated their relationship with both pulmonary phenotypes and gene expression in independent cohorts to prioritise individuals who may benefit from particular drug repurposing opportunities”. We believe that this edited sentence is appropriate in its conservatism as it does not explicitly state the significance of these results, even though we found correlations with mRNA expression that surpassed multiple-testing correction.

In the figure legend for Figure 3, we edited the wording such it states, “The phenotypic association between a polygenic score (PGS) of FVC and an FVC PES which was nominally significant (P < 0.05) but did not survive multiple testing correction after adjustment for genome wide PGS”.

We have removed the mention of gene-sets that trended towards multiple testing correction.

We have removed the sentence “However, this still suggested that there was a relationship between the Class B/2 secretin family receptor FVC PES and FVC beyond what is attributable to a genome-wide PGS”. We believe the description of this result as “nominally significant” but failing to survive correction in the previous line accurately portrays the strength of this result.

We have added some text to further contextualise the strength of these results given that it does not survive multiple testing correction.

The text in the discussion regarding the PES has also been edited to more explicitly state that the Class B/2 secretin family receptor FVC PES association does not survive multiple testing correction – specifically, we changed “The Class B/2 secretin family receptors score for FVC was particularly noteworthy given that it remained significant after an adjustment for genome wide PGS” to “The Class B/2 secretin family receptors score for FVC was noteworthy given that it remained nominally significant after an adjustment for genome wide PGS. However, this did not survive multiple-testing correction, and thus, further replication is needed to confirm this signal”

We also wish to further clarify the selection of P value thresholds to construct the PES and the PGS, this issue has also been addressed in a previous publication describing the PES method (PMID: 31964963). Firstly, we use P value thresholding in the identification of the candidate PES gene-sets using the GWAS summary statistics, that is, gene-based test statistics are constructed only using P values below said threshold. Thereafter, competitive gene-set association is conducted for each druggable pathway at the different thresholds, with the null hypothesis being that the druggable pathway is no more associated with the trait (enriched with association) than all other genes for which gene-based P values could be calculated by virtue of having a SNP below the threshold. The concept underlying this is that distinct pathways may be enriched with common variants at differing levels of the polygenic signal, for example, a model including all SNPs (P < 1) will identify gene-sets enriched with variation relative to all other genes, whilst a less polygenic model, like a threshold of P < 0.05, will capture gene-sets enriched with association relative to genes with at least one SNP mapped to it with a univariate association P < 0.05. The selection of the actual thresholds themselves is somewhat arbitrary, however, we justify our selection (P < 1, P < 0.5, P < 0.05, and P < 0.005), as this is indicative of a model with all SNPs (P < 1), nominally associated SNPs (P < 0.05), as well as an order of magnitude higher (P < 0.5) or lower (P < 0.005) than nominally associated. We believe this balances capturing different elements of the polygenic signal with encompassing enough SNPs to construct scores using variants only within a biological pathway, which may not be realistic at lower P thresholds. As described in the text – all the thresholds are considered when applying multiple testing correction via the FDR method, that is, we subject all gene-sets at all thresholds to FDR correction simultaneously, which amounts to around 4000 tests subjected to FDR correction at the same time over all the thresholds. The threshold which is most associated after FDR correction is then used to construct the score for individual-level genotype data. We only ever constructed the PES at one P threshold when applying this to individual genotype data, which was the threshold at which the strongest signal was observed in the discovery GWAS, and therefore, we did not need to correct for testing multiple P thresholds per PES when we applied this to the Hunter cohort and GEUVADIS datasets, unlike in the GWAS summary statistics analyses where all thresholds were corrected for. We have provided an additional summary of this process to aid in reader understanding, as well as added some text to clarify that only one threshold was tested in the Hunter cohort and GEUVADIS datasets for each PES.

Furthermore, the primary use of genome wide PGS in this study was to act as a covariate in the sensitivity analyses of the PES ~ lung function models in the Hunter cohort and we constructed the PGS at the same thresholds as the PES such that they could be directly compared. However, we also wished to benchmark the variance explained by the PES compared to a genome wide PGS, and, as a result, we constructed and tested genome wide PGS at two additional thresholds that were less polygenic (P < 1 x 10-5 and P < 5 x 10-8) to maximise our chances of finding the most parsimonious score with the highest variance explained. The inclusion of more stringent thresholds closer to genome-wide significance when testing PGS is usual practice in the literature. As it turned out, the most parsimonious model was a P value threshold of 0.005 and 0.05, for FEV1 and FVC, respectively, meaning the addition of these thresholds had no impact on the results. This does not have any implications for multiple testing correction in the PES analyses as we only used the PGS P threshold that matched the PES in all instances. The P < 1 x 10-5 and P < 5 x 10-8 thresholds would not have been appropriate for the identification of PES pathways as the number of independent SNPs in a single pathway that satisfy these thresholds would likely be insufficient, particularly for small gene-sets. We believe that the above provides an appropriate justification for the P value thresholds in this study. In summary, we corrected for all four thresholds in the gene-set discovery process simultaneously using FDR, however, we only constructed a score for each PES using a single threshold, and thus, there was no extra multiple testing burden in the Hunter cohort and GEUVADIS analyses, besides the number of PES tested.

Latent causal variable (LCV) results

We have edited the text in the latent causal variable section of the manuscript to discuss the |GCP| > 0.6 threshold more stringently. We still believe that we have demonstrated strong evidence of partial genetic causality between fasting glucose and lung function given that the GCP estimate for fasting glucose FVC exceeded 0.6 using a fasting glucose GWAS with, and without, BMI covariation. As a result, we believe the evidence for this causal relationship is appropriately reflected in the Abstract where we state “with further evidence of a causal relationship between increased fasting glucose and diminished lung function” Specific edits to this section are outlined below:

We have edited the statement regarding the GCP threshold to more definitively state the utility of this threshold, as described in the original O’Connor and Price manuscript – “We used the recommended threshold for partial genetic causality of |GCP| > 0.6, as this has been demonstrated in simulations to appropriately guard against false positives”.

We have also edited the text related to the LCV results such that the |GCP| > 0.6 threshold is applied in a simplified manner. We also now more explicitly demonstrate that the GCP threshold is not reached partial genetic causality between fasting glucose and FEV1. However, given the evidence of partial genetic causality of fasting glucose on FVC, and the FEV1 GCP value nearing 0.6 (0.57), we still mention this in text with the caveat that it does not exceed 0.6. We would like to make the point that our application of the GCP threshold here is stringent, although somewhat arbitrary given the GCP estimate is not intended to be interpreted in the same way as something like a P value where different thresholds have implicit implications. Nonetheless, we agree with the reviewers that stringency is important in this instance, and thus, we believe we have now more decisively applied the 0.6 threshold in our language in this section.

We have amended Figure 2B such that we denote GCP > 0.6 and GCP < -0.6 with a vertical dotted line as the respective thresholds on the forest plot to make it clearer to the reader that these are our thresholds of interest, rather than zero.

Mendelian randomisation (MR) results

We have also reviewed and amended our language regarding the reporting of the results in the Mendelian randomisation section of the manuscript. We wish to emphasise that given we tested traits for causality that were also genetically correlated, the latent causal variable model is our central test for evidence of a causal relationship because genetic correlation has been shown to potentially bias Mendelian randomisation. As a result, we deploy Mendelian randomisation here as a validation of the relationship observed by LCV as it uses a different set of statistical parameters and assumptions, even though MR may be inflated by genetic correlation. We have edited the text to more explicitly make this point. Given the strength of the LCV model between fasting glucose and FVC, we further would posit the emphasis placed on the relationship of fasting glucose on FVC is still appropriate in the Abstract.

The primary test we utilise for MR is the inverse-variance weighted estimator, as is this is generally considered the most powerful approach, however, this is at the cost of assuming all instrumental variables are valid. We state in text that the IVW estimates were nominally significant (P < 0.05), which is true, and we believe is an appropriate description. However, we have added some text to comment that the confidence intervals extend close to zero in both circumstances and that the P values are relatively high. The weighted median and MR-Egger methods were then utilised as sensitivity analyses of IVW estimate as they make different assumptions about IV validity. We believe our description of these results is appropriate as we compare them to the IVW method and report their P value which was nominally significant in the case of the weighted median and non-significant for MR-Egger. Moreover, we believe that the leave-one-out analyses appropriately demonstrate that the IVW estimate does have some (nominal) outliers, however, these can largely be attributed to what likely constitute true biological effects, as discussed in Supplementary file 1G-L.

https://doi.org/10.7554/eLife.63115.sa2

Article and author information

Author details

  1. William R Reay

    1. School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, Australia
    2. Hunter Medical Research Institute, Newcastle, Australia
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    Competing interests
    has filed a patent related to the use of the pharmagenic enrichment score methodology in complex disorders. This competing interest only applies to that section of the manuscript. WIPO Patent Application WO/2020/237314.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-7689-2453
  2. Sahar I El Shair

    School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, Australia
    Contribution
    Formal analysis, Writing - review and editing
    Competing interests
    No competing interests declared
  3. Michael P Geaghan

    1. School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, Australia
    2. Hunter Medical Research Institute, Newcastle, Australia
    Contribution
    Software, Formal analysis, Writing - review and editing
    Competing interests
    No competing interests declared
  4. Carlos Riveros

    1. Hunter Medical Research Institute, Newcastle, Australia
    2. School of Medicine and Public Health, The University of Newcastle, Callaghan, Australia
    Contribution
    Data curation, Formal analysis, Writing - review and editing
    Competing interests
    No competing interests declared
  5. Elizabeth G Holliday

    1. Hunter Medical Research Institute, Newcastle, Australia
    2. School of Medicine and Public Health, The University of Newcastle, Callaghan, Australia
    Contribution
    Resources, Data curation, Formal analysis, Writing - review and editing
    Competing interests
    No competing interests declared
  6. Mark A McEvoy

    1. Hunter Medical Research Institute, Newcastle, Australia
    2. School of Medicine and Public Health, The University of Newcastle, Callaghan, Australia
    Present address
    La Trobe Rural Health School, College of Science, Health and Engineering, La Trobe University, Bendigo, Australia
    Contribution
    Resources, Data curation, Writing - review and editing
    Competing interests
    No competing interests declared
  7. Stephen Hancock

    1. Hunter Medical Research Institute, Newcastle, Australia
    2. School of Medicine and Public Health, The University of Newcastle, Callaghan, Australia
    Contribution
    Resources, Data curation, Writing - review and editing
    Competing interests
    No competing interests declared
  8. Roseanne Peel

    1. Hunter Medical Research Institute, Newcastle, Australia
    2. School of Medicine and Public Health, The University of Newcastle, Callaghan, Australia
    Contribution
    Resources, Data curation, Writing - review and editing
    Competing interests
    No competing interests declared
  9. Rodney J Scott

    1. School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, Australia
    2. Hunter Medical Research Institute, Newcastle, Australia
    Contribution
    Resources, Data curation, Writing - review and editing
    Competing interests
    No competing interests declared
  10. John R Attia

    1. Hunter Medical Research Institute, Newcastle, Australia
    2. School of Medicine and Public Health, The University of Newcastle, Callaghan, Australia
    Contribution
    Resources, Data curation, Writing - review and editing
    Competing interests
    No competing interests declared
  11. Murray J Cairns

    1. School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan, Australia
    2. Hunter Medical Research Institute, Newcastle, Australia
    Contribution
    Conceptualization, Resources, Data curation, Supervision, Funding acquisition, Visualization, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    murray.cairns@newcastle.edu.au
    Competing interests
    has filed a patent related to the use of the pharmagenic enrichment score methodology in complex disorders. This competing interest only applies to that section of the manuscript. WIPO Patent Application WO/2020/237314.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2490-2538

Funding

National Health and Medical Research Council (1147644)

  • Murray J Cairns

National Health and Medical Research Council (1121474)

  • Murray J Cairns

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This study was supported by an NHMRC project grant (1147644). MJC is supported by an NHMRC Senior Research Fellowship (1121474).

Hunter Cohort Study: The research on which this paper is based was conducted as part of the Hunter Community Study, The University of Newcastle. We are grateful to the University of Newcastle for funding and to the men and women of the Hunter region who provided the information recorded.

Ethics

Human subjects: The use of the Hunter Community Cohort data was approved by the University of Newcastle Human Ethics Research Committee (HREC, reference: H-820-0504a). All other information related to ethical approval for the individual GWAS studies we utilised in this study are detailed in their respective publications as referenced throughout the text.

Senior Editor

  1. David E James, The University of Sydney, Australia

Reviewing Editor

  1. Chris P Ponting, University of Edinburgh, United Kingdom

Reviewer

  1. Bogdan Pasanuic, UCLA, United States

Publication history

  1. Received: September 15, 2020
  2. Accepted: March 11, 2021
  3. Accepted Manuscript published: March 15, 2021 (version 1)
  4. Version of Record published: April 21, 2021 (version 2)

Copyright

© 2021, Reay et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 746
    Page views
  • 78
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Genetics and Genomics
    Jose D Aponte et al.
    Research Article Updated

    Realistic mappings of genes to morphology are inherently multivariate on both sides of the equation. The importance of coordinated gene effects on morphological phenotypes is clear from the intertwining of gene actions in signaling pathways, gene regulatory networks, and developmental processes underlying the development of shape and size. Yet, current approaches tend to focus on identifying and localizing the effects of individual genes and rarely leverage the information content of high-dimensional phenotypes. Here, we explicitly model the joint effects of biologically coherent collections of genes on a multivariate trait – craniofacial shape – in a sample of n = 1145 mice from the Diversity Outbred (DO) experimental line. We use biological process Gene Ontology (GO) annotations to select skeletal and facial development gene sets and solve for the axis of shape variation that maximally covaries with gene set marker variation. We use our process-centered, multivariate genotype-phenotype (process MGP) approach to determine the overall contributions to craniofacial variation of genes involved in relevant processes and how variation in different processes corresponds to multivariate axes of shape variation. Further, we compare the directions of effect in phenotype space of mutations to the primary axis of shape variation associated with broader pathways within which they are thought to function. Finally, we leverage the relationship between mutational and pathway-level effects to predict phenotypic effects beyond craniofacial shape in specific mutants. We also introduce an online application that provides users the means to customize their own process-centered craniofacial shape analyses in the DO. The process-centered approach is generally applicable to any continuously varying phenotype and thus has wide-reaching implications for complex trait genetics.

    1. Computational and Systems Biology
    2. Genetics and Genomics
    Yekaterina Shulgina, Sean R Eddy
    Research Article Updated

    The genetic code has been proposed to be a ‘frozen accident,’ but the discovery of alternative genetic codes over the past four decades has shown that it can evolve to some degree. Since most examples were found anecdotally, it is difficult to draw general conclusions about the evolutionary trajectories of codon reassignment and why some codons are affected more frequently. To fill in the diversity of genetic codes, we developed Codetta, a computational method to predict the amino acid decoding of each codon from nucleotide sequence data. We surveyed the genetic code usage of over 250,000 bacterial and archaeal genome sequences in GenBank and discovered five new reassignments of arginine codons (AGG, CGA, and CGG), representing the first sense codon changes in bacteria. In a clade of uncultivated Bacilli, the reassignment of AGG to become the dominant methionine codon likely evolved by a change in the amino acid charging of an arginine tRNA. The reassignments of CGA and/or CGG were found in genomes with low GC content, an evolutionary force that likely helped drive these codons to low frequency and enable their reassignment.