1. Computational and Systems Biology
  2. Human Biology and Medicine
Download icon

The MR-Base platform supports systematic causal inference across the human phenome

Tools and Resources
  • Cited 4
  • Views 1,301
  • Annotations
Cite as: eLife 2018;7:e34408 doi: 10.7554/eLife.34408

Abstract

Results from genome-wide association studies (GWAS) can be used to infer causal relationships between phenotypes, using a strategy known as 2-sample Mendelian randomization (2SMR) and bypassing the need for individual-level data. However, 2SMR methods are evolving rapidly and GWAS results are often insufficiently curated, undermining efficient implementation of the approach. We therefore developed MR-Base (http://www.mrbase.org): a platform that integrates a curated database of complete GWAS results (no restrictions according to statistical significance) with an application programming interface, web app and R packages that automate 2SMR. The software includes several sensitivity analyses for assessing the impact of horizontal pleiotropy and other violations of assumptions. The database currently comprises 11 billion single nucleotide polymorphism-trait associations from 1673 GWAS and is updated on a regular basis. Integrating data with software ensures more rigorous application of hypothesis-driven analyses and allows millions of potential causal relationships to be efficiently evaluated in phenome-wide association studies.

https://doi.org/10.7554/eLife.34408.001

eLife digest

Our health is affected by many exposures and risk factors, including aspects of our lifestyles, our environments, and our biology. It can, however, be hard to work out the causes of health outcomes because ill-health can influence risk factors and risk factors tend to influence each other. To work out whether particular interventions influence health outcomes, scientists will ideally conduct a so-called randomized controlled trial, where some randomly-chosen participants are given an intervention that modifies the risk factor and others are not. But this type of experiment can be expensive or impractical to conduct.

Alternatively, scientists can also use genetics to mimic a randomized controlled trial. This technique – known as Mendelian randomization – is possible for two reasons. First, because it is essentially random whether a person has one version of a gene or another. Second, because our genes influence different risk factors. For example, people with one version of a gene might be more likely to drink alcohol than people with another version. Researchers can compare people with different versions of the gene to infer what effect alcohol drinking has on their health.

Every day, new studies investigate the role of genetic variants in human health, which scientists can draw on for research using Mendelian randomization. But until now, complete results from these studies have not been organized in one place. At the same time, statistical methods for Mendelian randomization are continually being developed and improved. To take advantage of these advances, Hemani, Zheng, Elsworth et al. produced a computer programme and online platform called “MR-Base”, combining up-to-date genetic data with the latest statistical methods.

MR-Base automates the process of Mendelian randomization, making research much faster: analyses that previously could have taken months can now be done in minutes. It also makes studies more reliable, reducing the risk of human error and ensuring scientists use the latest methods. MR-Base contains over 11 billion associations between people’s genes and health-related outcomes. This will allow researchers to investigate many potential causes of poor health. As new statistical methods and new findings from genetic studies are added to MR-Base, its value to researchers will grow.

https://doi.org/10.7554/eLife.34408.002

Introduction

Inferring causal relationships between phenotypes is a major challenge and has important implications for understanding the aetiology of disease processes. The potential for phenome-wide causal inference has increased markedly over the past 10 years due to two major advances. The first is the continuing success of large scale genome-wide association studies (GWAS) in identifying robust genetic associations (Visscher et al., 2017). The second is the development of statistical methods for causal inference that exploit the principles of Mendelian randomization (MR) using GWAS summary data (Davey Smith and Ebrahim, 2003; Davey Smith and Hemani, 2014; Zhu et al., 2016; Pierce and Burgess, 2013). Genetic data for MR can, however, be difficult to access, while MR methods are evolving rapidly and can be difficult to implement for non-specialists. To address the need for more systematic curation and application of complete GWAS summary data and MR methods, we have developed MR-Base (http://www.mrbase.org): a platform that integrates a database of thousands of GWAS summary datasets with a web interface and R packages for automated causal inference through MR. Following an extended introduction on the uses and sources of GWAS summary data, and the principles and assumptions behind MR, we describe how to implement MR analyses using MR-Base, how to interpret results and provide a thorough overview of potential limitations. In an applied example, we demonstrate the functionality of MR-Base through an MR study of low density lipoprotein (LDL) cholesterol and coronary heart disease (CHD). We also demonstrate how the integration achieved by MR-Base supports a wide range of applications, including phenome-wide association studies (PheWAS) to identify potential sources of horizontal pleiotropy, and for performing hypothesis-free MR to gain insight into impacts of interventions. These applications demonstrate how integrating data and analytical tools enable novel insights that would previously have been technically and practically challenging to achieve.

GWAS summary data

GWAS summary data, the non-disclosive results from testing the association of hundreds of thousands to millions of genetic variants with a phenotype, have been routinely collected and curated for several years (Welter et al., 2014; Li et al., 2016; Beck et al., 2014) and are a valuable resource for dissecting the causal architecture of complex traits (Pasaniuc and Price, 2017). Accessible GWAS summary data are, however, often restricted to ‘top hits’, that is, statistically significant results, or tend to be hosted informally in different locations under a wide variety of formats. For other studies, summary data may only be available ‘on request’ from authors. Complete summary data are currently publicly accessible for thousands of phenotypes but to ensure reliability and efficiency for systematic downstream applications they must be harvested, checked for errors, harmonised and curated into standardised formats. GWAS summary data are useful for a wide variety of applications, including MR, PheWAS (Millard et al., 2015; Denny et al., 2010), summary-based transcriptome-wide (Gusev et al., 2016) and methylome-wide (Richardson et al., 2017; Hannon et al., 2017a) association studies and linkage disequilibrium (LD) score regression (Bulik-Sullivan et al., 2015; Zheng et al., 2017b).

Mendelian randomization

MR (Davey Smith and Ebrahim, 2003; Davey Smith and Hemani, 2014) uses genetic variation to mimic the design of randomised controlled trials (RCT) (although for interpretive caveats see Holmes et al., 2017). Let us suppose we have a single nucleotide polymorphism (SNP) that is known to influence some phenotype (the exposure). Due to Mendel’s laws of inheritance and the fixed nature of germline genotypes, the alleles an individual receives at this SNP are expected to be random with respect to potential confounders and causally upstream of the exposure. In this ‘natural experiment’, the SNP is considered to be an instrumental variable (IV), and observing an individual’s genotype at this SNP is akin to randomly assigning an individual to a treatment or control group in a RCT (Figure 1a). To infer the causal influence of the exposure, one calculates the ratio between the SNP effect on the outcome over the SNP effect on the exposure. If there are many independent IVs available for a particular exposure, as is often the case, causal inference can be strengthened (Johnson, 2012). Here, we consider each SNP to mimic an independent RCT and we can adapt tools developed for meta-analysis (Bowden et al., 2017a) to combine the results obtained from each of the SNPs, giving an overall causal estimate that is better powered (Bowden et al., 2017a).

Principles and assumptions behind Mendelian randomization.

(A) Diagram illustrating the analogy between Mendelian randomization (MR) and a randomised controlled trial. (B) A directed acyclic graph representing the MR framework. Instrumental variable (IV) assumption 1: the instruments must be associated with the exposure; IV assumption 2: the instruments must influence the outcome only through the exposure; IV assumption 3: the instruments must not associate with measured or unmeasured confounders. (C-F) Scatter plots demonstrating the relationship between the instrumental single nucleotide polymorphism (SNP) effects on the exposure against their corresponding effects on the outcome. The slope of the regression is the estimate of the causal effect of the exposure on the outcome. (C) If there is no violation of the IV2 assumption (no horizontal pleiotropy), or the horizontal pleiotropy is balanced, an unbiased causal estimate can be obtained by inverse-variance weighted (IVW) linear regression, where the contribution of each instrumental SNP to the overall effect is weighted by the inverse of the variance of the SNP-outcome effect. Fixed and random effects IVW approaches are available (the slopes from both approaches are identical but the variance of the slope is inflated in the random effects model in the presence of heterogeneity between SNPs). (D) If there is a tendency for the horizontal pleiotropic effect to be in a particular direction, then constraining the slope to go through zero will incur bias (grey line). Egger regression relaxes this constraint by allowing the intercept to pass through a value other than zero, returning an unbiased effect estimate if the instrument-exposure and pleiotropic effects are uncorrelated, also known as the InSIDE (Instrument Strength Independent of Direct Effect) assumption (Bowden et al., 2015). Pleiotropic effect here refers to the effect of the instrument on the outcome that is not mediated by the exposure. (E) If the majority of the instruments are valid (black points), with some invalid instruments (red points), the median based approach will provide an unbiased estimate in the presence of unbalanced horizontal pleiotropy (black line), whereas IVW linear regression will provide a biased estimate (grey line). In addition, the median-based estimator does not require the InSIDE assumption of the Egger approach. (F) If a group of SNPs influences the outcome through a particular pathway other than the exposure (i.e. the SNPs are horizontally pleiotropic) then that group of SNPs will return consistently biased estimates. Clustering SNPs based on their estimates (grey lines) is possible with the mode-based estimator. The cluster with the largest weight (black line) is selected as the final causal estimate. The causal estimate from the mode-based estimator is unbiased if the SNPs contributing to the largest cluster are valid instruments.

https://doi.org/10.7554/eLife.34408.003

Crucially, MR can be performed using results from GWAS, in a strategy known as 2-sample MR ( 2SMR) (Pierce and Burgess, 2013). Here, the SNP-exposure effects and the SNP-outcome effects are obtained from separate studies. With these summary data alone, it is possible to estimate the causal influence of the exposure on the outcome. This has the tremendous advantage that causal inference can be made between two traits even if they aren’t measured in the same set of samples, enabling us to harness the statistical power of pre-existing large GWAS analyses. Due to the flexibility afforded by the 2SMR strategy, MR can be applied to 1000s of potential exposure-outcome associations, where ‘exposure’ can be very broadly defined, from gene expression and proteins to more complex traits, such as body mass index and smoking.

While MR avoids certain problems of conventional observational studies (Davey Smith and Ebrahim, 2001), it introduces its own set of new problems. MR is predicated on exploiting ‘vertical’ pleiotropy, where a SNP influences two traits because one trait causes the other (Davey Smith and Hemani, 2014). It is crucial to be aware of the assumptions and limitations that arise due to this model (Haycock et al., 2016). The main assumptions (Figure 1b) are: the instrument associates with the exposure (IV assumption 1); the instrument does not influence the outcome through some pathway other than the exposure (IV assumption 2); and the instrument does not associate with confounders (IV assumption 3). The IV1 assumption is easily satisfied in MR by restricting the instruments to genetic variants that are discovered using genome-wide levels of statistical significance and replicated in independent studies. The other two assumptions are impossible to prove, and, when violated, can lead to bias in MR analyses. Violations of the IV2 assumption can be introduced by ‘horizontal’ pleiotropy where the SNP influences the outcome through some pathway other than the exposure. Such effects can manifest in various different patterns (Figure 1c–f). When multiple independent instruments are available it is possible to perform sensitivity analyses that attempt to distinguish between horizontal and vertical pleiotropy and return causal estimates adjusted for the former (Bowden et al., 2016a; Bowden et al., 2015; Hartwig et al., 2017b). To improve reliability of causal inference, MR results should be presented alongside sensitivity analyses that make allowance for various potential patterns of horizontal pleiotropy. Further details on the design and interpretation of Mendelian randomization studies can be found in several existing reviews (Davey Smith and Hemani, 2014; Haycock et al., 2016; Swerdlow et al., 2016; Holmes et al., 2017; Zheng et al., 2017a). A glossary of terms can be found in Supplementary file 1F.

Model

In this section we describe how to use MR-Base to conduct MR analyses (Figure 2). The data required to perform the analysis can be described as a ‘summary set’ (Hemani et al., 2017a), where the genetic effects for a set of instruments are available for both the exposure and the outcome. To create a summary set we select appropriate instruments, obtain the effect estimates for those instruments for the exposure and the outcome, and harmonise the effects so that they reflect the same allele. We can then perform MR analyses using the summary set. These steps are supported by the database of GWAS results and R packages (‘TwoSampleMR’ and ‘MRInstruments’) curated by MR-Base and the following R packages curated by other researchers: 'MendelianRandomization' (Yavorska and Burgess, 2017), 'RadialMR' (Bowden et al., 2017b), 'MR-PRESSO' (Verbanck et al., 2018) and 'mr.raps' (Zhao et al., 2018). The statistical methods and R packages accessible through MR-Base are updated on a regular basis.

The practical steps for performing 2-sample Mendelian randomization (2SMR), as described in the Model section of the paper.

The database of genome-wide association study results and R packages (‘TwoSampleMR’ and ‘MRInstruments’) curated by MR-Base support the data extraction, harmonisation and analysis steps required for 2SMR. Additional R packages for MR from other researchers are also accessible, including MendelianRandomization (Yavorska and Burgess, 2017), RadialMR ( Bowden et al., 2017b), MR-PRESSO (Verbanck et al., 2018) and mr.raps (Zhao et al., 2018). The available methods are updated on a regular basis.

https://doi.org/10.7554/eLife.34408.004

Obtaining instruments

Instruments are characterised as SNPs that reliably associate with the exposure, meaning they should be obtained from well-conducted GWAS, typically involving their detection in a discovery sample at a GWAS threshold of statistical significance (e.g. p<5x10−8) followed by replication in an independent sample. The minimum data requirements for each SNP are effect sizes (βx), standard errors (σx) and effect alleles. Also useful are sample size, non-effect allele and effect allele frequency.

Sources

There are several data sources that can be used in MR-Base (Figure 3) to define exposure and outcome traits (the number of traits is updated on a regular basis):

The data available through MR-Base and the possible exposure-outcome analyses that can be performed.

Exposure traits can very broadly defined and may include molecular traits like gene expression, DNA-methylation, metabolites and proteins, as well as more complex traits, including cholesterol, body mass index, smoking and education. Further details on the traits with complete summary data can be found in Supplementary file 1A. The numbers reflect MR-Base in December 2017 and are updated on a regular basis.

https://doi.org/10.7554/eLife.34408.005
  1. The MR-Base database comprises complete GWAS summary data for hundreds of traits (Figure 3 and Supplementary file 1A). By ‘complete’ we mean all SNPs reported in a GWAS analysis, with no exclusions on the basis of a p-value threshold for association with the target trait of interest. It is possible for the user to extract the top-hits from this data source using their own criteria (e.g. strength of p-value). Alternatively, potential instruments can be obtained from the MRInstruments package, which includes independent SNP-trait associations from the database with p-value < 5e-8.

  2. Quantitative trait loci (QTL) studies performed on DNA methylation (Gaunt et al., 2016), gene expression (GTEx Consortium, 2015), protein (Deming et al., 2016) and metabolite (Shin et al., 2014; Kettunen et al., 2016) levels generate hundreds to thousands of independent associations for thousands of traits. The MRInstruments R package contains hundreds of thousands of 'omic QTLs for ease of use within MR-Base.

  3. The NHGRI-EBI GWAS catalog (Welter et al., 2014) comprises 21,324 SNPs associated with 1628 complex traits and diseases. This list of potential instruments has been harmonised and formatted for ease of use within MR-Base within the MRInstruments R package.

  4. User provided data can also be used for analysis.

Independence

It is important to ensure that instruments selected for an exposure are independent, unless measures are taken in the MR analysis to account for any correlation structures that arise through linkage disequilibrium. An efficient way to ensure that instruments are independent is to use clumping against a reference dataset of similar ancestry to the samples in which the GWAS was conducted. A clumping procedure has been implemented in MR-Base to automate the generation of independent instruments.

Obtaining SNP effects on the outcome

In order to generate the summary set, the effects of each of the instruments on the outcome need to be obtained. This typically requires access to the entire set of GWAS results because it is unlikely that the instrumental SNPs for the exposure will be amongst the top hits of the outcome GWAS. As with the exposure data, the outcome data must contain at a minimum the SNP effects (βy), their standard errors (σy) and effect alleles.

LD proxies

If a particular SNP is not present in the outcome dataset then it is possible to use SNPs that are LD ‘proxies’ instead. Here, it is important to ensure that for any LD proxy used, the surrogate effect allele is the one in phase with the effect allele of the original target SNP. LD proxy lookups are automatically provided by MR-Base.

Sources

There are two main sources that can be used (Figure 3):

  1. The MR-Base database comprises complete GWAS summary data for hundreds of traits (Supplementary file 1A). Fast lookups for specific SNPs against specific traits can be performed. If a requested SNP is absent, then MR-Base automatically searches for LD proxies, estimated using data from the 1000 genomes project (1000 Genomes Project Consortium et al., 2015), and returns the corresponding data for the best proxy (Figure 2).

  2. User provided complete GWAS summary data can be used with the R package.

Harmonising exposure and outcome SNP effects

To generate a summary set, for each SNP we need its effect and standard error on the exposure and the outcome corresponding to the same effect alleles (Hartwig et al., 2016). This is impossible to generate if the effect alleles for the SNP effects in the exposure and outcome datasets are unknown. MR-Base uses knowledge of the effect alleles, and where necessary the effect allele frequencies, to automatically harmonise the exposure and outcome datasets. The following scenarios are considered:

Wrong effect alleles

A SNP with (for example) effect/non-effect alleles G/T for the exposure and T/G for the outcome are harmonised by flipping the sign of the SNP-outcome effect.

Strand issues

SNPs that are reported as (for example) G/T for the exposure summary dataset and C/A for the outcome dataset indicate a strand issue, where, for example, one study has reported the effect on the forward strand and the other on the reverse strand. In this case, the outcome alleles are flipped to match those of the exposure alleles, and effect alleles are then aligned.

Palindromic SNPs

SNPs with A/T or G/C alleles are known as palindromic SNPs, because their alleles are represented by the same pair of letters on the forward and reverse strands, which can introduce ambiguity into the identity of the effect allele in the exposure and outcome GWASs. If reference strands are unknown, effect allele frequency can be used to resolve the ambiguity. For example, consider a SNP with alleles A and T, with a frequency of 0.11 for allele A in the exposure study and 0.91 in the outcome study. In addition, both studies have coded allele A as the effect allele and both are of European origin. The fact that allele A is the minor allele in the exposure study (frequency<0.5) and the major allele (>0.5) in the outcome study implies that the two studies have used different reference strands. To ensure that the effect sizes for the SNP reflect the same allele it is therefore necessary to switch the direction of the effect in either the exposure or outcome study (the default in MR-Base is to flip the direction of effect in the outcome study). Effect allele frequency may not, however, be a reliable indicator of reference strand when it is close to 0.5. This process has been described in more detail previously (Hartwig et al., 2016).

Incompatible alleles

If a SNP has (for example) A/G alleles for the exposure and A/C alleles for the outcome, there is no combination of flipping that can reconcile these differences, and either there are build differences or there is an error in the data. In this instance the SNP is excluded from the analysis.

Performing MR analysis

The generated summary set can now be analysed using a range of methods (summarised in Supplementary file 1B but new methods are added on a regular basis). The most basic way to combine these data is to use a Wald ratio where the estimated causal effect is

βMR=βyβx

and the standard error of the estimate is

σMR=σyβx

If there are multiple independent instruments for the exposure (as is typically the case for complex traits with well-powered GWAS), then our analysis can potentially improve in two major ways: first, the variance explained in the exposure, and therefore statistical power will improve; second, we can evaluate the sensitivity of the estimate to bias arising from violations of the IV2 assumption by assuming different patterns of horizontal pleiotropy. Sensitivity analyses are performed automatically by MR-Base.

Inverse variance weighted MR

The simplest way to obtain an MR estimate using multiple SNPs is to perform an inverse variance weighted (IVW) meta analysis of each Wald ratio (Johnson, 2012), effectively treating each SNP as a valid natural experiment. Fixed effects IVW assumes that each SNP provides the same estimate or, in other words, none of the SNPs exhibit horizontal pleiotropy (or other violations of assumptions). Random effects IVW relaxes this assumption, allowing each SNP to have different mean effects, e.g. due to horizontal pleiotropy (Bowden et al., 2017a). This will return an unbiased estimate if the horizontal pleiotropy is balanced, i.e. the deviation from the mean estimate is independent from all other effects. Another way to conceptualise this result is as a weighted regression of the SNP-exposure effects against the SNP-outcome effects, with the regression constrained to pass through the origin, and with weights derived from the inverse of the variance of the outcome effects. MR-Base implements a random effects IVW model by default, unless there is underdispersion in the causal estimates between SNPs, in which case a fixed effects model is used. The estimates from the random and fixed effects IVW models are the same but the variance for the random effects model is inflated to take into account heterogeneity between SNPs.

Maximum likelihood

An alternative strategy to the IVW approach is to estimate the causal effect by direct maximisation of the likelihood given the SNP-exposure and SNP-outcome effects and assuming a linear relationship between the exposure and outcome (Pierce and Burgess, 2013). Similar to the fixed effects IVW approach, the method assumes that the effect of the exposure on the outcome due to each SNP is the same, i.e. assumes there is no heterogeneity or horizontal pleiotropy. An unbiased estimate will be returned in the absence of horizontal pleiotropy or when horizontal pleiotropy is balanced (but the variance of the effect estimate will be overly precise in the latter case). An advantage of the method is that it may provide more reliable results in the presence of measurement error in the SNP-exposure effects.

MR Egger analysis

Relaxing the IV2 assumption of ‘no horizontal pleiotropy’, MR-Egger (Bowden et al., 2015; Bowden et al., 2016b) adapts the IVW analysis by allowing a non-zero intercept, allowing the net-horizontal pleiotropic effect across all SNPs to be unbalanced, or directional. The method returns an unbiased causal effect even if the IV2 assumption is violated for all SNPs but assumes that the horizontal pleiotropic effects are not correlated with the SNP-exposure effects (this is known as the InSIDE assumption). Horizontal pleiotropy refers to the effects of the SNPs on the outcome not mediated by the exposure.

Median-based estimator

An alternative approach is to take the median effect of all available SNPs (Bowden et al., 2016a; Kang et al., 2014). This has the advantage that only half the SNPs need to be valid instruments (i.e. exhibiting no horizontal pleiotropy, no association with confounders, robust association with the exposure) for the causal effect estimate to be unbiased. The weighted median estimate allows stronger SNPs to contribute more towards the estimate, and can be obtained by weighting the contribution of each SNP by the inverse variance of its association with the outcome.

Mode-based methods

The mode-based estimator clusters the SNPs into groups based on similarity of causal effects, and returns the causal effect estimate based on the cluster that has the largest number of SNPs (Hartwig et al., 2017b). The mode-based method returns an unbiased causal effect if the SNPs within the largest cluster are valid instruments. Clustering is performed using a kernel density function that requires selecting a bandwidth parameter. The weighted mode introduces an extra element similar to IVW and the weighted median, weighting each SNP’s contribution to the clustering by the inverse variance of its outcome effect.

Diagnostics and sensitivity analyses

It is recommended that the methods described above are applied to all MR analyses and presented in publications to demonstrate sensitivity to different patterns of assumption violations. MR-Base also automatically performs the following further sensitivity analyses and diagnostics

Heterogeneity tests

Heterogeneity in causal effects amongst instruments is an indicator of potential violations of IV assumptions (Bowden et al., 2017a). Heterogeneity can be calculated for the IVW and Egger estimates, and this can be used to navigate between models of horizontal pleiotropy (Bowden et al., 2017a).

Leave-one-out analysis

To evaluate if the MR estimate is driven or biased by a single SNP that might have a particularly large horizontal pleiotropic effect, we can re-estimate the effect by sequentially dropping one SNP at a time. Identifying SNPs that, when dropped, lead to a dramatic change in the estimate can be informative about the sensitivity of the estimate to outliers.

Funnel plots

A tool used in meta-analysis is the funnel plot in which the estimate for a particular SNP is plotted against its precision (Sterne et al., 2011). Asymmetry in the funnel plot may be indicative of violations of the IV2 assumption through horizontal pleiotropy.

Other MR analysis methods

In addition to the above, MR-Base also supports access to the following statistical methods implemented in other R packages: an extension of the IVW method that allows for correlated SNPs (Yavorska and Burgess, 2017), a method for the detection and correction of outliers in IVW linear regression (MR-PRESSO, Verbanck et al., 2018), methods for fitting and visualising radial IVW and radial MR-Egger models (Bowden et al., 2017b) and a method for correcting for horizontal pleiotropy using MR-RAPS (2SMR using robust adjusted profile scores (Zhao et al., 2018).

Results

The MR-Base database resource

We created a repository for complete GWAS summary data, where complete refers to all SNPs reported in a GWAS analysis with no exclusions according to p-values for the association with the trait of interest (e.g. datasets were not restricted to statistically significant SNPs). We included summary data from any array-based analysis, including targeted and untargeted arrays, with or without additional imputation for ungenotyped SNPs. The targeted arrays included immunochip and metabochip, as well as replication and fine-mapping studies with ≥10,000 variants. As of December 2017, the repository was populated by curated and harmonised datasets from 1673 GWAS analyses, corresponding to approximately 11 billion SNP-trait associations in 4 million samples (median sample size per study: 21,315). Excluding replication and fine-mapping studies, the median number of SNPs per study was 6.1 million (minimum = 79,129, maximum = 22,434,434); 95% of studies reported ≥393,465 SNPs. The current database also includes nine studies with ≤64,494 SNPs that generally correspond to replication and fine-mapping studies. The analysed traits included 605 traits generated using the UK Biobank resource (Millard et al., 2017; Bycroft et al., 2017; Churchhouse and Neale, 2017;GIANT consortium et al., 2016; Jones et al., 2016; Pilling et al., 2016), 575 metabolomic traits from two studies (Shin et al., 2014; Kettunen et al., 2016), 151 immunological traits from one study (Roederer et al., 2015), and 342 other complex traits and diseases acquired from 123 GWAS publications (Supplementary file 1A). The latter publications corresponded to 79 studies, including 39 consortia and three cohorts. Supplementary file 1A provides a detailed overview of the available studies with complete summary data in MR-Base at the time of writing but the number is updated on a regular basis.

In addition to the ‘complete summary data’, we also collected published GWAS associations that comprise only the significant hits of a GWAS after applying stringent p-value thresholds (e.g. p<5 × 10−8, a conventional threshold for declaring statistical significance in GWAS). These ‘top hits’, which can be used to define genetic instruments for exposures in MR analyses (see Materials and methods), include 29,792 SNPs obtained from clumping analysis of 1002 traits in the MR-Base database; 21,324 SNPs associated with 1628 complex traits and diseases in the NHGRI-EBI GWAS catalog (Welter et al., 2014); 187,318 SNPs associated with DNA methylation levels in whole blood at 33,256 genomic CpG sites, across five time points (Gaunt et al., 2016); 187,263 SNPs associated with gene expression levels at 27,094 gene identifiers, across 44 different tissues (GTEx Consortium, 2015); 1088 SNPs associated with metabolite levels in whole blood for 121 different metabolites (Kettunen et al., 2016); and 56 SNPs associated with protein levels in 47 different analytes (Deming et al., 2016).

The repositories of GWAS results described above can be interrogated and exploited for 2SMR using the R packages curated by MR-Base and other researchers. The R packages currently curated by MR-Base include ‘TwoSampleMR’ (https://github.com/MRCIEU/TwoSampleMR) and ‘MRInstruments’ https://github.com/MRCIEU/MRInstruments). Users can check the MR-Base website for updates to the curated packages. Accessible R packages curated by other researchers are described in Supplementary file 1B

Using MR-Base to estimate the causal relationship between LDL cholesterol and coronary heart disease

In an applied example, we conducted a MR study of the causal effect of LDL cholesterol on CHD risk, using summary data from the GLGC (Willer et al., 2013; Do et al., 2013) and CARDIoGRAMplusC4D consortia (Nikpay et al., 2015), respectively. There were 91 studies (214,370 subjects) in the GLGC, 48 studies (195,813 subjects) in CARDIoGRAMplusC4D and 17 studies that were common to both consortia (including 59,970 subjects). We estimated that 31% of subjects in CARDIoGRAMplusC4D are also part of the GLGC and 28% of GLGC participants are also part of CARDIoGRAMplusC4D. The selected instruments (Supplementary file 1C) reportedly explained 2.4% of the variance in LDL cholesterol levels (Willer et al., 2013), equivalent to to an F statistic of 85 in the GLGC. This indicates that the instrument is strong and therefore unlikely to be susceptible to weak instrument bias or bias from sample overlap (Burgess et al., 2011).

The random effects IVW estimate indicated that the odds ratio (OR) (95% confidence interval [CI]) for CHD was 1.45 (1.30–1.62) per standard deviation increase in LDL cholesterol (Figure 4). There was, however, strong evidence for heterogeneity amongst SNPs (Cochran’s Q value = 122.5, p=4.72e-07) and funnel plot asymmetry (Figure 4a and d), suggesting that at least some of the SNPs exhibit horizontal pleiotropy (a violation of the IV2 assumption, as shown in Figure 1b). There was evidence for a negative intercept (−0.013 [s.e.=0.005], p=0.020) and stronger odds ratio (1.85 [1.48–2.32]) in MR-Egger regression (Figure 4b) indicating some amount of directional horizontal pleiotropy. Similar results to the IVW estimate were provided by the weighted median (1.56 [1.43-1.70]) and weighted mode (1.68 [1.56-1.80]) estimators (Figure 4). In a leave-one-out analysis, we sequentially excluded one instrument (SNP) at a time to assess the sensitivity of the results to individual variants, finding that no single instrument was strongly driving the overall effect of LDL cholesterol on CHD (Figure 4c).

Mendelian randomization study of the effect of low density lipoprotein cholesterol levels on coronary heart disease.

(a) A forest plot, where each black point represents the log odds ratio (OR) for coronary heart disease (CHD) per standard deviation (SD) increase in low density lipoprotein (LDL) cholesterol, produced using each of the ‘LDL single nucleotide polymorphisms (SNPs)’ as separate instruments, and red points showing the combined causal estimate using all SNPs together in a single instrument, using each of four different methods (weighted median, weighted mode, inverse-variance weighted [IVW] random effects and MR-Egger). Horizontal lines denote 95% confidence intervals. (b) A plot relating the effect sizes of the SNP-LDL association (x-axis, SD units) and the SNP-CHD associations (y-axis, log OR) with standard error bars. The slopes of the lines correspond to causal estimates using each of the four different methods. Outlier SNPs are labeled. (c) Leave-one-out sensitivity analysis. Each black point represents the IVW MR method applied to estimate the causal effect of LDL on CHD excluding that particular variant from the analysis. The red point depicts the IVW estimate using all SNPs. There are no instances where the exclusion of one particular SNP leads to dramatic changes in the overall result. (d) Funnel plot showing the relationship between the causal effect of LDL on CHD estimated using each individual SNP as a separate instrument against the inverse of the standard error of the causal estimate. Vertical lines show the causal estimates using all SNPs combined into a single instrument for each of four different methods. There is some asymmetry in the plot (an excess of strong protective effects associated with higher LDL cholesterol), which is potentially indicative of violations of instrumental variable (IV) assumptions, e.g. violation of the IV2 assumption through horizontal pleiotropy. Outlier SNPs are labeled.

https://doi.org/10.7554/eLife.34408.006

Using PheWAS to interpret outliers and identify potential sources of horizontal pleiotropy

Inspection of the forest, funnel and scatter plots highlighted three outlier SNPs (rs11065987, rs1250229 and rs4530754) as potential sources of heterogeneity and the negative intercept in MR-Egger regression (Figure 4). For these three SNPs, the LDL-raising variant was associated with lower CHD risk, contrary to the results for the majority of the other LDL-raising variants. In such situations it is strongly advised to check for effect allele coding errors (Hartwig et al., 2016). We confirmed that the SNPs were not palindromic (such SNPs are particularly prone to coding errors in 2SMR) and that the LDL and CHD risk variants were compatible with those reported in the GWAS catalog and the original study reports. The unusually strong cardio-protective effects of the three LDL raising variants are also compatible with horizontal pleiotropy, whereby the effects of the SNPs on CHD are independent of their effects on LDL cholesterol.

To identify potential sources of horizontal pleiotropy, we performed a PheWAS of rs11065987, rs1250229 and rs4530754, using a threshold of p<2.04e-05 (0.05/2453 ‘trait lookups’) to select traits for further evaluation. Only rs11065987, located upstream of the BRAP gene, was associated with non-lipid non-vascular-disease traits, including two markers of adiposity (body mass index and hip circumference); three blood pressure traits (diastolic blood pressure, systolic blood pressure and mean arterial pressure); five hematological markers (hematocrit, haemoglobin concentration, packed cell volume, red blood cell count and platelet count); five autoimmune diseases (inflammatory bowel disease, primary biliary cirrhosis, Crohn's disease, rheumatoid arthritis and celiac disease); four metabolites (urate, kynurenine, erythronate and C-glycosyltryptophan); serum cystatin C (a marker of kidney function); and tetralogy of Fallot (Supplementary file 1D).

In further MR analyses of these traits, we found that higher hematocrit, higher blood pressure, higher BMI and higher hip circumference were putatively associated with higher CHD risk (p<0.05) (Supplementary file 1D). However, of these traits, only the indirect effect of rs11065987 due to hematocrit (-0.012, SE=0.005) was in the same direction as the direct effect of rs11065987 on CHD (−0.060, SE = 0.011), whereas the indirect effects mediated by hip circumference (0.004, SE = 0.002), BMI (0.007, SE = 0.001) and LDL cholesterol (0.011, SE = 0.001) were in opposite directions. Due to unreported effect alleles in the relevant GWAS, we were unable to assess the indirect effect of rs11065987 mediated by blood pressure. These results suggest that at least one cardio-protective mechanism for the LDL-raising variant of rs11065987 is due to a pleiotropic effect of hematocrit. However, this result should be interpreted with caution and requires replication in independent studies and further examination for potential violations of MR assumptions in sensitivity analyses.

Performing hypothesis-free searches for causal relationships

In order to gain insight into potential opportunities for repurposing or adverse effects of LDL cholesterol lowering - an established intervention stategy for CHD prevention - we conducted a hypothesis-free MR-PheWAS analysis (Millard et al., 2015; Haycock et al., 2017). Instrumented using 62 SNPs (Supplementary file 1C), we obtained fixed effects IVW estimates of lower LDL cholesterol on 40 non-vascular diseases and 108 non-lipid complex traits in MR-Base (Figure 5). Using an unadjusted p-value of 0.05 to denote suggestive evidence for association, we identified 16 non-vascular traits associated with LDL cholesterol. Surpassing a 5% false discovery threshold were lower mortality measures, higher adiposity measures, and higher risk of type two diabetes. We emphasise that this analysis is shown here for purposes of demonstrating the utility of MR-Base for efficiently screening many traits for hypothesis generation, and any claims of causality must be followed up with rigorous examination of potential violations of MR assumptions in sensitivity analyses, replication in independent studies and triangulation with evidence from other study designs (Lawlor et al., 2016; Munafò and Davey Smith, 2018).

Effect of lower low density lipoprotein cholesterol on 150 traits in MR-Base.

The x-axis shows the standard deviation (SD) change or log odds ratio (OR) for each of 150 traits per SD decrease in low density lipoprotein (LDL) cholesterol. The y-axis shows the p-value for the association on a -log10 scale. The effects on the x-axis correspond to the slope from fixed effects inverse variance weighted (IVW) linear regression of single nucleotide polymorphism (SNP)-outcome effects regressed on the SNP-LDL effects. Those results that have a p-value<0.05 are labelled. Larger points denote false discovery rate (FDR) < 0.05. LDL cholesterol was instrumented by 62 SNPs.

https://doi.org/10.7554/eLife.34408.007

Discussion

As the availability of published GWAS summary data continues to grow, MR offers an attractive approach for exploring the aetiology of disease (Pickrell et al., 2016). We have developed a platform that integrates a database of GWAS summary data together with a public repository of statistical methods for enabling systematic causal inference across the phenome. This benefits modelling of phenotypic relationships in three ways. First, it greatly simplifies and expedites the practice of performing MR and PheWAS analyses. Second, automating the application of state-of-the-art methodology establishes basic standards for reporting MR results and improves the reliability and reproducibility of causal inference. Third, it maximises the breadth of possible causal relationships that can be interrogated by drawing together genetic information on as many traits as possible. This presents new analytical opportunities that may not have been feasible previously.

In an applied example, we used MR-Base to recapitulate the known (Holmes et al., 2017) causal effect of higher LDL cholesterol on CHD risk, but also found strong evidence for violations of assumptions. The latter was at least partly explained by three LDL raising variants that were associated with lower CHD risk. One of the variants, rs11065987, was subject to substantial pleiotropy, suggesting that its association with CHD may operate independently of its effect on LDL cholesterol, with hematocrit identified as a potential alternative pathway. Despite the evidence for violations of IV assumptions, results remained compatible with a causal effect of higher LDL cholesterol on CHD in sensitivity analyses and were similar to findings from clinical trials (Silverman et al., 2016; White et al., 2016) and observational studies (Di Angelantonio et al., 2009). We also showed putative evidence that lower LDL cholesterol is causally related to lower mortality, but higher adiposity and higher risk of type two diabetes. The latter result is compatible with trials that have shown that lipid lowering drugs similarly increase risk of type two diabetes (Swerdlow et al., 2015; Schmidt et al., 2017; Sattar et al., 2010). These results were shown to demonstrate the utility of MR-Base for hypothesis-free MR-PheWAS analyses. We refrain from making strong claims of causality here, but the latter results highlight the scope of MR-Base for identifying opportunities for drug repurposing or potential adverse effects of pharmaceutical and public health interventions.

Much has been written about the strengths and limitations of MR (Davey Smith and Hemani, 2014; Haycock et al., 2016; Swerdlow et al., 2016; Holmes et al., 2017; VanderWeele et al., 2014; Lawlor et al., 2008; Lawlor et al., 2016; Zheng et al., 2017a), and we summarise how these relate to MR-Base in Supplementary file 1E, with particular focus on the most important strengths and limitations below.

Strengths of MR-Base

Efficiency of performing 2SMR

Automation of 2SMR greatly facilitates the practical implementation of the analysis, from acquiring, checking and harmonising GWAS summary data, and identifying LD proxies where necessary, to performing a battery of inferential and sensitivity analyses. In addition to the tests that we have implemented, we have also made it straightforward to apply other R packages that implement MR methodology to the MR-Base database. We advocate that causal inference is presented in a triangulation framework, consolidating evidence from different study designs (Lawlor et al., 2016; Munafò and Davey Smith, 2018). With automation of 2SMR, researchers are afforded more bandwidth to integrate evidence from other, complementary, study designs.

Reliability of MR estimates

MR-Base seeks to improve reliability in two ways. First, automation of data handling procedures with a rigorous quality control (QC) pipeline can eliminate errors that might arise due to human error. It has been shown that discrepant results can easily arise using identical data because of potential errors of implementation (Hartwig et al., 2016; Hartwig et al., 2017; Inoshita et al., 2018). Although MR-Base uses a number of strategies to reduce the potential for effect allele coding errors (see Materials and methods), we strongly encourage users to check that effect signs (i.e. positive or negative betas) accurately reflect the correct effect allele, in both the outcome and exposure study. Second, bias can arise in MR analyses if its assumptions are not met, and some of these are unprovable. We enable users to easily perform state-of-the-art methods and sensitivity analyses to evaluate their results, and the repository of methods available to users will continually expand as new ones become available.

Reproducibility of 2SMR results

The integration of data and analytical software under a single platform means that the results presented by researchers can easily be reproduced by other analysts. Diagnostic reports are automatically generated, and if using the web application we generate R code that can be used to reproduce any results that are derived through the graphical user interface.

A central data repository of complete GWAS summary data

Through our PheWAS of MR outliers we have shown the utility of having GWAS summary data together in one location. We have developed a highly scalable database infrastructure which can be used to store new and existing studies, and that can be easily and rapidly queried through a web app or an application programming interface (API). Researchers may wish to only make their GWAS summary data publicly available after a certain time period; to this end, we have developed an authentication system that allows fine-grained permission controls to restrict access for particular datasets to specific users, where necessary.

Avoiding reverse causal instruments

MR-Base provides the Steiger test, a method for identifying the correct direction of effect (Hemani et al., 2017b). The use of p-value thresholds for instrument selection can increase the potential for ‘reverse causal’ instruments, which occurs when the effect of genetic variants on a hypothesized exposure actually occurs via the hypothesized outcome, and can lead to incorrect inferences about directions of causality (Hemani et al., 2017a). One way to mitigate the impact of such invalid instruments is to check that the selected genetic variants are more strongly associated with the hypothesized exposure than with the outcome. It is important to note, however, that this test is subject to a number of assumptions (Hemani et al., 2017b).

Limitations that are inherent to 2SMR

Sample overlap

GWAS typically involves meta-analyses of large numbers of different cohorts, where each cohort is likely to contribute to GWAS on many different traits. As a consequence there is likely to be considerable sample overlap amongst different GWAS within the database (e.g. we estimated that 28% of GLGC participants were also part of the CARDIoGRAMplusC4D consortium), which could bias effect estimates from 2SMR towards the confounded observational association - an example of weak instrument bias (Pierce and Burgess, 2013). Bias from sample overlap is made worse if some or all of the overlapping studies are also discovery studies for the SNP-exposure or SNP-outcome associations - due to the phenomenon of winner’s curse (Haycock et al., 2016; Bowden and Dudbridge, 2009). We therefore strongly advise users of MR-Base to estimate the degree of participant overlap amongst the exposure, outcome and discovery GWAS in their analyses. Bias from sample overlap can be minimized by using strong instruments (e.g. F statistic much greater than 10 for the instrument-exposure association) (Pierce and Burgess, 2013). In our applied example, although we estimated some overlap between the GLGC and CARDIoGRAMplusC4D consortia, the robust strength of our instrument for LDL cholesterol (an F statistic of 85) likely minimized any potential impact of weak instruments bias in our results. We also encourage users to conduct sensitivity analyses using replication studies to define their instruments. See Pierce and Burgess (Pierce and Burgess, 2013) for further details on the relationship between instrument strength and bias from sample overlap.

Samples derived from the same population

Although the exposure and outcome studies used in 2SMR should not involve overlapping participants, the participants from both studies should be from the same population - practically defined as being of similar age and sex distributions and geographic and ancestral origins (Angrist and Krueger, 1995) - with similar patterns of LD in the genomic regions used to define the exposure. When exposure and outcome studies are derived from different populations, this could bias the magnitude of the association between the exposure and outcome. This could arise, for example, as a result of differences in LD between the SNPs used as instruments for the exposure and the underlying causal variants for the exposure, resulting in differences in instrument-exposure effect sizes (note that knowledge of causal variants is not required for MR). In addition, instrument-exposure effect sizes may vary with age or may differ between males and females. MR-Base provides meta-data on population characteristics, such as geographic origin, to help guide the user in selecting the most appropriate populations for their analysis. However, subtle population differences between studies of broadly similar backgrounds may persist. On the other hand, differences in population backgrounds should not generally increase the likelihood of incorrectly inferring an association when none exists (Burgess et al., 2016). Thus, results may still be informative for directions of causality, but not the magnitude of the effect, when the exposure and outcome studies are derived from different populations.

Limitations compounded by MR-Base

Multiple testing

Although automation allows more complex study designs, such as many-to-many MR studies and PheWAS, this also increases the scope for multiple testing, false positives and data dredging (whereby users do not present the results from all their analyses but cherry pick a few on the basis of arbitrary p-value thresholds). Users of MR-Base should adhere to well defined analysis plans and should be fully transparent about all traits investigated, including all exploratory analyses. Traditional MR studies, which are hypothesis-driven and done in conjunction with findings from observational studies, should be less prone to these biases.

Interpretation

Automation shifts the problem of handling complex data to a problem of interpreting complex results. In a hypothesis-free study, involving many exposure-outcome combinations and multiple methods, a critical evaluation of each causal estimate may not be possible and could increase the potential for cherry picking of results. In such a situation, it may be more appropriate to consider a machine learning framework to select the most appropriate method and instrument selection strategy for each exposure-outcome analysis (Hemani et al., 2017a).

Biased effects from automated instrument selection

Ideally, one would only use replicated GWAS effect sizes for instrumental variables because this reduces the impact of winner’s curse (Zollner and Pritchard, 2007; Bowden and Dudbridge, 2009). In the presence of winner’s curse the causal effect estimates are liable to be biased towards the null, assuming the exposure and outcome studies are independent (in the presence of substantial overlap between exposure and outcome studies winner’s curse can compound the effect of weak instruments bias). The MR-Base database of complete summary data is currently biased towards discovery GWAS, with few results from replication studies. This means that the clumping procedure used to obtain independent significant hits will likely return instruments with inflated effect sizes. Similar problems exist for the ‘omic QTL datasets, which seldom have independent replication data available. We advise users to assess the sensitivity of their results to winner's curse, e.g. using independent studies to redefine their instruments. MR-Base currently includes GWAS results for 605 traits from UK Biobank that could be used for this purpose and this number will increase over time.

General advice for interpretation of results

When the biases and limitations indicated above are avoidable, users should consider modifying their analyses. For example, users could use an automated approach for instrument selection in their primary analysis but use manually curated instruments in sensitivity analyses of prioritised results. Sometimes, however, biases may be unavoidable, in which case users should acknowledge their possible impact and relax their conclusions. For example, inferences about directions of causality, shared genetic architecture (Burgess et al., 2014) or null effects (VanderWeele et al., 2014) are usually much more reliable than inferences about magnitudes of causal effects, which are very sensitive to violations of assumptions.

Related resources

A number of resources are available for MR analysis or extracting and using GWAS summary data. The MendelianRandomization R package (Yavorska and Burgess, 2017) is a standalone tool comprising several 2SMR methods and, in addition to the methods that we have implemented, we make it easy to import data from MR-Base into that R package. PhenoScanner (Staley et al., 2016) expands upon established GWAS catalogs (Welter et al., 2014; Li et al., 2016) by storing a large number of complete summary level datasets and providing a web interface for specific SNP lookups. SMR (Zhu et al., 2016) has been developed to automate colocalisation analysis between eQTLs and complex traits.

Summary

There has been a massive growth in the phenotypic coverage and statistical power of GWAS over the past decade (Visscher et al., 2017; Welter et al., 2014). Many approaches to studying complex traits and diseases can now be interrogated using GWAS summary level data. By harvesting and harmonising these data into a database and directly integrating this with analytical software for 2SMR, MR-Base greatly improves the efficiency and reliability of hypothesis-driven approaches. The database is a generic repository of GWAS summary data accessible via an API and future work will see extensions to support other methods for the investigation of complex trait genetic architecture, such as fine-mapping (Benner et al., 2016), colocalisation (Zhu et al., 2016; Newcombe et al., 2016; Giambartolomei et al., 2014) and polygenic risk prediction (Dudbridge, 2013; Euesden et al., 2015). The curation of data and methods achieved by MR-Base opens up new opportunities for hypothesis-free and phenome-wide approaches.

Resources

The following resources are all part of the MR-Base platform:

Code to reproduce the analysis in this paper: https://github.com/explodecomputer/mr-base-methods-paper.

Materials and methods

Overview

MR-Base comprises two main components: a database of GWAS summary association statistics and LD proxy information, and R packages and web applications for causal inference methods. The GWAS summary data are further structured into complete summary data (i.e. all SNP-phenotype associations) and ‘top hits’, which comprises subsets of GWAS summary data (typically the statistically significant results). The R packages include the TwoSampleMR (https://github.com/MRCIEU/TwoSampleMR) package, which supports data extraction, data harmonisation and MR analysis methods, and the MRInstruments (https://github.com/MRCIEU/MRInstruments) package, which is a repository for instruments. MR-Base also supports access to R packages curated by other researchers, including MendelianRandomization (Yavorska and Burgess, 2017) (https://cran.r-project.org/web/packages/MendelianRandomization/), RadialMR (Bowden et al., 2017b) (https://github.com/wspiller/radialmr), MR-PRESSO (Verbanck et al., 2018) (https://github.com/rondolab/MR-PRESSO) and mr.raps (Zhao et al., 2018) (https://github.com/qingyuanzhao/mr.raps). The available methods are updated on a regular basis. The database is accessible through an API (http://api.mrbase.org) and is therefore extendable to other causal inference methods not currently covered by the aforementioned packages. A web app (http://app.mrbase.org) was developed as a user-friendly wrapper to the R package using the R/shiny framework. The SNP lookup tool is available through the PheWAS web app (http://phewas.mrbase.org/). Scripts to perform the analyses presented in this paper are available at https://github.com/explodecomputer/mr-base-methods-paper (Hemani, 2018; copy archived at https://github.com/elifesciences-publications/mr-base-methods-paper).

MR-Base database

Obtaining summary data from genomes wide association studies

We downloaded publicly available datasets from study-specific websites and dbGAP and invited studies curated by the NHGRI-EBI GWAS catalog to share results (if these were not already publically available). To be eligible for inclusion in MR-Base, studies should provide the following information for each SNP: the beta coefficient and standard error from a regression model (typically an additive model) and the modelled effect and non-effect alleles. This is the minimum information required for implementation of 2SMR. The following information is also sought but is not essential: effect allele frequency, sample size, p-values for SNP-phenotype associations, p-values for Hardy–Weinberg equilibrium, p-values for Cochran's Q test for between study heterogeneity (if a GWAS meta-analysis) and metrics of imputation quality, such as info or r2 scores (for imputed SNPs). MR-Base also includes information on the following study-level characteristics: sample size, number of cases and controls (if a case-control study), standard deviation of the sample mean for continuously distributed traits, geographic origin and whether the GWAS was conducted in males or females (or both). In future extensions to MR-Base, we plan to collate more detailed information on phenotype distributions (e.g. sample means for continuously distributed phenotypes) and population characteristics (mean and standard deviation of age and number of males and females) and statistical models (e.g. covariates included in regression models and genomic control inflation factors).

Creating a database of harmonized GWAS summary data

GWAS summary data posted online tends not to follow standardised formats, therefore harmonisation of these disparate data sources is a manual process. We adopted a systematic approach to harmonize these data and developed an Elasticsearch database (https://www.elastic.co/products/elasticsearch, v5.6.2) to store, structure and query the harmonised data. Insofar as it was possible we recorded all QC and harmonisation processes for each of the 1673 datasets to aid with reproducibility. The following steps were taken for each dataset:

Step 1. Pre-cleaning data

Prior to file harmonisation we carried out the following pre-cleaning steps: dropped duplicate datasets, removed non–SNP-level meta-information, removed unexpected characters such as non-utf8 characters, split 95% confidence intervals contained within a single column into two columns. In addition, when results from a single GWAS were split across separate chromosome files, we combined these into a single file. When a single file contained results for multiple traits, we split the file into separate files for each trait.

Step 2. Collecting and arranging meta-data

In order to collect the key information and harmonize these data into a uniform format, the column headers of each of the pre-cleaned GWAS files were collected using bash shell scripts and stored in Google sheets. The column headers were then reordered and columns with required information (such as file path, file name, file suffix, SNP rs ID, effect allele, other allele, effect size, standard error, p-value and sample size) were recorded.

Step 3. Converting summary data into a uniform format

After collecting and rearranging the column information, we automatically converted the files into uniform formats, converting odds ratios to log odds ratios, estimating standard errors from 95% confidence intervals or, when the latter were missing, from effect estimates and p-values, and inferring sample size from overall sample size if SNP-level sample size was missing. The output files are tab-delimited and contain 8 columns of information: SNP rs number, effect allele, other allele, effect allele frequency, beta (effect estimate for continuous traits or log odds ratio for binary traits), standard error, p-value and sample size.

Step 4. Controlling quality of data

In order to control the quality of the data we: drop duplicate SNP records, remove records with missing SNP ID or p-value, check mislabelled columns, check data type for each cell of data (i.e. data should be ASCII format for the SNP ID column; string (A, G, C or T) for effect allele and other allele column; numerical value between 0 and 1 for effect allele frequency column and p-value column; numerical value for beta column; numerical value larger than 0 for standard error column; and a numerical value larger than 0 for sample size column). The identity of the effect allele column is confirmed by checking the meta-data or readme files accompanying the GWAS results, correspondence with the original study or comparison with the NHGRI-EBI GWAS catalog.

Step 5. Building the relationship of data and creating an elasticsearch index

The cleaned and harmonized data were then indexed using Elasticsearch (https://www.elastic.co/products/elasticsearch, v5.6.2) to create a search engine. As shown in Supplementary file 1G, the database is structured into sets of indices (datasets) for association data, with two MySQL tables for study and LD proxy data:

  • The study table (MySQL) contains study-level information, including file name, trait name, study name (or first author of relevant GWAS publication if study name missing), geographic origins of the study, whether the study contains males or females (or both), number of cases, number of controls, sample size, PubMed identifier (PMID), publication year, units of the SNP-trait effect (e.g. mg/dL, log odds, etc); standard deviation of the sample mean for the trait (if continuously distributed); manually curated trait category (e.g. disease or risk factor); and manually curated trait subcategory (e.g. cardiovascular disease, cancer, anthropometric risk factor, etc).

  • The association datasets (Elasticsearch) contain the SNP association information including study ID, SNP ID, effect allele, other allele, beta, standard error, p-value and sample size. Effect allele refers to the modelled or coded allele in linear or logistic regression (typically using an additive genetic model); other allele refers to the non-effect allele; the beta, standard error and p-value refer to the change in trait per copy of the effect allele. Separate indices exist for major ‘batches’ of data, but are queried in combination by the API, requiring no additional operations on the part of the user.

  • The ‘SNP proxies’ table (MySQL) provides a list of proxy SNPs for when the test SNP is missing from the requested study and increases the chance of identifying the SNP in both the exposure and outcome data.

Linkage disequilibrium proxy data

One of the main functions of the MR-Base database is to provide association data for requested SNPs from studies of interest to the user (Figure 2). Often, however, a requested SNP may not be present in the requested GWAS (e.g. because of different imputation panels or because imputed SNPs were not available). In order to enable information to be obtained even when SNPs are missing, we provide an LD proxy function using 1000 genomes data from 503 European samples. For each common variant (minor allele frequency [MAF]>0.01) we used plink1.90 beta three software to identify a list of LD proxies. We recorded the r2 values for each LD proxy and the phase of the alleles of the target and proxy SNPs. We limited the LD proxies to be within 250 kb or 1000 SNPs and with a minimum r2 = 0.6.

MR instrument catalogs

We have assembled a collection of strong SNP-phenotype associations from various sources that can be used as potential instruments in Mendelian randomization studies. Instruments are currently restricted to biallelic SNPs but in principle could be extended in future versions to accommodate multi-allelic SNPs or copy number variants (CNVs). The potential instruments generally correspond to the ‘top hits’ from a GWAS, rather than the entire collection of GWAS summary statistics. As such, the traits included here can only be evaluated as potential exposures in a hypothesized exposure-outcome analysis (complete summary data are required when evaluating traits as potential outcomes). All curated instruments are available through the MRInstruments R package (https://github.com/MRCIEU/MRInstruments).

NHGRI-EBI GWAS catalog

This is a comprehensive catalog of reported associations from published GWAS (Welter et al., 2014). To make the data suitable for Mendelian randomization, we converted odds ratios to log odds ratios and inferred standard errors from reported 95% confidence intervals or (if the latter were unavailable) from the reported p-value using the Z distribution. The GWAS catalog scales odds ratios to be greater than 1 and includes information on unit changes (e.g. mg/dl increase) for beta coefficients. In MR-Base we therefore assume that effect sizes are odds ratios if they are greater than 1 and are missing information on unit changes. We extracted information on the units of the SNP-trait effect; and identified effect and non-effect alleles by comparing the risk allele reported in the GWAS catalog to allele information downloaded from ENSEMBL, using the R/biomaRt package (Durinck et al., 2009). R/biomaRt was also used to identify base pair positions (in GRCh38 format) and associated candidate genes. We inferred effect allele frequency from the risk allele frequency reported in the GWAS catalog. We excluded SNP-trait associations from the GWAS catalog if they were missing a p-value, beta (estimate of the SNP-trait effect) or a standard error for the beta. The MR-Base standardized version of the GWAS catalog (2017-03-20 release at the time of writing) comprises 21,324 potential instruments for 1628 traits. There are, however, several caveats to using the GWAS catalog as a source of instruments. First, reported units of analysis are often unclear (e.g. results are often reported as reflecting a ‘unit increase’). Second, the GWAS catalog prioritises results from the largest reported analysis in the original study report (typically the discovery study or a meta-analysis of discovery and replication studies). This makes instruments from the GWAS catalog susceptible to winner’s curse, which can compound the effect of weak instruments bias.

Accessible resource for integrated epigenomics studies (ARIES) mQTL catalog

We obtained a large set of SNPs associated with DNA methylation levels (i.e. mQTLs) using the ARIES dataset (Gaunt et al., 2016). mQTLs were identified in 1000 mothers at two time points and 1000 children at three time points. Top hits were obtained from http://mqtldb.org with p<1e-7. There are 33,256 unique CpG sites across the five time points with at least one independent mQTL. These mQTLs can be used as instruments for DNA methylation at CpG sites in Mendelian randomization analyses. The mQTLs can also be used to perform methylome-wide association studies (MWAS), to evaluate the association between DNA methylation at each CpG site and a phenotype of interest (implementable in MR-Base through the Wald ratio method when only a single mQTL for a CpG site is available).

GTEx eQTL catalog

We used the GTEx resource (GTEx Consortium, 2015) of published independent cis-acting expression QTLs (cis-eQTLs) to create a catalog of SNPs influencing up to 27,094 unique gene identifiers across 44 tissues. These eQTLs can be used as instruments for gene-expression in Mendelian randomization analyses. The eQTLs can also be used to perform transcriptome-wide association studies (TWAS), to evaluate the association between expression of each gene and a phenotype of interest (implementable in MR-Base using the Wald ratio method when only a single eQTL for a gene is available).

Metabolomic QTL catalog

SNPs influencing 121 metabolites measured using nuclear magnetic resonance (NMR) analysis in whole blood were obtained (Kettunen et al., 2016), totalling 1088 independent QTLs across all metabolites.

Proteomic QTL catalog

SNPs influencing 47 protein analyte levels (Deming et al., 2016) in whole blood were obtained, totaling 57 independent proteomic QTLs.

Defining instruments using the MR-Base catalog of complete summary data

The MR-Base repository of complete GWAS summary data, which contains all SNPs from a GWAS regardless of p-value, can also be used to define instruments. This involves extracting independent sets of SNP-phenotype associations that surpass user-specified p-value and clumping thresholds. However, as the MR-Base repository of complete summary data is based mostly on discovery studies, this strategy may be susceptible to false positive instruments (where some of the selected genetic variants are not truly associated with the target exposure) and winner’s curse. See discussion and Supplementary file 1E for potential implications on results of these limitations.

Interface to the data

The Elasticsearch database is behind a firewall and cannot be queried directly, in order to prevent misuse and to keep non-public data secure. An API (http://api.mrbase.org) is used to interface with the database, controlling access based on user permissions and using Google OAuth2.0 for user authentication.

A user friendly interface to the API is provided through the TwoSampleMR R package. A complete guide to use the R package is available at https://mrcieu.github.io/TwoSampleMR/ and a list of the analytical functions that are currently implemented are shown in Supplementary file 1B.

Applied example

We used MR-Base to recapitulate the known (Holmes et al., 2017) causal effect of higher LDL cholesterol on CHD risk. To obtain the list of instruments for LDL cholesterol we searched for the GLGC entry (Willer et al., 2013) in the GWAS catalog dataset in the MRInstruments library. This returned 62 SNPs. Due to unclear effect size units in the GWAS catalog, we extracted these 62 SNPs from the MR-Base database to obtain effect sizes in standard deviation units. We then searched for these 62 SNPs in the CARDIoGRAMplusC4D GWAS dataset (Nikpay et al., 2015) in the MR-Base database. One of the SNPs was not available and an LD proxy was identified. We harmonised the dataset, setting the algorithm to assume all SNPs were coded with alleles on the forward strand. Disabling this option would have excluded 7 palindromic SNPs with allele frequencies close to 0.5. 

The code to reproduce this analysis is below.

library(TwoSampleMR)
library(MRInstruments) data(gwas_catalog)
library(MRInstruments) data(gwas_catalog)
# Get published SNPs for LDL cholesterol
ldl_snps <- subset(gwas_catalog, grepl("LDL choles", Phenotype) & Author == "Willer CJ")$SNP
# Extract from GLGC dataset
exposure <- convert_outcome_to_exposure(extract_outcome_data(ldl_snps, 300))
# Get outcome data from Cardiogram 2015
outcome <- extract_outcome_data(exposure$SNP, 7)
# Harmonise exposure and outcome datasets
# Assume alleles are on the forward strand
dat <- harmonise_data(exposure, outcome, action=1)
# Perform MR
mr(dat)
mr_heterogeneity(dat)
# Label outliers and create plots
dat$labels <- dat$SNP dat$labels[! dat$SNP
 %in% c("rs11065987", "rs1250229", "rs4530754")] <- NA
mr_plots(dat)

A PheWAS of outliers in the LDL-CHD MR results was conducted to identify potential sources of horizontal pleiotropy. First, we searched the MR-Base database of complete summary data and the GWAS catalog for associations with outlier SNPs, using a threshold of p<2.04e-05 (0.05/2453 ‘trait lookups’) to select traits for further evaluation. When identical traits were duplicated across different GWAS analyses, we retained the GWAS with the largest sample size. Of the outlier SNPs analysed, only rs11065987 was associated with non-lipid-non-vascular-disease traits and was therefore the only SNP retained for further analyses. We then conducted MR analyses to estimate the effect of the selected traits on CHD, scaled to reflect the magnitude of the observed effect of rs11065987 on the trait. Instruments were based on SNPs associated with the traits at a p-value less than 5e-8, with clumping to ensure independence between SNPs (clumping r2 cutoff=0.001 and clumping window=10,000kb). The GWAS catalog was used to define instruments when the selected trait was unavailable in the MR-Base database of complete summary data. All instruments excluded rs11065987. The effect of the traits on CHD was based on the slope from IVW linear regression, except where only a single SNP was available to instrument the trait, in which case the Wald ratio method was used. The variance of the slope from IVW linear regression was estimated using a random effects model, except where there was underdispersion in the causal estimates between SNPs, in which case a fixed effects model was used. We then compared the rs11065987-CHD effect (the direct effect) with the trait-CHD effect (indirect effect of rs11065987). Where effects were in opposite directions we concluded that it was less likely that the association between rs11065987 and CHD was mediated by the selected trait.

Complete code for all analyses can be found here: https://github.com/explodecomputer/mr-base-methods-paper (Hemani, 2018; copy archived at https://github.com/elifesciences-publications/mr-base-methods-paper).

References

  1. 1
  2. 2
  3. 3
    Split-sample instrumental variables estimates of the return to schooling
    1. JD Angrist
    2. AB Krueger
    (1995)
    Journal of Business & Economic Statistics 13:225–235.
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
    Genetic studies of plasma analytes identify novel potential biomarkers for several complex traits
    1. Y Deming
    2. J Xia
    3. Y Cai
    4. J Lord
    5. JL Del-Aguila
    6. MV Fernandez
    7. D Carrell
    8. K Black
    9. J Budde
    10. S Ma
    11. B Saef
    12. B Howells
    13. S Bertelsen
    14. M Bailey
    15. PG Ridge
    16. F Hefti
    17. H Fillit
    18. EA Zimmerman
    19. D Celmins
    20. AD Brown
    21. M Carrillo
    22. A Fleisher
    23. S Reeder
    24. N Trncic
    25. A Burke
    26. P Tariot
    27. EM Reiman
    28. K Chen
    29. MN Sabbagh
    30. CM Beiden
    31. SA Jacobson
    32. SA Sirrel
    33. RS Doody
    34. J Villanueva-Meyer
    35. M Chowdhury
    36. S Rountree
    37. M Dang
    38. N Kowall
    39. R Killiany
    40. AE Budson
    41. A Norbash
    42. PL Johnson
    43. RC Green
    44. G Marshall
    45. KA Johnson
    46. RA Sperling
    47. P Snyder
    48. S Salloway
    49. P Malloy
    50. S Correia
    51. C Bernick
    52. D Munic
    53. Y Stern
    54. LS Honig
    55. KL Bell
    56. N Relkin
    57. G Chaing
    58. L Ravdin
    59. S Paul
    60. LA Flashman
    61. M Seltzer
    62. ML Hynes
    63. RB Santulli
    64. V Bates
    65. H Capote
    66. M Rainka
    67. K Friedl
    68. P Murali Doraiswamy
    69. JR Petrella
    70. S Borges-Neto
    71. O James
    72. T Wong
    73. E Coleman
    74. A Schwartz
    75. JS Cellar
    76. AL Levey
    77. JJ Lah
    78. K Behan
    79. R Scott Turner
    80. K Johnson
    81. B Reynolds
    82. GD Pearlson
    83. K Blank
    84. K Anderson
    85. TO Obisesan
    86. S Wolday
    87. J Allard
    88. A Lerner
    89. P Ogrocki
    90. C Tatsuoka
    91. P Fatica
    92. MR Farlow
    93. AJ Saykin
    94. TM Foroud
    95. L Shen
    96. K Faber
    97. S Kim
    98. K Nho
    99. A Marie Hake
    100. BR Matthews
    101. JR Brosch
    102. S Herring
    103. C Hunt
    104. M Albert
    105. C Onyike
    106. D D’Agostino
    107. S Kielb
    108. NR Graff-Radford
    109. F Parfitt
    110. T Kendall
    111. H Johnson
    112. R Petersen
    113. CR Jack
    114. M Bernstein
    115. B Borowski
    116. J Gunter
    117. M Senjem
    118. P Vemuri
    119. D Jones
    120. K Kantarci
    121. C Ward
    122. SS Mason
    123. CS Albers
    124. D Knopman
    125. K Johnson
    126. H Chertkow
    127. C Hosein
    128. J Mintzer
    129. K Spicer
    130. D Bachman
    131. H Grossman
    132. E Mitsis
    133. N Pomara
    134. R Hernando
    135. A Sarrael
    136. W Potter
    137. N Buckholtz
    138. J Hsiao
    139. S Kittur
    140. JE Galvin
    141. B Cerbone
    142. CA Michel
    143. DM Pogorelec
    144. H Rusinek
    145. MJ de Leon
    146. L Glodzik
    147. S De Santi
    148. N Johnson
    149. Chuang-Kuo
    150. D Kerwin
    151. B Bonakdarpour
    152. S Weintraub
    153. J Grafman
    154. K Lipowski
    155. M-M Mesulam
    156. DW Scharre
    157. M Kataki
    158. A Adeli
    159. J Kaye
    160. J Quinn
    161. L Silbert
    162. B Lind
    163. R Carter
    164. S Dolen
    165. M Borrie
    166. T-Y Lee
    167. R Bartha
    168. W Martinez
    169. T Villena
    170. C Sadowsky
    171. Z Khachaturian
    172. BR Ott
    173. H Querfurth
    174. G Tremont
    175. R Frank
    176. D Fleischman
    177. K Arfanakis
    178. RC Shah
    179. L deToledo-Morrell
    180. G Sorensen
    181. E Finger
    182. S Pasternack
    183. I Rachinsky
    184. D Drost
    185. J Rogers
    186. A Kertesz
    187. AJ Furst
    188. S Chad
    189. J Yesavage
    190. JL Taylor
    191. B Lane
    192. A Rosen
    193. J Tinklenberg
    194. S Black
    195. B Stefanovic
    196. C Caldwell
    197. G-Y Robin Hsiung
    198. B Mudge
    199. M Assaly
    200. N Fox
    201. SK Schultz
    202. LL Boles Ponto
    203. H Shim
    204. K Ekstam Smith
    205. JM Burns
    206. RH Swerdlow
    207. WM Brooks
    208. D Marson
    209. R Griffith
    210. D Clark
    211. D Geldmacher
    212. J Brockington
    213. E Roberson
    214. M Natelson Love
    215. C DeCarli
    216. O Carmichael
    217. J Olichney
    218. P Maillard
    219. E Fletcher
    220. D Nguyen
    221. A Preda
    222. S Potkin
    223. RA Mulnard
    224. G Thai
    225. C McAdams-Ortiz
    226. S Landau
    227. W Jagust
    228. L Apostolova
    229. K Tingus
    230. E Woo
    231. DHS Silverman
    232. PH Lu
    233. G Bartzokis
    234. P Thompson
    235. M Donohue
    236. RG Thomas
    237. S Walter
    238. D Gessert
    239. J Brewer
    240. H Vanderswag
    241. T Sather
    242. G Jiminez
    243. AB Balasubramanian
    244. J Mason
    245. I Sim
    246. P Aisen
    247. M Davis
    248. R Morrison
    249. D Harvey
    250. L Thal
    251. L Beckett
    252. T Neylan
    253. S Finley
    254. MW Weiner
    255. J Hayes
    256. HJ Rosen
    257. BL Miller
    258. D Perry
    259. D Massoglia
    260. O Brawman-Mentzer
    261. N Schuff
    262. CD Smith
    263. P Hardy
    264. P Sinha
    265. E Oates
    266. G Conrad
    267. RA Koeppe
    268. JL Lord
    269. JL Heidebrink
    270. SE Arnold
    271. JH Karlawish
    272. D Wolk
    273. CM Clark
    274. JQ Trojanowki
    275. LM Shaw
    276. V Lee
    277. M Korecka
    278. M Figurski
    279. AW Toga
    280. K Crawford
    281. S Neu
    282. LS Schneider
    283. S Pawluczyk
    284. M Beccera
    285. L Teodoro
    286. BM Spann
    287. K Womack
    288. D Mathews
    289. M Quiceno
    290. N Foster
    291. T Montine
    292. JJ Fruehling
    293. S Harding
    294. S Johnson
    295. S Asthana
    296. CM Carlsson
    297. EC Petrie
    298. E Peskind
    299. G Li
    300. AP Porsteinsson
    301. BS Goldstein
    302. K Martin
    303. KM Makino
    304. MS Ismail
    305. C Brand
    306. A Smith
    307. B Ashok Raj
    308. K Fargher
    309. L Kuller
    310. C Mathis
    311. M Ann Oakley
    312. OL Lopez
    313. DM Simpson
    314. KM Sink
    315. L Gordineer
    316. JD Williamson
    317. P Garg
    318. F Watkins
    319. NJ Cairns
    320. M Raichle
    321. JC Morris
    322. E Householder
    323. L Taylor-Reinwald
    324. D Holtzman
    325. B Ances
    326. M Carroll
    327. ML Creech
    328. E Franklin
    329. MA Mintun
    330. S Schneider
    331. A Oliver
    332. R Duara
    333. D Varon
    334. MT Greig
    335. P Roberts
    336. P Varma
    337. MG MacAvoy
    338. RE Carson
    339. CH van Dyck
    340. P Davies
    341. D Holtzman
    342. JC Morris
    343. K Bales
    344. EH Pickering
    345. J-M Lee
    346. L Heitsch
    347. J Kauwe
    348. A Goate
    349. L Piccio
    350. C Cruchaga
    (2016)
    Scientific Reports 6:18092.
    https://doi.org/10.1038/srep18092
  22. 22
  23. 23
  24. 24
    Common variants associated with plasma triglycerides and risk for coronary artery disease
    1. R Do
    2. CJ Willer
    3. EM Schmidt
    4. S Sengupta
    5. C Gao
    6. GM Peloso
    7. S Gustafsson
    8. S Kanoni
    9. A Ganna
    10. J Chen
    11. ML Buchkovich
    12. S Mora
    13. JS Beckmann
    14. JL Bragg-Gresham
    15. HY Chang
    16. A Demirkan
    17. HM Den Hertog
    18. LA Donnelly
    19. GB Ehret
    20. T Esko
    21. MF Feitosa
    22. T Ferreira
    23. K Fischer
    24. P Fontanillas
    25. RM Fraser
    26. DF Freitag
    27. D Gurdasani
    28. K Heikkilä
    29. E Hyppönen
    30. A Isaacs
    31. AU Jackson
    32. A Johansson
    33. T Johnson
    34. M Kaakinen
    35. J Kettunen
    36. ME Kleber
    37. X Li
    38. J Luan
    39. LP Lyytikäinen
    40. PK Magnusson
    41. M Mangino
    42. E Mihailov
    43. ME Montasser
    44. M Müller-Nurasyid
    45. IM Nolte
    46. JR O'Connell
    47. CD Palmer
    48. M Perola
    49. AK Petersen
    50. S Sanna
    51. R Saxena
    52. SK Service
    53. S Shah
    54. D Shungin
    55. C Sidore
    56. C Song
    57. RJ Strawbridge
    58. I Surakka
    59. T Tanaka
    60. TM Teslovich
    61. G Thorleifsson
    62. EG Van den Herik
    63. BF Voight
    64. KA Volcik
    65. LL Waite
    66. A Wong
    67. Y Wu
    68. W Zhang
    69. D Absher
    70. G Asiki
    71. I Barroso
    72. LF Been
    73. JL Bolton
    74. LL Bonnycastle
    75. P Brambilla
    76. MS Burnett
    77. G Cesana
    78. M Dimitriou
    79. AS Doney
    80. A Döring
    81. P Elliott
    82. SE Epstein
    83. GI Eyjolfsson
    84. B Gigante
    85. MO Goodarzi
    86. H Grallert
    87. ML Gravito
    88. CJ Groves
    89. G Hallmans
    90. AL Hartikainen
    91. C Hayward
    92. D Hernandez
    93. AA Hicks
    94. H Holm
    95. YJ Hung
    96. T Illig
    97. MR Jones
    98. P Kaleebu
    99. JJ Kastelein
    100. KT Khaw
    101. E Kim
    102. N Klopp
    103. P Komulainen
    104. M Kumari
    105. C Langenberg
    106. T Lehtimäki
    107. SY Lin
    108. J Lindström
    109. RJ Loos
    110. F Mach
    111. WL McArdle
    112. C Meisinger
    113. BD Mitchell
    114. G Müller
    115. R Nagaraja
    116. N Narisu
    117. TV Nieminen
    118. RN Nsubuga
    119. I Olafsson
    120. KK Ong
    121. A Palotie
    122. T Papamarkou
    123. C Pomilla
    124. A Pouta
    125. DJ Rader
    126. MP Reilly
    127. PM Ridker
    128. F Rivadeneira
    129. I Rudan
    130. A Ruokonen
    131. N Samani
    132. H Scharnagl
    133. J Seeley
    134. K Silander
    135. A Stančáková
    136. K Stirrups
    137. AJ Swift
    138. L Tiret
    139. AG Uitterlinden
    140. LJ van Pelt
    141. S Vedantam
    142. N Wainwright
    143. C Wijmenga
    144. SH Wild
    145. G Willemsen
    146. T Wilsgaard
    147. JF Wilson
    148. EH Young
    149. JH Zhao
    150. LS Adair
    151. D Arveiler
    152. TL Assimes
    153. S Bandinelli
    154. F Bennett
    155. M Bochud
    156. BO Boehm
    157. DI Boomsma
    158. IB Borecki
    159. SR Bornstein
    160. P Bovet
    161. M Burnier
    162. H Campbell
    163. A Chakravarti
    164. JC Chambers
    165. YD Chen
    166. FS Collins
    167. RS Cooper
    168. J Danesh
    169. G Dedoussis
    170. U de Faire
    171. AB Feranil
    172. J Ferrières
    173. L Ferrucci
    174. NB Freimer
    175. C Gieger
    176. LC Groop
    177. V Gudnason
    178. U Gyllensten
    179. A Hamsten
    180. TB Harris
    181. A Hingorani
    182. JN Hirschhorn
    183. A Hofman
    184. GK Hovingh
    185. CA Hsiung
    186. SE Humphries
    187. SC Hunt
    188. K Hveem
    189. C Iribarren
    190. MR Järvelin
    191. A Jula
    192. M Kähönen
    193. J Kaprio
    194. A Kesäniemi
    195. M Kivimaki
    196. JS Kooner
    197. PJ Koudstaal
    198. RM Krauss
    199. D Kuh
    200. J Kuusisto
    201. KO Kyvik
    202. M Laakso
    203. TA Lakka
    204. L Lind
    205. CM Lindgren
    206. NG Martin
    207. W März
    208. MI McCarthy
    209. CA McKenzie
    210. P Meneton
    211. A Metspalu
    212. L Moilanen
    213. AD Morris
    214. PB Munroe
    215. I Njølstad
    216. NL Pedersen
    217. C Power
    218. PP Pramstaller
    219. JF Price
    220. BM Psaty
    221. T Quertermous
    222. R Rauramaa
    223. D Saleheen
    224. V Salomaa
    225. DK Sanghera
    226. J Saramies
    227. PE Schwarz
    228. WH Sheu
    229. AR Shuldiner
    230. A Siegbahn
    231. TD Spector
    232. K Stefansson
    233. DP Strachan
    234. BO Tayo
    235. E Tremoli
    236. J Tuomilehto
    237. M Uusitupa
    238. CM van Duijn
    239. P Vollenweider
    240. L Wallentin
    241. NJ Wareham
    242. JB Whitfield
    243. BH Wolffenbuttel
    244. D Altshuler
    245. JM Ordovas
    246. E Boerwinkle
    247. CN Palmer
    248. U Thorsteinsdottir
    249. DI Chasman
    250. JI Rotter
    251. PW Franks
    252. S Ripatti
    253. LA Cupples
    254. MS Sandhu
    255. SS Rich
    256. M Boehnke
    257. P Deloukas
    258. KL Mohlke
    259. E Ingelsson
    260. GR Abecasis
    261. MJ Daly
    262. BM Neale
    263. S Kathiresan
    (2013)
    Nature Genetics 45:1345–1352.
    https://doi.org/10.1038/ng.2795
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
    Inflammatory Biomarkers and Risk of Schizophrenia: A 2-Sample Mendelian Randomization Study
    1. FP Hartwig
    2. MC Borges
    3. BL Horta
    4. J Bowden
    5. G Davey Smith
    (2017)
    JAMA Psychiatry, 74, 10.1001/jamapsychiatry.2017.3191, 29094161.
  34. 34
  35. 35
  36. 36
    Association Between Telomere Length and Risk of Cancer and Non-Neoplastic Diseases: A Mendelian Randomization Study
    1. PC Haycock
    2. S Burgess
    3. A Nounu
    4. J Zheng
    5. GN Okoli
    6. J Bowden
    7. KH Wade
    8. NJ Timpson
    9. DM Evans
    10. P Willeit
    11. A Aviv
    12. TR Gaunt
    13. G Hemani
    14. M Mangino
    15. HP Ellis
    16. KM Kurian
    17. KA Pooley
    18. RA Eeles
    19. JE Lee
    20. S Fang
    21. WV Chen
    22. MH Law
    23. LM Bowdler
    24. MM Iles
    25. Q Yang
    26. BB Worrall
    27. HS Markus
    28. RJ Hung
    29. CI Amos
    30. AB Spurdle
    31. DJ Thompson
    32. TA O'Mara
    33. B Wolpin
    34. L Amundadottir
    35. R Stolzenberg-Solomon
    36. A Trichopoulou
    37. NC Onland-Moret
    38. E Lund
    39. EJ Duell
    40. F Canzian
    41. G Severi
    42. K Overvad
    43. MJ Gunter
    44. R Tumino
    45. U Svenson
    46. A van Rij
    47. AF Baas
    48. MJ Bown
    49. NJ Samani
    50. FNG van t'Hof
    51. G Tromp
    52. GT Jones
    53. H Kuivaniemi
    54. JR Elmore
    55. M Johansson
    56. J Mckay
    57. G Scelo
    58. R Carreras-Torres
    59. V Gaborieau
    60. P Brennan
    61. PM Bracci
    62. RE Neale
    63. SH Olson
    64. S Gallinger
    65. D Li
    66. GM Petersen
    67. HA Risch
    68. AP Klein
    69. J Han
    70. CC Abnet
    71. ND Freedman
    72. PR Taylor
    73. JM Maris
    74. KK Aben
    75. LA Kiemeney
    76. SH Vermeulen
    77. JK Wiencke
    78. KM Walsh
    79. M Wrensch
    80. T Rice
    81. C Turnbull
    82. K Litchfield
    83. L Paternoster
    84. M Standl
    85. GR Abecasis
    86. JP SanGiovanni
    87. Y Li
    88. V Mijatovic
    89. Y Sapkota
    90. SK Low
    91. KT Zondervan
    92. GW Montgomery
    93. DR Nyholt
    94. DA van Heel
    95. K Hunt
    96. DE Arking
    97. FN Ashar
    98. N Sotoodehnia
    99. D Woo
    100. J Rosand
    101. ME Comeau
    102. WM Brown
    103. EK Silverman
    104. JE Hokanson
    105. MH Cho
    106. J Hui
    107. MA Ferreira
    108. PJ Thompson
    109. AC Morrison
    110. JF Felix
    111. NL Smith
    112. AM Christiano
    113. L Petukhova
    114. RC Betz
    115. X Fan
    116. X Zhang
    117. C Zhu
    118. CD Langefeld
    119. SD Thompson
    120. F Wang
    121. X Lin
    122. DA Schwartz
    123. T Fingerlin
    124. JI Rotter
    125. MF Cotch
    126. RA Jensen
    127. M Munz
    128. H Dommisch
    129. AS Schaefer
    130. F Han
    131. HM Ollila
    132. RP Hillary
    133. O Albagha
    134. SH Ralston
    135. C Zeng
    136. W Zheng
    137. XO Shu
    138. A Reis
    139. S Uebe
    140. U Hüffmeier
    141. Y Kawamura
    142. T Otowa
    143. T Sasaki
    144. ML Hibberd
    145. S Davila
    146. G Xie
    147. K Siminovitch
    148. JX Bei
    149. YX Zeng
    150. A Försti
    151. B Chen
    152. S Landi
    153. A Franke
    154. A Fischer
    155. D Ellinghaus
    156. C Flores
    157. I Noth
    158. SF Ma
    159. JN Foo
    160. J Liu
    161. JW Kim
    162. DG Cox
    163. O Delattre
    164. O Mirabeau
    165. CF Skibola
    166. CS Tang
    167. M Garcia-Barcelo
    168. KP Chang
    169. WH Su
    170. YS Chang
    171. NG Martin
    172. S Gordon
    173. TD Wade
    174. C Lee
    175. M Kubo
    176. PC Cha
    177. Y Nakamura
    178. D Levy
    179. M Kimura
    180. SJ Hwang
    181. S Hunt
    182. T Spector
    183. N Soranzo
    184. AW Manichaikul
    185. RG Barr
    186. B Kahali
    187. E Speliotes
    188. LM Yerges-Armstrong
    189. CY Cheng
    190. JB Jonas
    191. TY Wong
    192. I Fogh
    193. K Lin
    194. JF Powell
    195. K Rice
    196. CL Relton
    197. RM Martin
    198. G Davey Smith
    199. Telomeres Mendelian Randomization Collaboration
    (2017)
    JAMA Oncology 3:636–651.
    https://doi.org/10.1001/jamaoncol.2016.5945
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
    Efficient calculation for Multi-SNP genetic risk scores
    1. T Johnson
    (2012)
    American Society of Human Genetics Annual Meeting.
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
    Software Application Profile: PHESANT: a tool for performing automated phenome scans in UK Biobank
    1. LAC Millard
    2. NM Davies
    3. TR Gaunt
    4. G Davey Smith
    5. K Tilling
    (2017)
    International Journal of Epidemiology, 10.1093/ije/dyx204, 29040602.
  52. 52
  53. 53
  54. 54
    A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease
    1. M Nikpay
    2. A Goel
    3. HH Won
    4. LM Hall
    5. C Willenborg
    6. S Kanoni
    7. D Saleheen
    8. T Kyriakou
    9. CP Nelson
    10. JC Hopewell
    11. TR Webb
    12. L Zeng
    13. A Dehghan
    14. M Alver
    15. SM Armasu
    16. K Auro
    17. A Bjonnes
    18. DI Chasman
    19. S Chen
    20. I Ford
    21. N Franceschini
    22. C Gieger
    23. C Grace
    24. S Gustafsson
    25. J Huang
    26. SJ Hwang
    27. YK Kim
    28. ME Kleber
    29. KW Lau
    30. X Lu
    31. Y Lu
    32. LP Lyytikäinen
    33. E Mihailov
    34. AC Morrison
    35. N Pervjakova
    36. L Qu
    37. LM Rose
    38. E Salfati
    39. R Saxena
    40. M Scholz
    41. AV Smith
    42. E Tikkanen
    43. A Uitterlinden
    44. X Yang
    45. W Zhang
    46. W Zhao
    47. M de Andrade
    48. PS de Vries
    49. NR van Zuydam
    50. SS Anand
    51. L Bertram
    52. F Beutner
    53. G Dedoussis
    54. P Frossard
    55. D Gauguier
    56. AH Goodall
    57. O Gottesman
    58. M Haber
    59. BG Han
    60. J Huang
    61. S Jalilzadeh
    62. T Kessler
    63. IR König
    64. L Lannfelt
    65. W Lieb
    66. L Lind
    67. CM Lindgren
    68. ML Lokki
    69. PK Magnusson
    70. NH Mallick
    71. N Mehra
    72. T Meitinger
    73. FU Memon
    74. AP Morris
    75. MS Nieminen
    76. NL Pedersen
    77. A Peters
    78. LS Rallidis
    79. A Rasheed
    80. M Samuel
    81. SH Shah
    82. J Sinisalo
    83. KE Stirrups
    84. S Trompet
    85. L Wang
    86. KS Zaman
    87. D Ardissino
    88. E Boerwinkle
    89. IB Borecki
    90. EP Bottinger
    91. JE Buring
    92. JC Chambers
    93. R Collins
    94. LA Cupples
    95. J Danesh
    96. I Demuth
    97. R Elosua
    98. SE Epstein
    99. T Esko
    100. MF Feitosa
    101. OH Franco
    102. MG Franzosi
    103. CB Granger
    104. D Gu
    105. V Gudnason
    106. AS Hall
    107. A Hamsten
    108. TB Harris
    109. SL Hazen
    110. C Hengstenberg
    111. A Hofman
    112. E Ingelsson
    113. C Iribarren
    114. JW Jukema
    115. PJ Karhunen
    116. BJ Kim
    117. JS Kooner
    118. IJ Kullo
    119. T Lehtimäki
    120. RJF Loos
    121. O Melander
    122. A Metspalu
    123. W März
    124. CN Palmer
    125. M Perola
    126. T Quertermous
    127. DJ Rader
    128. PM Ridker
    129. S Ripatti
    130. R Roberts
    131. V Salomaa
    132. DK Sanghera
    133. SM Schwartz
    134. U Seedorf
    135. AF Stewart
    136. DJ Stott
    137. J Thiery
    138. PA Zalloua
    139. CJ O'Donnell
    140. MP Reilly
    141. TL Assimes
    142. JR Thompson
    143. J Erdmann
    144. R Clarke
    145. H Watkins
    146. S Kathiresan
    147. R McPherson
    148. P Deloukas
    149. H Schunkert
    150. NJ Samani
    151. M Farrall
    (2015)
    Nature Genetics 47:1121–1130.
    https://doi.org/10.1038/ng.3396
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
    PCSK9 genetic variants and risk of type 2 diabetes: a mendelian randomisation study
    1. AF Schmidt
    2. DI Swerdlow
    3. MV Holmes
    4. RS Patel
    5. Z Fairhurst-Hunter
    6. DM Lyall
    7. FP Hartwig
    8. BL Horta
    9. E Hyppönen
    10. C Power
    11. M Moldovan
    12. E van Iperen
    13. GK Hovingh
    14. I Demuth
    15. K Norman
    16. E Steinhagen-Thiessen
    17. J Demuth
    18. L Bertram
    19. T Liu
    20. S Coassin
    21. J Willeit
    22. S Kiechl
    23. K Willeit
    24. D Mason
    25. J Wright
    26. R Morris
    27. G Wanamethee
    28. P Whincup
    29. Y Ben-Shlomo
    30. S McLachlan
    31. JF Price
    32. M Kivimaki
    33. C Welch
    34. A Sanchez-Galvez
    35. P Marques-Vidal
    36. A Nicolaides
    37. AG Panayiotou
    38. NC Onland-Moret
    39. YT van der Schouw
    40. G Matullo
    41. G Fiorito
    42. S Guarrera
    43. C Sacerdote
    44. NJ Wareham
    45. C Langenberg
    46. R Scott
    47. J Luan
    48. M Bobak
    49. S Malyutina
    50. A Pająk
    51. R Kubinova
    52. A Tamosiunas
    53. H Pikhart
    54. LL Husemoen
    55. N Grarup
    56. O Pedersen
    57. T Hansen
    58. A Linneberg
    59. KS Simonsen
    60. J Cooper
    61. SE Humphries
    62. M Brilliant
    63. T Kitchner
    64. H Hakonarson
    65. DS Carrell
    66. CA McCarty
    67. HL Kirchner
    68. EB Larson
    69. DR Crosslin
    70. M de Andrade
    71. DM Roden
    72. JC Denny
    73. C Carty
    74. S Hancock
    75. J Attia
    76. E Holliday
    77. M O'Donnell
    78. S Yusuf
    79. M Chong
    80. G Pare
    81. P van der Harst
    82. MA Said
    83. RN Eppinga
    84. N Verweij
    85. H Snieder
    86. T Christen
    87. DO Mook-Kanamori
    88. S Gustafsson
    89. L Lind
    90. E Ingelsson
    91. R Pazoki
    92. O Franco
    93. A Hofman
    94. A Uitterlinden
    95. A Dehghan
    96. A Teumer
    97. S Baumeister
    98. M Dörr
    99. MM Lerch
    100. U Völker
    101. H Völzke
    102. J Ward
    103. JP Pell
    104. DJ Smith
    105. T Meade
    106. AH Maitland-van der Zee
    107. EV Baranova
    108. R Young
    109. I Ford
    110. A Campbell
    111. S Padmanabhan
    112. ML Bots
    113. DE Grobbee
    114. P Froguel
    115. D Thuillier
    116. B Balkau
    117. A Bonnefond
    118. B Cariou
    119. M Smart
    120. Y Bao
    121. M Kumari
    122. A Mahajan
    123. PM Ridker
    124. DI Chasman
    125. AP Reiner
    126. LA Lange
    127. MD Ritchie
    128. FW Asselbergs
    129. JP Casas
    130. BJ Keating
    131. D Preiss
    132. AD Hingorani
    133. N Sattar
    134. LifeLines Cohort study group
    135. UCLEB consortium
    (2017)
    The Lancet Diabetes & Endocrinology 5:97–105.
    https://doi.org/10.1016/S2213-8587(16)30396-5
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
    HMG-coenzyme A reductase inhibition, type 2 diabetes, and bodyweight: evidence from genetic analysis and randomised trials
    1. DI Swerdlow
    2. D Preiss
    3. KB Kuchenbaecker
    4. MV Holmes
    5. JE Engmann
    6. T Shah
    7. R Sofat
    8. S Stender
    9. PC Johnson
    10. RA Scott
    11. M Leusink
    12. N Verweij
    13. SJ Sharp
    14. Y Guo
    15. C Giambartolomei
    16. C Chung
    17. A Peasey
    18. A Amuzu
    19. K Li
    20. J Palmen
    21. P Howard
    22. JA Cooper
    23. F Drenos
    24. YR Li
    25. G Lowe
    26. J Gallacher
    27. MC Stewart
    28. I Tzoulaki
    29. SG Buxbaum
    30. DL van der A
    31. NG Forouhi
    32. NC Onland-Moret
    33. YT van der Schouw
    34. RB Schnabel
    35. JA Hubacek
    36. R Kubinova
    37. M Baceviciene
    38. A Tamosiunas
    39. A Pajak
    40. R Topor-Madry
    41. U Stepaniak
    42. S Malyutina
    43. D Baldassarre
    44. B Sennblad
    45. E Tremoli
    46. U de Faire
    47. F Veglia
    48. I Ford
    49. JW Jukema
    50. RG Westendorp
    51. GJ de Borst
    52. PA de Jong
    53. A Algra
    54. W Spiering
    55. AH Maitland-van der Zee
    56. OH Klungel
    57. A de Boer
    58. PA Doevendans
    59. CB Eaton
    60. JG Robinson
    61. D Duggan
    62. J Kjekshus
    63. JR Downs
    64. AM Gotto
    65. AC Keech
    66. R Marchioli
    67. G Tognoni
    68. PS Sever
    69. NR Poulter
    70. DD Waters
    71. TR Pedersen
    72. P Amarenco
    73. H Nakamura
    74. JJ McMurray
    75. JD Lewsey
    76. DI Chasman
    77. PM Ridker
    78. AP Maggioni
    79. L Tavazzi
    80. KK Ray
    81. SR Seshasai
    82. JE Manson
    83. JF Price
    84. PH Whincup
    85. RW Morris
    86. DA Lawlor
    87. GD Smith
    88. Y Ben-Shlomo
    89. PJ Schreiner
    90. M Fornage
    91. DS Siscovick
    92. M Cushman
    93. M Kumari
    94. NJ Wareham
    95. WM Verschuren
    96. S Redline
    97. SR Patel
    98. JC Whittaker
    99. A Hamsten
    100. JA Delaney
    101. C Dale
    102. TR Gaunt
    103. A Wong
    104. D Kuh
    105. R Hardy
    106. S Kathiresan
    107. BA Castillo
    108. P van der Harst
    109. EJ Brunner
    110. A Tybjaerg-Hansen
    111. MG Marmot
    112. RM Krauss
    113. M Tsai
    114. J Coresh
    115. RC Hoogeveen
    116. BM Psaty
    117. LA Lange
    118. H Hakonarson
    119. F Dudbridge
    120. SE Humphries
    121. PJ Talmud
    122. M Kivimäki
    123. NJ Timpson
    124. C Langenberg
    125. FW Asselbergs
    126. M Voevoda
    127. M Bobak
    128. H Pikhart
    129. JG Wilson
    130. AP Reiner
    131. BJ Keating
    132. AD Hingorani
    133. N Sattar
    134. DIAGRAM Consortium
    135. MAGIC Consortium
    136. InterAct Consortium
    (2015)
    Lancet 385:351–361.
    https://doi.org/10.1016/S0140-6736(14)61183-1
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
    Discovery and refinement of loci associated with lipid levels
    1. CJ Willer
    2. EM Schmidt
    3. S Sengupta
    4. GM Peloso
    5. S Gustafsson
    6. S Kanoni
    7. A Ganna
    8. J Chen
    9. ML Buchkovich
    10. S Mora
    11. JS Beckmann
    12. JL Bragg-Gresham
    13. HY Chang
    14. A Demirkan
    15. HM Den Hertog
    16. R Do
    17. LA Donnelly
    18. GB Ehret
    19. T Esko
    20. MF Feitosa
    21. T Ferreira
    22. K Fischer
    23. P Fontanillas
    24. RM Fraser
    25. DF Freitag
    26. D Gurdasani
    27. K Heikkilä
    28. E Hyppönen
    29. A Isaacs
    30. AU Jackson
    31. Å Johansson
    32. T Johnson
    33. M Kaakinen
    34. J Kettunen
    35. ME Kleber
    36. X Li
    37. J Luan
    38. LP Lyytikäinen
    39. PKE Magnusson
    40. M Mangino
    41. E Mihailov
    42. ME Montasser
    43. M Müller-Nurasyid
    44. IM Nolte
    45. JR O'Connell
    46. CD Palmer
    47. M Perola
    48. AK Petersen
    49. S Sanna
    50. R Saxena
    51. SK Service
    52. S Shah
    53. D Shungin
    54. C Sidore
    55. C Song
    56. RJ Strawbridge
    57. I Surakka
    58. T Tanaka
    59. TM Teslovich
    60. G Thorleifsson
    61. EG Van den Herik
    62. BF Voight
    63. KA Volcik
    64. LL Waite
    65. A Wong
    66. Y Wu
    67. W Zhang
    68. D Absher
    69. G Asiki
    70. I Barroso
    71. LF Been
    72. JL Bolton
    73. LL Bonnycastle
    74. P Brambilla
    75. MS Burnett
    76. G Cesana
    77. M Dimitriou
    78. ASF Doney
    79. A Döring
    80. P Elliott
    81. SE Epstein
    82. G Ingi Eyjolfsson
    83. B Gigante
    84. MO Goodarzi
    85. H Grallert
    86. ML Gravito
    87. CJ Groves
    88. G Hallmans
    89. AL Hartikainen
    90. C Hayward
    91. D Hernandez
    92. AA Hicks
    93. H Holm
    94. YJ Hung
    95. T Illig
    96. MR Jones
    97. P Kaleebu
    98. JJP Kastelein
    99. KT Khaw
    100. E Kim
    101. N Klopp
    102. P Komulainen
    103. M Kumari
    104. C Langenberg
    105. T Lehtimäki
    106. SY Lin
    107. J Lindström
    108. RJF Loos
    109. F Mach
    110. WL McArdle
    111. C Meisinger
    112. BD Mitchell
    113. G Müller
    114. R Nagaraja
    115. N Narisu
    116. TVM Nieminen
    117. RN Nsubuga
    118. I Olafsson
    119. KK Ong
    120. A Palotie
    121. T Papamarkou
    122. C Pomilla
    123. A Pouta
    124. DJ Rader
    125. MP Reilly
    126. PM Ridker
    127. F Rivadeneira
    128. I Rudan
    129. A Ruokonen
    130. N Samani
    131. H Scharnagl
    132. J Seeley
    133. K Silander
    134. A Stančáková
    135. K Stirrups
    136. AJ Swift
    137. L Tiret
    138. AG Uitterlinden
    139. LJ van Pelt
    140. S Vedantam
    141. N Wainwright
    142. C Wijmenga
    143. SH Wild
    144. G Willemsen
    145. T Wilsgaard
    146. JF Wilson
    147. EH Young
    148. JH Zhao
    149. LS Adair
    150. D Arveiler
    151. TL Assimes
    152. S Bandinelli
    153. F Bennett
    154. M Bochud
    155. BO Boehm
    156. DI Boomsma
    157. IB Borecki
    158. SR Bornstein
    159. P Bovet
    160. M Burnier
    161. H Campbell
    162. A Chakravarti
    163. JC Chambers
    164. YI Chen
    165. FS Collins
    166. RS Cooper
    167. J Danesh
    168. G Dedoussis
    169. U de Faire
    170. AB Feranil
    171. J Ferrières
    172. L Ferrucci
    173. NB Freimer
    174. C Gieger
    175. LC Groop
    176. V Gudnason
    177. U Gyllensten
    178. A Hamsten
    179. TB Harris
    180. A Hingorani
    181. JN Hirschhorn
    182. A Hofman
    183. GK Hovingh
    184. CA Hsiung
    185. SE Humphries
    186. SC Hunt
    187. K Hveem
    188. C Iribarren
    189. MR Järvelin
    190. A Jula
    191. M Kähönen
    192. J Kaprio
    193. A Kesäniemi
    194. M Kivimaki
    195. JS Kooner
    196. PJ Koudstaal
    197. RM Krauss
    198. D Kuh
    199. J Kuusisto
    200. KO Kyvik
    201. M Laakso
    202. TA Lakka
    203. L Lind
    204. CM Lindgren
    205. NG Martin
    206. W März
    207. MI McCarthy
    208. CA McKenzie
    209. P Meneton
    210. A Metspalu
    211. L Moilanen
    212. AD Morris
    213. PB Munroe
    214. I Njølstad
    215. NL Pedersen
    216. C Power
    217. PP Pramstaller
    218. JF Price
    219. BM Psaty
    220. T Quertermous
    221. R Rauramaa
    222. D Saleheen
    223. V Salomaa
    224. DK Sanghera
    225. J Saramies
    226. PEH Schwarz
    227. WH Sheu
    228. AR Shuldiner
    229. A Siegbahn
    230. TD Spector
    231. K Stefansson
    232. DP Strachan
    233. BO Tayo
    234. E Tremoli
    235. J Tuomilehto
    236. M Uusitupa
    237. CM van Duijn
    238. P Vollenweider
    239. L Wallentin
    240. NJ Wareham
    241. JB Whitfield
    242. BHR Wolffenbuttel
    243. JM Ordovas
    244. E Boerwinkle
    245. CNA Palmer
    246. U Thorsteinsdottir
    247. DI Chasman
    248. JI Rotter
    249. PW Franks
    250. S Ripatti
    251. LA Cupples
    252. MS Sandhu
    253. SS Rich
    254. M Boehnke
    255. P Deloukas
    256. S Kathiresan
    257. KL Mohlke
    258. E Ingelsson
    259. GR Abecasis
    260. Global Lipids Genetics Consortium
    (2013)
    Nature Genetics 45:1274–1283.
    https://doi.org/10.1038/ng.2797
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82

Decision letter

  1. Ruth Loos
    Reviewing Editor; The Icahn School of Medicine at Mount Sinai, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

[Editors’ note: a previous version of this study was rejected after peer review, but the authors submitted for reconsideration. The first decision letter after peer review is shown below.]

Thank you for submitting your work entitled "MR-Base: a platform for systematic causal inference across the phenome using billions of genetic associations" for consideration by eLife. Your article has been reviewed by two peer reviewers, one of whom is a member of our Board of Reviewing Editors and the evaluation has been overseen by a Senior Editor. The reviewers have opted to remain anonymous.

Our decision has been reached after consultation between the reviewers. Based on these discussions and the individual reviews below, we regret to inform you that your work will not be considered further for publication in eLife.

Both reviewers were enthusiastic about MRBase, and its value to the community. However, there were serious concerns about aspects of the manuscript (see the reviews below) that precluded further consideration of this version of the manuscript, and which went beyond the scope of what we feel is appropriate for revision. In this instance, however, should you decide to revise the manuscript along the lines recommended, we would be willing to reconsider the paper as a new submission.

Reviewer #1:

The purpose of this paper is twofold. The first part of the paper describes the MR-Base, a resource of published GWAS results and a platform that allows performing mendelian randomization (MR) analyses with the available data. In the second part of the paper, the authors perform an actually MR analysis using their MR-base platform. They assess the efficacy of lipid lowering drugs by using genetic variants that mimic the effect of those drugs (HMGCR for statins; NPC1L1 for Ezetimibe; PCSK9 for Evolocumab, APOA for Lp[a] and APOC3 for triglyceride lowering drugs) on the prevention of cardiovascular disease. In addition, the safety of the lipid-lowering drugs was assessed by testing the association with type 2 diabetes and other relevant – potentially adverse – outcomes in a hypothesis free manner. The MR analyses confirm previous (larger) MR studies and are consistent with available randomized control trials. In addition, they found evidence for potential adverse outcomes, that may have not been reported before.

The author should be commended on creating a massive resource and a useful platform for many to perform MR analyses without necessarily having immediate access to the required data. The example that was used to "pilot" the MR-Base platform is of interest, but mainly confirms what has been reported before.

My main concern is that neither the first or the second part of the paper have been sufficiently developed. For example, for the first part to be truly helpful to readers, the resource, platform and underlying methodology should have been described in more detail in the main text and more guidance should have been given at which test should/could be used under which circumstances and what their (dis)advantages are. In addition, the limitations of the MR-base compared to a typical MR analyses and compared to RCTs should have been made more explicit. Currently, this information is hidden in supplementary data.

The second part, while interesting, is mainly confirmatory and more used as an example or an application of the MR-base resource rather than a new hypothesis that is being tested. The main new observation is that MR-base allows doing a hypothesis free screen for potential (adverse/other) outcomes.

Taken together, as a reader, it may not be clear what the main aim of the paper is as neither part has great depth or innovation.

Reviewer #2:

MR-Base is an important resource and it is right that the academic community is made aware of it and that researchers have a reference that they can cite when they use it, so I am broadly supportive of this paper.

However, what the authors have produced is a non-critical description of MR-Base. There is a section entitled 'summary of limitations and some solutions' but it is only 7 lines long. The authors need to correct this imbalance.

In the supplement the authors consider 33 different examples of the use of MR-Base. If this paper is intended to inform the research community about MR-Base then one or two examples would be sufficient, provided that they include a statement to the effect that they are intended as illustrations and not as definitive research findings. It appears that the authors are looking for an easy way of laying claim to some of the obvious applications before MR-Base is made public. These applications are out of place in this paper and they are dangerous. There are so many examples they cannot be considered in the detail that one would normally find in an MR paper and just as importantly, they cannot be properly reviewed.

My suggestion is that the paper is re-written dropping most of the examples and in a style that acknowledges the limitations of the database. In that form I would support publication.

[Editors’ note: what now follows is the decision letter after the authors submitted for further consideration.]

Thank you for submitting your article "MR-Base: a database of GWAS summary data integrated with analytical tools enables causal inference across the phenome" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom Ruth Loos is a member of our Board of Reviewing Editors and the evaluation has been overseen by Mark McCarthy as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Tea Skaaby (Reviewer #2); Frank Dudbridge (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission. We hope you will be able to submit the revised version soon, so that we can proceed with final assessment of the paper. As usual, please provide a latter that indicates how you have responded to the comments raised.

Summary:

All three reviewers were enthusiastic and agreed that this paper (and the MR-BASE tool) are very valuable contributions to the field. We ask you to address the comments and suggestions from reviewer.

Reviewer #1:

The revised paper is very well written and provides a nice balance between the MR-base description, real-life examples and strengths and limitations. The example, estimating the causal relationship between LDL-cholesterol and coronary heart disease is informative as it also illustrates how to detect potential biases and how to resolve them. The MR-base has been an extreme valuable contribution to the field, and this paper provides the details users may need perform their own analyses.

As for the limitations; it would be great if the authors could be more explicit about how some of these limitations might affect the MR results. e.g. sample overlap could lead to "weak instrument bias", which mean what exactly?

MR-base provides meta-data to assess whether two samples differ; but how can researchers know whether differences indeed affect the MR results?

Somehow, the "sample overlap" and "two-sample assumption" are somewhat contradictory; i.e. you want the two samples to be similar, but overlap causes bias?

Reviewer #3:

Congratulations for creating a resource that should be very useful to many researchers in coming years. A huge amount of work has gone into this, saving many people a lot of time in future while reducing the chance of human error in these analyses.

I have not tested the software for this review but am aware that a user base already exists, and it is therefore ready for publication.

The article itself is well written and strikes a nice balance between describing methods, implementation and applications.

https://doi.org/10.7554/eLife.34408.011

Author response

[Editors’ note: the author responses to the first round of peer review follow.]

Reviewer #1:

The purpose of this paper is twofold. The first part of the paper describes the MR-Base, a resource of published GWAS results and a platform that allows performing mendelian randomization (MR) analyses with the available data. In the second part of the paper, the authors perform an actually MR analysis using their MR-base platform. They assess the efficacy of lipid lowering drugs by using genetic variants that mimic the effect of those drugs (HMGCR for statins; NPC1L1 for Ezetimibe; PCSK9 for Evolocumab, APOA for Lp[a] and APOC3 for triglyceride lowering drugs) on the prevention of cardiovascular disease. In addition, the safety of the lipid-lowering drugs was assessed by testing the association with type 2 diabetes and other relevant – potentially adverse – outcomes in a hypothesis free manner. The MR analyses confirm previous (larger) MR studies and are consistent with available randomized control trials. In addition, they found evidence for potential adverse outcomes, that may have not been reported before.

The author should be commended on creating a massive resource and a useful platform for many to perform MR analyses without necessarily having immediate access to the required data. The example that was used to "pilot" the MR-Base platform is of interest, but mainly confirms what has been reported before.

Thank you for the kind remark, and the very valuable suggestions on making this a more focused and accessible paper.

My main concern is that neither the first or the second part of the paper have been sufficiently developed. For example, for the first part to be truly helpful to readers, the resource, platform and underlying methodology should have been described in more detail in the main text and more guidance should have been given at which test should/could be used under which circumstances and what their (dis)advantages are.

We have now completely re-written the manuscript. The first part explains the methods behind 2SMR and the assumptions they seek to address. We then describe in detail the steps that should be taken to use MR-Base. Third, we provide some examples of hypothesis-driven and hypothesis-free analyses. Finally, we discuss the strengths and limitations of the resource.

In addition, the limitations of the MR-base compared to a typical MR analyses and compared to RCTs should have been made more explicit. Currently, this information is hidden in supplementary data.

We certainly had no intention to hide the limitations section. We have now re-written the Discussion section, focussing on the limitations that MR-Base addresses, the limitations that MR-Base isn’t addressing, and new or existing weaknesses in causal inference that might be exacerbated by MR-Base.

The second part, while interesting, is mainly confirmatory and more used as an example or an application of the MR-base resource rather than a new hypothesis that is being tested. The main new observation is that MR-base allows doing a hypothesis free screen for potential (adverse/other) outcomes.

Taken together, as a reader, it may not be clear what the main aim of the paper is as neither part has great depth or innovation.

We do still include an updated hypothesis-free analysis because we believe that this is absolutely necessary to illustrate the utility of the resource. With regards to innovation, we believe that the construction of a platform that integrates data with analysis on this scale is highly innovative, and we have tried to describe this in a lot of detail. We are reluctant to introduce too many new ideas in this single paper because as you have pointed out to us already, we were already struggling to be sufficiently clear. We have, however, included a brief section that describes how hypothesis-free analyses can help to understand the heterogeneity that is often observed in MR analysis – this is an entirely new way to exploit the database.

Reviewer #2:

MR-Base is an important resource and it is right that the academic community is made aware of it and that researchers have a reference that they can cite when they use it, so I am broadly supportive of this paper.

We are grateful for the positive remarks about MR-Base and the valuable suggestions on improving the paper. The paper has been completely re-written to focus more on describing the resource, in terms of the problems that it solves, how to use it, showcased examples, and there is now a lengthy section on limitations.

However, what the authors have produced is a non-critical description of MR-Base. There is a section entitled 'summary of limitations and some solutions' but it is only 7 lines long. The authors need to correct this imbalance.

Thank you for this suggestion. We completely agree that it’s of paramount importance for users to be aware of the limitations of MR, and to critically evaluate any causal inference that is generated through MR or by MR-Base. As such we had previously included an extremely lengthy description of limitations in MR and MR-Base, but we deemed it too unwieldy to go anywhere other than in the Supplementary materials. We have now completely re-written the Discussion section to focus on three main topics: The limitations in MR that are addressed by MR-Base, limitations in MR that are not addressed by MR-Base, and limitations in MR that are exacerbated by MR-Base. Throughout, we draw the reader’s attention to the limitations inherent in MR and MR-Base also.

In the supplement the authors consider 33 different examples of the use of MR-Base. If this paper is intended to inform the research community about MR-Base then one or two examples would be sufficient, provided that they include a statement to the effect that they are intended as illustrations and not as definitive research findings. It appears that the authors are looking for an easy way of laying claim to some of the obvious applications before MR-Base is made public. These applications are out of place in this paper and they are dangerous. There are so many examples they cannot be considered in the detail that one would normally find in an MR paper and just as importantly, they cannot be properly reviewed.

My suggestion is that the paper is re-written dropping most of the examples and in a style that acknowledges the limitations of the database. In that form I would support publication.

Thank you for the suggestion. Please be assured that we had no intention to lay claim to a large number of results through this paper – we were simply trying to demonstrate the utility of the resource. The paper has now been re-written to focus much more on describing the resource, and throughout we emphasise that the results presented are there to showcase the resource, and in general any results obtained through MR analysis should be triangulated with other experimental designs where possible. We do believe it is important to show-case the utility of MR-Base for both hypothesis-driven and hypothesis-free causal inference, and so have retained these analyses. We have omitted the follow up graphs for those suggestive findings in order to maintain focus on describing the resource, and we now flag in the paper that any putative associations need to be followed up with dedicated analysis in separate studies.

[Editors' note: the author responses to the re-review follow.]

Reviewer #1:

The revised paper is very well written and provides a nice balance between the MR-base description, real-life examples and strengths and limitations. The example, estimating the causal relationship between LDL-cholesterol and coronary heart disease is informative as it also illustrates how to detect potential biases and how to resolve them. The MR-base has been an extreme valuable contribution to the field, and this paper provides the details users may need perform their own analyses.

As for the limitations; it would be great if the authors could be more explicit about how some of these limitations might affect the MR results. e.g. sample overlap could lead to "weak instrument bias", which mean what exactly?

We have provided further clarifications and examples of impacts to text (see Discussion section). In the example given by the reviewer, sample overlap could bias associations towards the confounded observational association – a phenomenon known as weak instrument bias. Bias from sample overlap can, however, be minimized by using strong instruments (e.g. an F statistic much greater than 10 for the instrument-exposure association). If the overlapping samples also include the discovery study this can compound the problem. This can be avoided by using replication samples to define instrument-exposure effects.

MR-base provides meta-data to assess whether two samples differ; but how can researchers know whether differences indeed affect the MR results?

Clarifications added to Discussion section. If the populations are different for the exposure and outcome studies (e.g. European vs East Asian study), it is likely that this will lead to at least some mis-estimation of the magnitude of the association between exposure and outcome, although inferences about directions of causality should remain unbiased. If using samples from different populations is unavoidable, users should acknowledge the impact of this limitation and restrict their conclusions to directions of causality.

Somehow, the "sample overlap" and "two-sample assumption" are somewhat contradictory; i.e. you want the two samples to be similar, but overlap causes bias?

Clarifications added to Discussion section. The exposure and outcome studies should not involve overlapping participants (i.e. the participating individuals should not be members of both studies) but the participants from both studies should come from the same popuaslation – i.e. they should be of similar age and sex distribution and come from the same geographic area. In addition, they should have similar patterns of LD in the genomic regions used to define the instruments (i.e. come from similar genetic ancestry).

https://doi.org/10.7554/eLife.34408.012

Article and author information

Author details

  1. Gibran Hemani

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Methodology, Writing—original draft, Writing—review and editing
    Contributed equally with
    Jie Zheng and Benjamin Elsworth
    For correspondence
    g.hemani@bristol.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon 0000-0003-0920-1055
  2. Jie Zheng

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Data curation, Methodology, Writing—original draft
    Contributed equally with
    Gibran Hemani and Benjamin Elsworth
    For correspondence
    jie.zheng@bristol.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon 0000-0002-6623-6839
  3. Benjamin Elsworth

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Data curation, Software, Methodology
    Contributed equally with
    Gibran Hemani and Jie Zheng
    Competing interests
    No competing interests declared
  4. Kaitlin H Wade

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Data curation
    Competing interests
    No competing interests declared
    ORCID icon 0000-0003-3362-6280
  5. Valeriia Haberland

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Data curation
    Competing interests
    No competing interests declared
  6. Denis Baird

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Data curation
    Competing interests
    No competing interests declared
    ORCID icon 0000-0003-4600-6013
  7. Charles Laurin

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Software
    Competing interests
    No competing interests declared
  8. Stephen Burgess

    Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Methodology
    Competing interests
    No competing interests declared
    ORCID icon 0000-0001-5365-8760
  9. Jack Bowden

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Methodology
    Competing interests
    No competing interests declared
  10. Ryan Langdon

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Data curation
    Competing interests
    No competing interests declared
  11. Vanessa Y Tan

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Data curation
    Competing interests
    No competing interests declared
    ORCID icon 0000-0001-7938-127X
  12. James Yarmolinsky

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Data curation
    Competing interests
    No competing interests declared
  13. Hashem A Shihab

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Software
    Competing interests
    No competing interests declared
  14. Nicholas J Timpson

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Supervision
    Competing interests
    No competing interests declared
  15. David M Evans

    1. Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    2. University of Queensland Diamantina Institute, Translational Research Institute, Brisbane, Australia
    Contribution
    Supervision
    Competing interests
    No competing interests declared
  16. Caroline Relton

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Conceptualization, Supervision
    Competing interests
    No competing interests declared
    ORCID icon 0000-0003-2052-4840
  17. Richard M Martin

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Conceptualization, Supervision
    Competing interests
    No competing interests declared
    ORCID icon 0000-0002-7992-7719
  18. George Davey Smith

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Conceptualization, Supervision, Methodology
    Competing interests
    No competing interests declared
    ORCID icon 0000-0002-1407-8314
  19. Tom R Gaunt

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Conceptualization, Software, Supervision, Methodology, Writing—original draft, Project administration, Writing—review and editing
    Contributed equally with
    Philip C Haycock
    For correspondence
    tom.gaunt@bristol.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon 0000-0003-0924-3247
  20. Philip C Haycock

    Medical Research Council (MRC) Integrative Epidemiology Unit, Population Health Sciences, Bristol Medical School, University of Bristol, Bristol, United Kingdom
    Contribution
    Conceptualization, Supervision, Data curation, Formal analysis, Methodology, Writing—original draft, Project administration, Writing—review and editing
    Contributed equally with
    Tom R Gaunt
    For correspondence
    philip.haycock@bristol.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon 0000-0001-5001-3350

Funding

Wellcome (208806/Z/17/Z)

  • Gibran Hemani

Cancer Research UK (C18281/A19169)

  • Benjamin Elsworth
  • Kaitlin H Wade
  • Vanessa Y Tan
  • Nicholas J Timpson
  • Caroline Relton
  • Richard M Martin
  • Tom R Gaunt
  • Philip C Haycock

GlaxoSmithKline

  • Valeriia Haberland

Biogen

  • Denis Baird

Medical Research Council (Methodology Research Fellowship, MR/N501906/1)

  • Jack Bowden

National Institute for Health Research (NIHR Bristol BRC)

  • Nicholas J Timpson

Wellcome

  • Nicholas J Timpson

Australian Research Council

  • David M Evans

National Health and Medical Research Council (APP1125200)

  • David M Evans

National Health and Medical Research Council (APP1137714)

  • David M Evans

Cancer Research UK (Population Research Postdoctoral Fellowship, C52724/A20138)

  • Philip C Haycock

Roy Castle Lung Cancer Foundation (2013/18/Relton)

  • Philip C Haycock

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We gratefully acknowledge our collaborators who shared summary data: Nicole Soranzo on behalf of the HaemGen consortium; David A van Heel on behalf of the celiac disease GWAS; Yukinori Okada on behalf of the C-reactive protein GWAS; GliomaScan; Clara S. Tang, Merce Garcia-Barcelo and Paul KH Tam on behalf of the Hirschsprung's disease GWAS; Kaya Kvarme Jacobsen on behalf of the migraine in bipolar disorder GWAS; Gregory T Jones and Matthew J Bown on behalf of the International Aneurysm Consortium; Omar Albagha and Stuart H. Ralston on behalf of the Paget’s disease GWAS; Andre Franke, Annegret Fischer and David Ellinghaus on behalf of the sarcoidosis GWAS; Asta Försti, Hauke Thomsen and Stefano Landi on behalf of the thyroid cancer GWAS; Heather Cordell on behalf of the UK, Italian and Canadian-US primary biliary cirrhosis GWAS; Ani W Manichaikul and R Graham Barr on behalf of the percent emphysema GWAS; Jeffrey E Lee on behalf of the melanoma GWAS of the MDACC study. 

We also gratefully acknowledge all studies and databases that have made GWAS summary data available (the investigators of these studies and databases did not participate in the analysis, writing or interpretation of this report): ADIPOGen (Adiponectin genetics consortium), AMDGene (Age-related Macular Degeneration Gene Consortium), BioBank Japan Project, C4D (Coronary Artery Disease Genetics Consortium), CARDIoGRAM (Coronary ARtery DIsease Genome wide Replication and Meta-analysis), CKDGen (Chronic Kidney Disease Genetics consortium), CORNET (The CORtisol NETwork), dbGAP (database of Genotypes and Phenotypes), DCCT/EDIC (Diabetes Control and Complications Trial/Epidemiology of Diabetes Intervention and Complications study cohort), DIAGRAM (DIAbetes Genetics Replication And Meta-analysis), EAGLE (EArly Genetics and Lifecourse Epidemiology Consortium), EAGLE Eczema (EArly Genetics and Lifecourse Epidemiology Eczema Consortium), EGG (Early Growth Genetics Consortium), ENIGMA (Enhancing Neuro Imaging Genetics through Meta Analysis), GABRIEL (A Multidisciplinary Study to Identify the Genetic and Environmental Causes of Asthma in the European Community), GCAN (Genetic Consortium for Anorexia Nervosa), GEFOS (GEnetic Factors for OSteoporosis Consortium), GIANT (Genetic Investigation of ANthropometric Traits), GIS (Genetics of Iron Status consortium), GLGC (Global Lipids Genetics Consortium), GliomaScan (cohort-based genome-wide association study of glioma), GPC (Genetics of Personality Consortium), GUGC (Global Urate and Gout consortium), HaemGen (haemotological and platelet traits genetics consortium), HRgene (Heart Rate consortium), IAC (the International Aneurysm Consortium), ICBP (International Consortium for Blood Pressure), IGAP (International Genomics of Alzheimer's Project), IIBDGC (International Inflammatory Bowel Disease Genetics Consortium), ILCCO (International Lung Cancer Consortium), ImmunoBase (resource focused on the genetics and genomics of immunologically related human diseases), IMSGC (International Multiple Sclerosis Genetic Consortium), ISGC (International Stroke Genetics Consortium), MAGIC (Meta-Analyses of Glucose and Insulin-related traits Consortium), MDACC (MD Anderson Cancer Center), MESA (Multi-Ethnic Study of Atherosclerosis), NHGRI-EBI GWAS catalog (National Human Genome Research Institute and European Bioinformatics Institute Catalog of published genome-wide association studies), PanScan (Pancreatic Cancer Cohort Consortium), PGC (Psychiatric Genomics Consortium), Project MinE consortium, ReproGen (Reproductive ageing Genetics consortium), SSGAC (Social Science Genetics Association Consortium), TAG (Tobacco and Genetics Consortium) and TRICL (Transdisciplinary Research in Cancer of the Lung consortium). We gratefully acknowledge the assistance of Dr Johannes Kettunen and Dr Benjamin Neale.

Supported by Cancer Research UK grant C18281/A19169 (the Integrative Cancer Epidemiology Programme) and the Roy Castle Lung Cancer Foundation (2013/18/Relton). The Medical Research Council Integrative Epidemiology Unit is supported by grants MC_UU_12013/1, MC_UU_12013/2 and MC_UU_12013/8. PCH is supported by a Cancer Research UK Population Research Postdoctoral Fellowship (C52724/A20138). Jack Bowden is supported by a MRC Methodology Research Fellowship (grant MR/N501906/1). DME supported by the NHMRC APP1125200, APP1137714. GH is supported by Wellcome (208806/Z/17/Z).

Reviewing Editor

  1. Ruth Loos, Reviewing Editor, The Icahn School of Medicine at Mount Sinai, United States

Publication history

  1. Received: January 3, 2018
  2. Accepted: March 28, 2018
  3. Version of Record published: May 30, 2018 (version 1)

Copyright

© 2018, Hemani et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,301
    Page views
  • 186
    Downloads
  • 4
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

  1. Further reading

Further reading

    1. Computational and Systems Biology
    Yuichi Eguchi et al.
    Research Article
    1. Cell Biology
    2. Computational and Systems Biology
    Cecilia Garmendia-Torres et al.
    Research Article Updated