Statistical examination of shared loci in neuropsychiatric diseases using genome-wide association study summary statistics

Thomas P Spargo; Lachlan Gilchrist; Guy P Hunt; Richard JB Dobson; Petroula Proitsi; Ammar Al-Chalabi; Oliver Pain; Alfredo Iacoangeli

doi:10.7554/eLife.88768.2

eLife assessment

This paper presents a valuable pipeline based on state-of-the-art analytical software that was used to study genetic pleiotropy between neuropsychiatric disorders. The presented evidence supporting the claims is convincing and now includes an appropriate comparison to previously published methods as well as a detailed exploration of the findings. The created pipeline can thus be used by researchers from diverse fields to study different combinations of diseases and traits.

https://doi.org/10.7554/eLife.88768.2.sa3

Significance of findings

valuable: Findings that have theoretical or practical implications for a subfield

landmark
fundamental
important
valuable
useful

Strength of evidence

convincing: Appropriate and validated methodology in line with current state-of-the-art

exceptional
compelling
convincing
solid
incomplete
inadequate

During the peer-review process the editor and reviewers write an eLife assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife assessments

Abstract

Continued methodological advances have enabled numerous statistical approaches for the analysis of summary statistics from genome-wide association studies. Genetic correlation analysis within specific regions enables a new strategy for identifying pleiotropy. Genomic regions with significant ‘local’ genetic correlations can be investigated further using state-of-the-art methodologies for statistical fine-mapping and variant colocalisation. We explored the utility of a genome-wide local genetic correlation analysis approach for identifying genetic overlaps between the candidate neuropsychiatric disorders, Alzheimer’s disease, amyotrophic lateral sclerosis, frontotemporal dementia, Parkinson’s disease, and schizophrenia. The correlation analysis identified several associations between traits, the majority of which were loci in the human leukocyte antigen (HLA) region. Colocalisation analysis suggested that disease-implicated variants in these loci often differ between traits and, in one locus, indicated a shared causal variant between amyotrophic lateral sclerosis and Alzheimer’s disease. Our study identified candidate loci that might play a role in multiple neuropsychiatric diseases and suggested the role of distinct mechanisms across diseases despite shared loci. The fine-mapping and colocalisation analysis protocol designed for this study has been implemented in a flexible analysis pipeline that produces HTML reports and is available at: https://github.com/ThomasPSpargo/COLOC-reporter.

1. Introduction

The genetic spectrum of neuropsychiatric disease is diverse and various overlaps exist between traits. For instance, genetic pleiotropy between amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) is increasingly recognised, and ALS is genetically correlated with Alzheimer’s disease (AD), Parkinson’s disease (PD), and schizophrenia^1–3. Improving understanding of the genetic architecture underlying these complex diseases could facilitate future treatment discovery.

Advances in genomic research techniques have accelerated discovery of genetic variation associated with complex traits. Genome-wide association studies (GWAS), in particular, have enabled population-scale investigations of the genetic basis of human diseases and anthropometric measures⁴. Summary-level results from GWAS are being shared alongside publications with increasing frequency over time⁵, and a breadth of approaches now exist for downstream analysis based on summary statistics which can enable their interpretation and provide further biological insight.

Genetic correlation analysis allows estimation of genetic overlap between traits^6–10. A ‘global’ genetic correlation approach gives a genome-wide average estimate of this overlap However, genetic relationships between traits can be obscured when correlations in opposing directions cancel out genome-wide⁸. Recent methods allow for a more nuanced analysis, of ‘local’ genetic correlations partitioned across the genome^8,9. This stratified approach to genome-wide analysis could prove effective for identifying pleiotropic regions and designing subsequent analyses aiming to identify genetic variation shared between traits.

A number of methods aim to disentangle causality within associated regions. This is important because the focus on single nucleotide polymorphisms (SNPs), which are markers of genetic variation, in GWAS produces results that can be difficult to interpret, and causal variants are typically unclear. More so, because of linkage disequilibrium (LD), GWAS associations often comprise large sets of highly correlated SNPs spanning large genomic regions. Statistical fine-mapping is a common approach for dissecting complex LD structures and finding variants with implications for a given trait among the tens or hundreds that might be associated in the region¹¹.

Interpretation of regions associated with multiple traits can also be challenging, since it is often unclear whether these overlaps are driven by the same causal variant. Statistical colocalisation analysis can disentangle association signals across traits to suggest whether the overlaps result from shared or distinct causal genetic factors^12–14. Traditionally this analysis was restricted by the assumption of at most one causal variant for each trait in the region. However, recent extensions to the method now permit analysis based on univariate fine-mapping results for the traits compared and, therefore, analysis of regions with multiple causal variants.

Accordingly, we conducted genome-wide local genetic correlation analysis across 5 neuropsychiatric traits with recognised phenotypic and genetic overlap^2,3,15–17: AD, ALS, FTD, PD, and schizophrenia. Although previous studies have performed global genetic correlation analyses between various combinations of these traits^1,2,18,19, this is the first to compare them at a genome-wide scale using a local genetic correlation approach. Loci highly correlated between trait pairs were further investigated with univariate fine-mapping and bivariate colocalisation techniques to examine variants driving these associations.

2. Methods

2.1. Sampled GWAS summary statistics

We leveraged publicly-accessible summary statistics from European ancestry GWAS meta-analyses of risk for AD²⁰, ALS¹, FTD²¹, PD²², and schizophrenia²³. European ancestry data were selected to avoid LD mismatch between the GWAS sample and reference data from an external European population.

2.2. Procedure

Figure 1 summarises the analysis protocol for this study; further details are provided below.

Overview of the analysis procedure for this study
SuSiE (sum of single effects) is a univariate fine-mapping approach implemented within the R package susieR. ‘coloc’ is an R package for bivariate colocalisation analysis between pairs of traits. h ² = Heritability, r_g= bivariate genetic correlation. The analysis steps shaded in blue have been implemented within a readily applied analysis pipeline available on GitHub: https://github.com/ThomasPSpargo/COLOC-reporter.

2.2.1. Processing of GWAS summary statistics

A standard data cleaning protocol was applied to each set of summary statistics²⁴. We retained only single nucleotide polymorphisms (SNPs), excluding any non-SNP or strand-ambiguous variants. SNPs were filtered to those present within the 1000 Genomes phase 3 (1KG) European ancestry population reference dataset²⁵ (N = 503). They were matched to the 1KG reference panel by GRCh37 chromosomal position using bigsnpr (version 1.11.6)²⁶, harmonising allele order with the reference and assigning SNP IDs.

If not reported, and where possible, effective sample size (N_eff) was calculated from per-SNP case and control sample sizes. When this could not be determined per-SNP, all variants were assigned a single N_eff, calculated as a sum of N_eff values for each cohort within the GWAS meta-analysis²⁷.

Further processing was performed where possible, excluding SNPs with imputation INFO <0.9, p-values ≤0 or >1, and N_eff >3 standard deviations from the median N_eff. We filtered to include only variants with minor allele frequency (MAF) ≥0.005 in both the reference and GWAS samples and excluded SNPs with an absolute MAF difference of >0.2 between the two.

2.2.2. Genome-wide analyses

2.2.2.1. Global heritability and genetic correlations

LDSC (version 1.0.1)^6,7 was applied to estimate genome-wide univariate heritability (h²) for each trait on the liability scale. The software was also applied to derive ‘global’ (i.e., genome-wide) genetic correlation estimates between trait pairs and estimate sample overlap from the bivariate intercept. The latter of these outputs was taken forward as an input for the local genetic correlation analysis using LAVA (see 2.2.2.2). Since global genetic correlation analysis across the traits studied here is not novel and associations reported in past studies are congruent across different tools¹⁸, the compatibility between LDSC and LAVA motivated our use of LDSC for this analysis.

These analyses were performed using the HapMap3²⁸ SNPs and the LD score files provided with the software, calculated in the 1KG European population. No further MAF filter was applied (therefore variants with MAF >0.005 were included) and the other settings were left to their defaults.

2.2.2.2. Local genetic correlation analysis

LAVA (version 0.1.0)⁸ was applied to obtain local genetic correlation estimates across 2,495 approximately independent blocks delineating the genome, based on patterns in LD. We used the blocks provided alongside the LAVA software which were derived from the 1KG European cohort. Bivariate intercepts from LDSC were provided to LAVA to estimate sample overlap between trait pairs.

LAVA was the most appropriate local genetic correlation approach for this study for several reasons⁸. First, unlike SUPERGNOVA⁹ and rho-HESS¹⁰, LAVA makes specific accommodations for analysis of binary traits. Second, other tools focus on bivariate correlation between traits whilst LAVA offers this alongside multivariate tests such as multiple regression and partial correlation, enabling rigorous testing of pleiotropic effects. Lastly, LAVA is shown to provide results which are less biased than those from other tools.

In accordance with prior studies, genetic correlation analysis was performed following an initial filtering step. Univariate heritability was estimated for each genomic block across SNPs in-common between a pair of traits, and only loci with local h² p-values below a threshold of 2.004×10^-5 (0.05/2,495) in both traits continued to the bivariate analysis. This step ensures that univariate heritability is sufficient in both traits for a robust correlation estimate.

2.2.3. Targeted genetic analyses

2.2.3.1. Fine-mapping and colocalisation analysis

Statistical fine-mapping and colocalisation techniques were applied to further analyse associations between trait pairs in regions where the false discovery rate (FDR) adjusted p-value of local genetic correlation analysis was below 0.05 (after adjusting for all bivariate comparisons performed). Additional analysis was conducted at loci where significant correlations occurred between two trait pairs but not between the final pairwise comparison across the three implicated traits.

Fine-mapping was performed with susieR (v0.12.27)^11,29, which implements the ‘sum of single effects’ (SuSiE) model to represent statistical evidence of causal genetic variation within ‘credible sets’ and per-SNP posterior inclusion probabilities (PIPs). A 95% credible set indicates 95% certainty that at least one SNP included within the set has a causal association with the phenotype and higher PIPs indicate a greater posterior probability of being a causal variant within a credible set. Multiple credible sets are identified when the data suggest more than one independent causal signal.

Colocalisation analysis was implemented with coloc (v5.1.0.1)^12,13,30, which calculates posterior probabilities that a causal variant exists for neither, one, or both of two compared traits, testing also whether evidence for a causal variant in both traits suggests a shared variant (i.e., hypothesis 4 (H4); colocalisation) or independent signals (Hypothesis 3 (H3)).

Colocalisation analyses can be performed across all variants sampled in a region, under an assumption of at most one variant implicated per trait. It can also be performed using variants attributed to pairs of credible sets from SuSiE, relaxing the single variant assumption¹². When evidence of a shared variant is found, the individual SNPs with the highest posterior probability of being that variant can be assessed. With a 95% confidence threshold, these are termed 95% credible SNPs.

Analysis pipeline

We conducted colocalisation and fine-mapping analysis within an open-access pipeline developed for this study using R (v4.2.2)³¹: https://github.com/ThomasPSpargo/COLOC-reporter.

Briefly, in this workflow (see Figure 1), GWAS summary statistics are harmonised across analysed traits for a specified genomic region, including only variants in common between them and available within a reference population. An LD correlation matrix across sampled variants is derived from a reference population using PLINK (v1.90)^32,33.

Quality control is performed per-dataset prior to univariate fine-mapping analysis. Diagnostic tools provided with susieR are applied to test for consistency between the LD matrix and Z-scores from the GWAS and identify variants with a potential ‘allele flip’ (reversed effect estimate encoding) that can impact fine-mapping.

Fine-mapping is performed for each dataset with the coloc package runsusie function, which wraps around susie_rss from susieR and is configured to facilitate subsequent colocalisation analysis. Sample size (N_eff for binary traits) is specified as the median for SNPs analysed.

Colocalisation analysis can be performed with the coloc functions coloc.abf and coloc.susie when fine-mapping yields at least one credible set for both traits and otherwise using coloc.abf only. Genes located near credible sets from fine-mapping and credible SNPs from colocalisation analyses are identified via Ensembl and biomaRt (v2.54.0)^34–36.

Analysis parameters can be adjusted by the user in accordance with their needs. Various utilities are included to help interpretation of fine-mapping and colocalisation results, including identification of genes nearby to putatively causal signals, HTML reports to summarise completed analyses, and figures to visualise the results and compare the examined traits.

Current implementation

In this study, LD correlation matrices were derived from the 1KG European cohort. SNPs flagged for potential allele flip issues in either of the compared traits were removed from the analysis. Fine-mapping was performed with the susie_rss refine=TRUE option to avoid local maxima during convergence of the algorithm, leaving the other settings to the runsusie defaults. Colocalisation analysis was performed using the default priors for coloc.susie (P₁=1×10^-4, P₂=1×10^-4, P₁₂=5×10^-6).

Colocalisation and fine-mapping analyses were performed initially using the genomic blocks defined by LAVA, since these aim to define relatively independent LD partitions across the genome⁸. If a 95% credible set could not be identified in one or both traits, we inspected local Manhattan plots for the region to determine whether potentially relevant signals occurred around the region boundaries. The analysis was repeated with a ±10Kb window around the LAVA-defined genomic region if p-values for SNPs at the edge of the block were p<1×10^-4 for both traits and the Manhattan plots were suggestive of a ‘peak’ not represented within the original boundaries.

3. Results

3.1. Genome-wide analyses

Descriptive information and heritability estimates for the sampled traits and GWAS are presented in Table 1. ALS had nominally significant global genetic correlations with schizophrenia (p = 0.045), PD (p = 0.013), and AD (p = 0.006); no other bivariate genome-wide correlations were statistically significant (see Figure S1).

Genome-wide association studies (GWAS) sampled
Each GWAS is a GWAS meta-analysis of disease risk across people of European ancestry. *Proxy cases from the UK Biobank cohort. ^†Estimated from cumulative risk after age 45 after correcting for competing risk of mortality and assuming a lifespan of ∼85 years. h² = heritability

A total of 605 local genetic correlation analyses were performed across all trait pairs in genomic regions where both traits passed the univariate heritability filtering step after restricting to SNPs sampled in both GWAS (see Table 2; Figure 2; Table S1). The number of loci passing to bivariate analysis varied greatly across trait pairs and was congruent with the genome-wide heritability estimates (and their uncertainty) for each trait, reflecting differences in phenotypic variance explained by measured genetic variants and statistical power for each GWAS (see Table 1).

Comparison of genome-wide SNP significance against local genetic correlation significance thresholds in all trait pairs and loci analysed
All loci analysed showed sufficient local univariate heritability across compared traits to allow bivariate correlation analysi s. Subsequent fine-mapping and colocalisation analyses were performed in this study for regions with at least a false discovery rate (FDR) adjusted significance for the local genetic correlation. SNP = single nucleotide polymorphism.

Local genetic correlation analyses between trait pairs
The lower panel displays a heatmap of genetic correlations (r_g) across genomic regions where any bivariate analyses were performed; white colouring indicates that the region was not analysed for a given trait pair owing to insufficient univariate heritability in one or both traits. The upper panel shows a Manhattan plot of p-values from each correlation analysis, denoting trait pairs by colour and comparisons passing defined significance thresholds by shape (square for a strict Bonferroni threshold and triangle for a false discovery rate (FDR) adjusted threshold); the hatched line indicates the threshold p-value above which P_fdr <0.05. The panels are both ordered by relative genomic position, with bars above and below indicating each chromosome. AD = Alzheimer’s disease, ALS = amyotrophic lateral sclerosis, FTD = frontotemporal dementia, PD = Parkinson’s disease, SZ = schizophrenia. Table S1 provides a complete summary of local genetic correlation analyses performed.

Twenty-six bivariate comparisons were significant following FDR adjustment (p_fdr <0.05), two of which also passed the stringent Bonferroni threshold (p <8.26×10^-5; 0.05/605). While some regions included genome-wide significant SNPs (p <5×10^-8) for one or both traits, others occurred in regions where GWAS associations were weaker (see Table 2). Five of these associations occurred at loci within the human leukocyte antigen (HLA) region (GRCh37: Chr6:28.48-33.45Mb; 6p22.1-21.3⁴³), and all five traits were implicated in at least one of these.

3.2. Targeted genetic analyses

Univariate fine-mapping and bivariate colocalisation analyses were subsequently performed to test for variants jointly implicated between trait pairs in regions with local genetic correlation P_fdr <0.05. The ALS and schizophrenia trait pair was additionally examined at Chr6:32.22-32.45Mb because significant genetic correlations were found between ALS and FTD and between schizophrenia and FTD at this locus. The correlation between ALS and schizophrenia at this locus had not been analysed owing to insufficient univariate heritability for ALS after restricting to SNPs in common with the schizophrenia GWAS.

Fine-mapping identified at least one 95% credible set for each of the compared traits for 7 of the 27 comparisons performed (see Table 3), and for one trait only in a further 5 (see Table S2; Table S3). This analysis suggested two credible sets for schizophrenia in the Chr12:56.99-58.75Mb locus, for AD in Chr6:32.45-32.54Mb, and (only when harmonised to SNPs in common with the ALS GWAS) for FTD in Chr6:32.22-32.45Mb (see Table S3).

Colocalisation analysis conducted across 95% credible sets identified during univariate fine-mapping of trait pairs
N SNPs refers to the number of SNPs present for both traits and the 1000 genomes reference panel in the region within colocali sation and fine-mapping analysis. *Indicates comparisons with genetic correlation analysis p <8.26×10^-5 (0.05/605). ^Δ Denotes locus extended by ±10kb for fine-mapping and colocalisation analysis. ^†Variant identified in colocalisation as having the highest posterior probability of being shared variant assuming hypothesis 4 is true (see Figure 3). ^§Differences in fine-mapping solutions across trait pairs in the Chr6:32.21-32.45Mb locus reflect differences in the SNPs retained after restricting to those in common between the compared GWAS ^ø H0 = no causal variant for either trait, H1 = variant causal for trait 1, H2 = variant causal for trait 2, H3 = distinct causal variants for each trait, H4 = a shared causal variant between traits. PIP = posterior inclusi on probability. AD = Alzheimer’s disease, ALS = amyotrophic lateral sclerosis, FTD = frontotemporal dementia, PD = Parkinson’s disease, SZ = schizophrenia.

Although both positive and negative local genetic correlations passed the FDR-adjusted significance threshold, we observed only positive local genetic correlations in loci where fine-mapping credible sets were identified for both traits in the pair. This reflects that the absolute correlation coefficients and variant associations from the analysed GWAS studies were generally stronger in the positively correlated loci (see Figure S2).

Colocalisation analyses performed across fine-mapping credible sets and across all SNPs in a region generally gave support to the equivalent hypothesis (Table 3; Table S2). Moreover, comparisons suggesting a signal was present in one trait only were largely concordant with the identification of fine-mapping credible sets in only that trait (Table S2). Figure S3 compares per-SNP p-values across trait pairs for comparisons with evidence of a relevant signal in both traits. Figure S4 shows patterns of LD across SNPs assigned to credible sets for these analyses.

Strong evidence was found for a shared variant between ALS and AD within the HLA region (Posterior probability of shared variant = 0.9; see Figure 3). The 95% credible SNPs for this association were distributed around the MTCO3P1 pseudogene and rs9275477, the lead genome-wide significant SNP from the ALS GWAS in this region, had the highest posterior probability of being implicated in both traits. Figure S5 presents sensitivity analysis showing that the result is robust to a range of values for the shared variant hypothesis prior probability.

Evidence for colocalisation between amyotrophic lateral sclerosis (ALS) and Alzheimer’s disease (AD) in the Chr6:32.63-32.68Mb region
**Panel A:** SNP-wise p-value distribution between ALS and AD across Chr6:32.63-32.68Mb, in which colocalisation analysis found 0.90 posterior probability of the shared variant hypothesis (see Table 3). **Panel B:** (upper) Per-SNP posterior probabilities for being a shared variant between ALS and AD, (lower) positions of HGNC gene symbols nearby to the 95% credible SNPs. Posterior probabilities for being a shared variant sum to 1 across all SNPs analysed and are predicated on the assumption that a shared variant exists; 95% credible SNPs are those spanned by the top 0.95 of posterior probabilities. The x-axis for Panel B is truncated by the base pair range of the credible SNPs and genomic positions are based on GRCh37.

The other comparisons that found fine-mapping credible sets in both traits suggested that overlaps from the correlation analysis were driven by distinct causal variants (see Table 3; Table S2).

Univariate fine-mapping of PD and schizophrenia at Chr17:43.46-44.87Mb found large credible sets spanning many genes, including MAPT^44–47 and CRHR1^48,49 which have been previously implicated in the traits we have analysed. These expansive credible sets reflect the strong LD in the region and indicate a signal that is difficult to localise (see Figure S4(F); Table S3). The colocalisation analysis suggested independent variants for each trait despite many SNPs overlapping across their respective credible sets (see Figure S4). Sensitivity analysis showed robust support for the two independent variants hypothesis across shared-variant hypothesis priors (Figure S5). However, the colocalisation analysis will increasingly favour the two independent variants hypothesis as the number of analysed variants increases⁵⁰. Hence, the wide-spanning LD of this region may have obstructed identification of variants and mechanisms shared between the traits.

4. Discussion

We examined genetic overlaps between the neuropsychiatric conditions Alzheimer’s disease, amyotrophic lateral sclerosis, frontotemporal dementia, Parkinson’s disease, and schizophrenia. Through genetic correlation analysis, we replicated genome-wide correlations previously described between the studied traits^1,2,18,19. Leveraging a more recent local genetic correlation approach, we identified specific genomic loci jointly implicated between pairs of traits which were further investigated using statistical fine-mapping and colocalisation techniques.

Significant local genetic correlations were most frequent across genomic blocks within the HLA region, implicating each of the studied traits in at least one comparison. Several associated regions contained genes with known relevance for the traits studied, such as KIF5A, MAPT, and CRHR1. Colocalisation analysis found strong evidence for a shared genetic variant between ALS and AD in the Chr6:32.62-32.68Mb locus within HLA, while the other colocalisation analyses suggested causal signals distinct across traits, for one trait only, or for neither trait.

The tendency for association between traits around the HLA region is reasonable, since this is a known hotspot for pleiotropy^8,51. HLA is particularly known for its role in immune response and it is implicated in various types of disease^52,53. Mounting evidence has linked HLA and associated genetic variation to the traits we have analysed, and mechanisms underlying these associations are beginning to be understood^52–61. For instance, AD is associated with variants around the HLA-DQA1 and HLA-DRB1 genes and several SNPs in the non-coding region between them have been shown to modulate their expression⁶². Notably, one of the SNPs with a demonstrated regulatory role, rs9271247, had the highest probability of being causal for AD across the 95% credible set identified in the fine-mapping of the region.

Variants showing evidence for colocalisation between AD and ALS were distributed around the MTCO3P1 pseudogene in the HLA class II non-coding region between HLA-DQB1 and HLA-DQB2. MTCO3P1 has been previously identified as one of the most pleiotropic genes in the GWAS catalog^63,64. Previous studies have suggested the relevance of this region in both traits. HLA-DQB1 and HLA-DQB2 are both upregulated in the spinal cord of people with ALS, alongside other genes implicated in various immunological processes for antigen processing and inflammatory response⁶⁵. HLA class II complexes, and their subcomponents, have been identified as upregulated in multiple brain regions of people with AD, using both gene and protein expression techniques^61,66.

Our analysis of this region gave stronger support for colocalisation between the ALS and AD GWAS than a previous study¹. The previous study analysed a 200Kb window of over 2,000 SNPs around the lead genome-wide significant SNP from the ALS GWAS, rs9275477, and found ∼0.50 posterior probability for each of the shared and two independent variant(s) hypotheses. The current analysis used 475 SNPs occurring within a semi-independent LD block of ∼50kb in this locus. Since the posterior probability of the two independent variants hypothesis (H3) increases exponentially with the number of variants in the region whilst the shared variant hypothesis (H4) scales linearly, it is expected that our analysis would give stronger support for the latter⁵⁰. Given that the previous study defined regions for analysis based on an arbitrary window of ±100kb around each lead genome-wide significant SNP from the ALS GWAS and we defined each analysis region based on patterns of LD in European ancestry populations, it is reasonable to favour the current finding.

More broadly, our analyses suggest that regions with a strong genetic correlation between the five traits studied often result from adjacent but trait-specific signals, likely reflecting overlaps between LD blocks⁵¹. Correlations also occurred in regions with weaker overall GWAS associations (see Table 2), where fine-mapping and colocalisation analyses did not suggest causal associations in one or either trait. Such patterns likely reflect a shared polygenic trend across the region, rather than associations attributable to discrete variants. Accordingly, other approaches may be better suited for identifying regions containing genetic variation jointly causal across diseases, including the traditional approach of testing regions around overlapping genome-wide significant variants.

This study has used gold-standard statistical tools to examine genetic relationships between traits. The local genetic correlation analysis approach enabled targeted investigation of genomic regions which appear to overlap between traits. The application of colocalisation analysis alongside a prior univariate fine-mapping step allowed for associations to be tested without conflating independent but nearby signals under the single-variant assumption of colocalisation analysis across all variants sampled in a region.

The study is not without limitation. We necessarily used the 1KG European reference population to estimate LD between SNPs. Fine-mapping is ideally performed with an LD matrix from the GWAS sample and is sensitive to misspecification when inconsistencies in LD occur between the reference and GWAS cohorts. Use of a reference population is not uncommon, and diagnostic tools available within the susieR package allow testing for inconsistencies between the reference and GWAS samples¹¹. We accordingly implemented these tools centrally into our workflow and determined that the LD matrices from the 1KG reference were suitable for the data (estimates of Z-score and LD consistency are available in Table S3). Nevertheless, repeating this study in under-represented populations would be an important future step to validate our findings.

We employed statistical methods to identify and analyse genomic regions containing variants which might be jointly implicated across traits. These approaches provide useful associations between traits identified from large-scale genomic datasets. However, they alone are not sufficient for translation into clinical practice. Future studies should aim to extend any associations found by integrating functional and multi-omics datasets to gain mechanistic insights into observed trends and facilitate treatment discovery^62,67.

The fine-mapping and colocalisation analysis pipeline we have used is available as an open-access resource on GitHub to facilitate the application of these methods in future studies: https://github.com/ThomasPSpargo/COLOC-reporter. Specified genomic regions can be readily analysed by providing GWAS summary statistics for binary or quantitative traits of interest and a population-appropriate reference dataset for estimation of LD. The pipeline returns resources including detailed reports that overview the analyses performed.

Data Availability

All data are publicly available

Supplementary materials

Supplementary tables and figures can be found at https://github.com/KHP-Informatics/COLOC-reporter_supplementary_materials.

Funding

This project was part funded by the MND Association and the Wellcome Trust. This is an EU Joint Programme-Neurodegenerative Disease Research (JPND) project. The project is supported through the following funding organizations under the aegis of JPND– http://www.neurodegenerationresearch.eu/ [United Kingdom, Medical Research Council (MR/L501529/1 and MR/R024804/1) and Economic and Social Research Council (ES/L008238/1)]. AAC is a NIHR Senior Investigator. AAC received salary support from the National Institute for Health Research (NIHR) Dementia Biomedical Research Unit at South London and Maudsley NHS Foundation Trust and King’s College London. The work leading up to this publication was funded by the European Community’s Health Seventh Framework Program (FP7/2007–2013; grant agreement number 259867) and Horizon 2020 Program (H2020-PHC-2014-two-stage; grant agreement number 633413). This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Programme (grant agreement no. 772376–EScORIAL. This study represents independent research part funded by the NIHR Maudsley Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, King’s College London, or the Department of Health and Social Care. Funding was also provided by the King’s College London DRIVE-Health Centre for Doctoral Training and the Perron Institute for Neurological and Translational Science. AI is funded by South London and Maudsley NHS Foundation Trust, MND Scotland, Motor Neurone Disease Association, National Institute for Health and Care Research, Spastic Paraplegia Foundation, Rosetrees Trust, Darby Rimmer MND Foundation, the Medical Research Council (UKRI) and Alzheimer’s Research UK. OP is supported by a Sir Henry Wellcome Postdoctoral Fellowship [222811/Z/21/Z]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

This research was funded in whole or in part by the Wellcome Trust [222811/Z/21/Z]. For the purpose of open access, the author has applied a CC-BY public copyright licence to any author accepted manuscript version arising from this submission.

Acknowledgements

The authors acknowledge the use of the CREATE research computing facility at King’s College London⁶⁸. We also acknowledge Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (United Kingdom), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and Wellcome Trust.

References

1.
1. van Rheenen W
2. van der Spek RAA
3. Bakker MK
4. et al.
2021Common and rare variant association analyses in amyotrophic lateral sclerosis identify 15 risk loci with distinct genetic architectures and neuron-specific biologyNat Genet 53:1636–48https://doi.org/10.1038/s41588-021-00973-1
2.
1. Li C
2. Yang T
3. Ou R
4. Shang H
2021Overlapping Genetic Architecture Between Schizophrenia and Neurodegenerative DisordersFront Cell Dev Biol 9https://doi.org/10.3389/fcell.2021.797072
3.
1. Ranganathan R
2. Haque S
3. Coley K
4. Shepheard S
5. Cooper-Knock J
6. Kirby J
2020Multifaceted Genes in Amyotrophic Lateral Sclerosis-Frontotemporal DementiaFront Neurosci 14https://doi.org/10.3389/fnins.2020.00684
4.
1. Abdellaoui A
2. Yengo L
3. Verweij KJH
4. Visscher PM
202315 years of GWAS discovery: Realizing the promiseAm J Hum Genet 110:179–94https://doi.org/10.1016/j.ajhg.2022.12.011
5.
1. Reales G
2. Wallace C
2023Sharing GWAS summary statistics results in more citationsCommun Biol 6https://doi.org/10.1038/s42003-023-04497-8
6.
1. Bulik-Sullivan BK
2. Loh P-R
3. Finucane HK
4. et al.
2015LD Score regression distinguishes confounding from polygenicity in genome-wide association studiesNat Genet 47:291–5https://doi.org/10.1038/ng.3211
7.
1. Bulik-Sullivan BK
2. Finucane HK
3. Anttila V
4. et al.
2015An atlas of genetic correlations across human diseases and traitsNat Genet 47:1236–41https://doi.org/10.1038/ng.3406
8.
1. Werme J
2. van der Sluis S
3. Posthuma D
4. de Leeuw CA
2022An integrated framework for local genetic correlation analysisNat Genet 54:274–82https://doi.org/10.1038/s41588-022-01017-y
9.
1. Zhang Y
2. Lu Q
3. Ye Y
4. et al.
2021SUPERGNOVA: local genetic correlation analysis reveals heterogeneous etiologic sharing of complex traitsGenome Biol 22https://doi.org/10.1186/s13059-021-02478-w
10.
1. Shi H
2. Mancuso N
3. Spendlove S
4. Pasaniuc B
2017Local Genetic Correlation Gives Insights into the Shared Genetic Architecture of Complex TraitsAm J Hum Genet 101:737–51https://doi.org/10.1016/j.ajhg.2017.09.022
11.
1. Zou Y
2. Carbonetto P
3. Wang G
4. Stephens M
2022Fine-mapping from summary data with the “Sum of Single Effects” modelPLoS Genet 18https://doi.org/10.1371/journal.pgen.1010299
12.
1. Wallace C
2021A more accurate method for colocalisation analysis allowing for multiple causal variantsPLoS Genet 17https://doi.org/10.1371/journal.pgen.1009440
13.
1. Giambartolomei C
2. Zhenli Liu J
3. Zhang W
4. et al.
2018A Bayesian framework for multiple trait colocalization from summary association statisticsBioinformatics 34:2538–45https://doi.org/10.1093/bioinformatics/bty147
14.
1. Foley CN
2. Staley JR
3. Breen PG
4. et al.
2021A fast and efficient colocalization algorithm for identifying shared genetic risk factors across multiple traitsNat Commun 12https://doi.org/10.1038/s41467-020-20885-8
15.
1. Weintraub D
2. Mamikonyan E
2019The Neuropsychiatry of Parkinson Disease: A Perfect StormAm J Geriatr Psychiatry 27:998–1018https://doi.org/10.1016/j.jagp.2019.03.002
16.
1. Ferrari R
2. Wang Y
3. Vandrovcova J
4. et al.
2017Genetic architecture of sporadic frontotemporal dementia and overlap with Alzheimer’s and Parkinson’s diseasesJ Neurol Neurosurg Psychiatry 88:152–64https://doi.org/10.1136/jnnp-2016-314411
17.
1. Beck J
2. Poulter M
3. Hensman D
4. et al.
2013Large C9orf72 hexanucleotide repeat expansions are seen in multiple neurodegenerative syndromes and are more frequent than expected in the UK populationAm J Hum Genet 92:345–53https://doi.org/10.1016/j.ajhg.2013.01.011
18.
1. Wainberg M
2. Andrews SJ
3. Tripathy SJ
2023Shared genetic risk loci between Alzheimer’s disease and related dementias, Parkinson’s disease, and amyotrophic lateral sclerosisAlzheimers Res Ther 15https://doi.org/10.1186/s13195-023-01244-3
19.
1. McLaughlin RL
2. Schijven D
3. van Rheenen W
4. et al.
2017Genetic correlation between amyotrophic lateral sclerosis and schizophreniaNat Commun 8https://doi.org/10.1038/ncomms14774
20.
1. Kunkle BW
2. Grenier-Boley B
3. Sims R
4. et al.
2019Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processingNat Genet 51:414–30https://doi.org/10.1038/s41588-019-0358-2
21.
1. Ferrari R
2. Hernandez DG
3. Nalls MA
4. et al.
2014Frontotemporal dementia and its subtypes: a genome-wide association studyLancet Neurol 13:686–99https://doi.org/10.1016/S1474-4422(14)70065-1
22.
1. Nalls MA
2. Blauwendraat C
3. Vallerga CL
4. et al.
2019Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studiesLancet Neurol 18:1091–102https://doi.org/10.1016/S1474-4422(19)30320-5
23.
1. Trubetskoy V
2. Pardiñas AF
3. Qi T
4. et al.
2022Mapping genomic loci implicates genes and synaptic biology in schizophreniaNature :604–7906https://doi.org/10.1038/s41586-022-04434-5
24.
1. Pain O
2. Glanville KP
3. Hagenaars SP
4. et al.
2021Evaluation of polygenic prediction methodology within a reference-standardized frameworkPLoS Genet 17https://doi.org/10.1371/journal.pgen.1009021
25.
1. Auton A
2. Abecasis GR
3. Altshuler DM
4. et al.
2015A global reference for human genetic variationNature :526–7571https://doi.org/10.1038/nature15393
26.
1. Privé F
2. Aschard H
3. Ziyatdinov A
4. Blum MGB
2018Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnprBioinformatics 34:2781–7https://doi.org/10.1093/bioinformatics/bty185
27.
1. Grotzinger AD
2. Jdl Fuente
3. Privé F
4. Nivard MG
5. Tucker-Drob EM
2023Pervasive Downward Bias in Estimates of Liability-Scale Heritability in Genome-wide Association Study Meta-analysis: A Simple SolutionBiol Psychiatry https://doi.org/10.1016/j.biopsych.2022.05.029
28.
1. Altshuler DM
2. Gibbs RA
3. Peltonen L
4. et al.
2010Integrating common and rare genetic variation in diverse human populationsNature :467–7311https://doi.org/10.1038/nature09298
29.
1. Wang G
2. Sarkar A
3. Carbonetto P
4. Stephens M
2020A simple new approach to variable selection in regression, with application to genetic fine mappingJournal of the Royal Statistical Society Series B: Statistical Methodology 82:1273–300https://doi.org/10.1111/rssb.12388
30.
1. Giambartolomei C
2. Vukcevic D
3. Schadt EE
4. et al.
2014Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary StatisticsPLoS Genet 10https://doi.org/10.1371/journal.pgen.1004383
31.
1. R Core Team. R: A language and environment for statistical computing
2021R Foundation for Statistical Computing, Vienna, Austria
32.
1. Purcell S
2. Neale B
3. Todd-Brown K
4. et al.
2007PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage AnalysesAm J Hum Genet 81:559–75https://doi.org/10.1086/519795
33.
1. Purcell S.
2. PLINK. 1.9.0
Purcell S. PLINK. 1.9.0. Available from: http://pngu.mgh.harvard.edu/purcell/plink/
34.
1. Durinck S
2. Moreau Y
3. Kasprzyk A
4. et al.
2005BioMart and Bioconductor: a powerful link between biological databases and microarray data analysisBioinformatics 21:3439–40https://doi.org/10.1093/bioinformatics/bti525
35.
1. Durinck S
2. Spellman PT
3. Birney E
4. Huber W
2009Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRtNat Protoc 4:1184–91https://doi.org/10.1038/nprot.2009.97
36.
1. Cunningham F
2. Allen JE
3. Allen J
4. et al.
2022Ensembl 2022Nucleic Acids Res 50:D988–D95https://doi.org/10.1093/nar/gkab1049
37.
1. Chêne G
2. Beiser A
3. Au R
4. et al.
2015Gender and incidence of dementia in the Framingham Heart Study from mid-adult lifeAlzheimers Dement 11:310–20https://doi.org/10.1016/j.jalz.2013.10.005
38.
1. Alonso A
2. Logroscino G
3. Jick SS
4. Hernán MA
2009Incidence and lifetime risk of motor neuron disease in the United Kingdom: a population-based studyEuropean Journal of Neurology 16:745–51https://doi.org/10.1111/j.1468-1331.2009.02586.x
39.
1. Johnston CA
2. Stanton BR
3. Turner MR
4. et al.
2006Amyotrophic lateral sclerosis in an urban setting: A population based study of inner city LondonJ Neurol 253:1642–3https://doi.org/10.1007/s00415-006-0195-y
40.
1. Coyle-Gilchrist ITS
2. Dick KM
3. Patterson K
4. et al.
2016Prevalence, characteristics, and survival of frontotemporal lobar degeneration syndromesNeurology 86https://doi.org/10.1212/WNL.0000000000002638
41.
1. Parkinson’s UK. The Incidence and Prevalence of Parkinson’s in the UK: Results from the Clinical Practice Research Datalink Reference Report
2017Parkinson’s UK. The Incidence and Prevalence of Parkinson’s in the UK: Results from the Clinical Practice Research Datalink Reference Report. 2017. Available from: https://www.parkinsons.org.uk/professionals/resources/incidence-and-prevalence-parkinsons-uk-report
42.
1. Saha S
2. Chant D
3. Welham J
4. McGrath J
2005A Systematic Review of the Prevalence of SchizophreniaPLoS Med 2https://doi.org/10.1371/journal.pmed.0020141
43.
1. Genome Reference Consortium. Human Genome Region MHC
Genome Reference Consortium. Human Genome Region MHC [cited 2023 7th March]. Available from: https://www.ncbi.nlm.nih.gov/grc/human/regions/MHC?asm=GRCh37.
44.
1. Allen M
2. Kachadoorian M
3. Quicksall Z
4. et al.
2014Association of MAPT haplotypes with Alzheimer’s disease risk and MAPT brain gene expression levelsAlzheimers Res Ther 6https://doi.org/10.1186/alzrt268
45.
1. Snowden JS
2. Adams J
3. Harris J
4. et al.
2015Distinct clinical and pathological phenotypes in frontotemporal dementia associated with MAPT, PGRN and C9orf72 mutationsAmyotroph Lateral Scler Frontotemporal Degener 16:497–505https://doi.org/10.3109/21678421.2015.1074700
46.
1. Origone P
2. Geroldi A
3. Lamp M
4. et al.
2018Role of MAPT in Pure Motor Neuron Disease: Report of a Recurrent Mutation in Italian PatientsNeurodegener Dis 18:310–4https://doi.org/10.1159/000497820
47.
1. Nakayama S
2. Shimonaka S
3. Elahi M
4. et al.
2019Tau aggregation and seeding analyses of two novel MAPT variants found in patients with motor neuron disease and progressive parkinsonismNeurobiol Aging https://doi.org/10.1016/j.neurobiolaging.2019.02.016
48.
1. Cheng WW
2. Zhu Q
3. Zhang HY
2020Identifying Risk Genes and Interpreting Pathogenesis for Parkinson’s Disease by a Multiomics AnalysisGenes (Basel 11https://doi.org/10.3390/genes11091100
49.
1. Bigdeli TB
2. Fanous AH
3. Li Y
4. et al.
2020Genome-Wide Association Studies of Schizophrenia and Bipolar Disorder in a Diverse Cohort of US VeteransSchizophr Bull 47:517–29https://doi.org/10.1093/schbul/sbaa133
50.
1. Wallace C
2020Eliciting priors and relaxing the single causal variant assumption in colocalisation analysesPLoS Genet 16https://doi.org/10.1371/journal.pgen.1008720
51.
1. Watanabe K
2. Stringer S
3. Frei O
4. et al.
2019A global overview of pleiotropy and genetic architecture in complex traitsNat Genet 51:1339–48https://doi.org/10.1038/s41588-019-0481-0
52.
1. Dendrou CA
2. Petersen J
3. Rossjohn J
4. Fugger L
2018HLA variation and diseaseNature Reviews Immunology 18:325–39https://doi.org/10.1038/nri.2017.143
53.
1. Trowsdale J
2. Knight JC
2013Major Histocompatibility Complex Genomics and Human DiseaseAnnu Rev Genom Hum 14:301–23https://doi.org/10.1146/annurev-genom-091212-153455
54.
1. Wang Z-X
2. Wan Q
3. Xing A
2020HLA in Alzheimer’s Disease: Genetic Association and Possible Pathogenic RolesNeuromolecular Med 22:464–73https://doi.org/10.1007/s12017-020-08612-4
55.
1. Song S
2. Miranda CJ
3. Braun L
4. et al.
2016Major histocompatibility complex class I molecules protect motor neurons from astrocyte-induced toxicity in amyotrophic lateral sclerosisNat Med 22:397–403https://doi.org/10.1038/nm.4052
56.
1. Yu E
2. Ambati A
3. Andersen MS
4. et al.
2021Fine mapping of the HLA locus in Parkinson’s disease in EuropeansNPJ Parkinson’s Dis 7https://doi.org/10.1038/s41531-021-00231-5
57.
1. Broce I
2. Karch CM
3. Wen N
4. et al.
2018Immune-related genetic enrichment in frontotemporal dementia: An analysis of genome-wide association studiesPLoS Med 15https://doi.org/10.1371/journal.pmed.1002487
58.
1. Ferrari R
2. Forabosco P
3. Vandrovcova J
4. et al.
2016Frontotemporal dementia: insights into the biological underpinnings of disease through gene co-expression network analysisMol Neurodegener 11https://doi.org/10.1186/s13024-016-0085-4
59.
1. Al-Diwani AAJ
2. Pollak TA
3. Irani SR
4. Lennox BR
2017Psychosis: an autoimmune disease?Immunology 152:388–401https://doi.org/10.1111/imm.12795
60.
1. Mokhtari R
2. Lachman HM
2016The Major Histocompatibility Complex (MHC) in Schizophrenia: A ReviewJ Clin Cell Immunol 7https://doi.org/10.4172/2155-9899.1000479
61.
1. Aliseychik MP
2. Andreeva TV
3. Rogaev EI
2018Immunogenetic Factors of Neurodegenerative Diseases: The Role of HLA Class IIBiochemistry (Moscow 83:1104–16https://doi.org/10.1134/S0006297918090122
62.
1. Zhang X
2. Zou M
3. Wu Y
4. et al.
2022Regulation of the Late Onset Alzheimer’s Disease Associated HLA-DQA1/DRB1 ExpressionAm J Alzheimers Dis Other Demen 37https://doi.org/10.1177/15333175221085066
63.
1. Chesmore K
2. Bartlett J
3. Williams SM
2018The ubiquity of pleiotropy in human diseaseHum Genet 137:39–44https://doi.org/10.1007/s00439-017-1854-z
64.
1. Sollis E
2. Mosaku A
3. Abid A
4. et al.
2022The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resourceNucleic Acids Res 51:D977–D85https://doi.org/10.1093/nar/gkac1010
65.
1. Andrés-Benito P
2. Moreno J
3. Aso E
4. Povedano M
5. Ferrer I
2017Amyotrophic lateral sclerosis, gene deregulation in the anterior horn of the spinal cord and frontal cortex area 8: implications in frontotemporal lobar degenerationAging 9:823–51https://doi.org/10.18632/aging.101195
66.
1. Hopperton KE
2. Mohammad D
3. Trépanier MO
4. Giuliano V
5. Bazinet RP
2018Markers of microglia in post-mortem brain samples from patients with Alzheimer’s disease: a systematic reviewMol Psychiatry 23:177–98https://doi.org/10.1038/mp.2017.246
67.
1. Pain O
2. Jones A
3. Khleifat AA
4. et al.
2023Harnessing Transcriptomic Signals for Amyotrophic Lateral Sclerosis to Identify Novel Drugs and Enhance Risk PredictionmedRxiv https://doi.org/10.1101/2023.01.18.23284589
68.
1. King’s College London
2022King’s Computational Research, Engineering and Technology Environment (CREATE)Available from https://doi.org/10.18742/rnvf-m076

Article and author information

Author information

Thomas P Spargo
Maurice Wohl Clinical Neuroscience Institute, King’s College London, Department of Basic and Clinical Neuroscience, London, UK, Department of Biostatistics and Health Informatics, King’s College London, London, UK, NIHR Maudsley Biomedical Research Centre (BRC) at South London and Maudsley NHS Foundation Trust and King’s College London, London, UK
ORCID iD: 0000-0003-4297-6418
- Correspondence should be addressed to alfredo.iacoangeli@kcl.ac.uk and thomas.spargo@kcl.ac.uk
Lachlan Gilchrist
Maurice Wohl Clinical Neuroscience Institute, King’s College London, Department of Basic and Clinical Neuroscience, London, UK, Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King’s College London, London, UK, Perron Institute for Neurological and Translational Science, Nedlands, WA 6009, Australia
Guy P Hunt
Department of Biostatistics and Health Informatics, King’s College London, London, UK, Perron Institute for Neurological and Translational Science, Nedlands, WA 6009, Australia, Centre for Molecular Medicine and Innovative Therapeutics, Murdoch University, Murdoch, WA 6150, Australia
Richard JB Dobson
Department of Biostatistics and Health Informatics, King’s College London, London, UK, NIHR Maudsley Biomedical Research Centre (BRC) at South London and Maudsley NHS Foundation Trust and King’s College London, London, UK, Institute of Health Informatics, University College London, London, United Kingdom, NIHR Biomedical Research Centre at University College London Hospitals NHS Foundation Trust, London, UK
Petroula Proitsi
Maurice Wohl Clinical Neuroscience Institute, King’s College London, Department of Basic and Clinical Neuroscience, London, UK
ORCID iD: 0000-0002-2553-6974
Ammar Al-Chalabi
Maurice Wohl Clinical Neuroscience Institute, King’s College London, Department of Basic and Clinical Neuroscience, London, UK, King’s College Hospital, Bessemer Road, London, SE5 9RS, UK
ORCID iD: 0000-0002-4924-7712
Oliver Pain
Maurice Wohl Clinical Neuroscience Institute, King’s College London, Department of Basic and Clinical Neuroscience, London, UK
ORCID iD: 0000-0001-5680-3281
- co-senior authorss
Alfredo Iacoangeli
Maurice Wohl Clinical Neuroscience Institute, King’s College London, Department of Basic and Clinical Neuroscience, London, UK, Department of Biostatistics and Health Informatics, King’s College London, London, UK, NIHR Maudsley Biomedical Research Centre (BRC) at South London and Maudsley NHS Foundation Trust and King’s College London, London, UK
- Correspondence should be addressed to alfredo.iacoangeli@kcl.ac.uk and thomas.spargo@kcl.ac.uk
- co-senior authorss

Version history

Preprint posted: March 30, 2023
Sent for peer review: April 28, 2023
Reviewed Preprint version 1: September 12, 2023
Reviewed Preprint version 2: July 29, 2024

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.

Reviewing Editor
Joris Deelen
Max Planck Institute for Biology of Ageing, Cologne, Germany
Senior Editor
Timothy Behrens
University of Oxford, Oxford, United Kingdom

Reviewer #1 (Public Review):

The authors investigate pleiotropy in the genetic loci previously associated to a range of neuropsychiatric disorders: Alzheimer's disease, amyotrophic lateral sclerosis (ALS), frontotemporal dementia, Parkinson's disease, and schizophrenia. The local statistical fine-mapping and variant colocalisation approaches they use have the potential to uncover not only shared loci but also shared causal variants between these disorders. There is existing literature describing the pleiotropy between ALS and these other disorders but here the authors apply state-of-the-art, local genetic correlation approaches to further refine any relationships.

Complex disease and GWAS is not my area of expertise but the authors managed to present their methods and results in a clear, easy-to-follow manner. Their results statistically support several correlations between the disorders and, for ALS and AD, a shared variant in the vicinity of the lead SNP from the original ALS GWAS. Such findings could have important implications for our understanding of the mechanisms of such disorders and eventually the possibility of managing and treating them.

The authors have built a useful pipeline that plugs together all the gold-standard, existing software to perform this analysis and made it openly available which is commendable. However, there is little discussion of what software is available to perform global and local correlation analysis and, if there are multiple tools available, why they consider the ones they selected to be the gold-standard.

There is some mention of previous findings of genetic pleiotropy between ALS and these other disorders in the introduction, and discussion of their improved ALS-AD evidence relative to previous work. However, detailed comparisons of their other correlations to what was described before for the same pairs of disorders (if any) is missing. Adding this would strengthen the impact of this paper.

Finally, being new to this approach I found the abstract a little confusing. Initially, the shared causal variant between ALS and AD is mentioned but immediately in the following sentence they describe how their study "suggested that disease- implicated variants in these loci often differ between traits". After reading the whole paper I understood that the ALS-AD shared variant was the exception but it may be best to restructure this part of the abstract. Additionally, in the abstract the authors state that different variants "suggests the role of distinct mechanisms across diseases despite shared loci". Is it not possible that different variants in the same regulatory region or protein-coding parts of a gene could be having the same effect and mechanism? Or does the methodology to establish that different variants are involved automatically mean that the variants are too distant for this to be possible?

These concerns were addressed in the revised version of this manuscript.

https://doi.org/10.7554/eLife.88768.2.sa2

Reviewer #2 (Public Review):

Summary:

Spargo and colleagues present an analysis of the shared genetic architectures of Schizoprehnia and several late-onset neurological disorders. In contrast to many polygenic traits for which global genetic correlation estimates are substantial, global genetic correlation estimates for neurological conditions are relatively small, likely for several reasons. One is that assortative mating, which will spuriously inflate genetic correlation estimates, is likely to be less salient for late-onset conditions. Another, which the authors explore in the current manuscript, is that some loci affecting two or more conditions (i.e., pleiotropic loci) may have effects in opposite directions, or shared loci are sparse, such that the global genetic correlation signal washes out.

The authors apply a local genetic correlation approach that assesses the presence and direction of pleiotropy in much smaller spatial windows across the genome. Then, within regions evidencing local genetic correlations for a given trait pair, they apply fine-mapping and colocalization methods to attempt to differentiate between two scenarios: that the two traits share the same causal variant in the region or that distinct loci within the region influence the traits. Interestingly, the authors only discover one instance of the former: an SNP in the HLA region appearing to confer risk for both AD and ALS. This is in contrast to six regions with distinct causal loci, and twenty regions with no clear shared loci.

Finally, the authors have published their analysis pipeline such that other researchers might easily apply the same techniques to other collections of traits.

Strengths:
- All such analysis pipelines involve many decision points where there is often no clear correct option. Nonetheless, the authors clearly present their reasoning behind each such decision.
- The authors have published their analytic pipeline such that future researchers might easily replicate and extend their findings.

Weaknesses:
- The majority of regions display no clear candidate causal variants for the traits, whether shared or distinct. Further, despite the potential of local genetic correlation analysis to identify regions with effects in opposing directions, all of the regions for causal variants were identified for both traits evidenced positive correlations. The reasons for this aren't clear and the authors would do well to explore this in greater detail.
- The authors very briefly discuss how their findings differ from previous analyses because of their strict inclusion for "high-quality" variants. This might be the case, but the authors do not attempt to demonstrate this via simulation or otherwise, making it difficult to evaluate their explanation.

These concerns were addressed in the revised version of this manuscript.

https://doi.org/10.7554/eLife.88768.2.sa1

Author response:

The following is the authors’ response to the original reviews.

Public Reviews:

Reviewer #1 (Public Review):

The authors investigate pleiotropy in the genetic loci previously associated to a range of neuropsychiatric disorders: Alzheimer's disease, amyotrophic lateral sclerosis (ALS), frontotemporal dementia, Parkinson's disease, and schizophrenia. The local statistical fine-mapping and variant colocalisation approaches they use have the potential to uncover not only shared loci but also shared causal variants between these disorders. There is existing literature describing the pleiotropy between ALS and these other disorders but here the authors apply state of the art, local genetic correlation approaches to further refine any relationships.

Complex disease and GWAS is not my area of expertise but the authors managed to present their methods and results in a clear, easy to follow manner. Their results statistically support several correlations between the disorders and, for ALS and AD, a shared variant in the vicinity of the lead SNP from the original ALS GWAS. Such findings could have important implications for our understanding of the mechanisms of such disorders and eventually the possibility of managing and treating them.

The authors have built a useful pipeline that plugs together all the gold-standard, existing software to perform this analysis and made it openly available which is commendable. However, there is little discussion of what software is available to perform global and local correlation analysis and, if there are multiple tools available, why they consider the ones they selected to be the gold-standard.

There is some mention of previous findings of genetic pleiotropy between ALS and these other disorders in the introduction, and discussion of their improved ALS-AD evidence relative to previous work. However, detailed comparisons of their other correlations to what was described before for the same pairs of disorders (if any) is missing. Adding this would strengthen the impact of this paper.

Finally, being new to this approach I found the abstract a little confusing. Initially, the shared causal variant between ALS and AD is mentioned but immediately in the following sentence they describe how their study "suggested that disease- implicated variants in these loci often differ between traits". After reading the whole paper I understood that the ALS-AD shared variant was the exception but it may be best to restructure this part of the abstract. Additionally, in the abstract the authors state that different variants "suggests the role of distinct mechanisms across diseases despite shared loci". Is it not possible that different variants in the same regulatory region or protein-coding parts of a gene could be having the same effect and mechanism? Or does the methodology to establish that different variants are involved automatically mean that the variants are too distant for this to be possible?

We thank reviewer one for their considered review of this manuscript and for highlighting points that would benefit from further exploration. Itemised responses are provided below.

(1) The reviewer noted that we did not adequately explain our choice of software for global and local genetic correlation analysis, and why we consider the techniques chosen as gold standard. We agree that the paper would benefit from clarification around this aspect of the study.

Briefly, we firstly selected LAVA for the local genetic correlation analysis because it offers several advantages above competing software and was developed by a reputable team previously known for developing MAGMA, which is well-established in the statistical genetics field. In the manuscript (page 8), we added the following clarification: “LAVA was the most appropriate local genetic correlation approach for this study for several reasons. First, unlike SUPERGNOVA and rho-HESS, LAVA makes specific accommodations for analysis of binary traits. Second, other tools focus on bivariate correlation between traits whilst LAVA offers this alongside multivariate tests such as multiple regression and partial correlation, enabling rigorous testing of pleiotropic effects. Lastly, LAVA is shown to provide results which are less biased than those from other tools.”

LDSC was selected for the global genetic correlation analysis because the software is well-established and likely the most widely adopted global genetic correlation tool. Reflecting its prevalence, the software is also compatible with LAVA, which adjusts for sample overlap based on the bivariate intercept estimate returned by LDSC. Since global genetic correlations were not the primary focus of this study, having been tested across several previous investigations (see response 2), we did not prioritise comparison of correlation estimates from LDSC against other available software. In the manuscript (pages 7-8) we now include the following statement: “[LDSC] was also applied to derive ‘global’ (i.e., genome-wide) genetic correlation estimates between trait pairs and estimate sample overlap from the bivariate intercept. The latter of these outputs was taken forward as an input for the local genetic correlation analysis using LAVA (see 2.2.2.2). Since global genetic correlation analysis across the traits studied here is not novel and associations reported in past studies are congruent across different tools, the compatibility between LDSC and LAVA motivated our use of LDSC for this analysis”.

(2) The second comment was that the paper would be strengthened by contextualising our study with detail around what is previously known about associations between the studied traits. Accordingly, we have added clarifying text at the end of the introduction, stating: “although previous studies have performed global genetic correlation analyses between various combinations of these traits {references}, this is the first to compare them at a genome-wide scale using a local genetic correlation approach“. In the discussion, we link back to these studies, stating that “Through genetic correlation analysis, we replicated genome-wide correlations previously described between the studied traits {references}”.

(3) The reviewer highlighted that the abstract as originally written may mislead or confuse the reader and we agree that clarity could be improved with some restructuring. This has now been revised and should read more logically.

(4) They also enquired about our reasons for suggesting that the implication of distinct variants for each trait from a colocalisation analysis suggests a distinct causal mechanism. We thank them for this question as it encouraged us to reconsider how best to present the results of this analysis. To answer their question:

It is certainly true that nearby but distinct variants can confer the same effect. In a scenario where multiple distinct variants result in the same effect and thus increase susceptibility towards two or more related phenotypes, you would expect to find evidence of association to each relevant variant in GWAS across these related traits (even if the magnitude of the associations differ). Where biological mechanisms are shared, post-GWAS finemapping analysis would be expected to yield credible sets overlapping across the traits, and likewise, colocalisation analysis should converge on a set of credible SNPs that are candidates for the shared effect. Where multiple distinct variants confer the same effect, you would expect to see separate fine-mapping credible sets for these distinct variants that colocalise pairwise between the jointly-affected traits. Generally, therefore, evidence supporting the two distinct variants hypothesis would suggest the role of two distinct mechanisms except when certain credible sets identified through fine-mapping converge on a colocalised effect.

There is a further caveat which we also explored in response to Reviewer two: if a region includes long-spanning LD (and hence a larger number of variants are considered in the analysis), then the colocalisation analysis is more likely to favour the two distinct variants hypothesis since the probability of the variants implicated in both traits being shared decreases. It is likely that support for the two independent variants hypothesis is correct in most of the comparisons from this study that favour this conclusion. This is because, generally, the fine-mapping credible sets do not overlap across trait pairs (Figure S4) and consequently the colocalisation analysis does not find any support for the shared variant hypothesis. An exception is the analysis of PD and schizophrenia at the MAPT locus on chromosome 17. We have accordingly added the following clarification to the (page 18): “However, the colocalisation analysis will increasingly favour the two independent variants hypothesis as the number of analysed variants increases. Hence, the wide-spanning LD of this region may have obstructed identification of variants and mechanisms shared between the traits.”

Reviewer #2 (Public Review):

Summary:

Spargo and colleagues present an analysis of the shared genetic architectures of Schizoprehnia and several late-onset neurological disorders. In contrast to many polygenic traits for which global genetic correlation estimates are substantial, global genetic correlation estimates for neurological conditions are relatively small, likely for several reasons. One is that assortative mating, which will spuriously inflate genetic correlation estimates, is likely to be less salient for late-onset conditions. Another, which the authors explore in the current manuscript, is that some loci affecting two or more conditions (i.e., pleiotropic loci) may have effects in opposite directions, or shared loci are sparse, such that the global genetic correlation signal washes out.

The authors apply a local genetic correlation approach that assesses the presence and direction of pleiotropy in much smaller spatial windows across the genome. Then, within regions evidencing local genetic correlations for a given trait pair, they apply fine-mapping and colocalization methods to attempt to differentiate between two scenarios: that the two traits share the same causal variant in the region or that distinct loci within the region influence the traits. Interestingly, the authors only discover one instance of the former: an SNP in the HLA region appearing to confer risk for both AD and ALS. This is in contrast to six regions with distinct causal loci, and twenty regions with no clear shared loci.

Finally, the authors have published their analysis pipeline such that other researchers might easily apply the same techniques to other collections of traits.

Strengths:

- All such analysis pipelines involve many decision points where there is often no clear correct option. Nonetheless, the authors clearly present their reasoning behind each such decision.
- The authors have published their analytic pipeline such that future researchers might easily replicate and extend their findings.

Weaknesses:

- The majority of regions display no clear candidate causal variants for the traits, whether shared or distinct. Further, despite the potential of local genetic correlation analysis to identify regions with effects in opposing directions, all of the regions for causal variants were identified for both traits evidenced positive correlations. The reasons for this aren't clear and the authors would do well to explore this in greater detail.

- The authors very briefly discuss how their findings differ from previous analyses because of their strict inclusion for "high-quality" variants. This might be the case, but the authors do not attempt to demonstrate this via simulation or otherwise, making it difficult to evaluate their explanation.

We thank Reviewer two for their appraisal of this manuscript and kind comments regarding its strengths. We will now aim to address the identified weaknesses.

(1) The reviewer comments that we did not adequately investigate why loci with causal variants identified in both traits all had positive local genetic correlations. We agree that it would be helpful to better understand the underlying reasons. To address this issue, we have added a new supplementary figure to compare the positive and negative local genetic correlation results (see Figure S2). In the main-text we add the following clarification. ”Although both positive and negative local genetic correlations passed the FDR-adjusted significance threshold, we observed only positive local genetic correlations in loci where fine-mapping credible sets were identified for both traits in the pair. This reflects that the correlation coefficients and variant associations from the analysed GWAS studies were generally stronger in the positively correlated loci (see Figure S2).”

(2) The reviewer rightly suggests that the manuscript would benefit from an improved explanation of the somewhat inconsistent results for the colocalisation analysis of ALS and AD at the locus around the rs9275477 SNP from this work and a previous study. We have now further investigated this and believe that the discrepancy results partly from an inherent empirical characteristic of the colocalisation analysis. We have explained this in the manuscript (page 22) as follows: “The previous study analysed a 200Kb window of over 2,000 SNPs around the lead genome-wide significant SNP from the ALS GWAS, rs9275477, and found ~0.50 posterior probability for each of the shared and two independent variant(s) hypotheses. The current analysis used 475 SNPs occurring within a semi-independent LD block of ~50kb in this locus. Since the posterior probability of the two independent variants hypothesis (H3) increases exponentially with the number of variants in the region whilst the shared variant hypothesis (H4) scales linearly, it is expected that our analysis would give stronger support for the latter. Given that the previous study defined regions for analysis based on an arbitrary window of ±100kb around each lead genome-wide significant SNP from the ALS GWAS and we defined each analysis region based on patterns of LD in European ancestry populations, it is reasonable to favour the current finding.”

https://doi.org/10.7554/eLife.88768.2.sa0

Significance of findings

Strength of evidence

Abstract

1. Introduction

2. Methods

2.1. Sampled GWAS summary statistics

2.2. Procedure

Overview of the analysis procedure for this study

2.2.1. Processing of GWAS summary statistics

2.2.2. Genome-wide analyses

2.2.2.1. Global heritability and genetic correlations

2.2.2.2. Local genetic correlation analysis

2.2.3. Targeted genetic analyses

2.2.3.1. Fine-mapping and colocalisation analysis

Analysis pipeline

Current implementation

3. Results

3.1. Genome-wide analyses

Genome-wide association studies (GWAS) sampled

Comparison of genome-wide SNP significance against local genetic correlation significance thresholds in all trait pairs and loci analysed

Local genetic correlation analyses between trait pairs

3.2. Targeted genetic analyses

Colocalisation analysis conducted across 95% credible sets identified during univariate fine-mapping of trait pairs

Evidence for colocalisation between amyotrophic lateral sclerosis (ALS) and Alzheimer’s disease (AD) in the Chr6:32.63-32.68Mb region

4. Discussion

Data Availability

Supplementary materials

Funding

Acknowledgements

References

Article and author information

Author information

Thomas P Spargo

Lachlan Gilchrist

Guy P Hunt

Richard JB Dobson

Petroula Proitsi

Ammar Al-Chalabi

Oliver Pain$

Alfredo Iacoangeli$

Version history

Copyright

Peer review process

Editors

Oliver Pain

Alfredo Iacoangeli