1. Introduction

The genetic spectrum of neuropsychiatric disease is diverse and various overlaps exist between traits. For instance, genetic pleiotropy between amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD) is increasingly recognised, and ALS is genetically correlated with Alzheimer’s disease (AD), Parkinson’s disease (PD), and schizophrenia1-3. Improving understanding of the genetic architecture underlying these complex diseases could facilitate future treatment discovery.

Advances in genomic research techniques have accelerated discovery of genetic variation associated with complex traits. Genome-wide association studies (GWAS), in particular, have enabled population-scale investigations of the genetic basis of human diseases and anthropometric measures4. Summary-level results from GWAS are being shared alongside publications with increasing frequency over time5, and a breadth of approaches now exist for downstream analysis based on summary statistics which can enable their interpretation and provide further biological insight.

Genetic correlation analysis allows estimation of genetic overlap between traits6-9. A ‘global’ genetic correlation approach gives a genome-wide average estimate of this overlap However, genetic relationships between traits can be obscured when correlations in opposing directions cancel out genome-wide8. Recent methods allow for a more nuanced analysis, of ‘local’ genetic correlations partitioned across the genome8,9. This stratified approach to genome-wide analysis could prove effective for identifying pleiotropic regions and designing subsequent analyses aiming to identify genetic variation shared between traits.

A number of methods aim to disentangle causality within associated regions. This is important because the focus on single nucleotide polymorphisms (SNPs), which are markers of genetic variation, in GWAS produces results that can be difficult to interpret, and causal variants are typically unclear. More so, because of linkage disequilibrium (LD), GWAS associations often comprise large sets of highly correlated SNPs spanning large genomic regions. Statistical fine-mapping is a common approach for dissecting complex LD structures and finding variants with implications for a given trait among the tens or hundreds that might be associated in the region10.

Interpretation of regions associated with multiple traits can also be challenging, since it is often unclear whether these overlaps are driven by the same causal variant. Statistical colocalisation analysis can disentangle association signals across traits to suggest whether the overlaps result from shared or distinct causal genetic factors11-13. Traditionally this analysis was restricted by the assumption of at most one causal variant for each trait in the region. However, recent extensions to the method now permit analysis based on univariate fine-mapping results for the traits compared and, therefore, analysis of regions with multiple causal variants.

Accordingly, we conducted genome-wide local genetic correlation analysis across 5 neuropsychiatric traits with recognised phenotypic and genetic overlap2,3,14-16: AD, ALS, FTD, PD, and schizophrenia. Loci highly correlated between trait pairs were further investigated with univariate fine-mapping and bivariate colocalisation techniques to examine variants driving these associations.

2. Methods

2.1. Sampled GWAS summary statistics

We leveraged publicly-accessible summary statistics from European ancestry GWAS meta-analyses of risk for AD17, ALS1, FTD18, PD19, and schizophrenia20. European ancestry data were selected to avoid LD mismatch between the GWAS sample and reference data from an external European population.

2.2. Procedure

Figure 1 summarises the analysis protocol for this study; further details are provided below.

Overview of the analysis procedure for this study

SuSiE (sum of single effects) is a univariate fine-mapping approach implemented within the R package susieR. ‘coloc’ is an R package for bivariate colocalisation analysis between pairs of traits. h2 = Heritability, rg = bivariate genetic correlation. The analysis steps shaded in blue have been implemented within a readily applied analysis pipeline available on GitHub: https://github.com/ThomasPSpargo/COLOC-reporter.

2.2.1. Processing of GWAS summary statistics

A standard data cleaning protocol was applied to each set of summary statistics21. We retained only single nucleotide polymorphisms (SNPs), excluding any non-SNP or strand-ambiguous variants. SNPs were filtered to those present within the 1000 Genomes phase 3 (1KG) European ancestry population reference dataset22 (N = 503). They were matched to the 1KG reference panel by GRCh37 chromosomal position using bigsnpr (version 1.11.6)23, harmonising allele order with the reference and assigning SNP IDs.

If not reported, and where possible, effective sample size (Neff) was calculated from per-SNP case and control sample sizes. When this could not be determined per-SNP, all variants were assigned a single Neff, calculated as a sum of Neff values for each cohort within the GWAS meta-analysis24.

Further processing was performed where possible, excluding SNPs with imputation INFO <0.9, p-values ≤0 or >1, and Neff >3 standard deviations from the median Neff. We filtered to include only variants with minor allele frequency (MAF) ≥0.005 in both the reference and GWAS samples and excluded SNPs with an absolute MAF difference of >0.2 between the two.

2.2.2. Genome-wide analyses

2.2.2.1. Global heritability and genetic correlations

LDSC (version 1.0.1)6,7 was applied to estimate genome-wide univariate heritability (h2) for each trait on the liability scale. The software was also applied to derive ‘global’ (i.e., genome-wide) genetic correlation estimates between trait pairs and estimate sample overlap from the bivariate intercept.

These analyses were performed using the HapMap325 SNPs and the LD score files provided with the software, calculated in the 1KG European population. No further MAF filter was applied (therefore variants with MAF >0.005 were included) and the other settings were left to their defaults.

2.2.2.2. Local genetic correlation analysis

LAVA (version 0.1.0)8 was applied to obtain local genetic correlation estimates across 2495 approximately independent blocks delineating the genome, based on patterns in LD. We used the blocks provided alongside the LAVA software which were derived from the 1KG European cohort. Bivariate intercepts from LDSC were provided to LAVA to estimate sample overlap between trait pairs.

In accordance with prior studies, genetic correlation analysis was performed following an initial filtering step. Univariate heritability was estimated for each genomic block across SNPs in-common between a pair of traits, and only loci with local h2 p-values below a threshold of 2.004×10−5 (0.05/2495) in both traits continued to the bivariate analysis. This step ensures that univariate heritability is sufficient in both traits for a robust correlation estimate.

2.2.3. Targeted genetic analyses

2.2.3.1. Fine-mapping and colocalisation analysis

Statistical fine-mapping and colocalisation techniques were applied to further analyse associations between trait pairs in regions where the false discovery rate (FDR) adjusted p-value of local genetic correlation analysis was below 0.05 (after adjusting for all bivariate comparisons performed). Additional analysis was conducted at loci where significant correlations occurred between two trait pairs but not between the final pairwise comparison across the three implicated traits.

Fine-mapping was performed with susieR (v0.12.27)10,26, which implements the ‘sum of single effects’ (SuSiE) model to represent statistical evidence of causal genetic variation within ‘credible sets’ and per-SNP posterior inclusion probabilities (PIPs). A 95% credible set indicates 95% certainty that at least one SNP included within the set has a causal association with the phenotype and higher PIPs indicate a greater posterior probability of being a causal variant within a credible set. Multiple credible sets are identified when the data suggest more than one independent causal signal.

Colocalisation analysis was implemented with coloc (v5.1.0.1)11,12,27, which calculates posterior probabilities that a causal variant exists for neither, one, or both of two compared traits, testing also whether evidence for a causal variant in both traits suggests a shared variant (i.e., hypothesis 4 (H4); colocalisation) or independent signals (Hypothesis 3 (H3)). Colocalisation analyses can be performed across all variants sampled in a region, under an assumption of at most one variant implicated per trait. It can also be performed using variants attributed to pairs of credible sets from SuSiE, relaxing the single variant assumption11. When evidence of a shared variant is found, the individual SNPs with the highest posterior probability of being that variant can be assessed. With a 95% confidence threshold, these are termed 95% credible SNPs.

Analysis pipeline

We conducted colocalisation and fine-mapping analysis within an open-access pipeline developed for this study using R (v4.2.2)28: https://github.com/ThomasPSpargo/COLOC-reporter.

Briefly, in this workflow (see Figure 1), GWAS summary statistics are harmonised across analysed traits for a specified genomic region, including only variants in common between them and available within a reference population. An LD correlation matrix across sampled variants is derived from a reference population using PLINK (v1.90)29,30.

Quality control is performed per-dataset prior to univariate fine-mapping analysis. Diagnostic tools provided with susieR are applied to test for consistency between the LD matrix and Z-scores from the GWAS and identify variants with a potential ‘allele flip’ (reversed effect estimate encoding) that can impact fine-mapping.

Fine-mapping is performed for each dataset with the coloc package runsusie function, which wraps around susie_rss from susieR and is configured to facilitate subsequent colocalisation analysis. Sample size (Neff for binary traits) is specified as the median for SNPs analysed. Colocalisation analysis can be performed with the coloc functions coloc.abf and coloc.susie when fine-mapping yields at least one credible set for both traits and otherwise using coloc.abf only. Genes located near credible sets from fine-mapping and credible SNPs from colocalisation analyses are identified via Ensembl and biomaRt (v2.54.0)31-33.

Analysis parameters can be adjusted by the user in accordance with their needs. Various utilities are included to help interpretation of fine-mapping and colocalisation results, including identification of genes nearby to putatively causal signals, HTML reports to summarise completed analyses, and figures to visualise the results and compare the examined traits.

Current implementation

In this study, LD correlation matrices were derived from the 1KG European cohort. SNPs flagged for potential allele flip issues in either of the compared traits were removed from the analysis. Fine-mapping was performed with the susie_rss refine=TRUE option to avoid local maxima during convergence of the algorithm, leaving the other settings to the runsusie defaults. Colocalisation analysis was performed using the default priors for coloc.susie (P1=1×10−4, P2=1×10−4, P12=5×10−6).

Colocalisation and fine-mapping analyses were performed initially using the genomic blocks defined by LAVA, since these aim to define relatively independent LD partitions across the genome8. If a 95% credible set could not be identified in one or both traits, we inspected local Manhattan plots for the region to determine whether potentially relevant signals occurred around the region boundaries. The analysis was repeated with a ±10Kb window around the LAVA-defined genomic region if p-values for SNPs at the edge of the block were p<1×10−4 for both traits and the Manhattan plots were suggestive of a ‘peak’ not represented within the original boundaries.

3. Results

3.1. Genome-wide analyses

Descriptive information and heritability estimates for the sampled traits and GWAS are presented in Table 1. ALS had nominally significant global genetic correlations with schizophrenia (p = 0.045), PD (p = 0.013), and AD (p = 0.006); no other bivariate genome-wide correlations were statistically significant (see Figure S1).

Genome-wide association studies (GWAS) sampled

Each GWAS is a GWAS meta-analysis of disease risk across people of European ancestry. *Proxy cases from the UK Biobank cohort. †Estimated from cumulative risk after age 45 after correcting for competing risk of mortality and assuming a lifespan of ∼85 years. h2 = heritability

A total of 605 local genetic correlation analyses were performed across all trait pairs in genomic regions where both traits passed the univariate heritability filtering step after restricting to SNPs sampled in both GWAS (see Table 2; Figure 2; Table S1). The number of loci passing to bivariate analysis varied greatly across trait pairs and was congruent with the genome-wide heritability estimates (and their uncertainty) for each trait, reflecting differences in phenotypic variance explained by measured genetic variants and statistical power for each GWAS (see Table 1).

Comparison of genome-wide SNP significance against local genetic correlation significance thresholds in all trait pairs and loci analysed

All loci analysed showed sufficient local univariate heritability across compared traits to allow bivariate correlation analysis. Subsequent fine-mapping and colocalisation analyses were performed in this study for regions with at least a false discovery rate (FDR) adjusted significance for the local genetic correlation. SNP = single nucleotide polymorphism.

Local genetic correlation analyses between trait pairs

The lower panel displays a heatmap of genetic correlations (rg) across genomic regions where any bivariate analyses were performed; white colouring indicates that the region was not analysed for a given trait pair owing to insufficient univariate heritability in one or both traits. The upper panel shows a Manhattan plot of p-values from each correlation analysis, denoting trait pairs by colour and comparisons passing defined significance thresholds by shape (square for a strict Bonferroni threshold and triangle for a false discovery rate (FDR) adjusted threshold); the hatched line indicates the threshold p-value above which Pfdr <0.05. The panels are both ordered by relative genomic position, with bars above and below indicating each chromosome. AD = Alzheimer’s disease, ALS = amyotrophic lateral sclerosis, FTD = frontotemporal dementia, PD = Parkinson’s disease, SZ = schizophrenia. Table S1 provides a complete summary of local genetic correlation analyses performed.

Twenty-six bivariate comparisons were significant following FDR adjustment (pfdr <0.05), two of which also passed the stringent Bonferroni threshold (p <8.26×10−5; 0.05/605). While some regions included genome-wide significant SNPs (p <5×10−8) for one or both traits, others occurred in regions where GWAS associations were weaker (see Table 2). Five of these associations occurred at loci within the human leukocyte antigen (HLA) region (GRCh37: Chr6:28.48-33.45Mb; 6p22.1-21.340), and all five traits were implicated in at least one of these.

3.2. Targeted genetic analyses

Univariate fine-mapping and bivariate colocalisation analyses were subsequently performed to test for variants jointly implicated between trait pairs in regions with local genetic correlation Pfdr <0.05. The ALS and schizophrenia trait pair was additionally examined at Chr6:32.22-32.45Mb because significant genetic correlations were found between ALS and FTD and between schizophrenia and FTD at this locus. The correlation between ALS and schizophrenia at this locus had not been analysed owing to insufficient univariate heritability for ALS after restricting to SNPs in common with the schizophrenia GWAS.

Fine-mapping identified at least one 95% credible set for each of the compared traits for 7 of the 27 comparisons performed (see Table 3), and for one trait only in a further 5 (see Table S2; Table S3). This analysis suggested two credible sets for schizophrenia in the Chr12:56.99-58.75Mb locus, for AD in Chr6:32.45-32.54Mb, and (only when harmonised to SNPs in common with the ALS GWAS) for FTD in Chr6:32.22-32.45Mb (see Table S3).

Colocalisation analysis conducted across 95% credible sets identified during univariate fine-mapping of trait pairs

N SNPs refers to the number of SNPs present for both traits and the 1000 genomes reference panel in the region within colocalisation and fine-mapping analysis. *Indicates comparisons with genetic correlation analysis p <8.26×10−5 (0.05/605). ΔDenotes locus extended by ±10kb for fine-mapping and colocalisation analysis. Variant identified in colocalisation as having the highest posterior probability of being shared variant assuming hypothesis 4 is true (see Figure 3). §Differences in fine-mapping solutions across trait pairs in the Chr6:32.21-32.45Mb locus reflect differences in the SNPs retained after restricting to those in common between the compared GWASø H0 = no causal variant for either trait, H1 = variant causal for trait 1, H2 = variant causal for trait 2, H3 = distinct causal variants for each trait, H4 = a shared causal variant between traits. PIP = posterior inclusion probability. AD = Alzheimer’s disease, ALS = amyotrophic lateral sclerosis, FTD = frontotemporal dementia, PD = Parkinson’s disease, SZ = schizophrenia.

Colocalisation analyses performed across fine-mapping credible sets and across all SNPs in a region generally gave support to the equivalent hypothesis (Table 3; Table S2). Moreover, comparisons suggesting a signal was present in one trait only were largely concordant with the identification of fine-mapping credible sets in only that trait (Table S2). Figure S2 compares per-SNP p-values across trait pairs for comparisons with evidence of a relevant signal in both traits. Figure S3 shows patterns of LD across SNPs assigned to credible sets for these analyses.

Strong evidence was found for a shared variant between ALS and AD within the HLA region (Posterior probability of shared variant = 0.9; see Figure 3). The 95% credible SNPs for this association were distributed around the MTCO3P1 pseudogene and rs9275477, the lead genome-wide significant SNP from the ALS GWAS in this region, had the highest posterior probability of being implicated in both traits. Figure S4 presents sensitivity analysis showing that the result is robust to a range of values for the shared variant hypothesis prior probability.

Evidence for colocalisation between amyotrophic lateral sclerosis (ALS) and Alzheimer’s disease (AD) in the Chr6:32.63-32.68Mb region

Panel A: SNP-wise p-value distribution between ALS and AD across Chr6:32.63-32.68Mb, in which colocalisation analysis found 0.90 posterior probability of the shared variant hypothesis (see Table 3). Panel B: (upper) Per-SNP posterior probabilities for being a shared variant between ALS and AD, (lower) positions of HGNC gene symbols nearby to the 95% credible SNPs. Posterior probabilities for being a shared variant sum to 1 across all SNPs analysed and are predicated on the assumption that a shared variant exists; 95% credible SNPs are those spanned by the top 0.95 of posterior probabilities. The x-axis for Panel B is truncated by the base pair range of the credible SNPs and genomic positions are based on GRCh37.

The other comparisons that found fine-mapping credible sets in both traits suggested that overlaps from the correlation analysis were driven by distinct causal variants (see Table 3,Table S2).

Univariate fine-mapping of PD and schizophrenia at Chr17:43.46-44.87Mb found large credible sets spanning many genes, including MAPT41-44 and CRHR145,46 which have been previously implicated in the traits we have analysed. These expansive credible sets reflect the strong LD in the region and indicate a signal that is difficult to localise (see Figure S3(F); Table S3). The colocalisation analysis suggested independent variants for each trait despite many SNPs overlapping across their respective credible sets (see Figure S3). Sensitivity analysis showed robust support for the two independent variants hypothesis across shared-variant hypothesis priors (Figure S4).

4. Discussion

We examined genetic overlaps between the neuropsychiatric conditions Alzheimer’s disease, amyotrophic lateral sclerosis, frontotemporal dementia, Parkinson’s disease, and schizophrenia. Associated genomic regions between pairs of traits were identified with local genetic correlation analysis and further analysed with statistical fine-mapping and colocalisation techniques.

Significant correlations were most frequent across genomic blocks within the HLA region, implicating each of the studied traits in at least one comparison. Several associated regions contained genes with known relevance for the traits studied, such as KIF5A, MAPT, and CRHR1. Colocalisation analysis found strong evidence for a shared genetic variant between ALS and AD in the Chr6:32.62-32.68Mb locus within HLA, while the other colocalisation analyses suggested causal signals distinct across traits, for one trait only, or for neither trait.

The tendency for association between traits around the HLA region is reasonable, since this is a known hotspot for pleiotropy8,47. HLA is particularly known for its role in immune response and it is implicated in various types of disease48,49. Mounting evidence has linked HLA and associated genetic variation to the traits we have analysed, and mechanisms underlying these associations are beginning to be understood48-57. For instance, AD is associated with variants around the HLA-DQA1 and HLA-DRB1 genes and several SNPs in the non-coding region between them have been shown to modulate their expression58. Notably, one of the SNPs with a demonstrated regulatory role, rs9271247, had the highest probability of being causal for AD across the 95% credible set identified in the fine-mapping of the region.

Variants showing evidence for colocalisation between AD and ALS were distributed around the MTCO3P1 pseudogene in the HLA class II non-coding region between HLA-DQB1 and HLA-DQB2. MTCO3P1 has been previously identified as one of the most pleiotropic genes in the GWAS catalog59,60. Previous studies have suggested the relevance of this region in both traits. HLA-DQB1 and HLA-DQB2 are both upregulated in the spinal cord of people with ALS, alongside other genes implicated in various immunological processes for antigen processing and inflammatory response61. HLA class II complexes, and their subcomponents, have been identified as upregulated in multiple brain regions of people with AD, using both gene and protein expression techniques57,62.

Our analysis of this region gave stronger support for colocalisation between the ALS and AD GWAS than a previous study. The previous study defined a 100Kb window around the lead genome-wide significant SNP from the ALS GWAS, rs9275477, and found ∼0.50 posterior probability for each of the shared and two independent variant(s) hypotheses1. The difference between these studies reflects differences in the processing of GWAS data; in this study all summary statistics underwent quality control to ensure only high-quality variants were retained.

More broadly, our analyses suggest that regions with a strong genetic correlation between the five traits studied often result from adjacent but trait-specific signals, likely reflecting overlaps between LD blocks47. Correlations also occurred in regions with weaker overall GWAS associations (see Table 2), where fine-mapping and colocalisation analyses did not suggest causal associations in one or either trait. Such patterns likely reflect a shared polygenic trend across the region, rather than associations attributable to discrete variants. Accordingly, other approaches may be better suited for identifying regions containing genetic variation jointly causal across diseases, including the traditional approach of testing regions around overlapping genome-wide significant variants.

This study has used gold-standard statistical tools to examine genetic relationships between traits. The local genetic correlation analysis approach enabled targeted investigation of genomic regions which appear to overlap between traits. The application of colocalisation analysis alongside a prior univariate fine-mapping step allowed for associations to be tested without conflating independent but nearby signals under the single-variant assumption of colocalisation analysis across all variants sampled in a region.

The study is not without limitation. We necessarily used the 1KG European reference population to estimate LD between SNPs. Fine-mapping is ideally performed with an LD matrix from the GWAS sample and is sensitive to misspecification when inconsistencies in LD occur between the reference and GWAS cohorts. Use of a reference population is not uncommon, and diagnostic tools available within the susieR package allow testing for inconsistencies between the reference and GWAS samples10. We accordingly implemented these tools centrally into our workflow and determined that the LD matrices from the 1KG reference were suitable for the data (estimates of Z-score and LD consistency are available in Table S3). Nevertheless, repeating this study in under-represented populations would be an important future step to validate our findings.

We employed statistical methods to identify and analyse genomic regions containing variants which might be jointly implicated across traits. These approaches provide useful associations between traits identified from large-scale genomic datasets. However, they alone are not sufficient for translation into clinical practice. Future studies should aim to extend any associations found by integrating functional and multi-omics datasets to gain mechanistic insights into observed trends and facilitate treatment discovery58,63.

The fine-mapping and colocalisation analysis pipeline we have used is available as an open-access resource on GitHub to facilitate the application of these methods in future studies: https://github.com/ThomasPSpargo/COLOC-reporter. Specified genomic regions can be readily analysed by providing GWAS summary statistics for binary or quantitative traits of interest and a population-appropriate reference dataset for estimation of LD. The pipeline returns resources including detailed reports that overview the analyses performed.

Data Availability

All data are publicly available

Funding

This project was part funded by the MND Association and the Wellcome Trust. This is an EU Joint Programme-Neurodegenerative Disease Research (JPND) project. The project is supported through the following funding organizations under the aegis of JPND– http://www.neurodegenerationresearch.eu/ [United Kingdom, Medical Research Council (MR/L501529/1 and MR/R024804/1) and Economic and Social Research Council (ES/L008238/1)]. AAC is a NIHR Senior Investigator. AAC received salary support from the National Institute for Health Research (NIHR) Dementia Biomedical Research Unit at South London and Maudsley NHS Foundation Trust and King’s College London. The work leading up to this publication was funded by the European Community’s Health Seventh Framework Program (FP7/2007–2013; grant agreement number 259867) and Horizon 2020 Program (H2020-PHC-2014-two-stage; grant agreement number 633413). This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Programme (grant agreement no. 772376–EScORIAL. This study represents independent research part funded by the NIHR Maudsley Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King’s College London. The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR, King’s College London, or the Department of Health and Social Care. Funding was also provided by the King’s College London DRIVE-Health Centre for Doctoral Training and the Perron Institute for Neurological and Translational Science. AI is funded by South London and Maudsley NHS Foundation Trust, MND Scotland, Motor Neurone Disease Association, National Institute for Health and Care Research, Spastic Paraplegia Foundation, Rosetrees Trust, Darby Rimmer MND Foundation, the Medical Research Council (UKRI) and Alzheimer’s Research UK. OP is supported by a Sir Henry Wellcome Postdoctoral Fellowship [222811/Z/21/Z]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

This research was funded in whole or in part by the Wellcome Trust [222811/Z/21/Z]. For the purpose of open access, the author has applied a CC-BY public copyright licence to any author accepted manuscript version arising from this submission.

Acknowledgements

The authors acknowledge the use of the CREATE research computing facility at King’s College London64. We also acknowledge Health Data Research UK, which is funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (United Kingdom), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation and Wellcome Trust.