Figures and data

An overview of the gene based approach for detecting genetic overlap between complex traits.
Starting from a pair of GWASs (GWAS_1 and GWAS_2) representing two complex traits, this approach first transforms the GWAS data into gene-phenotype association profiles (vectors U1, U2), by assigning a p value for each gene using Sherlock-II (through integration of GWAS and eQTL data). The gene-phenotype p values are then translated into unsigned z scores (vectors V1, V2), and a gene based genetic overlap score Sg(V1, V2) is calculated as a permutation normalized distance between V1 and V2. To assess the statistical significance of the score Sg, an ensemble of random GWASs with the same power as GWAS_2 is generated (GWAS_2R), and the score Sg(V1, V2R) between GWAS_1 and GWAS_2R is calculated in parallel (see Methods). The distribution of Sg(V1, V2R) is then used to normalized the score Sg(V1, V2), leading to a final genetic overlap z score ZS(1,2), from which a p value can be calculated.

Genetic overlap between pairs of traits detected by the gene-based approach.
Shown are top hits to 12 selected traits (query GWAS, first column). Each row listed the top 6 hit GWASs that have ZS < -4 with the query GWAS (pvalue < 3.17e-5), ranked by their p-values (uncorrected for multiple test) as indicated by the color scale. Pairs detected in a previous analysis using LDSC with the same p value cutoff (p=3.17e-5, uncorrected for multiple test) are colored in cyan, and the examples of unexpected relationships discussed in the text are colored in red. Different numbers associated with the same trait name correspond to different GWAS data (pubmed ID) for the trait: Alzheimer 1(PMID 17998437), Alzheimer 2(PMID 21460841), Alzheimer 3(PMID 24162737), Alzheimer 4(PMID 25188341), Breast Cancer 1(PMID 17903305), Breast Cancer 2(PMID 29059683), Chronic Kidney Disease 1(PMID 20383146), Diabetes 1(PMID 18372903), Diabetes 2(PMID 22885922), Neuroticism 2(PMID 27089181), Osteoporosis 1(PMID 22805710), Schizophrenia 1(PMID 25056061), (see Dataset S3 for the full mapping from a trait name to the GWAS data).

Partial Pearson Correlation Analysis (PPCA) revealed GO terms and KEGG pathways supporting the genetic overlap between traits.
Shown are GO terms with PPCA-zscore >5 and KEGG pathways with PPCA-zscore >4, up to top 5, for Crohn’s Disease vs. Rheumatoid Arthritis, AD vs. Breast Cancer, and Longevity vs. Fasting glucose pairs. Length of the bar and the numbers in the bar indicate PPCA-zscore.

Hierarchical clustering of phenotypes based on their pairwise genetic overlap score.
Highlighted branches discussed in the text: 1) GWASs of the same phenotypes (green); 2) phenotypes that are known to share genetic etiology (blue); and 3) phenotypes with no obvious relations but co-occurrences were observed based on previous epidemiological studies (red). The colored subtrees were picked by cutting the tree at a distance of 0.2 (average ZS <-4).