Human genetic analyses of organelles highlight the nucleus in age-related trait heritability

  1. Rahul Gupta  Is a corresponding author
  2. Konrad J Karczewski
  3. Daniel Howrigan
  4. Benjamin M Neale  Is a corresponding author
  5. Vamsi K Mootha  Is a corresponding author
  1. Howard Hughes Medical Institute and Department of Molecular Biology, Massachusetts General Hospital, United States
  2. Broad Institute of MIT and Harvard, United States
  3. Analytic and Translational Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, United States
5 figures and 6 additional files

Figures

Selection of genetically diverse age-related diseases and traits using epidemiological data.

(A) Period prevalence of age-associated diseases systematically selected for this study (Materials and methods). Epidemiological data obtained from Kuan et al., 2019. (B) Genetic (lower half) and phenotypic (upper half) correlation between the selected age-related traits. All correlations were assessed between UK Biobank phenotypes with the exception of eGFR, Alzheimer’s Disease, and Parkinson’s Disease, for which the respective meta-analyses were used (Materials and methods). Grey ‘o’ in phenotypic correlations indicate phenotypes not tested within UKB for which individual-level data was not available. All data displayed in this panel are available in Figure 1—source data 1. * represents correlations that are significantly different from 0 at a Bonferroni-corrected threshold for p = 0.05 across all tested traits.

Figure 1—source data 1

Genetic and phenotypic correlation point estimates and standard errors.

https://cdn.elifesciences.org/articles/68610/elife-68610-fig1-data1-v3.xlsx
Figure 2 with 9 supplements
Assessment of the association of nucDNA and mtDNA loci contributing to the mitochondrial proteome with age-related traits.

(A) Scheme outlining the aspects of mitochondrial function assessed in this study. nucDNA loci contributing to the mitochondrial proteome are shown in teal, while mtDNA loci are shown in pink. (B) S-LDSC enrichment p-values on top of the baseline model in UKB. Inset labels represent gene-set size; dotted line represents BH FDR 0.1 threshold. (C) Visualization of mtDNA variants and associations with age-related diseases. The outer-most track represents the genetic architecture of the circular mtDNA. The heatmap track represents the log-scaled number of individuals with an alternate genotype at each site. The inner track represents mitochondrial genome-wide association p-values, with radial angle corresponding to position on the mtDNA and magnitude representing –log10 p-value. Dotted line represents Bonferroni cutoff for all tested trait-variant pairs. (D) Replication of S-LDSC enrichment results in meta-analyses. Dotted line represents BH FDR 0.1 threshold. * represents traits for which sufficiently well-powered cohorts from both UKB and meta-analyses were available. The trait color legend to the right of panel (C) applies to panels (B) and (C), representing UKB traits. S-LDSC enrichment p-values plotted in (B) and (D) are available in Source data 1; mtDNA-GWAS summary statistics are available in Source data 2.

Figure 2—figure supplement 1
Enrichment tests in mitochondria-localizing genes in the GWAS Catalog.

(A) The proportion of annotated lead SNP-associated genes for manually curated phenotypes from the GWAS Catalog that correspond to age-related traits of interest (colors) that overlap the set of mitochondria-localizing genes. Black line represents the proportion of all genes in the genome that are mitochondria-localizing, with bars above the black line representing nominal enrichment and below representing nominal depletion. * represents traits showing significant depletion or enrichment via two-sided Fisher’s exact test at BH FDR q-value < 0.1. (B) Empirical null distribution of the test statistic of the number of traits with nominal enrichment computed via 1000 randomly sampled subsets of the set of all protein-coding genes. Dotted line represents the true observed value for mitochondria-localizing genes. (C) Rank plot of the proportion of trait-associated genes overlapping with mitochondria-localizing genes (the same values as represented on the x-axis of panel (A) across all GWAS catalog phenotypes with at least 30 trait-associated genes). Dark points indicate traits with a significant depletion or enrichment via two-sided Fisher’s exact test at BH FDR q-value < 0.1. Dotted line represents the proportion of all genes in the genome that are mitochondria-localizing, with traits above the black line representing nominal enrichment and below representing nominal depletion. No traits show significant enrichment.

Figure 2—figure supplement 2
Analysis of mitochondria-localizing gene enrichments using alternative methods and cohorts.

(A) Coefficient point estimates and corresponding 95% CI for mitochondria-localizing gene-sets in UKB with p-values shown in Figure 2B. (B) Coefficient point estimates and 95% CI for mitochondria-localizing gene-sets tested for replication in meta-analyses with p-values shown in Figure 2D. MAGMA enrichments for mitochondria-localizing genes with 5 kb up and 1.5 kb down gene window in (C) UKB and (D) meta-analyses. (E) MAGMA enrichments for mitochondria-localizing genes in UKB with a 100 kb symmetric gene window. Dotted line represents BH FDR 0.1 threshold. * represents traits for which sufficiently well-powered cohorts from both UKB and meta-analyses were available. Inset numbers represent gene-set sizes. Enrichment p-values and effect sizes are available in Source data 1.

Figure 2—figure supplement 3
S-LDSC enrichments for tissues across age-related traits.

Enrichment P-values in (A) UK Biobank and (B) meta-analyses using S-LDSC on top of the baseline model. Gene-sets of the top 10% tissue-specific genes for each tested tissue were obtained using GTEx v7 as described previously (Finucane et al., 2018), with tissues for analysis selected manually based on the set of tested traits. * represents traits for which sufficiently well-powered cohorts from both UKB and meta-analyses were available. Dashed line represents –log10 p-value cutoff for Benjamini-Hochberg FDR < 10%. Enrichment p-values are available in Source data 1.

Figure 2—figure supplement 4
S-LDSC enrichment coefficients from tissue analysis.

(A) Coefficient point estimates and corresponding 95% CI for tissue-expressed gene-sets in UKB with p-values shown in Figure 2—figure supplement 3. (B). Coefficient point estimates and 95% CI for tissue-expressed gene-sets tested in meta-analyses with p-values shown in Figure 2—figure supplement 3. Inset numbers represent gene-set sizes. * represents traits for which sufficiently well-powered cohorts from both UKB and meta-analyses were available. All analyses were performed atop the baseline model. Enrichment coefficients and standard errors are available in Source data 1.

Figure 2—figure supplement 5
MAGMA enrichments for tissues in UKB.

Enrichment P-values for tissues in UKB with a (A) 5 kb up and 1.5 kb down window and (B) 100 kb symmetric window. These gene-sets are analogous to those shown tested with S-LDSC in Figure 2—figure supplement 3 and Figure 2—figure supplement 4. Inset numbers represent gene-set sizes. Dashed line represents –log10 P-value cutoff for Benjamini-Hochberg FDR < 10%. * represents traits for which sufficiently well-powered cohorts from both UKB and meta-analyses were available. Enrichment p-values and effect sizes are available in Source data 1.

Figure 2—figure supplement 6
MAGMA enrichments for tissues in meta-analyses.

Enrichment p-values for tissues in meta-analyses with a (A) 5 kb up and 1.5 kb down window and (B) 100 kb symmetric window. These gene-sets are analogous to those shown tested with S-LDSC in Figure 2—figure supplement 3 and Figure 2—figure supplement 4 and with MAGMA in Figure 2—figure supplement 5. Dashed line represents –log10 p-value cutoff for Benjamini-Hochberg FDR < 10%. No p-values crossed the FDR threshold for the 100 kb window, and as such the BH p-value cutoff cannot be displayed. * represents traits for which sufficiently well-powered cohorts from both UKB and meta-analyses were available. Enrichment p-values and effect sizes are available in Source data 1.

Figure 2—figure supplement 7
Enrichment power of S-LDSC and MAGMA across effect sizes and gene-set sizes.

Using biologically plausible trait-tissue associations originally computed at gene-set sizes of ~2500 genes (Atrial Fibrillation – Atrial appendage; Diverticular Disease – Sigmoid colon; Glucose, T2D – Pancreas; HDL, LDL, Cholesterol, TG – Liver, HDL – Visceral adipose; Myocardial Infarction – Coronary artery), power was computed as a probability of rejection at P ≤ 0.05 across 1000 randomly sampled subsets (of size 1520, 1105, 800, and 350 genes) of the original tissue expression gene-set. (A) Relationship between S-LDSC coefficient magnitude and power for 1105 genes. (B) Relationship between MAGMA effect size magnitude and power for 1105 genes. (C) Relationship between S-LDSC coefficient magnitude and power for 350 genes. (D) Relationship between MAGMA effect size magnitude and power for 350 genes. Shape represents window size, color represents disease, and small points represent trait-tissue pairs that were not detected as enriched at the full gene-set. (E) Power as a function of gene-set size for MAGMA and S-LDSC for a very high effect size trait-tissue pair: LDL levels and liver. (F) Power as a function of gene-set size for MAGMA and S-LDSC for a low effect size trait-tissue pair: Atrial fibrillation and atrial appendage. UKB vs. meta-analysis power comparisons for (G) 1523 genes, (H) 1105 genes, (I) 800 genes, and (J) 350 genes.

Figure 2—figure supplement 8
Assessment of cis-eQTLs among mitochondria-localizing genes.

(A) Percentage of genes localizing to each organelle that have an observed cis-eQTL in at least one GTEx tissue. (B) Coefficient effect size estimates for the regression of the number of distinct tissues with detected cis-eQTL for a given gene onto an indicator for organelle membership (Materials and methods). (C) Replication of (B) with alternate selection of tissues within each tissue class for robustness (Appendix 1). Error bars represent 95% CI.

Figure 2—figure supplement 9
Landscape of mtDNA variants and their associations with age-related disease in UKB.

(A) Within-mtDNA pair-wise correlation coefficients across all 213 tested variants with expected case count of alternative genotype individuals > 20. Axis labels represent mtDNA coordinates, color represents r2. (B) Quantile-quantile plot of p-values for all tested variant-trait pairs (21 traits x ~213 variants = ~4473 tests) using linear regression for all traits while controlling for baseline characteristics (first 20 PCs of nuclear genotype matrix, age, sex, age*sex, age2*sex) as well as array type. (C) Quantile-quantile plot of all tested variant-trait pairs for all traits using linear regression, controlling only for baseline characteristics. (D) Quantile-quantile plot of all tested variant-trait pairs using linear regression for continuous traits and logistic regression for binary traits, controlling only for baseline characteristics. Red line represents expected null p-values following the uniform distribution and shaded ribbon represents 95% CI. (E) Visualization of mtDNA variants and associations with serum creatinine and aspartate aminotransferase levels. The outer-most track represents the genetic architecture of the circular mtDNA. The heatmap track represents the number of individuals with alternate genotype on log scale. The inner track represents mitochondrial genome-wide association p-values, with radial angle corresponding to position on the mtDNA and magnitude representing –log10 p-value. Dotted line represents Bonferroni cutoff for all tested trait-variant pairs, using the same threshold as used in Figure 2C. mtDNA-GWAS summary statistics are available in Source data 2.

Figure 3 with 8 supplements
Heritability enrichment of organellar proteomes across age-related disease in UK Biobank.

(A) Quantile-quantile plot of heritability enrichment p-values atop the baseline model for gene-sets representing organellar proteomes, with black line representing expected null p-values following the uniform distribution and shaded ribbon representing 95% CI. (B) Scheme of spatially distinct disjoint subsets of the nuclear proteome as a strategy to characterize observed enrichment of the nuclear proteome. Numbers represent gene-set size. (C) S-LDSC enrichment p-values for spatial subsets of the nuclear proteome computed atop the baseline model. (D) S-LDSC enrichment p-values for TFs and all other nucleus-localizing proteins. Inset numbers represent gene-set sizes, black lines represent cutoff at BH FDR < 10%. * represents traits for which sufficiently well-powered cohorts from both UKB and meta-analyses were available. Enrichment p-values and coefficients are available in Source data 1.

Figure 3—figure supplement 1
Proportions of heritability explained by annotations indicating organellar localization as obtained via S-LDSC.

All categories are identical to those analyzed in Figure 3A, however, are obtained without controlling for the contributions to heritability from other categories in the baseline model. Each dot represents one of the 21 UKB age-related traits tested, with bar size determined by mean proportion of heritability explained. Black lines indicate the proportion of tested SNPs contained in each annotation.

Figure 3—figure supplement 2
Organelle analysis results in UKB using MAGMA.

Quantile-quantile plot of one-sided MAGMA enrichment p-values across UKB age-related traits using the same organelle gene-sets as in Figure 3A, with a (A) 5 kb up, 1.5 kb down gene window and (B) 100 kb symmetric gene window. The central black line represents expected null p-values following the uniform distribution and the shaded ribbon represents 95% CI. (C) Enrichment of spatially distinct disjoint subsets of the nuclear proteome as tested with S-LDSC in Figure 3C, using MAGMA with a 5 kb up, 1.5 kb down gene window and (D) with a 100 kb symmetric gene window. (E) Enrichment of the transcription factors and the rest of the nuclear proteome as in Figure 3D using MAGMA with a 5 kb up, 1.5 kb down gene window and (F) with a 100 kb symmetric gene window. Inset numbers represent gene-set sizes, black lines represent cutoff at BH FDR < 10%. Enrichment p-values and effect sizes are available in Source data 1.

Figure 3—figure supplement 3
S-LDSC coefficients from sub-nuclear compartment analysis.

(A) S-LDSC coefficient point estimates and 95% CI for spatially determined sub-categories of the nucleus as in Figure 3C. (B) S-LDSC coefficient point estimates and 95% CI for partitions of the transcription factors and the rest of the nuclear proteome as in Figure 3D. Inset numbers represent gene-set size. * represents traits for which sufficiently well-powered cohorts from both UKB and meta-analyses were available. Enrichment coefficients and standard errors are available in Source data 1.

Figure 3—figure supplement 4
Replication of transcription factor enrichment in meta-analyses.

(A) Coefficient p-values atop the baseline model using S-LDSC. (B) Coefficient point-estimates corresponding to panel (A). Error bars represent 95% CI. (C) Enrichment p-values in meta-analyses using MAGMA with a 5 kb up, 1.5 kb down window. The gene-sets used for this analysis are identical to those tested in UKB in Figure 3D. * represents traits for which sufficiently well-powered cohorts from both UKB and meta-analyses were available. Black lines represent cutoff at BH FDR < 10%. Inset numbers represent gene-set sizes. Enrichment p-values and effect sizes are available in Source data 1.

Figure 3—figure supplement 5
MAGMA enrichments of functional subsets of transcription factors in UKB.

(A) Distribution of Tau statistic across genes, as computed in GTEx V7 (Materials and methods). Black line indicates location of cutoff between tissue-specific genes (high Tau) and broadly expressed genes (low Tau). (B) GTEx v7 medians per tissue across tissues used to construct broadly expressed and tissue-specific transcription factor gene-sets. Rows were ordered by unbiased hierarchical clustering via complete linkage across all genes; columns were ordered using hierarchical clustering within broadly expressed and tissue-specific genes. Expression values are TPMs normalized across tissues. (C) Enrichment analysis of TFs subdivided by breadth of expression as defined in (A). (D) Enrichment analysis of TFs subdivided by the presence of a zinc finger domain. (E) Enrichment analysis of zinc finger TFs, subdivided by presence of KRAB domain. (F) Phylostrata comprising each tested gene age bin used in (G, H, I). Each gene was assigned a phylostratum based on its oldest ortholog by Litman and Stein, 2019. * are phylostrata grouped into Ancient non-KRAB TFs; ** are phylostrata grouped into Recent non-KRAB TFs. (G) Enrichment results dividing TFs into terciles based on evolutionary gene age. (H) TF DNA-binding domains represented in each gene age bin tested in panel (G). (I) Enrichment results subdividing TFs by age after removing KRAB domain zinc finger TFs. Inset numbers represent gene-set size, black lines for enrichment plots represent cutoff at BH FDR < 10%. All analyses were conducted with 5 kb up, 1.5 kb down windows. Enrichment p-values and effect sizes are available in Source data 1.

Figure 3—figure supplement 6
Replication of enrichment analysis in functional subsets of transcription factors using S-LDSC.

(A) Enrichment analysis of TFs, subdivided by breadth of tissue expression in GTEx v7 (Figure 3—figure supplement 5A, Figure 3—figure supplement 5B). (B) Enrichment analysis of TFs subdivided by the presence of a zinc finger domain. (C) Enrichment analysis of zinc finger TFs, subdivided by presence of KRAB domain. (D) Heritability enrichment results dividing TFs into terciles based on evolutionary gene age (see Figure 3—figure supplement 5F). (E) Enrichment analysis subdividing TFs into old and new after removing KRAB domain zinc finger TFs. Inset numbers represent gene-set size, black lines represent cutoff at BH FDR < 10%. Enrichment p-values are available in Source data 1.

Figure 3—figure supplement 7
S-LDSC enrichment coefficient point estimates in functional subsets of transcription factors corresponding to p-values in Figure 3—figure supplement 6.

(A) Point estimates for heritability enrichment when subdividing TFs based on breadth of tissue expression in GTEx v7 (Figure 3—figure supplement 5A, Figure 3—figure supplement 5B). (B) Point estimates for heritability enrichment when subdividing TFs based on the presence of a zinc finger domain. (C) Point estimates for heritability enrichment when subdividing TFs based on the presence of KRAB domain. (D) Point estimates for heritability enrichment when subdividing TFs into terciles based on evolutionary gene age (see Figure 3—figure supplement 5F). (E) Point estimates for heritability enrichment when subdividing TFs into old and new gene age bins after removing KRAB domain zinc finger TFs. Error bars represent 95% CI. Enrichment coefficients and standard errors are available in Source data 1.

Figure 3—figure supplement 8
Replication of enrichment analysis in functional subsets of transcription factors using MAGMA with a 100 kb symmetric gene window.

(A) Enrichment analysis of TFs, subdivided by breadth of tissue expression in GTEx v7 (Figure 3—figure supplement 5A, Figure 3—figure supplement 5B). (B) Enrichment analysis of TFs subdivided by the presence of a zinc finger domain. (C) Enrichment analysis of zinc finger TFs, subdivided by presence of KRAB domain. (D) Enrichment results dividing TFs into terciles based on evolutionary gene age (see Figure 3—figure supplement 5F). (E) Enrichment analysis subdividing TFs into old and new after removing KRAB domain zinc finger TFs. Inset numbers represent gene-set size, black lines represent cutoff at BH FDR < 10%. Enrichment p-values and effect sizes are available in Source data 1.

Figure 4 with 1 supplement
Enrichment of organellar proteomes within parental lifespan and healthspan as proxies for aging.

Upper panels represent organelle proteomes; lower panels represent spatial subsets of the nuclear proteome. Numbers atop each bar represent gene-set sizes. Dashed lines represent cutoff at BH FDR < 10%, dotted lines represent nominal p = 0.05. p-Values and coefficients available in Source data 3.

Figure 4—figure supplement 1
Enrichment of organellar proteomes within parental lifespan and healthspan as proxies for aging using S-LDSC.

Upper panels represent organelle proteomes; lower panels represent spatial subsets of the nuclear proteome. Numbers atop each bar represent gene-set sizes. Dashed lines represent cutoff at BH FDR < 10%, dotted lines represent nominal p = 0.05. Enrichment p-values and coefficients are available in Source data 3.

Figure 5 with 3 supplements
Differences in constraint distribution across organelles.

(A) Constraint as measured by LOEUF from gnomAD v2.1.1 for genes comprising organellar proteomes, book-ended by distributions for known haploinsufficient genes as well as olfactory receptors. Lower values indicate genes exacting a greater organismal fitness cost from a heterozygous LoF variant (greater constraint). (B) Proportion of each gene-set found in the lowest LOEUF decile. Higher values indicate gene-sets containing more highly constrained genes. (C) Constraint distributions for subsets of the nuclear-encoded mitochondrial proteome (red) and subsets of the nucleus (teal). Black points represent the mean with 95% CI. Inset numbers represent gene-set size.

Figure 5—figure supplement 1
Constraint distributions within subdivisions of the nuclear proteome.

(A) Constraint distributions for spatially defined subsets of the nuclear proteome. Gene-sets used are identical to those tested in Figure 3C. (B) Constraint distributions for transcription factor DNA-binding domains. Gene-sets used are identical to those tested in Figure 3—figure supplement 5. Black points represent the mean with 95% CI. Inset numbers represent gene-set size.

Figure 5—figure supplement 2
Enrichment results across age-related disease in UKB after correcting for constraint using MAGMA with a 5 kb up, 1.5 kb down window.

Quantile-quantile plots showing enrichment across age-associated disease in UKB for (A) mitochondria-localizing and (B) nucleus-localizing genes before and after correction for constraint. The central black line represents null p-values following the uniform distribution and the shaded ribbon represents 95% CI. (C) Enrichment p-values for genes with the lowest and highest LOEUF before and after inclusion of constraint as a covariate in the MAGMA model. (D) Enrichment p-values for mitochondria-localizing genes. (E) Enrichment p-values for spatial subsets of the nuclear proteome and (F) for subsets of the Chromosome and TF category. (G) Enrichment p-values for TFs subdivided by DNA-binding domain. Inset numbers represent gene-set size, black lines are cutoff at BH FDR < 10%.

Figure 5—figure supplement 3
Enrichment results across age-related disease in UKB after correcting for gene constraint using MAGMA with a 100 kb symmetric gene window.

(A) Quantile-quantile plot highlighting enrichment results for mitochondria-localizing genes across age-associated diseases in UKB before and after correcting for constraint. (B) Quantile-quantile plot highlighting enrichment results for genes comprising the nuclear proteome across age-associated diseases in UKB before and after correcting for constraint. The central black line represents expected null p-values following the uniform distribution and the shaded ribbon represents 95% CI.

Additional files

Source data 1

Effect size point estimates and p-values for all tested gene-sets and all tested age-related traits.

https://cdn.elifesciences.org/articles/68610/elife-68610-data1-v3.xlsx
Source data 2

mtDNA GWAS summary statistics.

https://cdn.elifesciences.org/articles/68610/elife-68610-data2-v3.xlsx
Source data 3

Effect size point estimates and p-values for all tested gene-sets for aging phenotypes.

https://cdn.elifesciences.org/articles/68610/elife-68610-data3-v3.xlsx
Supplementary file 1

Tested traits and sample sizes in UK Biobank and external meta-analyses.

https://cdn.elifesciences.org/articles/68610/elife-68610-supp1-v3.pdf
Supplementary file 2

(a) QC calls assigned to mtDNA variants through manual review of cluster plots. (b) Transcription factor assignment to broadly-expressed vs tissue-specific categories using two sub-region groupings.

https://cdn.elifesciences.org/articles/68610/elife-68610-supp2-v3.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/68610/elife-68610-transrepform-v3.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Rahul Gupta
  2. Konrad J Karczewski
  3. Daniel Howrigan
  4. Benjamin M Neale
  5. Vamsi K Mootha
(2021)
Human genetic analyses of organelles highlight the nucleus in age-related trait heritability
eLife 10:e68610.
https://doi.org/10.7554/eLife.68610