Research Article

GWAS and ExWAS of blood mitochondrial DNA copy number identifies 71 loci and highlights a potential causal role in dementia

Department of Biochemistry and Biomedical Sciences, McMaster University, Canada
Department of Pathology and Molecular Medicine, McMaster University, Canada
Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton Health Sciences, Canada
Thrombosis and Atherosclerosis Research Institute, Canada
Centre for Data Science and Digital Health, Hamilton Health Sciences, Canada
Department of Health Research Methods, Evidence, and Impact, McMaster University, Canada
Department of Medicine, McMaster University, Michael G. DeGroote School of Medicine, Canada
National University of Ireland, Galway, Ireland
Department of Medicine, University of Cape Town & Groote Schuur Hospital, South Africa
Hatter Institute for Cardiovascular Diseases Research in Africa (HICRA) & Cape Heart Institute (CHI), Department of Medicine, University of Cape Town, South Africa

Jan 13, 2022

Open access
Copyright information

Abstract
Editor's evaluation
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Background:

Mitochondrial DNA copy number (mtDNA-CN) is an accessible blood-based measurement believed to capture underlying mitochondrial (MT) function. The specific biological processes underpinning its regulation, and whether those processes are causative for disease, is an area of active investigation.

Methods:

We developed a novel method for array-based mtDNA-CN estimation suitable for biobank-scale studies, called ‘automatic mitochondrial copy (AutoMitoC).’ We applied AutoMitoC to 395,781 UKBiobank study participants and performed genome- and exome-wide association studies, identifying novel common and rare genetic determinants. Finally, we performed two-sample Mendelian randomization to assess whether genetically low mtDNA-CN influenced select MT phenotypes.

Results:

Overall, genetic analyses identified 71 loci for mtDNA-CN, which implicated several genes involved in rare mtDNA depletion disorders, deoxynucleoside triphosphate (dNTP) metabolism, and the MT central dogma. Rare variant analysis identified SAMHD1 mutation carriers as having higher mtDNA-CN (beta = 0.23 SDs; 95% CI, 0.18–0.29; p=2.6 × 10^-19), a potential therapeutic target for patients with mtDNA depletion disorders, but at increased risk of breast cancer (OR = 1.91; 95% CI, 1.52–2.40; p=2.7 × 10^-8). Finally, Mendelian randomization analyses suggest a causal effect of low mtDNA-CN on dementia risk (OR = 1.94 per 1 SD decrease in mtDNA-CN; 95% CI, 1.55–2.32; p=7.5 × 10^-4).

Conclusions:

Altogether, our genetic findings indicate that mtDNA-CN is a complex biomarker reflecting specific MT processes related to mtDNA regulation, and that these processes are causally related to human diseases.

Funding:

No funds supported this specific investigation. Awards and positions supporting authors include: Canadian Institutes of Health Research (CIHR) Frederick Banting and Charles Best Canada Graduate Scholarships Doctoral Award (MC, PM); CIHR Post-Doctoral Fellowship Award (RM); Wellcome Trust Grant number: 099313/B/12/A; Crasnow Travel Scholarship; Bongani Mayosi UCT-PHRI Scholarship 2019/2020 (TM); Wellcome Trust Health Research Board Irish Clinical Academic Training (ICAT) Programme Grant Number: 203930/B/16/Z (CJ); European Research Council COSIP Grant Number: 640580 (MO); E.J. Moran Campbell Internal Career Research Award (MP); CISCO Professorship in Integrated Health Systems and Canada Research Chair in Genetic and Molecular Epidemiology (GP)

Editor's evaluation

This is an original human GWAS study that treats mitochondrial copy number variation as a trait, and investigates its genetic basis, as well as its association with (and possible causal role in) various human diseases, such as cancer and dementia. The study identifies 71 significant loci, show that these are significantly over-represented in a priori candidates, and argue convincingly that this could help us understand how mitochondrial copy number is regulated at a cellular level.

https://doi.org/10.7554/eLife.70382.sa0

eLife digest

Our cells are powered by small internal compartments known as mitochondria, which host several copies of their own ‘mitochondrial’ genome. Defects in these semi-autonomous structures are associated with a range of severe, and sometimes fatal conditions: easily checking the health of mitochondria through cheap, quick and non-invasive methods can therefore help to improve human health.

Measuring the concentration of mitochondrial DNA molecules in our blood cells can help to estimate the number of mitochondrial genome copies per cell, which in turn act as a proxy for the health of the compartment. In fact, having lower or higher concentration of mitochondrial DNA molecules is associated with diseases such as cancer, stroke, or cardiac conditions. However, current approaches to assess this biomarker are time and resource-intensive; they also do not work well across people with different ancestries, who have slightly different versions of mitochondrial genomes.

In response, Chong et al. developed a new method for estimating mitochondrial DNA concentration in blood samples. Called AutoMitoC, the automated pipeline is fast, easy to use, and can be used across ethnicities. Applying this method to nearly 400,000 individuals highlighted 71 genetic regions for which slight sequence differences were associated with changes in mitochondrial DNA concentration. Further investigation revealed that these regions contained genes that help to build, maintain, and organize mitochondrial DNA. In addition, the analyses yield preliminary evidence showing that lower concentration of mitochondrial DNA may be linked to a higher risk of dementia.

Overall, the work by Chong et al. demonstrates that AutoMitoC can be used to investigate how mitochondria are linked to health and disease in populations across the world, potentially paving the way for new therapeutic approaches.

Introduction

Mitochondria are semiautonomous organelles present in nearly every human cell that execute fundamental cellular processes including oxidative phosphorylation, calcium storage, and apoptotic signaling. Mitochondrial (MT) dysfunction has been implicated as the underlying cause for many human disorders based on mechanistic in vitro and in vivo studies (Burbulla et al., 2017; Desdín-Micó et al., 2020; Sliter et al., 2018). Complementary evidence comes from recent epidemiological studies that measure mitochondrial DNA copy number (mtDNA-CN), an MT-derived marker that can be conveniently measured from peripheral blood. Since mitochondria contain their own unique set of genomes that are distinct from the nuclear genome, the ratio of mtDNA to nuclear DNA molecules (mtDNA-CN) in a sample serves as an accessible marker of MT DNA abundance per cell (Longchamps et al., 2020). Indeed, observational studies suggest that individuals with lower mtDNA-CN are at higher risk of age-related complex diseases, such as coronary artery disease, sudden cardiac death, cardiomegaly, stroke, portal hypertension, and chronic kidney disease (Tin et al., 2016; Ashar et al., 2017; Zhang et al., 2017; Hägg et al., 2021). Conversely, higher mtDNA-CN levels have been associated with increased cancer incidence (Kim et al., 2015; Hu et al., 2016).

While previous studies demonstrate that mtDNA-CN is associated with various diseases, evidence suggests that it may also play a direct and causative role in human health and disease. For example, in cases of mtDNA depletion syndrome, wherein rare defects in nuclear genes responsible for replicating and/or maintaining mtDNA lead to deficient mtDNA-CN (Gorman et al., 2016), patients manifest with severe dysfunction of energy-dependent tissues (heart, brain, liver, and cardiac and skeletal muscles). So far, 19 genes have been reported to cause mtDNA depletion (Oyston, 1998). In addition to these rare monogenic syndromes, the importance of common genetic variation in regulating mtDNA-CN is an active area of research with approximately 50 common loci identified so far (Cai et al., 2015; Guyatt et al., 2019; Longchamps, 2019; Hägg et al., 2021). In contrast to marked drops in mtDNA-CN by 60–80% as seen in those with rare mtDNA depletion syndromes, the relevance of subtler perturbations in mtDNA-CN in disease risk remains to be determined (Basel, 2020).

Granted, the connection between blood mtDNA-CN and aspects of MT biology remains unclear with many studies showing only moderate correlation between mtDNA-CN and markers of MT function or biogenesis (Frahm et al., 2005; Wachsmuth et al., 2016). Further complicating the interpretation of epidemiological associations between blood mtDNA-CN and disease risk is the fact that (i) blood mtDNA-CN is strongly confounded by blood cell composition, particularly, platelets that are devoid of nuclei and that (ii) blood mtDNA-CN does not correlate well with mtDNA-CN measured in other tissues in which MT dysfunction may be more relevant (Picard, 2021). Accordingly, understanding the genetic determinants of blood mtDNA-CN may provide a better understanding of the etiological processes reflected by this poorly understood MT biomarker in the blood.

To interrogate mtDNA-CN as a potential determinant of human diseases and to better understand its biological relevance to mitochondria, we performed extensive genetic investigations in up to 395,781 participants from the UKBiobank study (Sudlow et al., 2015). We first developed and validated a novel method for biobank-scale mtDNA-CN investigations that leverages single nucleotide polymorphism (SNP) array intensities, called ‘automatic mitochondrial copy (AutoMitoC).’ Leveraging AutoMitoC-based mtDNA-CN estimates, we performed large-scale genome-wide association study (GWAS) and exome-wide association study (ExWAS) to identify common and rare genetic variants contributing to population-level variation in mtDNA-CN. Various analyses were then conducted to build on previous publications regarding the specific genes and pathways underlying mtDNA-CN regulation (Cai et al., 2015; Guyatt et al., 2019; Longchamps, 2019; Hägg et al., 2021). Finally, we applied Mendelian randomization analyses to assess potential causal relationships between mtDNA-CN and disease susceptibility.

Results

AutoMitoC: a streamlined method for array-based mtDNA-CN estimation

We built on an existing framework for processing normalized SNP probe intensities (log2ratio [L2R] values) from genetic arrays into mtDNA-CN estimates known as the ‘MitoPipeline’ (Lane, 2014) (Materials and Methods, Figure 1). The MitoPipeline yields mtDNA-CN estimates that correlate with direct qPCR measurements and has been successfully implemented in several epidemiological investigations (Ashar et al., 2017; Zhang et al., 2017; Figure 1—figure supplements 1–2). We developed a novel method, ‘AutoMitoC,’ which incorporates three amendments to facilitate large-scale investigations of mtDNA-CN (Figure 1). First, AutoMitoC replaces autosomal signal normalization of common variants with globally rare variants which negates the need for linkage disequilibrium pruning (Materials and Methods, Figure 1—figure supplements 3–4). As a result, this simplifies derivation of mtDNA-CN estimates in ethnically diverse cohorts by allowing for use of a single, universal variant set for normalization. Second, to detect potentially cross-hybridizing probes, we empirically assess the association of corrected probe signal intensities with off-target genome intensities (Materials and Methods, Supplementary file 1 – Tab 1–2), rather than relying on sequence homology of probe sequences, which is not always available. Last, the primary estimate of MT signal is ascertained using principal component analysis ([PCA]; as opposed to using the median signal intensity of MT probes as per the MitoPipeline) which improves concordance of array-based mtDNA-CN estimates with those derived from alternative methods (Materials and Methods). A detailed description of the development of the AutoMitoC pipeline is provided in the Materials and Methods.

Figure 1 with 6 supplements see all

Download asset Open asset

Schematic summary of the automatic mitochondrial copy (AutoMitoC) pipeline.

The AutoMitoC pipeline is comprised of four major steps: (i) preprocessing, (ii) background correction, (iii) detection of probe cross-hybridization, and (iv) final derivation of mitochondrial DNA copy number (mtDNA-CN) estimates. First, preprocessing is simplified by restricting analysis of autosomal variants to those that have low minor allele frequency ( <0.01) and low genotype missingness ( <0.05). For probes passing quality control, MT and autosomal log2ratio (L2R) values undergo an initial correction for guanine cytosine (GC) waves using the method by Diskin et al., 2008. Samples exhibiting high genomic waviness post GC-correction (L2R SD >0.35) are removed. Second, background correction consists of performing principal component analysis of the autosomal probe L2R values and finding the top k principal components (PCs) that correspond to the ‘elbow’ of the scree plot. In our case, ~70% variance in autosomal L2R values was explained by the top k PCs in both UKbiobank and INTERSTROKE datasets. GC-corrected MT and L2R values are then further adjusted for the top autosomal PCs (representing technical background noise) by taking the residuals of the association between the L2R values versus the k autosomal PCs. Third, we derive a ‘clean’ set of autosomal and MT probes without signs of off-target probe cross-hybridization by empirically testing the GC-corrected and background-corrected L2R values for association with either the sample medians of off-target genome L2R values or self-reported gender (to capture off-target hybridization to sex chromosomes). Fourth, using the ‘clean’ probeset, we repeat the autosomal background correction, extract the top MT PC as a crude measure of mtDNA-CN, change the sign of the MT PC according to association of the MT PC with known predictors of mtDNA-CN that are commonly reported (sex or age), and last, standardize the MT PC values as the final AutoMitoC estimate.

To benchmark performance of AutoMitoC, array-based mtDNA-CN estimates were compared to complementary measures of mtDNA-CN in two independent studies. First, array-based mtDNA-CN estimates were derived in a subset of 34,436 UKBiobank participants with available whole exome sequencing (WES) data. Reference mtDNA-CN estimates were derived from the proportion of WES reads aligned to the MT genome relative to the autosome (Longchamps, 2019). AutoMitoC estimates were significantly correlated with WES estimates (r = 0.45; p<2.23 × 10^–308). Since WES data involves enrichment for nuclear coding genes and therefore could result in biased reference estimates for mtDNA-CN, we also performed an independent validation in an ethnically diverse study of 5791 participants where mtDNA-CN was measured using qPCR, the current gold standard assay (Fazzini et al., 2018). Indeed, we observed stronger correlation between AutoMitoC and qPCR-based estimates (r = 0.64; p<2.23 × 10^–308; Figure 1—figure supplement 5). Furthermore, AutoMitoC demonstrated robust performance (r > 0.53) across all ethnic strata in the secondary validation cohort including Europeans (N = 2431), Latin Americans (N = 1704), Africans (N = 542), South East Asians (N = 471), South Asians (N = 186), and others (N = 360; ; Figure 1—figure supplement 5). Bland Altman plots also illustrate the extent of agreement between methods (Figure 1—figure supplement 6). For every ethnicity, 95% limits of agreement intervals were smaller than expected by chance. Last, while all analyses hitherto followed the MitoPipeline condition of requiring >40,000 autosomal variants for normalization, we observed comparable performance using even 1000 random very rare probes (MAF <0.001; r = 0.60; p<5 × 10^–300) for signal normalization which reduced the runtime from several hours to less than 10 min for these 5791 samples running on 10 CPUs.

In the larger UKBiobank dataset of 395,781 with suitable array-based estimates, the distribution of mtDNA-CN was approximately normal (Figure 2A). To further verify that AutoMitoC-based estimates were indeed capturing mtDNA-CN, we performed association testing between mtDNA-CN values and known predictors including age, sex, ethnicity, and blood cell composition. Consistent with previous reports, every decade increase in age was associated with lower levels (beta = −0.08 SDs; 95% CI, –0.07 to –0.08; p<2.23 × 10^–308), and females had higher levels than males (beta = 0.12 SDs; 95% CI, 0.11–0.12; p=1.67 × 10^–299). MtDNA-CN levels also differed between ethnicities with South Asians (beta = −0.18; 95% CI, –0.16 to –0.21; p=2.93 × 10^–47) and Africans (beta = −0.18; 95% CI, –0.16 to –0.21; p=8.39 × 10^–48) having significantly lower levels than Europeans. Collectively, age, sex, and ethnicity accounted for 0.83% of the phenotypic variance in mtDNA-CN levels. All types of cell counts were significantly associated with mtDNA-CN including the standard set of covariates commonly adjusted for in mtDNA-CN investigations, namely, white blood cells (beta = −0.25 SDs in mtDNA-CN levels per 1 SD increase in cell counts; 95% CI, –0.25 to –0.24; p<2.23^–308) and platelets (beta = 0.07; 95% CI, 0.06–0.07; p<2.23 × 10^–308) (Figure 2B). White blood cell and platelet counts each accounted for 6.3% and 0.5% of the variance in mtDNA-CN levels, respectively (Figure 2B; Figure 2—source data 1). Notably, neutrophil count (beta = −0.31; 95% CI, –0.31 to –0.30; p<2.23^–308) was the strongest predictor of mtDNA-CN levels and accounted for more variance (9.9%) than white blood cell and platelet counts combined (8.2%). Collectively, total white blood cell, platelet, and neutrophil counts explained 12.3% variance in mtDNA-CN levels.

Figure 2

Download asset Open asset

Distribution of automatic mitochondrial copy (AutoMitoC) estimates and the influence of blood cell counts.

(A) Histogram illustrating AutoMitoC mitochondrial DNA copy number (mtDNA-CN) estimates in 395,781 UKBiobank participants expressed per SD change in mtDNA-CN. Associations between blood cell counts with mtDNA-CN levels as conveyed by forest plots illustrating effect estimates (left) and a bar plot showing the proportion of variance in mtDNA-CN explained (right). Models were adjusted for age, age², sex array type, 20 genetic principal components, and ethnicity. Both blood cell counts and mtDNA-CN levels were standardized (mean = 0; SD = 1).

Figure 2—source data 1 Table showing the association of blood cell counts and AutoMitoC mtDNA-CN estimates in the UKBiobank. Effect estimates (betas, 95% CI, p-values) are expressed in terms of change in mtDNA-CN per 1 SD increase in blood counts. The variance explained (r²) by blood cell counts is also provided.: https://cdn.elifesciences.org/articles/70382/elife-70382-fig2-data1-v2.xlsx
Download elife-70382-fig2-data1-v2.xlsx

GWAS identifies 70 common loci for mtDNA-CN

A GWAS was performed testing the association of 11,453,766 common genetic variants (MAF >0.005) with mtDNA-CN in 383,476 UKBiobank participants of European ancestry (Figure 3). In total, 9602 variants were associated with mtDNA-CN at genome-wide significance (Figure 3A; Figure 3—figure supplement 1), encompassing 82 independent signals in 72 loci (Supplementary file 2 – Tab 1; Figure 3—source data 1). The genomic inflation factor was 1.16 and the LD score intercept was 1.036, indicating that most inflation in test statistics was attributable to polygenicity. Sensitivity analyses revealed that nuclear mitochondrial DNA (NUMT) interference may have played a role in two independent signals (two loci), which were subsequently discarded, leading to a total of 80 independent signals in 70 loci. These 80 independent genetic signals explained 1.48% variance in mtDNA-CN levels.

Figure 3 with 3 supplements see all

Download asset Open asset

Analyses of common genetic loci associated with mitochondrial DNA copy number (mtDNA-CN).

(A) Manhattan plot illustrating common genetic variant associations with mtDNA-CN. (B) Size distribution of 95% credible sets defined for 80 independent genetic signals. (C) GENE-MANIA-mania protein network interaction exploration showing all DEPICT and gene-mania prioritized genes (top) with functions color coded and a zoom-in highlighting interactors of the key mitochondrial (MT) regulator gene, *PPRC1* (bottom). (D) ‘MitoPathway’ counts corresponding to 27 prioritized MitoCarta3 genes encoding proteins with known MT localization.

Figure 3—source data 1 Locus zoom plots for 72 loci and 82 conditionally independent genetic signals. Variants with the highest fine-mapping posterior probability are labeled by their genomic coordinates (GRCh37). Pairwise correlation between the lead variant and proximal variants were color-coded based on the 1000 Genomes Europeans reference panel.: https://cdn.elifesciences.org/articles/70382/elife-70382-fig3-data1-v2.zip
Download elife-70382-fig3-data1-v2.zip

Fine-mapping via the FINEMAP algorithm (Benner et al., 2016) yielded 95% credible sets containing 2363 genome-wide significant variants. Of the 80 independent genetic associations, 17 (22%) mapped to a single candidate causal variant; 32 (39%) mapped to 5 or fewer variants, and 42 (51%) mapped to 10 or fewer variants (Figure 3B; Supplementary file 2 – Tab 2). Credible sets for 11 genetic signals overlapped with genes responsible for rare mtDNA depletion disorders including DGUOK (3), MGME1 (2), TFAM (2), TWNK (2), POLG2 (1), and TYMP (1) (Supplementary file 2 – Tab 1 & 2). Several associations mapped to coding variants with high posterior probability. DGUOK associations mapped to a synonymous variant (rs62641680; posterior probability = 1) and a nonsynonymous variant (rs74874677; p=1). TFAM associations mapped to a 5’ untranslated region (UTR) variant (rs12247015; p=1) falling within an ENCODE candidate cis-regulatory element with a promotor-like signature and an intronic variant (rs4397793; p=1) with a proximal enhancer-like signature. Last, POLG2 associations mapped to a nonsynonymous variant (rs17850455; p=1). Beyond the six aforementioned mtDNA depletion genes identified at genome-wide significance, suggestive associations were found for POLG (rs2307441; p=1.0 × 10^–7), OPA1 (rs9872432; p=5.2 × 10^–7), SLC25A10 (rs62077224; p=1.2 × 10^–7), and RRM2B (rs3907099; p=4.7 × 10^–6). Given these observations, we hypothesized that mtDNA depletion genes may be generally enriched for common variant associations. Indeed, 10 (53%) of 19 known mtDNA depletion genes (Oyston, 1998) harbored at least suggestive mtDNA-CN associations (p<5 × 10^–6).

We also compared our findings to genome-wide significant results from two recent GWAS of mtDNA-CN by Hägg et al., 2021 and Longchamps et al., 2022 to better contextualize our findings. Of the 66 independent signals identified by Hägg et al., 2021, we found varying levels of evidence to support 43 (65%) genetic associations; 41 (62%) variants were detected at Bonferroni significance; 2 (3%) variants were detected using a suggestive significance threshold (p<5 × 10^–6) (Supplementary file 2 – Tab 3). For the remaining 23 (35%) variants not detected at suggestive significance, we hypothesized that differences in covariate adjustment strategies. Specifically, Hagg et al. did not adjust for platelets which is an important confounder of blood mtDNA-CN levels which might explain the majority of these discrepancies. As a sensitivity analysis, we repeated association testing without adjusting for blood cell traits and were able to recover 19/23 (83%) of the remaining variants at suggestive significance threshold, which suggests that approximately 1 in 3 associations detected by Hägg et al., 2021 may not be robust to adjustment for platelets (Supplementary file 2 – Tab 3). While we could not perform the reverse lookup of our top GWAS hits within the Hägg et al., 2021 GWAS due to a lack of full genome-wide summary statistics, 48 (60%) of our 80 genetic associations are likely novel relative to Hägg et al., 2021 as these variants were not reported as genome-wide significant nor in linkage disequilibrium (r² >0.10) with their top hits (Supplementary file 2 – Tab 1). Among the 133 independent signals identified by Longchamps et al., 2022, 119 achieved Bonferroni significance, of which 116 were available in our GWAS (Supplementary file 2 – Tab 4). The remaining three variants (or their proxies; r² >0.7) did not meet the MAF threshold for inclusion (MAF >0.005). Of the 116 detectable variants, we found varying levels of evidence to support 112 (97%) genetic associations; 70 (60%) variants were detected at Bonferroni significance; 15 (13%) variants were detected using a suggestive significance threshold; 27 (24%) variants were detected at nominal significance with concordant directionality (Supplementary file 2 – Tab 4). While we could not perform the reverse lookup of our top GWAS hits within the Longchamps et al., 2022 GWAS, 11 (14%) of our 80 genetic associations are likely novel as these variants were not reported as genome-wide significant nor in linkage disequilibrium with their top hits (Supplementary file 2 – Tab 1). Notably, these 11 associations were also not reported in Hägg et al., 2021.

Additionally, transethnic meta-analysis inclusive of non-Europeans (N = 395,781) was performed but given the small increase in sample size, GWAS findings remained highly similar (Figure 3—figure supplement 2). However, European effect estimates were significantly and highly correlated with those derived from South Asian (r = 0.97; p=2.2 × 10^–15) and African (r = 0.88; p=9.1 × 10^–5) GWAS analyses (Figure 3—figure supplement 3).

mtDNA-CN loci influence MT gene expression and heteroplasmy

We postulated that differential expression of genes encoded by the MT genome may trigger changes in copy number, and thus a subset of identified loci may affect mtDNA-CN through MT genome transcription. Ali et al., 2019 recently conducted a GWAS to identify nuclear genetic variants associated with variation in mtDNA-encoded gene expression (i.e. mitochondrial expression quantitative trait loci [mt-eQTLs]) (Ali et al., 2019). Nonsynonymous variants in LONP1 (rs11085147) and TBRG4 (rs2304693), as well as an intronic variant in MRPS35 (rs1127787), were associated with changes in MT gene expression across various tissues (Supplementary file 2 – Tab 5). Nominally associated mtDNA-CN loci were also observed to influence MT gene expression including intronic variants in both PNPT1 (rs62165226; mtDNA-CN p=5.5 × 10^–5) and LRPPRC (rs10205130; mtDNA-CN p=1.1 × 10^–4). Although differences in MT gene expression may be a consequence rather than a cause of variable mtDNA-CN, the analysis performed by Ali et al., 2019 was corrected for factors associated with global changes in the MT transcriptome (Ali et al., 2019). Moreover, the direction of effect estimates between mtDNA-CN and mt-eQTLs varied depending on gene and tissue context. Altogether, such findings imply that some mtDNA-CN loci may regulate mtDNA-CN by influencing MT gene expression.

Heteroplasmy refers to the coexistence of multiple mtDNA alleles within an individual for a particular variant, which is a function of the multicopy nature of the MT genome. A recent GWAS by Nandakumar et al., 2021 for mean heteroplasmy levels in saliva specimen provided initial evidence supporting a shared genetic basis for heteroplasmy and copy number. To further explore the overlap in genetic determinants of these traits, we searched for the previously reported heteroplasmy loci within our mtDNA-CN GWAS. Of 19 matching variants between the GWAS, 4 heteroplasmy loci were also associated with mtDNA-CN at genome-wide significance including variants nearby or within TINCR/LONP1 (rs12461806; mtDNA-CN GWAS p=7.5 × 10^–88), TWNK/MPRL43 (rs58678340; p=1.3 × 10^–39), TFAM (rs1049432; p=1.5 × 10^–21), and PRKAB1 (rs11064881; p=2.6 × 10^–10) genes. Consistent with the initial finding from Nandakumar et al., 2021 that the heteroplasmy-increasing TFAM allele was also associated with higher mtDNA-CN, we also observed concordant directionality for the other three variants. No additional mtDNA heteroplasmy loci were identified to influence mtDNA-CN when using a more liberal suggestive significance threshold. Full genome-wide summary statistics for the heteroplasmy GWAS were not publicly available so we could not perform the reverse lookup to examine whether the 80 mtDNA-CN variants affected heteroplasmy.

Genes and pathways implicated in the regulation of mtDNA-CN

DEPICT analysis led to the prioritization of 91 out of 18,922 genes (false discovery rate (FDR) p<0.05; Supplementary file 2 – Tab 6). Among them, 87 of the genes intersected with the GeneMANIA database and were uploaded to the GeneMANIA platform to identify additional functionally related genes (Warde-Farley et al., 2010). GeneMANIA analysis discovered an additional 20 related genes (Supplementary file 2 – Tab 7). Among the 107 total genes prioritized by DEPICT or GeneMANIA (Figure 3C), MT functions were significantly enriched in gene ontology (GO) terms including mitochondrion organization (coverage: 12/225 genes; FDR p=7.4 × 10^–5), MT nucleoid (6/34; FDR p=2.2 × 10^–4), MT genome maintenance (4/10; FDR p=6.8 × 10^–4), and MT matrix (11/257; FDR p=6.8 × 10^–4). Visual inspection of the links between key genes involved in these functions highlights PPRC1, a member of the Peroxisome proliferator-activated receptor-gamma coactivator-1alpha (PGC-1A) family of MT biogenesis activators (Scarpulla, 2011), as a potential coordinator of mtDNA-related processes (Figure 3C).

MitoCarta3 is a comprehensive and curated inventory of 1136 human proteins (1120 nuclear) known to localize to the mitochondria based on experiments of isolated mitochondria from 14 nonblood tissues (Rath et al., 2021). We leveraged this recently updated database, that was absent from GeneMANIA, to conduct a complementary set of targeted analyses focused on MT annotations (Supplementary file 2 – Tab 7). First, we hypothesized that prioritized genes would be generally enriched for genes encoding the MT proteome. Overall, 27 (25%) of 107 genes had evidence of MT localization corresponding to a 4.2-fold enrichment (null expectation = 5.9%; p=1.0 × 10^–10). Next, given that PPRC1, an activator of MT biogenesis, was prioritized by DEPICT analyses and then linked to central mtDNA regulators in GeneMANIA, we postulated that prioritized genes may be enriched for downstream targets of PGC-1A. PGC-1A induction resulted in a higher mean fold change among prioritized genes (beta = 1.48; 95% CI, 0.60–2.37) as compared to any MT gene (beta = 1.19; 95% CI, –0.76–3.13; t-test p=0.04). Finally, we categorized the 27 MitoCarta3 genes into their respective pathways. Most (16; 57%) genes were members of the ‘Mitochondrial central dogma’ pathway, which represents a nearly threefold enrichment as compared to the frequency of this pathway in the whole MitoCarta3 database (null expectation = 20.7%; p=1.3 × 10^–5). Other implicated (albeit not significantly enriched) pathways included ‘Metabolism,’ ‘Mitochondrial dynamics and surveillance,’ ‘Oxidative phosphorylation,’ and ‘Protein import, sorting and homeostasis’ (Figure 3D). Four proteins were annotated as part of multiple pathways including TYMP/SCO2, GTPBP3, MIEF1, and OXA1L (Supplementary file 2 – Tab 7).

Exome-wide association testing uncovers rare coding SAMHD1 mutations as a determinant of mtDNA-CN levels and breast cancer risk

We performed an ExWAS in 147,740 UKBiobank participants with WES data to assess the contribution of rare coding variants. Among 18,890 genes tested, SAMHD1 was the only gene reaching exome-wide significance (Figure 4A; Figure 4—source data 1). The carrier prevalence of rare SAMHD1 mutations was 0.75%, and on average, mutation carriers had higher mtDNA-CN than noncarriers (beta = 0.23 SDs; 95% CI, 0.18–0.29; p=2.6 × 10^–19; Figure 4—figure supplement 1). Also, while none of the 19 known mtDNA depletion genes reached Bonferroni significance, a suggestive association was found for TFAM (beta = −0.33; 95% CI, –0.47 to –0.19; p=4.2 × 10^–6), and this association was independent of the common TFAM variants (rs12247015; rs4397793) previously identified in the GWAS (beta = −0.33; 95% CI, –0.47 to –0.19; p=8 × 10^–6). Rare variants in SAMHD1 and TFAM accounted for 0.06% of the variance in mtDNA-CN levels. Collectively, rare and common loci accounted for 1.55%.

Figure 4 with 1 supplement see all

Download asset Open asset

Rare variant gene burden association testing with mitochondrial DNA copy number and disease risk.

(A) QQ plot illustrating expected vs observed -log10 p-values for exome-wide burden of rare (MAF <0.001) and nonsynymous mutations. (B) Manhattan plot showing phenome-wide significant associations between *SAMHD1* carrier status and cancer-related phenotypes.

Figure 4—source data 1 Summary statistics for exome-wide association testing of rare protein-altering variants with mtDNA-CN.: https://cdn.elifesciences.org/articles/70382/elife-70382-fig4-data1-v2.xlsx
Download elife-70382-fig4-data1-v2.xlsx
Figure 4—source data 2 Summary statistics for phenome-wide association testing of SAMHD1 carrier status.: https://cdn.elifesciences.org/articles/70382/elife-70382-fig4-data2-v2.xlsx
Download elife-70382-fig4-data2-v2.xlsx

To evaluate whether rare SAMHD1 mutations also influenced disease risk, we conducted phenome-wide association testing of 771 diseases within the UKBiobank. At phenome-wide significance, SAMHD1 mutation carrier status was associated with approximately twofold increased risk of breast cancer (OR = 1.91; 95% CI, 1.52–2.40; p=2.7 × 10^–8), as well as greater risk of ‘cancer (suspected or other)’ (OR = 1.52; 95% CI, 1.28–1.80; p=1.1 × 10^–6; Figure 4B; Figure 4—source data 2). Exclusion of breast cancer cases was attenuated but did not nullify the association with ‘cancer (suspected or other)’ (OR = 1.36; 95% CI, 1.10–1.67; p=0.004) suggesting that SAMHD1 mutations may also increase risk of other cancers, as has been shown for colon cancer (Rentoft, 2019). To understand whether differences in mtDNA-CN levels between SAMHD1 mutation carriers was a consequence of cancer diagnosis, we repeated association testing with mtDNA-CN excluding cancer patients. In this analysis, the association with mtDNA-CN levels was not attenuated (beta = 0.26; 95% CI, 0.19–0.32; p=7.8 × 10^–15) suggesting that the effect of rare SAMHD1 variants on mtDNA-CN levels is not driven by its relationship with cancer status. A summary of MT genes and pathways implicated by common and rare loci is provided in Figure 5.

Figure 5

Download asset Open asset

Graphical summary of mitochondrial genes and pathways implicated by genetic analyses.

Color coding indicates through which set(s) of analyses genes were identified. The image was generated using BioRender (https://biorender.com/).

Mendelian randomization analysis implicates low mtDNA-CN as a causal mediator of dementia

Given that common variant loci overlapped with several mtDNA depletion genes, we postulated that polygenically low mtDNA-CN might cause a milder syndrome with phenotypically similar manifestations. To assess whether mtDNA-CN may represent a putative mediator of mtDNA depletion-related phenotypes, we conducted Mendelian randomization analyses between genetically determined mtDNA-CN and MT disease phenotypes using summary statistics derived from the FinnGen v4 GWAS dataset (Supplementary file 2 – Tab 8).

After accounting for multiple testing of 10 phenotypes, an association between mtDNA-CN and all-cause dementia was found (OR = 1.94 per 1 SD decrease in mtDNA-CN; 95% CI, 1.55–2.32; p=7.5 × 10^–4; Figure 6; Figure 6—source data 1). Sensitivity analyses indicated no evidence of global (MR-PRESSO p=0.51; Q-statistic p=0.51) or directional (Egger Intercept p=0.47) pleiotropy. The 27 selected variants accounted for 0.70% of the variance in mtDNA-CN and 0.13% of the risk for dementia, consistent with a causal effect of mtDNA-CN on dementia risk and not vice versa (Steiger p=1.9 × 10^–62). Findings were robust across several different MR methods including the weighted median (OR = 2.47; 95% CI, 1.93–3.00; p=0.001), MR-EGGER (OR = 2.41; 95% CI, 1.71–3.11; p=0.02), and GSMR-HEIDI (OR = 1.95; 95% CI, 1.34–2.84; p=0.001) methods. Results also remained statistically significant when using a broader set of genetic instruments including all genome-wide significant loci irrespective of whether genetic variants were located proximally to MitoCarta3 genes (GSMR-HEIDI OR = 1.31; 95% CI, 1.02–1.68; p=0.04). Furthermore, we replicated the dementia association using a second UKBiobank-independent GWAS dataset derived from the International Genomics of Alzheimer’s Disease Consortium (2013) including 17,008 Alzheimer’s disease patients (OR = 1.41; 95% CI, 1.0001–1.98; p=0.04993) (Lambert et al., 2013).

Figure 6

Download asset Open asset

Association of genetically lower mitochondrial DNA copy number with mitochondrial (MT) disease phenotypes.

Coefficient plots for Mendelian randomization analyses of MT disease traits. In the absence of heterogeneity (Egger intercept p>0.05; MR-PRESSO global heterogeneity p>0.05), the inverse variance weighted result was reported. In the presence of balanced pleiotropy (MR-PRESSO global heterogeneity p<0.05), the weighted median result was reported. No set of analyses had evidence for directional pleiotropy (Egger intercept p<0.05).

Figure 6—source data 1 Mendelian randomization analyses of mtDNA-CN versus mitochondrial disease phenotypes.: https://cdn.elifesciences.org/articles/70382/elife-70382-fig6-data1-v2.xlsx
Download elife-70382-fig6-data1-v2.xlsx

Discussion

We developed a novel method to estimate mtDNA-CN from genetic array data, ‘AutoMitoC,’ and applied it to the UKBiobank study. Extensive genetic investigations led to several key insights regarding mtDNA-CN. First, several novel common and rare genetic determinants of mtDNA-CN were identified, totaling 71 loci. Second, these loci were enriched for MT processes related to dNTP metabolism and the replication, packaging, and maintenance of mtDNA. Third, we observed a strong role for common variation within known mtDNA depletion genes in regulating mtDNA-CN in the general population. Fourth, we found that rare variants in SAMHD1 not only affect mtDNA-CN levels but also confer risk to cancer. Finally, we provided the first Mendelian randomization evidence implicating low mtDNA-CN as a causative risk factor for dementia.

While several investigations for mtDNA-CN have been published, the present study represents the most comprehensive genetic assessment published to date (Cai et al., 2015; Longchamps, 2019; Hägg et al., 2021). Notably, Hägg et al., 2021 recently conducted the largest GWAS for mtDNA-CN in 295,150 UKBiobank participants and identified 50 common loci (Hägg et al., 2021). However, the method developed by Hägg et al., 2021 calibrated SNP probe intensities based on association with WES read depths, which may limit the convenience of the method. In contrast, AutoMitoC only necessitates array probe intensities and does not require any secondary genetic measurements (WES or otherwise) for calibration. In addition, AutoMitoC exhibits superior concordance with WES-based estimates (Hagg r = 0.33; AutoMitoC r = 0.45), which were validated in an independent dataset with gold standard qPCR measurements (AutoMitoC r = 0.64). Further, Hägg et al., 2021 restricted genetic analyses to unrelated European individuals, whereas we incorporated ~100,000 additional individuals and demonstrated consistency in genetic effects between Europeans and non-Europeans (r > 0.88). The greater sample size in combination with more accurate mtDNA-CN estimates may explain the additional 48 (59%) independent common genetic associations identified at genome-wide significance. Furthermore, we took additional precaution to ensure that cataloged NUMTs were not influencing our genetic association results and also to empirically evaluate the robustness of each genetic association against cataloged and uncataloged NUMTs. Finally, in the present study we included complementary explorations of the role of rare variants through ExWAS and phenome-wide association study (PheWAS), as well as, Mendelian randomization analyses to assess disease contexts whereby mtDNA-CN may represent a causal mediator and a potential therapeutic target.

MtDNA-CN has proven to be a biomarker of cardiovascular disease in several large epidemiological studies, with studies often assuming that such relationships are attributable to pathological processes including MT dysfunction, oxidative stress, and inflammation (Tin et al., 2016; Wu et al., 2017; Fazzini et al., 2019; Koller et al., 2020). Consistent with previous GWAS findings, our genetic analyses confirm that mtDNA-CN indeed reflects specific MT functions, but perhaps not the ones commonly attributable to mtDNA-CN (Cai et al., 2015; Guyatt et al., 2019; Hägg et al., 2021). Primarily, differences in mtDNA-CN reflect MT processes related to dNTP metabolism and the replication, maintenance, and organization of mtDNA. Secondarily, genes involved in MT biogenesis, metabolism, oxidative phosphorylation, and protein homeostasis were also identified but do not represent the main constituents. The observed enrichment in common variant associations within mtDNA depletion genes further reinforces the notion that differences in mtDNA-CN first and foremost reflect perturbations in mtDNA-related processes.

No therapy for mtDNA depletion disorders currently exists with treatment mainly consisting of supportive care. Intriguingly, we found that rare variants within SAMHD1 were associated with increased levels of mtDNA-CN. SAMHD1 is a multifaceted enzyme with various functions including tumor suppression through DNA repair activity and maintenance of steady-state intracellular dNTP levels, which has been involved in HIV-1 replication (Baldauf et al., 2012; Kretschmer et al., 2015). Rare homozygous and compound heterozygous loss-of-function mutations in SAMHD1 result in an immune encephalopathy known as Aicardi Goutieres’ syndrome (White et al., 2017). Imbalanced intracellular dNTP pools and chronic DNA damage cause persistent elevations in interferon alpha thus mimicking a prolonged response to HIV-1 infection. While Aicardi Goutieres’ syndrome is a severe recessive genetic disorder, case reports of SAMHD1-related disease often describe heterozygous parents and siblings as being unaffected or with milder disease (familial chilblain lupus 2) (Haskell et al., 2018). In the UKBiobank, the vast majority (99.4%) of individuals possessing SAMHD1 mutations were heterozygote carriers, who had a twofold increased risk of breast cancer. Indeed, our finding that SAMHD1 mutations associate with both elevated mtDNA-CN levels and risk of breast cancer belies the prevailing notion that higher mtDNA-CN is always a protective signature of proper MT function and healthy cells. Such findings may have important clinical implications for genetic screening. First, heterozygous SAMHD1 mutations may be an overlooked risk factor for breast cancer considering that ~1 in 130 UKBiobank participants possessed a genetic mutation conferring twofold elevated risk. Notably, while SAMHD1 mutations have been described previously to be associated with various cancers (Kohnken et al., 2015; Rentoft, 2019), this gene is not routinely screened nor part of targeted gene panels outside the context of neurological disorders (https://www.genedx.com/test-catalog/available-tests/comprehensive-common-cancer-panel/). Second, unaffected parents and siblings of Aicardi Goutieres patients might also present with greater risk of cancer. Third, while SAMHD1 is a highly pleiotropic protein, therapeutic strategies to dampen (but not abolish) SAMHD1 activity might be considered to treat mtDNA depletion disorders caused by defects in nucleotide metabolism. Indeed, Franzolin et al., 2015 demonstrated that siRNA knockdown of SAMHD1 in human fibroblasts with DGUOK mtDNA depletion mutations partially recovered mtDNA-CN (Franzolin et al., 2015).

To our knowledge, we provide the first Mendelian randomization evidence that mtDNA-CN may be causally related to risk of dementia. Although dysfunctional mitochondria have long been implicated in the pathogenesis of Alzheimer’s disease, only recently has mtDNA-CN been tested as a biomarker. Silzer et al., 2019 conducted a matched case-control study of 46 participants and showed that individuals with cognitive impairment had significantly lower blood-based mtDNA-CN (Silzer et al., 2019). Andrews and Goate, 2020 studied the relationship between postmortem brain tissue mtDNA-CN and measures of cognitive impairment in 1025 samples (Andrews and Goate, 2020). Consistent with our findings, a 1 SD decrease in brain mtDNA-CN was associated with lower mini mental state exam (beta = –4.02; 95% CI, –5.49 to –2.55; p=1.07 × 10^–7) and higher clinical dementia rating (beta = 0.71; 95% CI, 0.51–0.91). Both studies implicate blood- and brain-based mtDNA-CN as a marker of dementia but were retrospective. In contrast, Yang et al., 2021 observed a significant association between mtDNA-CN and incident risk of neurodegenerative disease (Parkinson’s and Alzheimer’s disease)(Yang et al., 2021). Altogether, our results combined with previous findings suggest that mtDNA-CN represents both a marker and mediator of dementia. Considering that our overall findings suggest that mtDNA-CN reflects numerous MT subprocesses, future studies will be required to disentangle which ones, as reflected by diminished mtDNA-CN, truly mediate dementia pathogenesis.

Several limitations should be noted. First, mtDNA-CN approximated by array-based methods remains imperfectly accurate as compared to qPCR or whole genome sequencing measurements, though we found strong correlation between AutoMitoC and qPCR-based estimates in this study (r = 0.64; p<2.23 × 10^–308). AutoMitoC was benchmarked using multiple comparative mtDNA-CN estimation modalities, in independent cohorts, and in various ethnic groups, but further work can be done to assess the reproducibility and error of mtDNA-CN estimates, calibration to absolute counts, and the robustness to different genotyping platforms. Although whole genome sequencing will eventually supplant array-based mtDNA-CN GWAS, we hypothesize that the improvements made in the areas of speed, portability to ethnically diverse studies, and ease-of-implementation, should greatly increase accessibility of mtDNA-CN research as a plethora of genotyping array data is presently available to reanalyze. Second, we did not detect strong evidence of confounding from population structure (or substructure) in our genetic analyses, and we undertook several precautions to mitigate such biases including (i) stratified GWAS analyses by ethnicity, (ii) adjustment for a large number of intraethnic principal components (PCs), and (iii) restriction of downstream analyses to use of European GWAS summary statistics to harmonize with external databases; however, it is still plausible that population substructure may have influenced our findings. Third, Mendelian randomization analyses were underpowered to conduct a broad survey of diseases in which MT dysfunction may play a causal role, and even among the 10 phenotypes tested, statistical power varied by disease prevalence as evidenced by relatively large CI for the dementia association. Accordingly, future studies including larger case sample sizes are necessary to provide more precise effect estimates. Equally as important, we were unable to differentiate whether specific MT subpathways mediated risk of disease. As additional loci are uncovered, such analyses may be feasible. Fourth, variants and genes implicated in the regulation of mtDNA-CN may be specific to blood samples though findings suggest that many mtDNA-CN loci act through genes that are widely expressed in mitochondria across multiple tissues. Future studies are required to determine whether associations are ubiquitous across mitochondria-containing cells and to investigate the role of mtDNA-CN in other tissues. Lastly, whole blood mtDNA-CN reflects a heterogenous mixture of nucleated and unnucleated cells, and despite adjustment for major known confounding cell types, interindividual differences in cell subpopulations not captured by a standard blood cell count may represent an important source of confounding.

Conclusion

Although commonly viewed as a simple surrogate marker for the number of mitochondria present within a sample, genetic analyses suggest that mtDNA-CN is a highly complex biomarker under substantial nuclear genetic regulation. mtDNA-CN reflects a mixture of MT processes mostly pertaining to mtDNA regulation. Accordingly, the true relationship between mtDNA-CN measured in blood samples with human disease remains to be completely defined though we find evidence for mtDNA-CN as a putative causal risk factor for dementia. Future studies are necessary to decipher if mtDNA-CN is directly involved in the pathogenesis of dementia and other diseases or whether other specific MT processes are truly causative.

Share this article

Cite this article

Schematic summary of the automatic mitochondrial copy (AutoMitoC) pipeline.

Distribution of automatic mitochondrial copy (AutoMitoC) estimates and the influence of blood cell counts.

Figure 2—source data 1

Analyses of common genetic loci associated with mitochondrial DNA copy number (mtDNA-CN).

Figure 3—source data 1

Rare variant gene burden association testing with mitochondrial DNA copy number and disease risk.

Figure 4—source data 1

Figure 4—source data 2

Graphical summary of mitochondrial genes and pathways implicated by genetic analyses.

Association of genetically lower mitochondrial DNA copy number with mitochondrial (MT) disease phenotypes.

Figure 6—source data 1

Author details

Michael Chong

Contribution

Competing interests

Pedrum Mohammadi-Shemirani

Contribution

Competing interests

Nicolas Perrot

Contribution

Competing interests

Walter Nelson

Contribution

Competing interests

Robert Morton

Contribution

Competing interests

Sukrit Narula

Contribution

Competing interests

Ricky Lali

Contribution

Competing interests

Irfan Khan

Contribution

Competing interests

Mohammad Khan

Contribution

Competing interests

Conor Judge

Contribution

Competing interests

Tafadzwa Machipisa

Contribution

Competing interests

Nathan Cawte

Contribution

Competing interests

Martin O'Donnell

Contribution

Competing interests

Marie Pigeyre

Contribution

Competing interests

Loubna Akhabir

Contribution

Competing interests

Guillaume Paré

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism