Research Article

Origins and functional consequences of somatic mitochondrial DNA mutations in human cancer

Wellcome Trust Sanger Institute, United Kingdom
Cambridge University Hospitals NHS Foundation Trust, United Kingdom
University of Cambridge, United Kingdom
University of New South Wales, Australia
National Cancer Centre, Singapore
Duke-NUS Graduate Medical School, Singapore
Dana-Farber Cancer Institute, United States
Institute of Cancer Research, Sutton, United Kingdom
University of Oxford, United Kingdom
MD Anderson Cancer Center, United States
University of Nottingham, United Kingdom
National Institute of Health, United States
University of North Carolina, United States
University of Toronto, Canada
University of Liverpool, United Kingdom
HCA Pathology Laboratories, United Kingdom
University of East Anglia, United Kingdom
University of Tampere and Tampere University Hospital, Finland
Johns Hopkins University, United States
Royal National Orthopaedic Hospital, United Kingdom
University College London, United Kingdom
The University of Texas, MD Anderson Cancer Center, Houston, United States
Newcastle University, United Kingdom

Oct 1, 2014

Open access
Copyright information

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Recent sequencing studies have extensively explored the somatic alterations present in the nuclear genomes of cancers. Although mitochondria control energy metabolism and apoptosis, the origins and impact of cancer-associated mutations in mtDNA are unclear. In this study, we analyzed somatic alterations in mtDNA from 1675 tumors. We identified 1907 somatic substitutions, which exhibited dramatic replicative strand bias, predominantly C > T and A > G on the mitochondrial heavy strand. This strand-asymmetric signature differs from those found in nuclear cancer genomes but matches the inferred germline process shaping primate mtDNA sequence content. A number of mtDNA mutations showed considerable heterogeneity across tumor types. Missense mutations were selectively neutral and often gradually drifted towards homoplasmy over time. In contrast, mutations resulting in protein truncation undergo negative selection and were almost exclusively heteroplasmic. Our findings indicate that the endogenous mutational mechanism has far greater impact than any other external mutagens in mitochondria and is fundamentally linked to mtDNA replication.

https://doi.org/10.7554/eLife.02935.001

eLife digest

The DNA in a cell's nucleus must be copied faithfully, and divided equally, when a cell divides to produce two new cells. Mistakes—or mutations—are sometimes made during the copying process, and mutations can also be introduced by exposing DNA to damaging agents known as mutagens, such as UV light or cigarette smoke. These mutations are then maintained in all of the descendants of the cell. Most of these mutations have no impact on the cell's characteristics (‘passenger mutations’). However, ‘driver mutations’ that allow cells to divide uncontrollably and spread to other body sites can lead to cancer.

Mitochondria are cellular compartments that are responsible for generating the energy a cell needs to survive and are also responsible for initiating programmed cell death. Mitochondria contain their own DNA—entirely separate from that in the nucleus of the cell—that encodes the proteins most essential for energy production. Mitochondrial DNA molecules are frequently exposed to damaging molecules called reactive oxygen species that are produced by the mitochondria. Therefore, these reactive oxygen species have been thought to be one of the most important causes of mitochondrial DNA mutations. In addition, because cancer cells produce energy differently to normal cells, mutations in the mitochondrial DNA that change the ability of the mitochondria to produce energy have been conventionally thought to help normal cells to become cancerous. However, conclusive evidence for a link between cancer and mitochondrial DNA mutations is lacking.

Ju et al. examined the mitochondrial DNA sequences taken from 1675 cancer biopsies from over thirty different types of cancer and compared these to normal tissue from the same patients. This revealed 1907 mutations in the mitochondrial DNA taken from the cancer cells. The pattern of the mutations suggests that the majority of the mutations are not introduced from reactive oxygen species, but from the errors the mitochondria themselves make in the process of duplicating their DNA when a cell divides. Unexpectedly, known mutagens, such as cigarette smoke or UV light, had a negligible effect on mitochondrial DNA mutations.

Contrary to conventional wisdom, Ju et al. found no evidence that the mitochondrial DNA mutations help cancer to develop or spread. Instead, like passenger mutations found in the DNA in the cell nucleus, most mitochondrial genome mutations have no discernible effect. However, Ju et al. revealed that DNA mutations that damage normal mitochondrial activity are less likely to be maintained in cancer cells. Presumably, mitochondria containing these proteins produce less energy, and so a cell containing too many of these mutations will find it harder to survive. This shows that having enough correctly functioning mitochondria is essential for even cancer cells to thrive.

https://doi.org/10.7554/eLife.02935.002

Introduction

All cancers result from somatic mutations in their genomes. Beyond the ∼3200 Mb of nuclear genomic DNA, human cells have hundreds to thousands of mitochondria present in every cell, each carrying one or a few copies of the 16,569 bp circular mitochondrial genomes (Smeitink et al., 2001; Legros et al., 2004; Koppenol et al., 2011). In addition to their role in cellular energy balance through oxidative phosphorylation, mitochondria are involved in many essential cellular functions including modulation of oxidation–reduction status, contribution to cytosolic biosynthetic precursors, and initiation of apoptosis. Mitochondria in eukaryotic cells evolved by endosymbiosis from a free-living α-proteobacterium (Gray et al., 1999). Over 2 billion years of co-evolution, many ancestral mitochondrial genes have transferred to the nucleus (Falkenberg et al., 2007; Calvo and Mootha, 2010; Wallace, 2012). What remains in the mitochondrial genome is distinctive for the striking asymmetry between the two complementary mtDNA strands in terms of nucleotide content and gene distribution (Andrews et al., 1999). The heavy (H) strand is guanine-rich (C/G = 0.4) and is the template from which most mitochondrial proteins (12 out of 13) are transcribed, whereas only one protein-coding gene, MT-ND6, is transcribed from the correspondingly cytosine-rich light (L) strand.

Mutations in the mitochondrial genome cause inherited disease (Chinnery, 1993), with a maternal inheritance pattern because only eggs contribute mitochondria to the zygote. The penetrance of inherited mitochondrial disease is determined stochastically by both the random assortment of mutated vs wild-type mitochondrial genomes during meiosis and random drift during the early cell divisions after fertilization. In cancer, the role of somatically acquired mtDNA mutations is controversial. Although cancer-specific mutations have been previously reported (Polyak et al., 1998; Brandon et al., 2006; Chatterjee et al., 2006; He et al., 2010; Larman et al., 2012), the limited sample size or poor sensitivity of capillary sequencing for heteroplasmic mutations has not allowed a comprehensive analysis of the mutational signatures of mitochondrial mutations nor their likely functional significance. It has long been proposed that mitochondria might contribute to cancer development given their fundamental importance to cellular biology (Wallace, 2012). Previous reports suggested that mitochondrial somatic mutations might be under positive selection and thus contribute to cancer development, but the small number of reported mutations renders this conclusion uncertain (Brandon et al., 2006; Chatterjee et al., 2006; Larman et al., 2012; Schon et al., 2012). Nonetheless, the hypothesis of functionally relevant mitochondrial mutations is an appealing one because cancer cells have greatly increased energy demands over normal cells and demonstrate a switch from aerobic glycolysis in mitochondria to lactic acid fermentation in the cytosol (the Warburg effect) (Hanahan and Weinberg, 2011; Koppenol et al., 2011).

In each cell cycle, the replicating genome is at risk of de novo mutations, which can promote the development of cancer. These mutations may be generated by intrinsic cellular errors during DNA replication or repair or through exposure to mutagens, such as reactive oxygen species, tobacco smoke, and ultraviolet light (Pleasance et al., 2010a, 2010b). Recently, >20 mutational signatures operative in cancers have been identified in the nuclear genome (Alexandrov et al., 2013). Whether any of these mutational processes also affect the mitochondrial genome has not been studied. Furthermore, whether there are mtDNA-specific mutational processes in somatic cells remain unclear, although the many unique features of mtDNA replication and repair, coupled with the high concentration of reactive oxygen species generated by the electron transport chain, could be associated with distinctive mutation signatures.

In this study, we compare 1675 cancer and paired normal mtDNA sequences across 31 tumor types using massively parallel DNA sequencing technologies to obtain a systematic and unbiased catalog of somatic mitochondrial mutations. We find that mtDNA mutations are almost exclusively the product of a mutational process that is specific to mitochondria and probably linked to the unique mechanism of genome replication these organelles employ. We find no evidence for positive selection of mitochondrial mutations during oncogenesis, suggesting that they confer no clonal advantage on the nascent cancer cells.

Results

mtDNA sequencing and Mutation Calling

We extracted the mtDNA sequences from 704 whole-genome and 971 whole-exome sequencing data generated on primary cancers and compared them with mtDNA sequences from their matched normal samples. Given the abundance of mtDNA per cancer cell, a standard coverage of 30–40× in the nuclear genome provides significantly greater coverage of the mitochondrial genome (average read depth = 7901.0×), enabling accurate identification of somatic mutations including rare heteroplasmic variants. We also assessed whether whole-exome sequencing could be used to identify mtDNA mutations from off-target reads derived from the mitochondrial genome. We found an average read depth of 92.1× across the mitochondrial genome in exome studies. From 139 samples in which we had both exome and whole-genome sequencing data, the overall read depths correlated strongly (R² = 0.59, Figure 1—figure supplement 1) as did variant allele fractions for mtDNA somatic mutations (R² = 0.97, Figure 1—figure supplement 2). Validation experiments suggested the sensitivity of whole-exome sequencing for detection of mtDNA somatic mutations to be 71.4% compared to whole-genome sequencing (Figure 1—figure supplement 3 and ‘Materials and Methods’, ‘Off-target mtDNA reads in whole-exome sequencing’ and ‘DNA cross-contamination’).

To reduce potential false-positive calls of mtDNA somatic mutations, we only report variants called with an allele fraction of >3%. This eliminates the risk of miscalls due to mtDNA-derived pseudogenes in the nucleus (NuMTs) because mtDNA copy numbers are 100–1000 times higher than nuclear genomes in human somatic cells, and the sequence homology between mtDNA and NuMTs presented in the human reference genome is generally <95% (in 96 out of 101 NuMTs with length greater than 300 bp). Furthermore, pairwise comparison between cancer and matched normal mtDNAs from the same individual further minimizes the contamination of NuMTs in the mutation calling.

The catalog of mtDNA somatic mutations

In total, 1675 tumor–normal pairs across 31 tumor types were analyzed (Table 1 and Supplementary file 1). For 61 of these patients, we had sequencing data available from multiple sites of the primary cancer, several time points or matched primary cancers, and metastases (a total of 73 such cancer samples), allowing us to study the timing of mtDNA mutations in cancer evolution (Supplementary file 1). We identified 1907 somatic mtDNA substitutions (Figure 1 and Supplementary file 2). In contrast to inherited polymorphisms (n = 38,706, available at Supplementary file 2), which were almost always homoplasmic in both the cancer and counterpart normal, the variant allele fractions (VAFs) of these somatic substitutions were highly variable in the cancer, ranging from our detection threshold (3%) to homoplasmy (100%). Of these 1907 somatic substitutions, 1209 (63.4%) were not registered in the databases of mtDNA common polymorphism (Ingman and Gyllensten, 2006; Levin et al., 2013). In comparison, when we examined substitutions found in both the tumor and the normal samples from a patient, only 21 (0.05%) were not registered in the polymorphism databases, a significantly different fraction from the tumor-only variants (p < 10⁻¹⁰; Chi-squared test). We found 595 (31.2%) recurrent mutations that can be collapsed onto 246 mtDNA positions, which is a 6.9-fold higher level of recurrence than expected by chance (p < 10⁻¹⁰). This suggests that the generation or fixation of mtDNA mutations is not random, but influenced by factors such as the underlying mutational process or positive selection.

Table 1

Summary statistics of mtDNA sequence data

https://doi.org/10.7554/eLife.02935.003

	WGS	WXS	Average mt RD (WGS)	Average mt RD (WXS)	Total		WGS	WXS	Average mt RD (WGS)	Average mt RD (WXS)	Total
Breast	284	98	11594.3	52.7	382	Meningioma	0	12	-	42.5	12
Colorectal	1	75	34916.9	276.6	76	Ependymoma	1	9	10323.7	52.7	10
Lung	60	0	2798.1	-	60
Prostate	80	0	17810.6	-	80	MPD	12	138	1517.0	10.9	150
Hepatocellular	0	47	-	205.8	47	MDS	3	75	5648.7	44.5	78
Melanoma	13	13	513.9	353.5	26	ALL	64	6	886.6	35.9	70
Gastric	0	13	-	184.1	13	CLL	6	0	5002.2	-	6
Cholangiocarcinoma	0	8	-	143.9	8	AML	1	6	6783.6	27.4	7
Mesothelioma	0	6	-	106.3	6	Multiple myeloma	0	69	-	43.2	69
Bladder	54	0	646.2	-	54	AMKL	0	9	-	24.2	9
Renal	0	23	-	35.4	23	Lymphoma	0	4	-	99.5	4
Ovarian	0	38	-	58.9	38
Uterine	27	23	736.0	149.5	50	Osteosarcoma	38	90	9525.5	119.2	128
Cervical	0	52	-	85.2	52	Chondrosarcoma	0	47	-	99.1	47
Adenoid cystic ca.	1	60	714.7	75.6	61	Ewing sarcoma	0	27	-	69.5	27
Head & Neck	43	3	1369.1	18.8	46	Kaposi sarcoma	0	9	-	181.0	9
						Chordoma	16	11	1240.0	82.1	27
Total; 31 cancer types							704	971			1675

WGS, whole-genome sequencing; WXS, whole-exome sequencing; mt RD, mitochondrial read depth; MPD, myeloproliferative disease; MDS, myelodysplastic syndrome; ALL, acute lymphoblastic leukemia; CLL, chronic lymphoblastic leukemia; AML, acute myeloid leukemia; AMKL, acute megakaryoblastic leukemia.

Figure 1 with 5 supplements see all

Download asset Open asset

Mitochondrial somatic substitutions identified from 1675 Tumor–Normal pairs.

mtDNA genes and intergenic regions are shown. The strand of genes is shown based on mtDNA strand containing equivalent sequences of transcribed RNA. Substitution categories (silent, non-silent (missense and nonsense), non-coding (tRNA and rRNA), and intergenic) are shown by the shapes of each substitution. Six classes of substitutions are presented color-coded. The substitutions on the H, and L strand (when six substitutional classes were considered) are shown outside and inside of mtDNA genes, respectively. Vertical axes for H and L strand substitutions represent the VAF of each variant.

https://doi.org/10.7554/eLife.02935.004

Of the 1675 cancer samples, 976 (58.3%) harbored at least one somatic substitution and 521 (31.1%) had multiple substitutions, ranging from 2 to 7 (Figure 2A). In those with multiple substitutions, 72 pairs of mutations were sufficiently close to phase (Nik-Zainal et al., 2012b) such that we could determine whether they were linked on the same mtDNA genome or were on different copies. We found that 45 (62.5%) pairs of mutations were linked on the same mtDNA genome (Supplementary file 3 and Figure 2—figure supplement 1). Furthermore, of these linked mutations, 33 showed a clear temporal order: that is, one mutation was demonstrably sub-clonal to the other. This is rather unexpected, since each somatic cell has 100–1000 copies of the mitochondrial genome, and we might anticipate that random mutations would, on average, affect different copies. That many pairs of mutations are phased on the same mtDNA genome and yet show a clear sub-clonal relationship suggests that they occur sufficiently separated in time to allow the mitochondrial genome carrying the earlier mutation to drift towards a substantial fraction of all genomes in that cell before the second mutation occurs, consistent with a previous report (De Alwis et al., 2009).

Figure 2 with 1 supplement see all

Download asset Open asset

mtDNA somatic substitutions of human cancer.

(A) Number of somatic substitutions in a tumor sample. (B) Average number of somatic substitutions per sample across 31 tumor types. (C) Age of diagnosis and number of mtDNA somatic substitutions in breast cancers.

https://doi.org/10.7554/eLife.02935.010

The number of somatic mtDNA substitutions varied significantly according to tumor type (p = 4.4 × 10⁻⁵²) after correcting for confounding variables such as sequencing coverage: gastric, hepatocellular, prostate, and colorectal cancers had the highest number of mtDNA substitutions (Figure 2B). In contrast, hematologic cancers (acute lymphoblastic leukemia, myeloproliferative disease, and myelodysplastic syndrome) had fewer mutations. Several possible explanations could underpin these differences across tumor types. It could be that the mutation rates differ across cell lineages; it could be that selection pressures shape the number of mutations; or the number of mtDNA genome generations could differ across cell lineages. Of these explanations, we believe that the second is unlikely because, as we shall see, positive selection is not a major component of mitochondrial mutations. Interestingly, we find a positive correlation between the number of mtDNA somatic mutations and age at diagnosis in breast cancers (p = 0.0004; Figure 2C), in keeping with the idea that the number of mitochondrial generations is linked to mutation burden. The mutational burden of an established cancer represents the accumulated variation acquired in the lineage of cell divisions from fertilized egg to transformed cell and will include events acquired in normal development and homeostasis as well as those acquired during tumorigenesis (Stratton et al., 2009). Interestingly, mtDNA mutations have been found at high rates in normal colonic crypt cells (Taylor et al., 2003; Ericson et al., 2012). Given that we find high burdens of mutations in colonic tumors as well, the differences we see across tumor types may arise from pre- or post-transformation differences in mtDNA burden across tissues.

Extracting mtDNA mutational signatures

With respect to signatures of somatic substitutions, C > T and T > C transitions constituted 90.9% of all the 1907 substitutions (Figure 1) among the six classes of possible base substitutions. To characterize this aggregated signature of mtDNA cancer specific mutations in more detail, we looked for the presence of mtDNA strand bias between the complementary H and L strands of mtDNA. The two main substitution classes showed an extreme level of mtDNA strand bias. 84.1% of the C > T transitions were on the H strand. This level of strand bias occurred despite the fact that cytosine is 2.4-fold less common on the H than the L strand, so the C > T substitution rate is 12.6-fold higher on the H strand. By contrast, 76.8% of the T > C transitions were on the L strand despite its lower thymine content (1.3-fold less than the H strand). This implies that the T > C mutation rate on the L strand is 4.2-fold higher than on the H strand.

We then examined the sequence context in which these mutations occurred by examining the bases immediately 5′ and 3′ to the mutated bases. This generates 96 possible mutation classes (the 6 substitution classes multiplied by the 16 combinations of immediate 5′ and 3′ nucleotides). Both C > T and T > C mutations showed highly distinctive sequence contexts. C_H > T_H substitutions (i.e. C > T mutations on the H strand) were enriched for the NpCpG trinucleotide context (8- to 15-fold more frequent than expected by chance; Figure 3A). By contrast, T_L > C_L substitutions (i.e. T > C mutations on the L strand) showed 5- to 8-fold enrichment in NpTpC. This strand-asymmetric mutational signature is not similar to any of the 21 cancer-associated mutational signatures recently identified from the nuclear DNA of 30 different cancer types (Alexandrov et al., 2013).

Figure 3 with 1 supplement see all

Download asset Open asset

Replicative strand bias for mtDNA somatic substitutions.

(A) Replicative strand-specific substitution rate (# of observed/# of expected) by 96 trinucleotide context. Substitutions in a specific mtDNA segment (from Ori-b to O_H) are not included, because they present a different substitutional signature. (B) Mutational signature across tumor types. Eighteen tumor types, which include at least 25 mtDNA mutations, were shown. (C) Inverted substitution signature in the Ori-b–O_H.

https://doi.org/10.7554/eLife.02935.012

Of the 18 tumor types that presented at least 25 mtDNA somatic substitutions in this study, the mutational signatures were broadly consistent across tumor types (Figure 3B), with the exception that multiple myeloma had a somewhat higher rate of T_H > C_H changes than other histologies (p = 8.1 × 10⁻⁶). Thus, in contrast to the mutational signatures found in nuclear genomes, where there is striking heterogeneity both across tumor types and across individuals within a tumor type (Alexandrov et al., 2013), the mutational profile in the mitochondrial genome of somatic cells is remarkably homogeneous.

Replication-coupled mutational process in mitochondria

The major known cause of mutational strand bias in nuclear DNA is transcription-coupled nucleotide excision repair, where DNA lesions on the transcribed (non-coding) strand are more frequently repaired (Alexandrov et al., 2013). However, we find that the strand bias always favors C_H > T_H and T_L > C_L whether the gene is transcribed from the H strand or from the L strand (Figure 3—figure supplement 1). This is not compatible with transcription-coupled repair, for which the direction of strand bias is fundamentally dictated by which strand is transcribed.

Instead, the mtDNA mutational strand bias reported here appears to be driven by differences in replication between the two strands. mtDNA replication harbors substantial strand asymmetry between the H and L strands: mtDNA replication initiates from an origin of replication (O_H) in the D-loop, with the nascent H and the L strand replicating as leading and lagging strand, respectively (Clayton, 1982; Falkenberg et al., 2007; Holt and Reyes, 2012). We observed that C > T substitutions were prevalent in the leading (heavy) strand, whereas T > C substitutions were found in the lagging (light) strand (Figure 1). Remarkably, this strand bias was reversed in the D-loop itself (Figures 1 and 3C), further suggesting that the mtDNA somatic mutations are replication-coupled: according to a recently proposed bidirectional model of mtDNA replication (Yasukawa et al., 2005, 2006; Holt and Reyes, 2012), mtDNA replication is also able to initiate from the so-called Ori-b site, typically located around genomic position 16,197 and proceeds on both strands away from the origin (Figure 1). Replication of the nascent H strand continues unimpeded like the traditional model, but the nascent L strand terminates at the so-called O_H site, typically around mtDNA position 191 bp. Under this model, then, the leading and lagging strand are reversed in the few hundred base-pairs of the D-loop, which is consistent with the reversed mutational signature in this region (Figures 1 and 3C).

Equivalent mutational signature during human mtDNA Evolution

It is not entirely straightforward to infer the mutational signatures operating on the mitochondrial genome in the germline. De novo mutations are generally rare and often discovered because they cause disease; distinguishing the ancestral base and the derived base is challenging for single nucleotide polymorphisms; and comparative mtDNA genomics across species extends over considerable evolutionary time. In contrast, because ancestral and derived states are defined for tumor–normal pairs, a much clearer picture emerges of the somatic mtDNA mutation signature. We therefore assessed whether the signature that emerges for somatic mitochondrial mutations could extend to explain sequence composition of the human mtDNA genome.

It appears that the mutational mechanism which has generated the C_H > T_H and T_L > C_L signature in cancer mtDNA is equivalent to the one that has been operating during evolution of human germline mtDNA (Nikolaou and Almirantis, 2006). This manifests as the depletion of certain codons in the reference human mtDNA sequence through the action of the C_H > T_H and T_L > C_L mutational process over time (Figure 4A). For example, the GCG triplet codon (Alanine) appears to have been replaced by its synonymous GCA codon (due to C_H > T_H (G_L > A_L)), with the former being 15.8-fold less frequently observed in the 12 mtDNA protein-coding genes that are transcribed from the H strand (and encoded on the L strand). All 32 synonymous codon pairs present the same tendency. Consistent with this interpretation, the gene transcribed from the L strand (MT-ND6) demonstrates the opposite direction of skew. Further analyses of mtDNA codon usage from seven animal species suggest that the C_H > T_H and T_L > C_L mutational pressure may be characteristic of vertebrates, and primates in particular (Figure 4—figure supplement 1).

Figure 4 with 2 supplements see all

Download asset Open asset

Mutational signature similar to processes shaping human mtDNA sequence over evolutionary time.

(A) Triplet codon depletion in human mtDNA by equivalent (C_H > T_H and T_L > C_L) mutational pressure. Relative frequency of each triplet codon within synonymous pairs (NNT–NNC or NNA–NNG) is shown by color. The arrows beside the box highlight the T > C (red) and G > A (blue) substitutional pressures on the L strand in germline mtDNA. (B) Correlation of triplet codon frequencies between from observed and from simulated evolutions of a random sequence mtDNA by the mtDNA somatic mutational signature with constraining mitochondrial protein sequences.

https://doi.org/10.7554/eLife.02935.014

To quantify whether the somatic mutational signature we have defined can fully explain the trinucleotide frequency of human mtDNA, we performed evolutionary simulations. First, we simulated the evolution of a random DNA sequence under the mutational signature described here. By mutational pressure alone, the random sequence starts losing certain hypermutable trinucleotides until eventually reaching a stationary sequence composition. The actual sequence composition of the human mitochondrial genome strongly resembles this stationary distribution (Pearson's r = 0.83; p < 0.0001; Figure 4—figure supplement 2). In a second simulation, a random sequence encoding the exact amino acid sequence of the reference mitochondrial genome was evolved by synonymous mutations under the observed mtDNA signature until reaching a stationary sequence composition (mutation–selection equilibrium). These simulations also eventually approximate the observed human mitochondrial genome (Pearson's r = 0.96, p < 0.0001; Figure 4B). These analyses strongly suggest that the mitochondrial mutation signature observed in cancer cells closely reflects the mutation signature active in the germline, which has continuously shaped the mitochondrial genome during human evolution.

Negative selection on truncating mtDNA mutations and tRNA anticodons

Next, we assessed the functional impact of somatic mtDNA mutations. Of the 1907 substitutions, 1153 (60.5%) were in the 13 protein-coding genes. These include 63 nonsense, 4 stop-lost, 878 missense, and 208 silent substitutions (Supplementary file 2). In addition, out of 251 indels we observed, 110 occurred within protein-coding genes (Supplementary file 2). Of the missense substitutions, 245 (27.9%) were recurrent, affecting 107 distinct mtDNA sites. Although this very high level of mutation clustering could, at first sight, be interpreted as evidence for positive selection, we found that silent substitutions were also frequently recurrent (28 recurrent variants in 13 mtDNA sites), with no substantial difference in recurrence rates between silent and missense mutations (p = 0.19; Figure 5—figure supplement 1). We believe this recurrence to be the consequence of a high mtDNA mutation rate with restricted mutational signature (C_H > T_H and T_L > C_L). Independently recurring mutations in human germline mtDNA are well described across human evolution (Levin et al., 2013).

The ratio of somatic missense to silent substitutions (Rms:s) is apparently higher (4.2, 878/208) than that observed for cancer-associated somatic mutations in nuclear DNA (generally around 2:1 to 3:1 across tumor types) (Greenman et al., 2007; Nik-Zainal et al., 2012a). At face value, this again could be interpreted as evidence for positive selection. However, as described above, the somatic mtDNA mutational signature shows extreme strand asymmetry and the same mutational signature has been operative in the germline over evolutionary time. Thus, the dominant mutational signature has already acted on potentially synonymous sites in the mitochondrial genome (Figure 4A), meaning that any new somatic changes are much less likely to be silent. In keeping with this, a dN/dS ratio (See ‘Materials and Methods’) calculated taking into account both the mutational signature and the mtDNA codon usage revealed that missense mutations accumulate at a frequency very close to that expected under neutrality (dN/dS = 1.21; 95% confidence interval, 1.015–1.434; p = 0.031). This indicates that despite the apparent high ratio of missense to silent mutations, the vast majority of mtDNA mutations are passengers with no convincing evidence suggesting the existence of driver mitochondrial DNA mutations. Additional gene-by-gene analysis further revealed that no single gene had a higher than expected rate of missense or nonsense mutations (Supplementary file 4).

For nonsense substitutions and frameshift indels, we observe a somewhat different picture. Taking into account the mutation signature and amino acid composition of the mitochondrial genome, the overall ratio of nonsense mutations to silent mutations is exactly that expected by chance (dNonsense/dS = 1.004; 95% confidence interval, 0.699–1.443; p = 0.98). However, while missense and silent substitutions exhibited equivalent variant allele fractions (average VAFs; 40.1% and 40.9%, respectively; p = 0.8), nonsense substitutions presented significantly lower VAFs (average 26.4%; p = 6x10⁻⁵), as did frameshift indels (average 25.0%; p = 2 × 10⁻³; Figure 5A). Taken together, these data suggest that nonsense mutations occur at the expected rate given the underlying mutational process. However, while silent and missense substitutions frequently achieve high allele fractions in tumor cells due to the effects of random drift, there are significantly greater constraints on mitochondrial genomes carrying protein-inactivating mutations. The inference here is that cancer cells carrying such deleterious mutations at or near homoplasmy are at a selective disadvantage and hence do not contribute to clonal expansions, underlining the importance of functional mitochondria to cancer cells. The extent of such disadvantage may vary according to tumor type: for example colorectal cancers show less negative selection compared to breast cancers (p = 0.028; Figure 5—figure supplement 2).

Figure 5 with 3 supplements see all

Download asset Open asset

Selection and mutational process for mtDNA somatic substitutions.

(A) Truncating mutations (nonsense substitutions and frame-shifting (FS) coding indels) present significantly lower VAF. (B) Change of VAF of mtDNA somatic mutation between primary and metastatic (or late) cancer tissues. (C) Mutational signature for mtDNA across various tumor types. None of the three highlighted mechanisms or nuclear DNA double-strand breaks repair mechanism (*BRCA*) match with the mtDNA mutational signature. * Only substitutions in protein-coding genes considered. (D) A proposed model of mtDNA mutational process.

https://doi.org/10.7554/eLife.02935.017

We found 171 mtDNA substitutions in mitochondrial tRNA sequences, which are very similar to the expected number (168.2, p = 0.82) from the mutational signature. Interestingly, none of the substitutions was located in the trinucleotide anticodon site of the tRNA (expected number = 7.6, p = 0.006). This suggests that mutations in tRNA anticodons confer a similar selective disadvantage as protein-truncating mutations, presumably because such mutations would lead to systematic erroneous aminoacylation of nascent proteins during translation of the relevant codon.

Next, we assessed whether any specific somatic mutations showed evidence of positive selection. Out of the 1907 somatic substitutions, 16 (0.8%) overlapped with known disease-associated mtDNA mutations, such as mutations frequently detected in MELAS (Mitochondrial Encephalomyopathy, lactic acidosis, and stroke-like episodes) and LHON (Leber hereditary optic neuropathy) (Supplementary file 2). In addition, ten mutations within mitochondrial protein-coding, tRNA and rRNA genes showed significantly higher recurrent rate than expected from background mutational signature (Supplementary file 5). However, it remains unclear whether this high recurrence reflects positive selection, because any factors not included in our background model of the mutational process, such as local mutation hotspots, could also explain a mild excess of mutations at a given nucleotide.

mtDNA mutations across tumor Evolution

We investigated whether somatic mtDNA mutations are more likely to become homoplasmic later in tumor evolution by assessing paired cancer samples, either primary and metastasis (breast, colorectal, and prostate) or primary and relapse (myeloma) (Figure 5B and Supplementary file 1). As mentioned earlier, 73 late (metastasis or relapse) cancer samples were sequenced in addition to the primary tissues. Among the mtDNA mutations identified in either of the paired cancer samples, a number of different patterns were observed. There were mutations at high VAF in the primary not found in the metastasis (n = 49); mutations in the metastasis not found in the primary (n = 49); and shared mutations (n = 71) at high or low VAF, sometimes with evidence for drift (VAF difference >0.2) between the two samples (n = 25). These data, particularly the mutations found in the metastasis only, suggest that mitochondrial mutations can occur throughout the time course of tumor evolution, and still drift to homoplasmy with appreciable frequency, as suggested previously (Coller et al., 2001). To assess the plausibility of this conclusion, we modeled the dynamics of mtDNA mutations based on a few simplifying assumptions (See ‘Materials and Methods’, Evolutionary dynamics of neutral mitochondrial mutations). We find that the expected number of neutral mitochondrial mutations drifting to homoplasmy increases linearly with mutation rate and number of cell divisions. Based on a mutation rate of 10⁻⁷/base-pair/generation (Coller et al., 2001; Hudson and Chinnery, 2006), this leads to an average ∼1 homoplasmic mutation for every 1000 cell generations.

Origins of mtDNA somatic mutations

We also explored whether the mutational forces that are so critical to shaping the nuclear genome during tumor evolution could affect the mitochondrial genome. In cancers associated with exogenous mutagens, such as tobacco-associated lung cancer and ultraviolet light-associated melanomas, we found no evidence of the mutational signatures characteristic of these carcinogens among the mtDNA mutations (Figure 5C, Figure 5—figure supplement 3). Moreover, BRCA1 and BRCA2 mutations showed no evident influence on mitochondrial genomes in breast cancer (Figure 5C), in contrast to their effects on nuclear genomes exhibiting an even distribution of mutations across all trinucleotide contexts (Nik-Zainal et al., 2012a; Alexandrov et al., 2013). Taken together, it appears that the primary mtDNA mutational process is endogenous to mitochondria and is very different to those operating in nuclear DNA. It is surprising that the endogenous mutational process has far greater impact than any external forces, as the physicochemical interactions of ultraviolet light or the chemicals in cigarette smoke with DNA should be similar in both genomes. The simulations described above suggest the major explanation to be that the endogenous mutation rate is several orders of magnitude greater than that expected for exogenous carcinogens, thus swamping any signal.

Discussion

In theory, there are two potential sources of the mtDNA variants we observed in cancer tissues: (1) somatically acquired, or de novo, mutations accumulated during the cancer clone's lineage of cell divisions from the fertilized egg or (2) low-level heteroplasmic mtDNA present in the oocyte (therefore maternally inherited) amplified in cancer but lost from normal tissue by random drift (He et al., 2010; Freyer et al., 2012; Payne et al., 2013). We believe the majority of the variants we find are genuinely acquired somatically. First, of the 45 pairs of somatic mutations phased together on the same copy of the mtDNA genome, at least 33 (73.3%) showed a clear sub-clonal relationship and therefore their occurrence is separated in time, or apparently somatic. Secondly, 63.4% of our substitutions were not previously reported as germline polymorphisms. This is a much higher rate than reported for equivalent analyses on heteroplasmic variants in non-cancer samples (8/37; 21.6%) (Li et al., 2010), although methodological differences may somewhat contribute to this apparent difference (Goto et al., 2011; Avital et al., 2012). Thirdly, if the variants were due to inherited low-level heteroplasmy, we would not expect to see such variation across tissue types, since all tissue types derive from the fertilized egg. It is difficult to distinguish whether the variants we observe occur before or after the initiating driver mutations that herald tumorigenesis, but our analysis of paired samples does suggest that they can occur both early and late. Given the homogeneity of the mutational signature across tumor types and its inferred resemblance to the germline mtDNA mutational process, we would hypothesize that new mutations occur at a fairly constant and high rate per mitochondrial genome replication throughout all cell divisions.

On the basis of the mutational signature observed here, somatic substitutions are unlikely to be attributable to reactive oxygen species (ROS), as previous reports have suggested (Polyak et al., 1998; Larman et al., 2012). Guanine oxidation by ROS predominantly causes G:C > T:A transversion (Thilly, 2003; Delaney et al., 2012), which constitute only 4.0% of the mutations in our data (Figure 5C). Instead, we propose three replication-coupled mechanisms that can explain the strand asymmetric C_H > T_H and T_L > C_L mutational signature and define a model of the mtDNA mutational process (Figure 5D). First, the parent H strand, displaced and single-stranded during mtDNA replication (Holt and Reyes, 2012), could be more prone to cytosine deamination (generating C_H > T_H) and/or adenine deamination (Lindahl, 1993; Saccone et al., 1999; Faith and Pollock, 2003) (generating T_L > C_L). Secondly, endogenous mtDNA polymerase (POLG) replication errors (Zheng et al., 2006) (which show the pattern of C > T and A > G substitutions) could be preferentially generated on the leading strand (Pavlov et al., 2002). Thirdly, there may be differences between the efficiency of repair between the leading and lagging strand (Pavlov et al., 2003). Further, the mutation pattern reported here is consistent with the hypothesized bidirectional initiation of mtDNA genome replication (Yasukawa et al., 2005, 2006; Holt and Reyes, 2012).

It appears that most of the mtDNA missense mutations we observe become fixed in tumor progenitor cells without distinct physiological advantage. All the statistical testing performed in this study—variant allele fraction comparison across different categories of somatic mutations, number of recurrent mutations, and dN/dS ratio—suggest that mtDNA somatic substitutions accumulate largely neutrally. This is not different from previous observations in nuclear genomes: of the thousands of somatic mutations found in a cancer genome, many fewer than a hundred are believed to confer a selective advantage to the cancer cell (Stratton et al., 2009). In contrast, protein-truncating mutations showed evidence of negative selection, at the level of constraints on the allele fraction achieved. The implication of this is that the inactivating mutations occur at an appreciable rate, but the fraction of mitochondrial genomes per cell carrying these variants cannot increase beyond a certain limit without impairing the selective fitness of that cell. Having a sizable number of mitochondria with fully intact proteome remains critical to the fitness of a cancer cell.

Share this article

Cite this article

Mitochondrial somatic substitutions identified from 1675 Tumor–Normal pairs.

mtDNA somatic substitutions of human cancer.

Replicative strand bias for mtDNA somatic substitutions.

Mutational signature similar to processes shaping human mtDNA sequence over evolutionary time.

Selection and mutational process for mtDNA somatic substitutions.

Author details

Young Seok Ju

Contribution

Competing interests

Ludmil B Alexandrov

Contribution

Competing interests

Moritz Gerstung

Contribution

Competing interests

Inigo Martincorena

Contribution

Competing interests

Serena Nik-Zainal

Contribution

Competing interests

Manasa Ramakrishna

Contribution

Competing interests

Helen R Davies

Contribution

Competing interests

Elli Papaemmanuil

Contribution

Competing interests

Gunes Gundem

Contribution

Competing interests

Adam Shlien

Contribution

Competing interests

Niccolo Bolli

Contribution

Competing interests

Sam Behjati

Contribution

Competing interests

Patrick S Tarpey

Contribution

Competing interests

Jyoti Nangalia

Contribution

Competing interests

Charles E Massie

Contribution

Competing interests

Adam P Butler

Contribution

Competing interests

Jon W Teague

Contribution

Competing interests

George S Vassiliou

Contribution

Competing interests

Anthony R Green

Contribution

Competing interests

Ming-Qing Du

Contribution

Competing interests

Ashwin Unnikrishnan

Contribution

Competing interests

John E Pimanda

Contribution

Competing interests

Bin Tean Teh

Contribution

Competing interests

Nikhil Munshi

Contribution

Competing interests