DNA replication errors that persist as mismatch mutations make up the molecular fingerprint of mismatch repair (MMR)-deficient tumors and convey them with resistance to standard therapy. Using whole-genome and whole-exome sequencing, we here confirm an MMR-deficient mutation signature that is distinct from other tumor genomes, but surprisingly similar to germ-line DNA, indicating that a substantial fraction of human genetic variation arises through mutations escaping MMR. Moreover, we identify a large set of recurrent indels that may serve to detect microsatellite instability (MSI). Indeed, using endometrial tumors with immunohistochemically proven MMR deficiency, we optimize a novel marker set capable of detecting MSI and show it to have greater specificity and selectivity than standard MSI tests. Additionally, we show that recurrent indels are enriched for the ‘DNA double-strand break repair by homologous recombination’ pathway. Consequently, DSB repair is reduced in MMR-deficient tumors, triggering a dose-dependent sensitivity of MMR-deficient tumor cultures to DSB inducers.https://doi.org/10.7554/eLife.02725.001
Before a cell divides, it must first copy all of its genetic material. Any mistakes that are made during this process are called mutations. Mutations can give rise to new traits but are mostly harmful to the cells, or cause cancer; therefore, cells have evolved tools that can efficiently spot these mistakes and repair them. One of the main tools is called mismatch repair (MMR).
Defects in the cell's mismatch repair tools can wreak havoc as this allows many mutations to accumulate. Zhao et al. looked at the genomes of tumors where mismatch repair was not working properly to see what makes these ‘MMR-deficient tumors’ different from other tumors. This revealed that MMR-deficient tumors have similar patterns of mutations to those seen in egg and sperm cells. This was unexpected and suggests that mutations that are not corrected by mismatch repair are an important source of the genetic differences found between different humans, and between humans and their ancestors.
Identifying cancerous tumors that are MMR-deficient is vital, as these tumors tend not to respond to commonly used cancer treatments. However, current clinical methods to identify MMR-deficient tumors often fail or produce results that are difficult to interpret. MMR-deficient tumors commonly contain mutations called indels, where short fragments of DNA are inserted or deleted into longer DNA sequences. Zhao et al. have found 59 indels that can be used to detect MMR-deficient tumors, where each indel had been identified in several tumors taken from different tissues. This new approach allowed MMR-deficiency to be identified in several types of tumor, including colon and ovarian cancers, with greater sensitivity and accuracy than the existing methods.
Zhao et al. also found that the indels in MMR-deficient tumors reduce the ability of the tumors to repair a type of DNA damage called double-strand breaks. In these, both strands of DNA that make up the double helix are broken and the DNA chain is severed. As this kind of damage is very harmful to a cell, making more double-strand breaks could therefore form part of a more effective treatment against MMR-deficient tumors; further research is needed to investigate this possibility.https://doi.org/10.7554/eLife.02725.002
MMR-deficiency represents a well-established cause of Lynch syndrome, which is an autosomal dominantly inherited disorder of cancer susceptibility triggered by loss-of-function mutations in MMR genes (MLH1, MSH2, or MSH6) (Jiricny, 2006). Lynch syndrome is responsible for 2–5% of endometrial (EM) or colorectal (CRC) tumors. Additionally, epigenetic silencing of MLH1 contributes to another 15–28% of these tumors (Parsons et al., 2012; Peltomaki, 2014). Deficiency of the MMR machinery leads to DNA replication errors in the tumor tissue, but not the normal surrounding tissue. In particular, errors often accumulate as indel mutations in mono- and di-nucleotide repeats—a phenomenon referred to as microsatellite instability (MSI) (Pinol et al., 2005).
MMR-deficient tumors exhibit a different prognosis and therapeutic outcome after standard chemotherapy (Ng and Schrag, 2010). Untreated CRC patients with MMR-deficient tumors have a modestly better prognosis, but do not seem to benefit from 5-fluorouracil-based adjuvant chemotherapy, which is the first-choice chemotherapy for CRC. In particular, in MMR-deficient tumors, mismatches induced by 5-fluorouracil are tolerated, leading to failure to induce cell death (Fischer et al., 2007). MMR-deficient tumors are also resistant to cisplatin and carboplatin, which are frequently used chemotherapies in EM cancer (Hewish et al., 2010). Furthermore, MMR-deficient tumors can be resistant to targeted therapies, because they acquire secondary mutations in genes that activate alternative or downstream signaling pathways (e.g., PIK3CA). Another possibility is that epigenetic silencing of MLH1 coincides with particular mutations, such as the BRAF V600E mutation (Donehower et al., 2013), which represents an established negative predictor of response to targeted anti-EGFR therapies in advanced CRC (Richman et al., 2009).
Efforts to individualize the treatment of MMR-deficient tumors have focused on identifying synthetic lethal interactions within the MMR pathway. In particular, increased oxidative damage (by methotrexate exposure or PINK1 silencing [Martin et al., 2011]) and interference with the base excision repair (BER) pathway (by DNA polymerase γ or β inhibition [Martin et al., 2010]) can sensitize MMR-deficient tumors. Until now, these findings failed, however, to translate into clinically effective treatment options. Alternatively, as highlighted above, secondary mutations occurring because of MMR-deficiency may also critically determine therapeutic efficacy (Dorard et al., 2011). These secondary mutation spectra have, however, been poorly characterized, mainly because studies often focused at one or a few reporter loci, or exclusively on mutations at known hotspot sequences. More recently, the first whole-exome sequencing of MMR-deficient tumors was performed, highlighting the clearly distinct mutational landscape of these tumors (TCGA, 2012), whereas at the whole-genome level, Kim et al. (2013) revealed overrepresentation of MSI in euchromatic and intronic regions compared to heterochromatic and intergenic regions.
To generate a more comprehensive picture of the mutation spectra arising in MMR-deficient tumors, and in particular, to interpret their clinical relevance with respect to diagnostically assessing MSI and therapeutically targeting MMR-deficient tumors, we sequenced another comprehensive set of MMR-deficient tumors. In particular, whole-genome and whole-exome sequencing was applied to 5 and 28 tumor–normal pairs, of which respectively 3 and 22 were MMR-deficient.
To select MMR-deficient tumors for whole-genome sequencing, standard diagnostic tests were used, including immunohistochemistry of MMR proteins (MLH1, MSH2, and MSH6), assessment of MSI using the extended Bethesda panel and methylation profiling of the MLH1 promoter. Three chemo-naive EM tumors, either deficient for MLH1, MSH2, or MSH6 and thus covering the full spectrum of MMR-deficiency, as well as two MMR-proficient EM tumors were selected (Table 1). Different sequencing technologies were leveraged to avoid potential technology biases in assessing mutation patterns in MMR-deficient tumor genomes, that is, Complete Genomics (CG) and Illumina short-read sequencing. We obtained high coverage sequencing data (30–120x) for tumor and matched normal samples (Table 1). Application of a standard annotation and filtering pipeline, as previously described (Reumers et al., 2011), revealed that each MMR-deficient tumor exhibited a clear hypermutator phenotype, containing on average 50 times more novel somatic mutations than MMR-proficient tumors (Figure 1A, Figure 1—source data 1, Figure 1—source data 2). Orthogonal technologies validated 98% of substitutions and 88% of indels in the three MMR-deficient tumors, while only 62% of substitutions and 11% of indels were validated in the two MMR-proficient tumors (Figure 1—source data 3). This difference in validation rates between MMR-deficient and MMR-proficient tumors is probably due to the fact that in normal genomes, as well as MMR-proficient tumor genomes, the number of true-positive indels is low in comparison to the number of false-positive indels. However, in MMR-deficient tumors, due to their specific hypermutator phenotype, the number of true-positive indels is vastly increased, thereby rendering the false positive fraction proportionally much smaller. Notably, all tumors were negative for POLE mutations (Kandoth et al., 2013; Palles et al., 2013).
Studies in model organisms and cell lines have shown that somatic mutations arising due to MMR-deficiency mostly involve indels affecting microsatellite sequences (di- to hexa-nucleotide repeats with a minimal length of six bases and at least two repeat units) and homopolymers (mononucleotide repeats with a minimal length of six bases) (Ellegren, 2004). We observed that indels were indeed more frequent than single basepair substitutions in all three MMR-deficient tumors (Figure 1A). Indels predominantly affected homopolymers (40-fold enrichment over expected by chance) and to a lesser extent also microsatellites (2.3-fold enrichment; Figure 1B, Figure 1—figure supplement 1). Substitutions were only slightly enriched in homopolymers and microsatellites (3- and 1.5-fold enrichment, respectively; Figure 1B). Mutations occurred as frequently in introns as in the rest of the genome, but were clearly less frequent in exons (excluding 5′ and 3′ untranslated regions [UTRs]). This decrease was caused by indels that were 91% less frequent in exons (Figure 1C,D). Correction for the number of homopolymers, the length of homopolymers or their basepair composition in exons versus other regions weakened this effect, but failed to completely alleviate it (Figure 1E, Figure 1—figure supplement 2). Since 92% of exonic indels resulted in frameshift mutations, which have a greater functional impact than substitutions (Montgomery et al., 2013), this suggests that exonic indels are prone to negative clonal selection during tumorigenesis.
There is extraordinary variation in the frequency and spectrum of somatic mutations affecting different cancers, shedding light on the underlying mutational processes and disease etiology of these tumors (Wheeler and Whang, 2013). When assessing somatic substitutions in MMR-deficient tumors, we observed that 74% of all substitutions represent transitions (i.e., purine-to-purine or pyrimidine-to-pyrimidine substitutions), which is similar to the patterns observed in the matched germ-line of these tumors (Figure 2A). This is surprising, since tumor genomes generally display patterns distinct from those found in the germ-line. Indeed, when extending these analyses to other hypermutators, that is, UV-light-induced melanoma (Pleasance et al., 2010), tobacco smoke-induced small cell lung adenocarcinoma (SCLC) (Pleasance et al., 2010), as well as breast tumors deficient for BRCA1 (Nik-Zainal et al., 2012) or EM tumors proficient for MMR, patterns were clearly dissimilar from the matched germ-line (Figure 2A). On the other hand, de novo germ-line substitutions identified through whole-genome sequencing of parent–offspring trios (Campbell et al., 2012; Kong et al., 2012), common genetic variation as catalogued by the 1000 Genomes Project (1 KG) (1000 Genomes Project Consortium, 2012), and substitutions that occurred in the human lineage during the divergence of humans and chimpanzees correlated strongly to the MMR-deficient tumor genome (Figure 2A). Given these remarkable parallels, we hypothesized that MMR-deficient genomes hypermutate in a way that mirrors the processes driving genetic variation on a population level, albeit somatically and on a shorter time scale.
To further assess the similarities between MMR-deficient mutation patterns and germ-line genetic variability, we analyzed small-scale and large-scale context-dependent effects on substitution patterns. At the small-scale level, when assessing the effect of flanking nucleotides on substitution frequencies, the patterns of all four sets of germ-line genetic variants were highly correlated to MMR-deficient tumors (average R2 = 0.77), but less to the four other cancer genomes (average R2 = 0.45; Figure 2B,C), providing further support for our hypothesis. On a large-scale context, the number of intergenic substitutions per 1 Mb in germ-line genetic variability databases was similarly highly correlated to those in MMR-deficient genomes (average R2 = 0.67), but not to those in other cancer genomes (average R2 = 0.42; Figure 2D). This suggests that also on a large scale, substitutions are comparably distributed in MMR-deficient tumor genomes as in germ-line genomes. At the large-scale level, nine genomic features are linked with genetic variability (Hodgkinson and Eyre-Walker, 2011). Each of these features correlated significantly with substitution frequencies in MMR-deficient tumors and germ-line genomes. Linear modeling revealed that six of these independently correlated with substitution rates in MMR-deficient tumors as well as with germ-line substitutions (Figure 2E). Overall, the types as well as the narrow and broad context-dependencies of substitutions thus appear to be largely shared between germ-line and MMR-deficient genomes, suggesting that a considerable fraction of human genetic diversity arises through mismatches escaping MMR.
Since MMR-proficient tumors carried 50 times fewer substitutions and displayed more disparate substitution patterns than MMR-proficient tumors, the observed correlations can almost exclusively be attributed to the MMR-deficient phenotype of these tumors. As such, these correlations also provide novel insights into the functioning of the MMR system. First, replication timing correlated with transitions but not transversions in all three MMR-deficient tumors (Figure 2F). This contrasts with the increase in late S phase transversions observed in all other genomes studied here (Figure 2F), as well as in lymphoblastoid cell lines (Koren et al., 2012). The increase in MMR-proficient but not MMR-deficient cells suggests a reduced fidelity of DNA repair in late S phase, leading to an increase in transversions. Potential causes include a decreased MMR-activity in late S phase, or a longer window of time available for the repair of early vs late transversions in MMR-proficient cells (Hombauer et al., 2011). In contrast, DNA repair fidelity in MMR-deficient cells is invariably low and therefore not affected by replication time. Secondly, a positive association with simple repeat content was noted. Indeed, a 1.6-fold increase in substitutions at bases immediately flanking simple repeats was noted, with a threefold increase next to homopolymers and a 1.3-fold increase next to microsatellites (Figure 2G). These substitutions for the vast majority converted the base flanking the repeat, to the base constituting the repeat (Figure 2G). They are thus probably the result of polymerase slippage events, following a mechanism akin to the previously described bacterial dislocation mutagenesis (Kunkel and Soni, 1988). Thirdly, G:C>A:T transitions in CpG sites strongly depend on CpG content, but are inversely correlated with the fraction of CpG islands (Figure 2E). Spontaneous, replication-independent deaminations of methyl-C to T underlie such transitions. Here, the much larger increase in CG>TG transitions observed in MMR-deficient compared to MMR-proficient tumors (3449 vs 145) demonstrates that replication-independent MMR, recently described at the molecular level (Shell et al., 2007; Pena-Diaz et al., 2012), is also involved in deamination repair in vivo (Chen et al., 2014). Finally, overall substitution frequencies correlated inversely with CpG islands. Indeed, irrespective of dinucleotide context, bases outside CpG islands were nearly two times more likely to undergo mutation than those inside CpG islands (Figure 2H). As CpG islands are generally unmethylated, DNA methylation thus appears to contribute to the mutagenic process. Explanations for this observation include the polymerase stalling that DNA methylation may induce (Song et al., 2012), and the repair of spontaneously deaminated methyl-Cs, which is error-prone and thus mutagenic on its own (Chen et al., 2014).
We also evaluated somatic indel patterns in MMR-deficient tumors. As expected, since the majority of indels was located in homopolymers, a strong correlation between simple repeats and indel frequency was observed (Figure 3A). Indels were also predominantly 1 or 2 bps in length (Figure 3B). Although the minority of homopolymers consists of C or G bases (7%), an even smaller fraction of indels affected C:G homopolymers (1.9%; Figure 3C), suggesting that C:G homopolymers are less likely to accumulate indels. As observed in other MMR-deficient tumors and also in MMR-deficient Caenorhabditis elegans (Denver et al., 2005; Kim et al., 2013), deletions were remarkably more frequent than insertions (81% vs 19%), confirming that DNA polymerases are more prone to remove than to add a base during DNA synthesis.
Next, we selected 13 additional MMR-deficient tumors, as well as four MMR-proficient tumors, collected from different tissues (i.e., endometrium, colon, and ovarium). Of these, six represented primary tumor cultures of low passage, which we preferred over cell lines, because the latter due to their hypermutator phenotype are no longer representative of the original tumor (Figure 4—source data 2). Exome-sequencing of tumor and matched germ-line DNA at an average coverage of 44x revealed that each MMR-deficient tumor contained ∼2015 somatic events vs 39 for MMR-proficient tumors (52-fold increase; Figure 4A, Figure 4—source data 1, Figure 4—source data 2). Validation rates for substitutions and indels were respectively 87% and 86%. Clustering analysis of all 13 MMR-deficient tumors for the genes affected by either a somatic substitution or indel in the coding regions revealed no obvious subgroups in terms of cancer of origin or between primary tumors and cell cultures (Figure 4—figure supplement 1). Presumably, because of negative clonal selection and differences in homopolymer content in exons vs other genomic regions, exonic substitutions outnumbered indels (Figure 4A, Figure 4—figure supplement 2), similar to what we observed in the MMR-deficient whole-genomes (Figure 1C,D). Only a minority of these indels affected microsatellites, confirming that homopolymers were most frequently affected by indels.
Remarkably, 1.6% of homopolymers was recurrently affected by an indel in the 16 MMR-deficient tumors that underwent whole-genome or exome sequencing (i.e., 2244 out of 29,663 homopolymers were affected at least once, whereas 477 were affected at least twice; Figure 4—figure supplement 3). Furthermore, 34 and 10 homopolymers were affected in ≥6 or ≥8 tumors (Figure 4—source data 3). In contrast, only 55 substitutions were recurrent, three of which were found in ≥2 tumors (i.e., two substitutions affecting KRAS codon 12 and 13 were found in three and four tumors [Tie et al., 2011], whereas a substitution in ZNF648 affected three tumors). When comparing homopolymer content of coding regions vs UTRs, long homopolymers (>10 bps) were more frequent in UTRs than in coding regions (Figure 4B). Because these long homopolymers were also more frequently affected (Figure 4C), the overall indel rate in coding regions was lower than in UTRs (Figure 4D). As a consequence of this difference, recurrent indels also occurred more frequently in UTRs than coding regions (31,438 vs 1337; Figure 4—source data 3). Remarkably, however, recurrent indels were more frequently observed than expected based on indel frequency in short, but not in long homopolymers (Figure 4E, Figure 4—figure supplement 4). This suggests that features other than homopolymer length underlie indel recurrence rates. Positive clonal selection of indels affecting short homopolymers, which are predominant in coding regions, represents a possible explanation. Very similar results were obtained when the analysis was repeated only on the 13 whole-exomes, indicating that exonic mutations identified from whole-genome sequences did not introduce any bias.
The extended Bethesda panel, which consists of eight microsatellite and two homopolymer markers, is currently used to diagnostically assess MSI (Pinol et al., 2005). This panel was historically compiled from a limited set of markers known to be variable. Due to their length and variability, these markers are notoriously difficult to analyze and interpret. As a consequence, the Bethesda panel has reduced sensitivity to detect MSI. In an effort to improve MSI testing, we randomly selected 59 recurrent indels affecting ≥6 out of 16 tumors; 50 markers were in 5′ or 3′UTRs and 9 were in coding regions (Figure 5—source data 1). Furthermore, each of the markers was detected in both MMR-deficient EM and CRC. To facilitate high-throughput genotyping, the maximal length of affected homopolymers was restricted to 12 bps. First, we applied these 59 markers to a discovery set of 236 EM tumors for which MMR immunohistochemistry (IHC) data were available. This allowed us to determine three positive markers as the threshold with the best Matthew correlation coefficient to detect MMR-deficiency based on IHC and thus to define MSI (Figure 5A,B). At this threshold, our markers detected 40 out of 41 tumors MMR-deficient on IHC (sensitivity ∼98%), while only 1 out of 184 MMR-normal tumors on IHC were identified as MSI (specificity > 99%). Notably, the latter patient had a familial history of cancer within the Lynch spectrum, suggesting that the tumor indeed exhibited MSI. Secondly, after having optimized the marker threshold, a head-to-head comparison against Bethesda panel was performed in 114 independent EM tumors as a validation. When observing discordances, we assessed MMR-deficiency using IHC to address which of both MSI panels was correct. Briefly, each MSI tumor on Bethesda (>2 markers positive) was also MSI with the 59-marker panel (Figure 5C). However, 12 tumors were positive in the 59-marker panel, but negative in Bethesda. IHC on the nine discordant tumors for which a paraffin block was available confirmed that each of them was MMR-deficient either for MLH1 or MSH2, indicating that the 59-marker panel has a higher sensitivity compared to Bethesda.
Likewise, we assessed MSI in 126 stage II or III CRC tumors. Each of the 28 MSI tumors on Bethesda was also positive with our 59-marker panel. In contrast, one tumor was MSI-positive in the 59-marker panel but not in the Bethesda panel (Figure 5D). This tumor contained a V600E BRAF mutation and was MLH1 hypermethylated, indicating that it was MMR-deficient and that our panel was also more sensitive for CRC (Deng et al., 2004). Finally, we also assessed whether our 59-marker panel can detect MSI in other cancer types. In a limited set of ovarian tumors and leukemias, we indeed correctly identified MSI in each of the samples tested (Figure 5—source data 2).
Since we observed clear signs of clonal indel selection in MMR-deficient tumors, we assessed whether specific pathways were enriched for indels. We focused on frameshift indels in exons and exon/intron boundaries as they represent loss-of-function mutations (Ham et al., 2006), and thus have a less ambiguous functional impact than indels in UTRs. On average, each MMR-deficient tumor contained 472 such indels, 59 of which were recurrent indels. Pathway analyses using IPA of all genes affected by a somatic indel, excluding the core MMR genes, ranked the ‘Role of BRCA1 in DNA damage response’ as the top enriched pathway. IPA analysis of genes affected by recurrent indels moreover revealed that the ‘Double-strand break repair by homologous recombination’ pathway (DSBR by HR) ranked top (Table 2). We also performed pathway analyses using the more advanced GenomeMuSiC, which takes background mutation rates into account and assigns weights depending on the number of tumors and genes affected in a given pathway. GenomeMuSiC analyses based on either the independently assembled Reactome or BioCarta pathway databases, ranked respectively the ‘ATR/BRCA pathway and the DNA repair’ pathway first, with the more specific ‘Homologous recombination repair’ pathway ranking third in the latter (Table 2). Based on an expert curated DNA repair database (DNARepairDB), ‘Homologous recombination’ represented the only DNA repair pathway that was significantly enriched in indels. Since each pathway database differed with respect to the genes included, we finally compiled a literature-based set of genes with proven involvement in DSBR by HR, allowing us to more accurately estimate that each MMR-deficient tumor on average contained 3.3 ± 0.4 indels in the ‘DSBR by HR’ pathway (Table 2, Table 2—source data 1). Notably, none of the top-ranking pathways for any of the databases contained significantly more homopolymers in their genes than expected.
In an effort to replicate these findings, we analyzed mutation data of 27 CRC and 65 EM tumors with MSI sequenced by The Cancer Genome Atlas (TCGA, 2012; Kandoth et al., 2013). Although most of these tumors were sequenced at low coverage depth, we identified 2183 and 3138 mutated genes from respectively the CRC and EM tumor data sets. IPA analysis confirmed that the Role of BRCA1 in DNA damage response was again amongst the top enriched pathways for each of the data sets. The corresponding p-values were 9.06E−3 and 2.97E−4, although only the latter survived multiple testing correction (p = 0.022; Table 2—source data 1). As raw data sets were not accessible, the more sensitive GenomeMuSiC could not be used.
Homozygous mutations affecting genes in the DSBR by HR pathway cause DSB repair defects reminiscent of BRCA1 or BRCA2 loss, a phenotypic feature dubbed BRCAness (McCabe et al., 2006). Having established that MMR-deficient tumors are enriched in heterozygous frameshift mutations in the DSBR by HR pathway, we investigated the functional impact of these events. First, we confirmed that indels affecting the DSBR by HR pathway were located in the major tumor subclone (Table 2, Table 2—source data 1). Then, we analyzed HR in seven MMR-deficient and four MMR-proficient patient-derived primary tumor cultures. We exposed these cultures to the PARP inhibitor olaparib, which induces DSBs upon DNA replication through single-strand break repair inhibition, and to mitomycin C, which induces DSBs through DNA cross-links and replication fork collapse (Bunting et al., 2012). We then quantified the relative number of cells with γH2AX- and RAD51-positive foci, respectively, as a measure of induced DSBs and ongoing HR. Exposure to olaparib or mitomycin C triggered an increase in γH2AX-foci in all tumor cultures, regardless of MMR status. In contrast, although RAD51 foci formation was evident in MMR-deficient and MMR-proficient cultures, the increase was far less pronounced in MMR-deficient cultures (Figure 6A,B), and this for both olaparib (p = 0.021) and mitomycin C (p = 0.006) exposure. The reduction in RAD51 foci could not be ascribed to differences in RAD51 protein expression or differences in cell cycle between MMR-deficient and -proficient cells, as these were similar between both sets of cultures, under both treated and untreated conditions (Figure 6—figure supplements 1–3). Since RAD51 foci are completely absent upon PARP inhibition in cells with homozygous loss of BRCA1, but not affected in heterozygous mutation carriers (Farmer et al., 2005), these ex vivo data suggest that the accumulation of indels in MMR-deficient tumors gradually impairs the DSBR by HR pathway to a level that is intermediate to that of cells heterozygous- and homozygous-deficient for BRCA1.
As MMR-deficient tumors are compromised in their DSBR by HR activity, we wondered whether these tumors, similar to BRCA1-deficient tumors (Farmer et al., 2005), are more sensitive to agents that induce DSBs. First, since PARP inhibitors are already used in clinical practice, all seven MMR-deficient and four MMR-proficient cultures were dose-dependently exposed to olaparib. This revealed that MMR-deficient cultures exhibited a dose-dependent decrease in proliferation upon exposure to olaparib, whereas MMR-proficient cultures were only affected at higher concentrations. Likewise, cell cytotoxicity assays revealed a dose-dependent sensitivity of MMR-deficient cells to olaparib that was more pronounced than in MMR-proficient cells (50% growth inhibition [GI50]) was reached at 26 µM vs 129 µM, respectively, p = 0.0064 (Figure 7A,B, Figure 7—figure supplement 1). Other DSB-inducing compounds such as mitomycin C or ionizing radiation similarly proved more detrimental for MMR-deficient than MMR-proficient cells (Figure 7B). In contrast, cytotoxicities of other chemotherapeutic compounds such as paclitaxel were comparable between both groups.
Finally, in order to more accurately measure the level of HR-deficiency in MMR-deficient tumors, we assessed the level of knock-down of BRCA1, BRCA2, and ATR needed to achieve an olaparib sensitivity similar to that observed in MMR-deficient cells, that is, a GI50 of 26 µM. BRCA1, BRCA2, or ATR expression was dose-dependently reduced using siRNAs in the MMR- and HR-proficient cell line, MCF7. A growth inhibition of 50% was reached in MCF7 cells when applying 5.9 nM ATR, 0.88 nM BRCA1 or 0.41 nM BRCA2 siRNA, corresponding respectively to a reduction in expression of 69.5 ± 1.1%, 76.1 ± 4.4%, and 80.0 ± 2.4% (Figure 7C). These data thus suggest that the loss of DSBR by HR activity in MMR-deficient tumors corresponds to a loss of about 75–80% BRCA1 or BRCA2 expression.
Here, we surveyed whole-genomes of MMR-deficient tumors to provide a comprehensive picture of the mutations associated with human MMR-deficiency. With respect to somatic substitutions, we observed that the majority represented transitions and not transversions, and that adjacent nucleotides and various genomic features had an important context-dependent effect on determining which nucleotides were affected. Remarkably, the observed substitution pattern, in particular how it was impacted by small and large-scale contexts, was very similar to that in the germ-line at different time scales: for germ-line substitutions as they currently arise (de novo), as they have accumulated in the human population or as they served as a substrate for human-chimpanzee divergence (Hodgkinson and Eyre-Walker, 2011). Our observations thus suggest that, similar to bacterial populations and other lower organisms (Saint-Ruf and Matic, 2006), incomplete mismatch repair in humans contributes significantly to genetic variability and probably also to natural selection through genetic adaptation. Additionally, our data provide fundamental insights into the function of the MMR machinery. We observed, for instance, a higher number of substitutions in methylated CpG sequences, implicating MMR in the repair of methylated cytosine deamination and demonstrating that MMR disconnected from the replication fork is also critical to maintain genomic integrity.
At the whole-genome level, ∼80% of somatic mutations represented indels. Although indel detection using high-throughput sequencing is burdened with high false-positive rates, 88.0% of the indels identified here validated favorably using orthogonal technologies. When focusing on the clinical relevance of indel mutation patterns to diagnose MSI, we observed that indels specifically affected homopolymer stretches, which is relevant as the extended Bethesda panel consists of eight microsatellite and only two homopolymer markers and possibly therefore has only limited sensitivity relative to IHC (∼75% for both EM and CRC tumors [Hampel et al., 2005, 2006, 2008]). Our 59-marker panel consisting only of markers in homopolymers was clearly more sensitive than Bethesda, yielding sensitivity rates of 87% relative to IHC. This was not due to the fact that we genotyped more markers than Bethesda, as restricting our panel to 10 markers still resulted in a sensitivity rate of 85% (data not shown). Furthermore, since our panel was based on recurrent mutations present in both CRC and EM, and since 50 out of 59 markers were located in UTRs, which are less likely to drive clonal selection and thus to represent tissue-specific events, it could be used to detect MSI in cancers affecting various tissues. Finally, since all markers were located in homopolymers ≤12 bps in length, they are, in contrast to the 25 or 26 bps markers from Bethesda, compatible with various low- to high-throughput genotyping technologies, thereby greatly facilitating their clinical adoption. For instance, we were able to multiplex all 59 markers in just five PCR amplification reactions compatible with Sequenom MassArray genotyping.
Pathway analyses on all genes affected by exonic indels further revealed that the DSBR by HR pathway was enriched for somatic indels. Although mutations in genes involved in this pathway, such as MRE11A or RAD50, have previously been reported in MMR-deficient tumors, these studies focused on specific mutations in individual genes rather than on pathways, and for this reason could establish that only a fraction of MMR-deficient tumors was affected by mutations in these genes (Miquel et al., 2007). In contrast, our study identified that every MMR-deficient tumors was affected by on average 3.3 somatic indels in the DSBR by HR pathway. Furthermore, although it is well established that cells deficient in BRCA1, BRCA2, Fanconi anemia, or other HR-related genes are hypersensitive to DSB inducers (Murai et al., 2012), as for instance, synthetic lethality in BRCA1- or BRCA2-deficient tumors through PARP inhibition is already approved as therapy in breast and ovarian cancer (Metzger-Filho et al., 2012), data demonstrating sensitivity of MMR-deficient cells to DSB inducers have not been conclusive (Takahashi et al., 2011; Vilar et al., 2011; Park et al., 2013). For instance, although there are some reports highlighting the sensitivity of MSH3-deficient cell lines to DSB inducers, this appeared to occur through a non-canonical MMR pathway, as MLH1 was not involved in this process (Takahashi et al., 2011; Park et al., 2013). Furthermore, the only clinical study set-up so far to explore efficacy of PARP inhibitors as a single-agent therapy in previously treated patients with metastatic CRC stratified by MSI status, was unfortunately delayed due to patient accrual issues.
Our hypothesis-free discovery that DSBR by HR is the top pathway affected by heterozygous loss-of-function mutations in MMR-deficient tumors, both in our own data set and TCGA, also suggests that mutations in DSBR by HR genes converge in an oligogenic model, wherein the number of indels dose-dependently decreases DSBR by HR activity, thereby rendering them gradually more sensitive to DSB inducers. As a result of this double-hit, our ex vivo culture experiments are, however, difficult to compare to experiments relying on genotype-matched cells that have a single hit in the MMR or HR pathway. In addition, since MMR and DSB by HR pathway activities are not characterized in a clinical setting, it is difficult to relate our data to clinical studies assessing the outcome of therapeutics such as cisplatin or 5-fluorouracil, which have potential opposing activities on MMR- and HR-deficient tumors.
Clinical studies are therefore needed to assess whether DSB inducers, such as PARP inhibitors, are indeed also effective in MSI tumors. In particular, since on average 3.3 heterozygous loss-of-function mutations only partially inactivate the DSB repair by HR pathway (∼80% inactivation), it remains to be seen whether, compared to BRCA1 or BRCA2-deficient tumors, in which the HR pathway is completely inactivated, clinically relevant benefits are also achievable in MSI tumors. Possibly, only those MMR-deficient tumors containing large numbers of indels (≥5) in the DSBR by HR pathway will show a significant response. Nevertheless, there is a great clinical need for novel treatment options in MSI tumors. Indeed, although stage II or III CRC tumors with MSI are characterized by a modestly improved prognosis, MSI tumors in the advanced setting are generally associated with a more peritoneal metastasis and a worse overall survival independent of the chemotherapy regimen (Smith et al., 2013; Yoon et al., 2013). Our observations thus clearly warrant novel clinical studies assessing the therapeutic efficacy of DSB inducers in MMR-deficient tumors.
To assess MLH1-, MSH2-, and MSH6-deficiency immunohistochemistry using monoclonal antibodies against MLH1 (clone ES05; DAKO, Heverlee, Belgium), MSH2 (clone G219-1129; BD Pharmagen, Erembodegem, Belgium), and MSH6 (clone EP49; Epitomics, Burlingame, USA) were applied. Absence of nuclear staining in tumor cells and normal staining in the surrounding normal tissue were considered as MMR-deficient. Methylation of the MLH1 promoter was determined using the SALSA MS-MLPA KIT (MRC-Holland, Amsterdam, The Netherlands). PCR reaction fragments covering the Deng C and Deng D regions were separated by capillary gel electrophoresis (ABI 3130; Applied Biosystems, Ghent, Belgium) and quantified using the Genemarker (v1.91) software (Softgenetics). MSI status was detected by the extended Bethesda panel using capillary gel electrophoresis, as described previously (Dietmaier et al., 1997; Boland et al., 1998).
We selected 17 endometrial, three colorectal, and two ovarian tumor–normal pairs for either whole-genome or whole-exome sequencing. Samples were all chemo-naive. DNA was derived from fresh frozen, primary tumors. Matched normal DNA for these 22 samples was extracted from peripheral white blood cells.
Five tumor–normal pairs were selected for whole-genome sequencing. Paired-end sequencing was performed using the Complete Genomics service (CG, Mountain View, California, USA) as described in Drmanac et al. (2010) or by Illumina HiSeq2000. For CG sequencing, reads were initially mapped to the reference genome (hg18) using Complete Genomics' CGAtools. Between 207 and 338 Gb of sequencing data were obtained, resulting in a haploid coverage between 73× and 119×. Approximately, 2.7 × 109 bases were called in each genome, representing ∼95% of the total genome and ∼97% of the exome. Substitutions and indels were called by the variant caller in the CGAtools. On average, 3,132,715 substitutions and 357,153 indels were detected in each genome. The CGAtool (v126.96.36.199) calldiff method was used to detect somatic mutations in the tumor–normal pairs. For Illumina sequencing, 2 × 100 bp paired-end sequencing was performed, yielding 25–30x coverage per sample. Burrows-Wheeler Alignment (BWA) was used to align the raw reads to the reference genome (hg19) (Li and Durbin, 2010). PCR duplicates were removed with Picard MarkDuplicates (v1.32). Base recalibration, local realignment around indels and single nucleotide variant calling were performed using the GenomeAnalysisToolKit (GATK v1.0.4487) (McKenna et al., 2010). Small indels were detected using Dindel (v1.01) (Albers et al., 2011). Substitutions and indels with quality score >Q30 were considered. On average, 3,977,086 substitutions and 837,915 indels were detected in each genome. Somatic mutations were detected by means of intersectBed command of BEDTools (v2.12.0) (Quinlan and Hall, 2010). Raw data for all whole-genomes are available under restricted access in the European Genome-Phenome Archive (EGA) with accession number EGAS00001000182.
Sequence data were annotated using ANNOVAR (v2013Jun21) and the UCSC RefGene annotation track. Germ-line substitutions and indels were eliminated from the list of somatic mutations using the following publicly available datasets: (i) common SNPs in dbSNP (v132) with a minor allele frequency of >1%, (ii) substitutions identified in the November 2010 release of the 1000 Genomes Project, (iii) the Axiom Genotype Data Set containing common SNPs from 1261 HapMap3 individuals in 11 populations, and (iv) variant data identified in 46 HapMap individuals (CG diversity panel). Somatic mutations were validated using Sequenom MassARRAY genotyping, as previously described (Reumers et al., 2011). Details of validation experiments are shown in Figure 1—source data 3. A quality score method to enrich for true somatic mutations by defining a threshold that differentiates false-positive and true-positive variants based on Sequenom validation data was applied to CG genomes and increased the validation rate for substitutions from 93.5%, 71.4%, and 55.6% to 97.7%, 100%, and 73.3% for MMR− 1, MMR+ 1, and MMR+, 2 respectively. Detailed data of all somatic mutations are in Figure 1—source data 1 and Figure 1—source data 2. Copy number status of the sequenced tumors was determined by Illumina CytoSNP-12 chips and analyzed using the ASCAT algorithm (Van Loo et al., 2010). Copy number status of the five whole-genomes was shown as Figure 1—figure supplement 3.
The genome was annotated into the following functional genomic regions: (coding) exonic regions (1.12%), intronic regions (34.01%), 3′ untranslated regions (3′UTR, 0.78%), 5′ untranslated regions (5’UTR, 0.14%), noncoding RNA (ncRNA, 2.81%), upstream genic regions (defined as 1 kb before the start of the gene, 0.58%), downstream genic regions (defined as 1 kb after the end of the gene, 0.58%), and intergenic regions (59.98%).
Overall mutation frequencies were defined as the number of somatic mutations per base (mpb) in a given genomic region. To assess negative selection in the exome, we checked whether (i) there was a lower mutation frequency in the exome relative to the whole-genome, and whether (ii) the frequency of somatic mutations was more prominently decreased in the exome. As homopolymers in exomes have characteristics that differ from those in the rest of the genome in terms of number, base composition and length, we corrected indel frequencies for these confounding factors. We calculated the frequency of affected homopolymers for each genomic location (t: exonic, 5′UTR, 3′UTR, intronic, intergenic, or genomic), for each type of homopolymer (AT or CG composition) and each homopolymer length (6, 7, 8, etc[l]). ATFreqtl = ATaffntl. Next, we calculate the relative increase of observed frequencies relative to the frequency observed at the genome-wide level: ATrFreqtl = ATFreqtl/ATFreqgenomel. The frequency ATrFreqtl was normalized for the number of homopolymers of a given length l, for each genomic location t and for homopolymer composition (ATwrFreqtl = ATrFreqtl × ATntl/∑ ATntl), and further normalized for the number of AT (or GC) homopolymers for each genomic location and homopolymer length (ATnwrFreqtl = ATwrFreqtl × ATntl/(ATntl + CGntl)). All the weighted frequencies are then summed for every genomic location (cFreqt = ∑ ATnwrFreqtl + ∑ CGnwrFreqtl) and divided by the overall summed genomic frequency (rFreq = cFreqt/cFreqgenomic).
The following datasets were used: (i) the 1000 Genomes Project containing common variants with a minor allellic frequency >10%, (ii) all germ-line variants identified in the 3 MMR-deficient tumors sequenced in this study, (iii) de novo mutations from 83 trios as published by Campbell et al. (2012) and Kong et al. (2012), and (iv) a human-chimp divergence set of substitutions as previously described (Stamatoyannopoulos et al., 2009). Somatic mutations identified in other tumor whole-genomes were: (i) BRCA-deficient breast cancer tumors as published by Nik-Zainal et al. (2012), (ii) MMR-proficient endometrial tumors sequenced in this study, (iii) melanoma genomes as published by Pleasance et al. (2010), and (iv) small cell lung cancer (SCLC) as published by Pleasance et al. (2010).
The distance to telomere was defined as the distance from the middle of the 1 Mb window to the beginning or the end of the chromosome whichever was the shortest. Replication time was considered as published by Chen et al. (2010). Simple repeats represented the number of homopolymer and microsatellite bases. GC% was calculated as (G+C)/(A+T+G+C), CpG content as the number of CG dinucleotide bases, CpG islands as the number of bases belonging to CpG islands, gene content as the number of bases belonging to each genomic region. DNase hypersensitivity (DNAseI size) and nuclear lamina binding sites were downloaded from UCSC and the number of bases per site was counted for both.
We sequenced 11 tumor–normal pairs, 6 primary cell cultures (PC) and their match normal DNA samples. Detailed clinical information is shown in Figure 4—source data 1. Exomes were captured using Illumina's TruSeq Exome Enrichment Kit. The TruSeq capture regions encompass 62 Mb, consisting of 94.4%, 83.9%, and 91.9% of the exonic, 5′UTR and 3′UTR regions respectively. 2 × 75 bp paired-end sequencing reactions were used for all EM tumors, while 2 × 100 bp paired-end sequencing was performed on CRC tumors and PC samples. Analysis, annotation, and validation were performed similarly as for whole-genome sequencing. On average, the coverage was 44.5× and 95.1% of bases were called in the captured regions, yielding 51,782 substitutions and 30,290 indels per sample. Raw data are available under restricted access in EGA under accession number EGAS00001000182. Details of validated somatic mutations are available in Figure 4—source data 1 and Figure 4—source data 2.
The 13 MMR-deficient whole-exomes and whole-exome data extracted from 3 MMR-deficient whole-genomes were screened for recurrent mutations. Random selection and validation of 24 indels occurring in 6 or more samples revealed a validation rate of 100%. Given the high validation rate for somatic indels per se, and the even higher rate for recurrent indels, we considered all recurrent indels as true-positives. Subsequent analyses were limited to indels recurrently affecting homopolymer regions, that is, 29,663 Illumina TruSeq-captured exonic homopolymers. Details of recurrent mutations in these homopolymers are available in Figure 4—source data 3. We also screened 5430 and 60,942 homopolymers located in the exome-captured 5′ and 3′ UTRs for recurrent indels. Details of these recurrent indels are in Figure 4—source data 3. Recurrent indels meeting the following criteria were considered for a targeted Sequenom panel assessing MSI: (i) occurring in 6 or more samples, (ii) detected in both EM and CRC exomes, (iii) the maximal length of affected homopolymer <12 bp. After extensive optimization experiments, 59 markers were chosen. Detailed information about each indel is given in Figure 5—source data 1.
236 EM tumors used to establish MSI thresholds were drawn from the Australian National Endometrial Cancer Study (ANECS). IHC analyses of these tumors were independently performed at the Molecular Cancer Epidemiology Laboratory in Brisbane, Australia as described (Tan et al., 2013). 11 out of 236 tumors were excluded for the 59-marker panel due to their low tumor percentage (≤10%). By varying the marker threshold, we calculated the number of true-positives and false-positives identified by our MSI panel relative to the IHC data. A ROC curve was constructed based on these values. The Matthew Correlation Coefficient of the ROC curve was calculated for each threshold. Tumors were considered MSI when they had three markers positive. We did not distinguish between MSI-low and microsatellite stable (MSS), as this is currently not clinically relevant. All tumors with less than three positive markers were thus considered MSS/MSI-L. For the Bethesda panel, we defined three categories as follows: microsatellite stable (MSS, 0 out of 10 markers), low microsatellite instability (MSI-L, 1–2 out of 10 markers), and high microsatellite instability (MSI-H, 3 or more out of 10 markers). Two sets of data (114 EM tumors and 97 CRC tumors) were used for the comparison. Details of these sample sets are given in Figure 5—source data 1.
The 59-marker panel was applied to ovarian tumors and leukemia. Four samples with proven MSI status were selected, including one ovarian tumor (OV) and three leukemia cell lines (DND41, CCRF-CEM, and SUPT1). The MSI-H OV tumor, two MSS OV tumors, and their matched normal samples, as well as three MSI-H leukemia cell lines and a MSS leukemia cell line (RPMI-8402) were exome-sequenced. Detailed information for all samples can be found in Figure 5—source data 2. Raw data are available in EGA under the accession number EGAS00001000182.
Two pathway tools (IPA and GenomeMuSiC) and three pathway databases (IPA, BioCarta, and Reactome [Haw et al., 2011]) were used. We first selected all genes with somatic exonic indels, and then extended our mutation calling to indels occurring 25 bp up or down-stream of each exon. Mutation calling and filtering for the later set of mutations was done as described above. In total, 1989 additional indels in exon/intron boundaries were detected. These were combined with the previously described indels in exonic regions, which—after the removal of indels in MMR genes—yielded 7546 indels in 4116 genes. As a validation, we selected 27 CRC and 65 EM tumors with MSI sequenced by The Cancer Genome Atlas (TCGA, 2012; Kandoth et al., 2013). We selected genes recurrently affected not only by frameshift indels but also by non-synonymous substitutions. There were 2183 and 3138 genes from the CRC and EM tumor data sets, respectively. Detailed results of pathway analyses are given in Table 2—source data 1.
11 primary endometrial and ovarian tumor cell cultures were established from tumors of patients undergoing surgery at the Division of Gynecologic Oncology, UZ Leuven (Belgium). Tissue was washed with PBS supplemented with penicillin/streptomycin and fungizone, digested with collagenases type IV (1 mg/ml; Roche, Vilvoorde, Belgium) and DNAse I (0.1 mg/ml; Roche) in RPMI+ medium. Single cell suspensions were prepared by filtration through a 70-µm filter. Red blood cells were lysed using ammonium chloride (Stem Cell Technologies, Grenoble, France). Single cells were plated into a 25-cm (Parsons et al., 2012) culture flask. After 1–3 weeks, when cells reached 60–70% confluency, fibroblasts were removed using mouse anti-human CD90 (Clone AS02; Dianova, Hamburg, Germany) bound to Mouse Pan IgG Dynabeads (Life Technologies, Erembodegem, Belgium). Cell cultures were subsequently passaged at 70–90% confluency and banked at −80°C. Primary tumor cell cultures were grown in RPMI Medium 1640 supplemented with 20% fetal bovine serum (FBS), 2 mM L-Glutamine, 100U/ml penicillin, 100 μg/ml streptomycin, 1 μg/ml fungizone, and 10 μg/ml gentamicin (Life Technologies) up to 25 passages.
Cells were seeded in 8-well Lab-tek Permanox Chamber slides (Nunc, Zellik, Belgium), treated for 24 hr, fixed in 4% paraformaldehyde for 15 min at room temperature, and ice-cold methanol for 5 min. Primary antibodies recognizing γH2AX (JBW301, Millipore, Overijse, Belgium) and RAD51 (PC130, Merck, Darmstadt, Germany) followed by secondary antibodies conjugated to Alexa Fluor 647 and 488 (Life Technologies) were used. Images were acquired using an A1R Eclipse Ti inverted confocal microscope (Nikon, Brussels, Belgium) and processed using Fiji software, with compound or vehicle-treated cells being processed identically. Nuclei with >5 foci were scored as positive, and at least 200 nuclei were counted per condition by two independent individuals, blinded to the genotypes.
Cells were treated for 24 hr with 26 µM olaparib, 0.3 µM mitomycin C, 0.03 µM camptothecin or carrier, and incubated for 90 min with BrdU (10 µM) before harvesting. Cells were resuspended in ice-cold PBS and ice-cold ethanol was slowly added to 70%. Cells were fixed for 5 min at room temperature, treated with 2 M HCl for 30 min and stained with FITC-conjugated anti-BrdU antibody (BD). Cells were washed, resuspended in PI/RNase staining buffer (BD), and analyzed on a BD Biosciences FACSVerse flow cytometer. Cell cycle distributions were modeled using FlowJo software, and the fraction of cells in S-phase, G2/M and G1 determined as described by Watson et al. (1987).
5,000 cells/well were seeded in 96-well plates. After 24 hr, cells were treated in quadruplicate, incubated for 48 hr at 37°C and analyzed using the In Vitro Toxicology Assay Kit, Sulforhodamine B-based (Sigma, Diegem, Belgium) as per the manufacturer's instructions. Growth inhibition was calculated as described (Vichai and Kirtikara, 2006).
siRNA ON-TARGETplus SMART pools (Thermo) were diluted in Optimem I reduced serum medium using Lipofectamine RNAiMAX (Life technologies) to reverse transfect MCF7 cells For cytotoxicity screening, transfections were in 96-well format and medium was changed 14 hr after transfection. Cells were treated with olaparib (26 μM) and after 48 hr processed for cytotoxicity screening. Simultaneously, siRNA transfections in 12-well plates were done to quantify knockdown.
Total RNA was extracted using the RNeasy Mini kit (Qiagen, Venlo, The Netherlands) and reverse transcribed using the SuperScript III reverse transcription system (Life technologies). Quantitative RT-PCR (qRT-PCR) with ACTB an internal control was performed using TaqMan gene expression assay probes and 5 μl TaqMan Fast Universal PCR master mix (Life technologies). Reactions were amplified in a Roche LightCycler 480 using the following cycles: 50°C (2 min), 95°C (30 s), and 40 cycles of 95°C (3 s), 60°C (30 s).
Mouse anti-phospho-Histone H2A.X (Ser139) monoclonal antibody (clone JBW301) was from Millipore Corporation, Billerica, MA, USA. Rabbit anti-Rad51 (PC130) polyclonal antibody was from Calbiochem/Merck, Darmstadt, Germany. Rabbit anti-ACTB (#4967) polyclonal antibody was from Cell Signalling, Danvers, MA, USA. FITC-conjugated anti-BrdU antibody (347583) was from Becton–Dickinson, San Jose, CA, USA. Alexa Fluor 647 goat anti-mouse IgG (A-21235) and Alexa Fluor 488 goat anti-rabbit IgG (A-11034) were from Life technologies, Carlsbad, CA, USA. Olaparib (AZD-2281, batch JSAR104) was purchased from JS Research Chemicals Trading, Schleswig Holstein, Germany. Cis-platinum (II) diammine dichloride (P4394), paclitaxel (T7402), mitomycin C (M4287), (S)-(+)-camptothecin (C9911) and carmustine (C0400) were purchased from Sigma-Aldrich, St. Louis, MO, USA, and prepared and stored according to the manufacturer's recommendations. siRNA ON-TARGETplus SMART pools were purchased from Thermo Scientific Dharmacon, Chicago, IL, USA: Non-targeting (D-001810-10-05); ATM (L-003201-00-0005); ATR (L-003202-00-0005); BRCA1 (L-003461-00-0005); and BRCA2 (L-003462-00-0005). TaqMan gene expression assays (Life technologies, Carlsbad, CA, USA) used in this study were as follows ATM: Hs01112355_g1; ATR: Hs00992123_m1; BRCA1: Hs01556193_m1; BRCA2: Hs00609073_m1; ACTB: Hs99999903_m1. Normal goat serum (005-000-121) was from Jackson Immunoresearch Labs, West Grove, PA USA.
A National Cancer Institute Workshop on Microsatellite Instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancerCancer Research 58:5248–5257.
Estimating the human mutation rate using autozygosity in a founder populationNature Genetics 44:1277–1281.https://doi.org/10.1038/ng.2418
Diagnostic microsatellite instability: definition and correlation with mismatch repair protein expressionCancer Research 57:4749–4756.
Microsatellites: simple sequences with complex evolutionNature Reviews Genetics 5:435–445.https://doi.org/10.1038/nrg1348
Feasibility of screening for Lynch syndrome among patients with colorectal cancerJournal of Clinical Oncology 26:5783–5788.https://doi.org/10.1200/JCO.2008.17.5950
Screening for the Lynch syndrome (Hereditary nonpolyposis colorectal cancer)The New England Journal of Medicine 352:1851–1860.https://doi.org/10.1056/NEJMoa043146
Mismatch repair deficient colorectal cancer in the era of personalized treatmentNature Reviews Clinical Oncology 7:197–208.https://doi.org/10.1038/nrclinonc.2010.18
Variation in the mutation rate across mammalian genomesNature Reviews Genetics 12:756–766.https://doi.org/10.1038/nrg3098
Differential relationship of DNA replication timing to different forms of human mutation and variationAmerican Journal of Human Genetics 91:1033–1040.https://doi.org/10.1016/j.ajhg.2012.10.018
Mutagenesis by transient misalignmentThe Journal of Biological Chemistry 263:14784–14789.
Dissecting the heterogeneity of triple-negative breast cancerJournal of Clinical Oncology 30:1879–1887.https://doi.org/10.1200/JCO.2011.38.2010
Trapping of PARP1 and PARP2 by Clinical PARP InhibitorsCancer Research 72:5588–5599.https://doi.org/10.1158/0008-5472.CAN-12-2753
Microsatellite instability and adjuvant fluorouracil chemotherapy: a mismatch?Journal of Clinical Oncology 28:3207–3210.https://doi.org/10.1200/JCO.2010.28.9314
Correlation of tumour BRAF mutations and MLH1 methylation with germline mismatch repair (MMR) gene mutation status: a literature review assessing utility of tumour features for MMR variant classificationJournal of Medical Genetics 49:151–157.https://doi.org/10.1136/jmedgenet-2011-100714
Epigenetic mechanisms in the pathogenesis of Lynch syndromeClinical Genetics 85:403–412.https://doi.org/10.1111/cge.12349
Accuracy of revised Bethesda guidelines, microsatellite instability, and immunohistochemistry for the identification of patients with hereditary nonpolyposis colorectal cancerThe Journal of the American Medical Association 293:1986–1994.
Human mutation rate associated with DNA replication timingNature Genetics 41:393–395.https://doi.org/10.1038/ng.363
MSH3 mediates sensitization of colorectal cancer cells to cisplatin, oxaliplatin, and a poly(ADP-ribose) polymerase inhibitorThe Journal of Biological Chemistry 286:12157–12165.https://doi.org/10.1074/jbc.M110.198804
Improving identification of lynch syndrome patients: a comparison of research data with clinical recordsInternational Journal of Cancer 132:2876–2883.https://doi.org/10.1002/ijc.27978
KRAS mutation is associated with lung metastasis in patients with curatively resected colorectal cancerClinical Cancer Research 17:1122–1130.https://doi.org/10.1158/1078-0432.CCR-10-1720
Sulforhodamine B colorimetric assay for cytotoxicity screeningNature Protocols 1:1112–1116.https://doi.org/10.1038/nprot.2006.179
From human genome to cancer genome: the first decadeGenome Research 23:1054–1062.https://doi.org/10.1101/gr.157602.113
John StamatoyannopoulosReviewing Editor; University of Washington, United States
eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.
Thank you for sending your work entitled “Mismatch repair deficiency endows tumors with a unique mutation signature and sensitivity to DNA double-strand breaks” for consideration at eLife. Your article has been favorably evaluated by Stylianos Antonarakis (Senior editor), a Reviewing editor, and 2 reviewers, one of whom, Thilo Dörk, has agreed to reveal his identity.
The Senior editor has assembled the following comments to help you prepare a revised submission.
1) There is a major issue of mapping and calling the in-dels within short repeats that are typical of MSI + cancers. This can lead to unmapped reads, mismapping and type 1 and type 2 errors. This problem especially affects the gapped reads in the Complete Genomics platform. Strategies to mitigate these issues are not mentioned. I have considerable doubts about the inclusion of the whole-genome data in this manuscript - after all, N = 3 is a very small number anyway – and I suggest that the issue should be addressed in exome data (even if some of those data are from the WGS cancers).
2) The sample set is heterogeneous in terms of cancer of origin and derivation from primary cancers and cultures that are likely to have been subjected to considerable in vitro selection pressure and/or founder effects.
3) The similarity between germline and somatic mutation spectra might, in part be caused by many somatic mutations occurring in normal progenitors prior to loss of MMR. Is there a way of investigating this in comparison with non-MMR tumors (e.g. examining effects of age)?
4) The 59-marker exomic MSI panel is useful, especially for Lynch syndrome, and appears to perform well.
5) The pathway analysis presumably incorporated all detected variants. Whilst strongly suggestive, does filtering for variants with strong evidence of functionality alter these conclusions? Moreover, the burden of mutations in these pathways might relate to redundant function – noting that MSI + cancers are usually near-diploid - rather than positive selection. This may or may not matter for therapeutic purposes, but can it be checked in some way?https://doi.org/10.7554/eLife.02725.040
1) There is a major issue of mapping and calling the in-dels within short repeats that are typical of MSI + cancers. This can lead to unmapped reads, mismapping and type 1 and type 2 errors. This problem especially affects the gapped reads in the Complete Genomics platform. Strategies to mitigate these issues are not mentioned. I have considerable doubts about the inclusion of the whole-genome data in this manuscript - after all, N=3 is a very small number anyway – and I suggest that the issue should be addressed in exome data (even if some of those data are from the WGS cancers).
We thank the reviewer for this comment. Indeed, different sequencing platforms are each characterized by their specific false-positive and false-negative variant detection rates. For example, 10.5% of indels was false-positive in Complete Genomics (CG) datasets, whereas 27.7% of indels was missed (false-negatives), as discussed by Zook et al (Jiricny, 2006). For Illumina genomes, 6.9% of indels were false-positive and 0.5% of indels were false-negative.
First, to address the issue of false-positives (type 1 error), we expanded the number of randomly selected indels from the CG-sequenced MMR-deficient tumor (MMR-1) and Illumina-sequenced MMR-deficient tumor (MMR-2), respectively. We chose an orthogonal validation technology, i.e., Sequenom MassARRAY, to validate a total number of 391 indels. The overall validation rate that we obtained for indels was 90.3% in CG genomes and 85.9% in Illumina -sequenced genomes (see table 1 below and Figure 1—source data 3). These validation rates are thus very similar between both platforms. The validation rates are also much higher than those that we observed for indels in MMR-proficient tumors and those the reviewer is correctly referring to in the literature (Parsons et al., 2012; Peltomaki, 2014). The low validation rates for indels in the MMR-proficient whole-genome tumors most probably reflect the fact that in germ-line genomes, as well as MMR -proficient tumor genomes, the number of true-positive indels is low in comparison to the number of false-positive indels that are detected. However, in MMR-deficient tumors, due to their specific hypermutator phenotype, the number of true-positive indels is vastly increased, thereby rendering the false positive fraction proportionally much smaller. In the revised manuscript, we have therefore highlighted the possibility of having false-positive and false-negative findings in the whole-genomes and indicate that this may affect observed indel rates. We also discuss that validation rates in MMR-deficient tumors are much higher than in the MMR-proficient tumors and explain the reasons for this. As highlighted by the reviewer, sequence reads containing both indels and substitutions (i.e., reads with a relatively high percentage of mismatches) are more prone to mismapping than sequence reads containing only substitutions or indels. We acknowledge this issue, and have in response deleted the paragraph describing that somatic indels and substitutions are often located close to each other in the 3 MMR-deficient tumors that were whole-genome sequenced. Indeed, such observations are prone to contamination by false-positives.
Secondly, we have followed the reviewer’s suggestion and have removed the 3 whole-genomes from the analyses aimed at identifying recurrent mutations and constructing a novel MSI panel. In particular, we repeated all analyses with only the 13 genomes subjected to Illumina exome-sequencing. We observed that in coding regions, 1.4% of homopolymers were affected at least once (i.e. in 2073 homopolymers out of a total of 29,663), whereas 414 were affected at least twice. Furthermore, 47 homopolymers were affected in ≥5 samples. In 3’UTR and 5’UTRs, 2296 and 105 homopolymers were affected in ≥5 samples respectively. When randomly selecting recurrent indels to design a panel of recurrent markers capable of assessing MSI, 54 out of the 59 originally selected markers were still selected, as they affected ≥5 out of 13 tumors (compared to 59 recurrent indels affecting ≥6 out of 16 tumors). Of these 54 markers, 45 markers were in UTRs and 9 were in coding regions. Applying the 54-marker panel on the same discovery set of 236 EM tumors as described in the original manuscript, we also found 3 positive markers as the threshold with the best Matthew Correlation Coefficient (Author response image 1A and 1B).
When comparing our 54 markers against the Bethesda panel, we equally found that the 54-maker panel had a higher sensitivity compared to Bethesda. Specifically, we applied the 54 -marker panel to a set of 114 endometrial tumors and a set of 126 colorectal tumors as described in the original manuscript. For the EM tumors, 73 tumors (64%) were defined as MSS/MSI-L and 41 tumors (36%) as MSI-H. Out of these 41 MSI-H tumors, Bethesda identified 29 tumors as MSI-H (>2 markers positive), 7 tumors as MSI-L and 5 tumors as MSS. Vice versa, Bethesda did not identify any MSI-H tumor that was not identified by our novel MSI panel (Author response image 1C). IHC on 9 out of 12 discordant samples confirmed that each of these samples was deficient either for MLH1 or MSH2, and thus MMR-deficient. No tumor slides were available for the remaining 3 samples. The 9 discordant samples, we had access to, were confirmed as true positives by IHC. For CRC tumors, there were 97 MSS tumors in our 54-marker panel that were concordantly called MSS or MSI-L by the Bethesda panel. The remaining 29 samples were detected as MSI in the 54-marker panel (Author response image 1D). 28 of these were also called MSI-H by the Bethesda panel, whereas one was called MSS by the Bethesda panel. It had a BRAF mutation and was MLH1 hypermethylated, thereby confirming MMR-deficiency and correct classification by the 54-marker panel.
Finally, we also repeated our pathway analyses on the genes affected by indels in the 13 exomes, rather than on the whole set of 16 exomes, 3 of which were generated by whole-genome sequencing. Pathway analyses on the 3856 genes affected by a somatic indel using IPA® revealed that the “Role of BRCA1 in DNA damage response” was the top enriched pathway (P=4.2E-04). IPA® analysis of 1302 genes affected by recurrent indels revealed that the “DNA double-strand break (DSB) repair by Homologous Recombination (HR)” was the top enriched pathways (P=4.7E-03). Pathway analyses of 6736 indels in MMR-deficient tumors using GenomeMuSiC revealed that the “ATR/BRCA pathway”, “Homologous recombination repair” and “DNA repair” pathways were ranked highest in BioCarta, DNARepairDB and Reactome databases respectively (P=4.9E-13, P=1.8E-03andP=5.7E-08, respectively). Overall, these results are nearly identical to the data generated on the 16 exomes, as presented in the original manuscript.
In conclusion, since our data were not significantly affected or did not change any of our conclusions, depending on whether we analyzed 13 or 16 genomes, we chose to present the data of the 16 exomes as the main analysis. However, in the revised manuscript, we have now added a sentence highlighting that data and conclusions did not change when the analysis was limited to the 13 exomes generated by Illumina exome-sequencing only. Furthermore, since the response to this comment will be published in parallel to the manuscript, a critical reader will be able to assess in full detail that data did not change after removing the 3 whole-genomes from the analysis.
2) The sample set is heterogeneous in terms of cancer of origin and derivation from primary cancers and cultures that are likely to have been subjected to considerable in vitro selection pressure and/or founder effects.
We agree with the reviewer that it is important to consider the heterogeneity of the tumors in terms of cancer of origin or derivation procedure. We have therefore performed a clustering analysis of all MMR-deficient tumors that we sequenced for the genes affected by either a somatic substitution or indel in the coding region. As can be appreciated from Author response image 2 below, no obvious subgroup in terms of cancer of origin was observed. In the revised manuscript, this figure has also been added as Figure 4–figure supplement 1.
The figure also shows that there is no distinct difference between primary tumors and data generated on the primary cell cultures. As mentioned in the revised manuscript, we specifically chose to use primary tumor cultures of low passage rather than tumor cell lines, because the latter have been subject to much more selective pressure as primary tumor cultures of low passage. In addition to the above cluster analysis, we also performed pathway analysis on the 10 primary tumor tissues only. Pathway analyses of all genes affected by a somatic indel using IPA® revealed that the “Role of BRCA1 in DNA damage response” and “DNA double-strand break (DSB) repair by Homologous Recombination (HR)” were the top enriched pathways (P=6.5E-03 and P=1.1E −02, respectively). IPA® analysis of genes affected by recurrent indels revealed that the “Role of BRCA1 in DNA damage response” was also enriched (p = 2.0E-03). Pathway analyses of all indels in MMR-deficient tumors using GenomeMuSiC revealed that the “ATR/BRCA pathway”, “Homologous recombination repair” and “DNA repair” pathways were ranked highest in BioCarta, DNARepairDB and Reactome databases respectively (P=1.0E -09, P=0.4E-02and P=3.4E-06, respectively). The results derived only from the primary tumors suggest that MMR-deficient tumors are indeed enriched in indels affecting the DSB repair pathway. Data are thus highly concordant with the results shown in the manuscript.
3) The similarity between germline and somatic mutation spectra might, in part be caused by many somatic mutations occurring in normal progenitors prior to loss of MMR. Is there a way of investigating this in comparison with non-MMR tumors (e.g. examining effects of age)?
We thank the reviewer for this insightful hypothesis. However, we found no correlation between age at diagnosis and the number of mutations detectable (P=0.86). Moreover, although the age at diagnosis of patients with MMR-proficient and MMR-deficient tumors was very comparable (67 and 62 years respectively), MMR-deficient tumors carried >55 times more mutations than MMR-proficient tumors. When compared to MMR- deficient tumors, MMR-proficient somatic mutations thus comprise at most only 2% of all mutations in MMR-deficient tumors, a fraction that is unlikely to contribute significantly to the similarity in patterns between MMR -deficient and germline mutation patterns.
Finally, as described in the original manuscript (Figure 2A in the original manuscript), no extensive similarity was noted between MMR-proficient and germline mutation patterns. Consequently, even if somatic mutations (as reflected in MMR-proficient mutations) would contribute significantly to MMR-deficient mutation patterns, they would not display extensive similarity to germline variation patterns and could therefore not be responsible for the patterns observed in MMR - deficient tumors. In the revised manuscript, we have indicated this.
4) The 59-marker exomic MSI panel is useful, especially for Lynch syndrome, and appears to perform well.
We are happy to read that the reviewers appreciate our work.
5) The pathway analysis presumably incorporated all detected variants. Whilst strongly suggestive, does filtering for variants with strong evidence of functionality alter these conclusions? Moreover, the burden of mutations in these pathways might relate to redundant function – noting that MSI + cancers are usually near-diploid - rather than positive selection. This may or may not matter for therapeutic purposes, but can it be checked in some way?
We apologize for not more clearly explaining our pathway analyses in the original manuscript. We described two types of pathway analyses: the first involved somatic frameshift indels in exons, the second involved somatic indels both in exons and exon/intron boundaries. We thus already restricted the presented pathway analyses to variants with strong evidence of functionality. Indeed, in the first analysis each of the selected somatic indels in exons already represented an out-of-frame mutation, thus conferring a heterozygous loss-of function mutation on the gene affected in the tumor.
In an effort to further enrich for mutations with a functional effect in the tumor, we additionally restricted our pathway analyses to genes expressed in endometrial or colorectal normal tissue. RNA-sequencing data generated on normal endometrium and colon tissue were downloaded from TCGA (Pinol et al., 2005; Ng et al., 2010). For both EM and CRC datasets, we calculated the mean normalized read count for each gene in 12 normal endometrial samples and 40 normal colorectal samples respectively. Transcripts with over 10 reads per kb and per million reads were considered expressed. This resulted in 12,851 and 12,293 genes that were expressed in endometrial and colorectal tissues respectively. We then limited the pathway analyses to indels affecting genes expressed either in normal endometrium or in normal colon tissue. Pathway analysis using IPA® of 2,126 expressed genes affected by a somatic indel ranked the “Role of BRCA1 in DNA damage response” as the top enriched pathway. IPA® analysis of 851 expressed genes affected by recurrent indels revealed that the “Double-strand break repair by homologous recombination” pathway ranked top. GenomeMuSiC ranked “ATR/BRCA pathway”, “DNA repair” and “Homologous recombination” pathways as the top pathways for BioCarta, Reactome and DNARepair DB respectively. By restricting ourselves to frameshift indels affecting genes that are expressed in endometrial tissue, similar results were thus obtained.
In order not to burden the reader with too many pathway analyses, we have chosen not to present these data in the revised manuscript. Furthermore, since the response to this comment will be published online, critical readers will be able to appreciate in detail that the outcome was not altered by removing genes that are affected by indels but not expressed in the normal corresponding tissue.
1. Zook, J. M. et al. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol 32, 246‐251, doi:10.1038/nbt.2835 (2014).
2. Jia, P. et al. Consensus rules in variant detection from next-‐generation sequencing data. PLoS One 7, e38470, doi:10.1371/journal.pone.0038470 (2012).
3. O'Rawe, J. et al. Low concordance of multiple variant-‐calling pipelines: practical implications for exome and genome sequencing. Genome medicine 5, 28, doi:10.1186/gm432 (2013).
4. TCGA. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330‐337, doi:10.1038/nature11252 (2012).
5. Cancer Genome Atlas Research, N. et al. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67‐73, doi:10.1038/nature12113 (2013).https://doi.org/10.7554/eLife.02725.041
- Diether Lambrechts
- Diether Lambrechts
- Amanda Spurdle
- Amanda Spurdle
- Amanda Spurdle
- Amanda Spurdle
- Hui Zhao
- Bernard Thienpont
- Lieve Coenegrachts
- Betül Tuba Yesilyurt
- Matthieu Moisse
- Diether Lambrechts
- Amanda Spurdle
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
We greatly appreciate the assistance of Mark Veugelers and Stéphane Plaisance of the VIB Technology Watch team. We acknowledge the contributions of Gilian Peuteman and Thomas Van Brussel for Sequenom validation experiments. We thank Penelope Webb, Daniel Buchanan, Kaltin Ferguson, Mike Walsh, Joanne Young, as well as ANECS collaborators, ANECS staff and participating Institutions (http://www.anecs.org.au/html) for their roles in ANECS study setup and/or characterization of ANECS endometrial tumors. We are grateful to the Verelst Fund and Reliable Cancer Therapies. The research was funded by grants from the Fund for Scientific Research Flanders (FWO-F), the ‘Stichting tegen Kanker’ and the KULeuven (KUL PFV/10/016 SymBioSys). ANECS patient recruitment, data collection, biospecimen collection, and IHC analysis was supported by funding from the National Health and Medical Research Council (NHMRC) of Australia (Grant ID#339435); The Cancer Council Queensland (ID#4196615); and Cancer Council Tasmania (IDs#403031, #457636), and Cancer Australia (ID1010859). HZ, BT, LC, and JR hold a FWO postdoctoral fellowship, BTY and MM hold a FWO PhD fellowship. AS is supported by an NHMRC Senior Research Fellowship.
Human subjects: Informed consent and consent to publish was obtained from all patients. Ethical approval was obtained at the ethical committee of University Hospital Gasthuisberg of Leuven with identifier ML2266.
- John Stamatoyannopoulos, University of Washington, United States
© 2014, Zhao et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Downloads (link to download the article as PDF)
Download citations (links to download the citations from this article in formats compatible with various reference manager tools)
Open citations (links to open the citations from this article in various online reference manager services)
Sequence variation in regulatory DNA alters gene expression and shapes genetically complex traits. However, the identification of individual, causal regulatory variants is challenging. Here, we used a massively parallel reporter assay to measure the cis-regulatory consequences of 5832 natural DNA variants in the promoters of 2503 genes in the yeast Saccharomyces cerevisiae. We identified 451 causal variants, which underlie genetic loci known to affect gene expression. Several promoters harbored multiple causal variants. In five promoters, pairs of variants showed non-additive, epistatic interactions. Causal variants were enriched at conserved nucleotides, tended to have low derived allele frequency, and were depleted from promoters of essential genes, which is consistent with the action of negative selection. Causal variants were also enriched for alterations in transcription factor binding sites. Models integrating these features provided modest, but statistically significant, ability to predict causal variants. This work revealed a complex molecular basis for cis-acting regulatory variation.
Hosts and viruses are constantly evolving in response to each other: as a host attempts to suppress a virus, the virus attempts to evade and suppress the host’s immune system. Here, we describe the recurrent evolution of a virulent strain of a DNA virus, which infects multiple Drosophila species. Specifically, we identified two distinct viral types that differ 100-fold in viral titer in infected individuals, with similar differences observed in multiple species. Our analysis suggests that one of the viral types recurrently evolved at least four times in the past ~30,000 years, three times in Arizona and once in another geographically distinct species. This recurrent evolution may be facilitated by an effective mutation rate which increases as each prior mutation increases viral titer and effective population size. The higher titer viral type suppresses the host-immune system and an increased virulence compared to the low viral titer type.