Breast cancer genomes have revealed a novel form of mutation showers (kataegis) in which multiple same-strand substitutions at C:G pairs spaced one to several hundred nucleotides apart are clustered over kilobase-sized regions, often associated with sites of DNA rearrangement. We show kataegis can result from AID/APOBEC-catalysed cytidine deamination in the vicinity of DNA breaks, likely through action on single-stranded DNA exposed during resection. Cancer-like kataegis can be recapitulated by expression of AID/APOBEC family deaminases in yeast where it largely depends on uracil excision, which generates an abasic site for strand breakage. Localized kataegis can also be nucleated by an I-SceI-induced break. Genome-wide patterns of APOBEC3-catalyzed deamination in yeast reveal APOBEC3B and 3A as the deaminases whose mutational signatures are most similar to those of breast cancer kataegic mutations. Together with expression and functional assays, the results implicate APOBEC3B/A in breast cancer hypermutation and give insight into the mechanism of kataegis.https://doi.org/10.7554/eLife.00534.001
The genomes of cancer cells contain mutations that are not present in normal cells. Some of these prevent cells from repairing their DNA, while others give rise to tumours by causing cells to multiply uncontrollably. Moreover, some of the mutations in breast cancer cells occur in clusters—a phenomenon known as kataegis (from the Greek for ‘thunderstorm’).
Kataegic mutations occur almost exclusively at a cytosine preceded by a thymine. This suggests that a family of proteins called AID/APOBEC enzymes—which remove amine groups from cytosines—may be involved in generating these mutations. In this study, Taylor et al. confirm this possibility by showing that expressing individual members of the AID/APOBEC family of enzymes in yeast cells increases the mutation frequency and induces kataegis.
The kataegis triggered by the AID/APOBEC enzymes could be localised through the introduction of double-stranded breaks into the DNA: Taylor et al. suggest that this might happen because repairing the breaks exposes single-stranded DNA, which the AID/APOBEC enzymes then act upon. By comparing the mutations induced in the yeast cells with those observed in breast cancer cells, Taylor et al. identified APOBEC3B as the enzyme most likely to be responsible for kataegis in breast cancer (with APOBEC3A also a strong candidate in some cancers). Moreover, they showed that APOBEC3B was highly expressed in breast cancer cell lines, and that APOBEC3B and APOBEC3A can also cause DNA damage in human cells.
Taken together, the findings provide key insights into the mechanism by which kataegis arises, and identify two proteins likely to contribute to the mutations seen in breast cancer. Further work is now required to determine whether these enzymes also give rise to mutations in other forms of cancer.https://doi.org/10.7554/eLife.00534.002
Whole genome sequencing of 21 breast cancers recently revealed the presence in more than half the cancers of a novel form of localised hypermutation (termed kataegis) in which clusters of multiple same-strand mutations at C:G pairs are focused on multikilobase-long genomic regions with adjacent mutations within each cluster separated by one to several hundred base pairs (Nik-Zainal et al., 2012). Although the mechanism underlying kataegis is unknown, the fact that the mutations occurred nearly exclusively at C residues preceded by a 5′-T suggested that AID/APOBEC cytidine-DNA deaminases might possibly be involved in the process since these enzymes are sensitive to the 5′-flanking sequence context (Conticello et al., 2007; Nik-Zainal et al., 2012).
Members of the AID/APOBEC family of enzymes (reviewed in Conticello et al., 2007) deaminate cytosine in the context of a single-stranded polynucleotide substrate, and function in adaptive and innate immunity. AID acts on C residues in the DNA of the genomic immunoglobulin loci in activated lymphocytes to trigger antibody gene diversification whereas APOBEC3 family members, of which there are seven in humans, act on C residues in the DNA of viral replication intermediates (usually in the cytoplasm) as part of a host restriction pathway. Off-target deamination by AID results in nucleotide substitutions and genomic rearrangements in B lymphocyte tumours, some of which are implicated in oncogenesis (reviewed by Gazumyan et al., 2012). Although AID is the only member of the AID/APOBEC family known to act physiologically on endogenous nuclear DNA, it is possible that other members of the AID/APOBEC family might occasionally get access to the nucleus and cause cancer-associated genomic damage or mutation (Harris et al., 2002; Beale et al., 2004; Vartanian et al., 2008; Stenglein et al., 2010; Landry et al., 2011; Nik-Zainal et al., 2012; Nowarski et al., 2012).
Here we have asked whether we could recapitulate cancer-like kataegis by expression of different AID/APOBEC enzymes in budding yeast and if so, use the tractability of yeast to gain insight into the kataegic process. The results provide insight not only into the mechanism of kataegis but also provide a strong pointer to the identity of the deaminases responsible for the kataegis observed in breast cancers.
Several members of the AID/APOBEC family members were expressed in yeast and all were found to give a significant increase in the mutation frequency as judged by the yield of colonies resistant to canavanine (CanR) (Figure 1—figure supplement 1). Genome sequencing, however, revealed that most CanR colonies had typically accumulated less than 10 point mutations (>98% at C:G pairs) during the period of AID/APOBEC induction and clonal expansion (Figure 1A). A hyperactive mutant of AID (AID*; Wang et al., 2009) gave a significantly higher mutation load (median of 25 mutations per genome). We therefore initially focused on the mutations in AID*-transformants (1078 mutations in total, of which all except 14 were at C:G pairs) to see if there were signs of kataegis.
The distances between neighbouring mutations in the AID* yeast transformants are displayed as rainfall plots (Figure 1B). While the median overall intermutational distance (IMD) is 13 kb, it is apparent that rather than the mutations being scattered randomly over the genome, mutation distribution is bimodal (Figure 1C). Dividing the mutations into two groups using k-means cluster analysis (Hartigan and Wong, 1979) reveals that one group exhibit a median IMD of 156 kb with a distribution of distances that is as expected for a set of individual mutations randomly scattered over the yeast genome as judged by Monte Carlo simulation (Figure 1C). We designate these as singlet mutations: they account for 52% of the total mutations. The remaining 48% of the mutations are much more closely spaced than would be expected on a random basis. We designate these as proximal mutations, which are separated from each other by a median IMD of only 727 bp with >99% of them being within 8.5 kb of their closest neighbour.
The overwhelming majority of the proximal mutations in the AID* transformants do not occur as isolated mutational pairs but, rather, are found in clusters. Thus, if we define proximal mutations as a pair of mutations that are located <8.5 kb apart (a distance that excludes 99% of the singlet mutations) and define a cluster as a stretch of DNA containing ≥5 proximal mutations, we find that 75% of the AID*-induced proximal mutations are actually parts of clusters. These clusters typically extend over 6–15 kb (with the full range detected being 1.8–30 kb) and contain anything up to 26 mutations (Figure 1E). This clustering is far in excess of anything that would be expected on a random basis. The level of mutation clustering observed with AID* is such that more than one-third of the transformants analysed (16/40) contain at least one mutation cluster. In affected clones, a quarter to two-thirds of all the mutations in the cell are concentrated in a small number of clusters that account for <0.2% of the entire genome. Similar clusters were also observed in yeast cells transformed with APOBEC3A and APOBEC3B as well as with the hyperactive APOBEC3G mutant APOBEC3G* (Figure 1D,F). Like the cancer kataegis, the clustered mutations in the various yeast transformants showed a strong tendency towards strand polarity; mutations within a cluster occur predominantly at either a C residue or a G residue with over 88% of mutations being strand coordinated (Figure 1D,F).
Exploring the mutational spectra, we find that the majority (76%) of the mutations in the yeast AID* transformants are C→T transitions, although transversions do occur and these are preferentially associated with the kataegic stretches (Figure 2A and Table 1). Transversions account for 54% of the kataegic mutations in the AID* transformants but for only 13% of the unclustered substitutions (Table 1). The same bias towards transversion mutations in the kataegic stretches is also observed in the APOBEC3A, 3B and 3G* transformants (Figure 2A and Table 1).
Whereas C→T transitions will likely arise through direct replication over uracils generated by cytidine deamination, transversions are presumably due to replication over abasic sites created through uracil excision by uracil-DNA glycosylase (UNG). The transversions exhibit a strong (4- to 10-fold) bias for C→G rather than C→A substitutions (Table 1) suggesting that the replication over the abasic site could be catalysed by REV1 since this translesion polymerase (by virtue of its deoxycytidyl nucleotide transferase activity) inserts C opposite abasic sites (Nelson et al., 1996). Indeed, deficiency in either REV1 or UNG led to a dramatic fall in the proportion of transversion mutations (Table 1). Deficiency in UNG also resulted in a fourfold increase in the average total mutation load in AID* transformants (Supplementary file 1B). This presumably reflects diminished repair of the AID/APOBEC-generated uracils. There was an overall decrease in average total mutation load in AID* transformants of REV1 deficient yeast that might reflect the possible non-catalytic roles of REV1 during DNA damage repair (Sale et al., 2012).
Since UNG is required for the transversion mutations that are enriched in kataegic stretches, we asked whether UNG itself is required for kataegis. We found that the increased mutation load in AID* ung1Δ transformants was accompanied by a dramatic shift away from mutational clustering (Figure 2B). Despite the fourfold ‘increase’ in mutation load, the percentage of mutations that are <8.5 kb from their neighbour (proximal mutations) actually ‘falls’ from accounting for 48% of the AID* mutations in wild type cells to 18% in ung1Δ transformants. Similarly, using the same criterion to distinguish clustered mutations in both datasets (≥5 linked mutations separated from their neighbour by <8.5 kb), 274 of the 1064 mutations observed in AID* wild type transformants are found within clusters compared to 28 of the 2088 mutations in the AID* ung1Δ transformants (Supplementary file 1B). Thus, the median overall IMD actually ‘increases’ from 13 kb in AID* wild type transformants to 41 kb in the ung1Δ cells despite the increase in mutation load.
These shifts do not simply reflect a fall in the proportion of clustered mutations due to the increased total mutation load. There is also an absolute fall in the amount of kataegis as judged by either the average number of clustered mutations per yeast transformant (6.9 in the wild type background vs 1.5 in the ung1Δ transformants) or by the frequency of kataegic events (26 kataegic stretches in 40 AID* transformants in the wild type background vs 4 kataegic stretches in 19 AID* transformants in ung1Δ background) (Supplementary file 1B). Thus, it is evident that kataegis is substantially reduced in the UNG-deficient background, but not completely lost: a few residual clusters (which exhibit evident strand polarity or bipolarity) are still detected (Figure 2C).
The sensitivity of kataegis to UNG-deficiency indicates that kataegis is, at least in part, triggered through the generation of abasic sites. Cleavage at abasic sites by apyrimidinic endonucleases will lead to occasional double-stranded DNA breaks: kataegis could result from AID/APOBEC deamination of single-stranded DNA exposed during the resection phase of break repair. To determine whether the DNA break repair process predispose to kataegis, we introduced a target site for the restriction endonuclease I-SceI immediately downstream of the polyadenylation site of the CAN1 locus, and asked whether co-expression of I-SceI together with the APOBEC3G* deaminase increased the likelihood of kataegis in the vicinity. We chose to use APOBEC3G* for this experiment since it gave a good mutation load but a lower proportion of kataegic mutations than AID* (Supplementary file 1B): any enhancement of kataegis would therefore be more readily detectable. Consistent with previous findings (Poltoratsky et al., 2010), induction of I-SceI expression resulted in an increased frequency of deaminase-dependent selectable mutation at the linked CAN1 locus (Figure 2—figure supplement 1). More importantly, in the presence of APOBEC3G*, induction of a double-strand break increases the probability that mutations in its vicinity are kataegic (Figure 2D).
The mutation clusters in the breast cancers were analysed in the same way as the yeast clusters. Most of the cancers identifiable by rainfall plots as harbouring major regions of kataegis also contain clusters comprising smaller numbers of same-strand nucleotide substitutions at 5′-T-C dinucleotides (Figure 3—figure supplements 2–4). There is some diversity amongst the breast cancers with respect to the frequency/nature of the kataegic stretches. The main outlier is tumour PD4107a which carries a dense array of highly mutated (and transition-restricted) kataegic clusters coincident with extensive genomic rearrangement in a 14 Mb region of chromosome 6 (Nik-Zainal et al., 2012). Overall, the kataegic clusters in the breast cancers are distributed over a similar range of lengths to those detected in the yeast transformants (Figure 3A) but the yeast clusters do typically contain a twofold to fivefold lower density of mutations (a mean inter-mutational distance of 1220 bp within the AID* yeast kataegic stretches compared to 209 bp in PD4107a, 335 bp in PD4103a and 763 bp in PD4199a).
The vast majority of the breast cancer kataegic mutations occur at C residues preceded by a T (Figure 3—figure supplements 2–4). In tumours PD4103a, PD4107a and PD4199a, over 91% of the kataegic C mutations are preceded by a T (Figure 3C and Figure 3—figure supplement 2–4). However, any sensitivity to the identity of the base at position −2 is exceedingly mild (average across the kataegic stretches in these three tumours is A:C:G:T = 32:20:19:29 compared to the human genome average of 30:20:20:30) (Figure 3C).
Previous experiments in which AID/APOBEC deaminases have been used to mutate specific bacterial or retroviral gene targets have revealed that individual deaminases show characteristic flanking nucleotide preferences. However, none of the deaminases analysed to date (AID, APOBEC1, APOBEC3C, 3DE, 3F and 3G) has been shown to exhibit a preference that accords with the breast cancer kataegic mutations. Their flanking sequence preferences (reviewed in Conticello et al., 2007) are either radically different (e.g., AID prefers A/G at −1; APOBEC3G prefers C at −1) or else they do not show the high (>90%) preference for T at −1 coupled to a relative indifference to the base at −2.
The mutation spectra obtained in yeast allow the consensus motifs for individual deaminases to be refined owing to the large number of potential target sequences interrogated when mutational specificity is analysed on a genome-wide basis (Figure 3B). With AID, APOBEC3C and APOBEC3G the yeast data essentially confirm the previously identified sensitivity to nucleotides located at positions −1 and −2 (AID: 5′-WRC, APOBEC3C: 5′-TYC, APOBEC3G: 5′-CCC) whilst allowing more precise quantitation of the degree of preference. With regard to APOBEC3A and APOBEC3B, the results reveal that (consistent with earlier studies on APOBEC3B; Bishop et al., 2004), both enzymes strongly prefer a T at position −1 (91%). However, the yeast studies reveal that unlike other deaminases, both APOBEC3A and APOBEC3B show mild discrimination with regard to the bases located at position −2 (APOBEC3A, A:C:G:T = 25:26:7:42; APOBEC3B, A:C:G:T = 35:14:20:31) (Figure 3B).
Comparing the contexts of the mutations obtained with the individual deaminases in yeast to those of the kataegic mutations in the cancers reveals that APOBEC3B has a signature that fits extremely well with the kataegis in PD4107a and PD4103a whereas APOBEC3A fits better with PD4199a (p values in all three cases <0.005) (Figure 3D). Interestingly, a marked bias towards a 5′-T is also seen amongst the individual singlet C mutations in several of the breast tumours (e.g., PD4199a, PD4005a and PD4120a; Figure 3C and Figure 3—figure supplement 2–4).
Although APOBEC3A has been shown to be capable of causing genomic damage in mammalian cells (Vartanian et al., 2008; Stenglein et al., 2010; Landry et al., 2011), the same has not been shown for APOBEC3B. We find that induction of APOBEC3B expression in stably transfected human KBM7 cells (like that of APOBEC3A) results in loss of viability as well as in genomic DNA damage as judged by the induction of γH2AX (a marker of the DNA damage response) and of 53BP1 foci (which identify DNA breaks) (Figure 4A,B).
Although kataegis could easily have resulted from a transient spike in deaminase expression during tumour development, it was interesting to ascertain whether APOBEC3A or APOBEC3B expression could be detected or induced in breast cancer-derived cells. RNA analysis revealed that although several APOBEC3s can be expressed in individual breast cancer cell-lines, the highest and broadest pattern of expression was evident with APOBEC3B (Figure 4C). Consistent with studies in other cell-types (Madsen et al., 1999; Koning et al., 2009; Stenglein et al., 2010), the expression of APOBEC3A and APOBEC3B in some of the breast cancer cell-lines could be enhanced by treatment with phorbol ester or interferon alpha.
Expression of AID/APOBECs cytidine deaminases in yeast generates mutations across the genome, a proportion of which are found in clusters. Since completing this work, two other groups have also demonstrated that cytidine deaminases can generate such clustered mutations (Chan et al., 2012; Lada et al., 2012). Here we extend on these findings, demonstrating the similarity of yeast and breast cancer kataegis, use yeast genetics to gain insight into the mechanism of kataegis and provide evidence identifying the individual APOBECs likely responsible for the kataegis in the breast cancers.
The stimulation of local kataegis in yeast by the induction of an I-SceI break indicates that the process occurs during DNA break repair, most likely by AID/APOBEC-catalysed deamination of cytidines exposed on single-stranded DNA during the resection phase of homology-mediated repair. The lengths of the kataegic stretches (mostly in the range 6–15 kb) are in the same order as the extent of resection observed during yeast DNA break repair (reviewed in Paull, 2010) although the occurrence and detection of kataegis is likely to bias towards longer stretches. The distances separating adjacent mutations within the yeast kataegic stretches (averaging about 1.2 kb in the AID* dataset) might in part reflect that the deaminase both jumps and slides on ssDNA, acting with possibly low efficiency at each encountered cytidine as proposed by Goodman (Chelico et al., 2006).
In the absence of an induced double-strand break, kataegis in AID/APOBEC-expressing yeast was greatly dependent on UNG, likely reflecting that kataegis under these circumstances was dependent on breaks generated through the processing of abasic sites. That some residual kataegis is still observed in the absence of UNG might well reflect that breaks will occasionally occur spontaneously through other means.
The finding that a double-strand break can be the nucleating lesion for kataegis in this yeast experimental system is consistent with the close association of kataegis and rearrangements in breast cancer (Nik-Zainal et al., 2012). Whereas the yeast data demonstrate that double-strand breaks can nucleate kataegis, it is probable that APOBEC-catalysed kataegic deamination in exposed stretches of single-stranded DNA in the cancer cells might itself lead to DNA breaks.
It has long been known that recombinational repair of double-strand breaks in yeast is associated with an increased frequency of local mutations with implication of error-prone polymerases (Strathern et al., 1995). In our experiments, the signatures of the mutations associated with the I-SceI break (see Figure 2D legend) implicate APOBEC3 activity rather than error-prone polymerases as the source of mutations during the double-strand break repair. More recently Gordenin and colleagues have shown that extensive clusters of mutations can be induced in yeast by alkylating agents acting on single-stranded DNA (Roberts et al., 2012). Thus, the AID/APOBEC-mediated kataegic hypermutation, driven by these endogenous mutagens, can be viewed as a specialised, albeit dramatic, example of localised hypermutation caused by exposure of single-stranded DNA during homologous recombination, along the lines proposed by Roberts (Roberts et al., 2012).
It is striking that transversions in yeast are specifically associated with kataegic stretches whereas the unclustered mutations in the same cells are restricted to transitions. The reason for this is a matter for speculation but we suspect the singlet uracils largely encounter UNG as part of the base-excision repair process (which would be non-mutagenic); the C→T transition mutations would be the result of direct replication over the non-excised uracil. In contrast, the action of UNG on uracil in a stretch of exposed single-stranded DNA may yield an abasic site that is replicated over by a translesion polymerase rather than repaired.
The yeast experiments indicate that kataegis can be triggered by DNA breaks, whether generated through the joint action of the deaminase and UNG or by other processes. The same likely holds true for the breast cancer kataegis. However, there is no reason why kataegis should be restricted to such initiating triggers. One can well imagine that other processes that cause significant exposure of single-stranded DNA (e.g., DNA spooling caused by replication fork stalling [Lopes et al., 2006]; R-loop structures generated during transcription of suitable target sequences [Aguilera and Gómez-González, 2008]) could predispose to kataegis. Such mechanisms, or spontaneously-arising DNA breaks, could underlie the presence of kataegis in UNG-deficient cells (this work and Lada et al., 2012). A more extensive study of the genetic dependence of kataegis and of the localisation of the kataegic stretches in yeast may give insight into such possibilities.
Comparison of the yeast and breast cancer data reveals that the kataegic stretches in both sets extend over a similar range of lengths but with the cancer kataegis displaying a twofold to fivefold higher average mutation density. This could reflect differences in deaminase activity in the two organisms. It also appears that those cancers which harbour kataegic stretches comprising larger numbers of mutations additionally contain multiple clusters with smaller numbers of T-C mutations (Figure 3—figure supplements 2–4). The marked bias towards a 5′-T seen amongst some cancer singlet C mutations suggests that kataegis might be signalling a much wider implication of APOBEC-mediated deamination in genome-wide mutagenesis in some tumours.
The mutation data obtained in yeast reveal APOBEC3B and APOBEC3A as the only deaminases characterised whose target specificity matches the breast cancer kataegic mutations, arguing very strongly for an involvement of these deaminases in cancer kataegis. The implication of APOBEC3A fits with data from others revealing that enforced expression of APOBEC3A (as well as APOBEC3C and 3H) can lead to mutation of human papilloma viral DNA (Vartanian et al., 2008) as well as of transfected plasmid DNA (Stenglein et al., 2010). Enforced expression of APOBEC3A has also been shown to lead to genomic damage in the nucleus (Landry et al., 2011). The target-specificity data implicating APOBEC3B in the breast cancer mutation is not only supported by our demonstration that its enforced expression can yield DNA damage but also by the fact that it is well expressed in breast cancer cell lines. Furthermore, after submission of this manuscript, Burns et al., have demonstrated that APOBEC3B expression also correlates with a T-C mutator phenotype in many primary breast cancer tumours (Burns et al., 2013).
Thus, APOBEC3B and/or APOBEC3A are the deaminases likely responsible for the breast cancer hypermutation although it remains possible that other APOBEC3s might contribute to genome mutation in other tumours. With regard specifically to kataegis, given that double strand breaks are a common feature of tumour development, it will obviously be interesting to discover whether whole genome sequencing of other tumour types also reveals evidence of kataegic hypermutation and whether, in light of the fact that the AID/APOBEC family has undergone considerable expansion in primates, such kataegic hypermutation might also have contributed more generally to recent genome evolution.
Yeast strain BY4741 (MATa; his3Δ1; leu2Δ0; met15Δ0; ura3Δ0) and the ung1Δ::kanMX4 derivative were from Euroscarf (Frankfurt, Germany). The rev1Δ::LEU2 derivative was generated by homologous recombination to remove the open reading frame of REV1 using a LEU2 cassette generated by PCR using 157-bp 5′ homology arm and 200-bp 3′ homology arm. The CAN1::KanMX-ISceIRS strain was generated by inserting a 1.4-kb module containing the I-SceI-recognition site and the KanMX selection cassette (Wach et al., 1994) immediately after its poly-A site by homologous recombination. Correct integration of the targeting constructs was confirmed by PCR.
Yeast transformants expressing galactose-inducible human AID/APOBEC proteins were generated by transformation with the appropriate pRS426-derived expression vectors (Christianson et al., 1992) in which C-terminally-FLAG-tagged AID/APOBEC cDNAs flanked by a GAL1 promoter and tADH polyA site had been inserted between the HindIII and XhoI sites. The cDNAs encoded the full-length human wild type polypeptides except that AID* and A3G* correspond to upmutants AID-7.3 and A3G-T283I in Wang et al., 2009, with a FLAG-tagged A3G* comprising just the second deaminase domain used in the I-SceI experiments. For these experiments, the I-SceI-ORF with an N-terminal HA tag and 3xNLS (Johnson et al., 1999) was cloned between the EcoRI and XhoI sites in pSH62 (Gueldener et al., 2002).
For canavanine resistance assays, single yeast colonies (at least 12 independent colonies for each experiment) that had been grown overnight in glucose medium to repress expression from the GAL1 promoter were diluted 1:100 into galactose-containing medium and grown for 2 days at 30°C before serial dilutions were plated onto canavanine-selection or viability plates. Colonies were counted after 3 days growth. For I-SceI-break induction, individual colonies were grown overnight in glucose medium before dilution 1:10 into raffinose-containing medium. After 4 hr growth, galactose was added to 2% and cells were cultured for a further 2 days before serial dilution and plating as above. APOBEC3G* was used in the I-SceI experiments as it gave a good mutation load but a lower proportion of kataegic mutations than AID* (Supplementary file 1B). Induction of protein expression both with and without the raffinose step gave similar mutation rates.
For genome sequence determination, individual CanR colonies selected as above were subcloned by streaking out on selective plates, grown for 3 days in canavanine selection media (10 ml) and DNA prepared using Gentra Puregene Yeast/Bact. Kit (Qiagen Ltd, Manchester, UK) following manufacturers instructions.
Short insert 500-bp library construction, flowcell preparation and cluster generation was in accordance with the Illumina no-PCR library protocol (Kozarewa et al., 2009). 100-bp paired-end sequencing was performed on Illumina Hiseq 2000 analysers as described in the Illumina Genome Analyzer operating manual. Short insert 2 × 100 bp paired-end reads were aligned to the reference yeast genome (SacCer_Apr2011/sacCer3) using BWA (Li and Durbin, 2009). An average of approximately 25-fold sequence coverage was achieved for each yeast genome.
A bespoke substitution-calling algorithm, CaVEMan (manuscript in preparation) was used for calling somatic substitutions where these were identified as alleles present in an AID/APOBEC-transformant genome but absent in the parental BY4741 genome. All high-confidence mutations included in this analysis were present in more than 0.5 variant allele fractions but were more frequently present in all reads reporting that genomic position. Post-processing filters were developed to improve the specificity of substitution calling. These filters removed false positive variants that were generated by genomic features resulting in mapping errors or systematic sequencing artefacts. All substitutions were visually assessed using a genome browser in order to ensure a high specificity of mutation-calling.
K-cluster analysis (Hartigan and Wong, 1979) was used to divide intermutational distances (IMDs) into two groups, which we designated distal and proximal. An IMD which excluded 99% of the distal group was then used as a threshold for cluster calling. For all the yeast analysis, the IMD threshold was set using the combined dataset of mutations from the wild type transformants. For the breast cancer analysis, IMDs combined from all tumours were used for threshold setting (except PD4120 because of its much higher mutation load). A cluster was called when a minimum of 5 adjacent mutations were identified each with IMDs below the threshold. This 5 mutation threshold was chosen since such clusters are likely to arise with a probability of <0.001 by randomly scattered mutations.
Sequence contexts were compared in pairwise fashion with the Tomtom motif comparison tool using Sandelin-Wasserman similarity (MEME Suite; http://tools.genouest.org/tools/meme/cgi-bin/tomtom.cgi) and are displayed as p-values.
APOBEC expressing vectors were generated by cloning the appropriate C-terminally FLAG-tagged cDNAs into a self-inactivating retroviral plasmid. The self-inactivating retroviral plasmid was generated by cloning a pTRE-(pTRE-TIGHT; Clontech, Saint-Germain-en-Laye, France)-IRES-GFP (pMX-IG) cassette into the BglII- and 3′LTR XbaI site of pMSCVpuro. The tetracycline transactivator (TET-ON; Clontech) was cloned into a modified pMSCVpuro (Clontech) which contained an IRES-mCherry cassette at the BglII–ClaI site, to generate pTET-ON-ImC. A derivative of the KBM7 human myelocytic leukemia line that stably expressed TET-ON protein was established by retroviral infection with virus particles produced from 293 cells that had been co-transformed with pTET-ON-ImC and packaging vectors using GeneJuice (Merck KGaA, Darmstadt, Germany) according to manufacturers instructions. This KBM7[pTET-ON-ImC] cell-line was then superinfected with pMSCV/APOBEC retrovirus to yield derivatives expressing the AID/APOBEC proteins under doxycyclin-inducible control. Expression of the FLAG-tagged AID/APOBEC proteins in the KBM7 transfectants was monitored by Western blot analysis of whole cell lysates after 24 hr of doxycyclin induction using HRP-conjugated anti-FLAG antibody M2 (A8592; Sigma, Gillingham, UK), probing with anti-lamin antibody (ab16048; Abcam, Cambridge, UK) as a loading control.
Stable derivatives of KBM7 cells harbouring regulatable APOBEC proteins were induced for 72 hr with doxycyclin (inducer) and viability measured by flow cytometry by DAPI exclusion. γH2AX and 53BP1 induction and localisation was analysed by flow cytometry and confocal immunofluorescence after 24hr induction with doxycyclin; caspase inhibitor (20 μM Z-VAD-FMK; Promega, Southampton, UK) was included in the cultures for γH2AX expression analysis to maintain cell viability. For γH2AX staining, ethanol-fixed cells were stained sequentially for 1 hr with anti-γH2AX (05-636; Millipore, Watford, UK) and Alexa568-conjugated anti-mouse IgG (A-11004; Invitrogen Life Technologies Ltd, Paisley, UK) prior to resuspension in PBS containing 5 μg/ml DAPI and flow cytometry. For 53BP1 staining, cells were allowed to adhere to poly-L-lysine-coated cover slips and stained using anti-53BP1 (NB100-304; Novus Biologicals, Cambridge, UK) and Alexa 568-conjugated anti-rabbit IgG (A-11011; Invitrogen) prior to mounting with DAPI. 20-30 fields per sample were imaged with a Bio-Rad Radiance 2100 confocal microscope using a 63x oil immersion objective. Images were processed using ImageJ (default settings), and cells were scored as exhibiting either diffuse or punctate staining with punctate cells further scored for the number of foci.
Breast cancer cell lines were kindly provided by Dr Kerstin Meyer (Cancer Research Institute, Cambridge, United Kingdom) and RNA extracted using RNeasy Plus Mini Kit (Qiagen). cDNA was prepared using GoScript Reverse Transcription System (Promega) prior to APOBEC expression quantification by qPCR using QuantiFast SYBR Green PCR Kit using an ABI ViiA-7 system (Applied Biosystems, Paisley, UK). The primers (which were selected for specificity and equivalent amplification on APOBEC ORF templates) are given in Supplementary file 1C.
APOBEC3G DNA deaminase acts processively 3′ → 5′ on single-stranded DNANat Struct Mol Biol 13:392–399.https://doi.org/10.1038/nsmb1086
Algorithm AS 136: A K-Means Clustering AlgorithmJournal of the Royal Statistical Society. Series C (Applied Statistics) Vol. 28, No. 1:100–108.
Y-family DNA polymerases and their role in tolerance of cellular DNA damageNat Rev Mol Cell Biol 13:141–152.https://doi.org/10.1038/nrm3289
APOBEC3 proteins mediate the clearance of foreign DNA from human cellsNat Struct Mol Biol 17:222–229.https://doi.org/10.1038/nsmb.1744
DNA synthesis errors associated with double-strand-break repairGenetics 140:965–972.
John StamatoyannopoulosReviewing Editor; University of Washington, United States
eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.
Thank you for choosing to send your work entitled “AID/APOBECs induce DNA break-associated mutation showers: implication of APOBEC3B/A in breast cancer kataegis” for consideration at eLife. Your article has been favorably evaluated by a Senior editor and 2 reviewers, one of whom is a member of our Board of Reviewing Editors. The Reviewing editor has assembled the following comments to help you prepare a revised submission.
The manuscript by Taylor et al presents a concise and timely description of the role of APOBEC3A and ABOBEC3B in breast cancer kataegis. Studying the observations by Nik-Zainal et al, the authors use AID/APOBEC yeast overexpression constructs and whole genome sequencing to elucidate the mutational mechanisms induced by a set of AID/APOBEC family members.
The power of this study derives from the authors' ability to utilize yeast knockout strains to study the role of specific repair proteins within the genesis of kataegic mutations, and to obtain whole-genome sequencing data from multiple mutant yeast clones. Overall the study is very well done and we do not see any major flaws. The experiments are well laid out and the manuscript is well written.
One arising issue is the recently published study by Burns et al in Nature (doi:10.1038/nature11881). Although the conclusions of these two studies overlap, it is clear that the manuscript by Taylor et al offers important additional mechanistic insight into the kataegic process through the key UNG deletion experiment. The findings from this experiment expose the fundamental role of base excision within kataegis and demonstrate the utility of the yeast system the authors have created.
As you will see below, the revisions requested, while important for a final submission and publication decision, are relatively minor. Given the recent emergence of a competing manuscript of which you are aware, we ask that you make these revisions without delay.
1. In the text the authors refer to the clustered mutations as showing a “strong tendency towards strand polarity (Figure 1E and F).” We are assuming that this reference is to the observation that one tends to see either all red ticks (clustered Gs) or black ticks (clustered Cs) together in a single cluster. However, it would be good to clarify this in either the text or figure legends.
2. Was the stability of the AID and APOBEC proteins in yeast measured? Is the amount of protein produced equivalent?
3. For the canavanine resistance assays, cells were grown in glucose medium, diluted, and then grown in galactose. For the I-SceI break assays, cells are transferred from glucose medium to raffinose medium for 4 hr, and then to galactose medium. Why the difference? Also, wouldn't the induction of both the human deaminases and I-SceI be better if, instead of using glucose medium at all, cells were grown in the non-inducing/non-repressing mediums of either raffinose or 3% glycerol 3% lactate, followed by the addition of galactose?
4. The absence of REV1 could have pleiotropic effects not related to its deoxycytidyl transferase activity. For example, you may want to note previous work sequencing CAN1 mutations in a rev1-null background (Mito et al., Genetics 179: 1795-1806 August 2008). In that work, a temperature sensitive mutant of polymerase delta (pol3-t) was shown to lead to an increase in canavanine resistance that was almost completely suppressed in the absence of REV1, without significant changes to the number of transitions or transversions. Also the absence of REV1 suppressed the increased frequencies of chromosome loss, interhomolog exchange, and direct repeat recombination measured in pol3-t.
5. In Figure 2D, the authors might also want to show the condition; here I-SceI is present but not APOBEC3G. This would add greatly to the paper because it would indicate how much of the mutation localization is due to APOBEC3G (after I-SceI).
6. The authors infer long zones of end resection, but this is only an indirect inference based on AID and Apobec action. The single-strandedness could arise from increased breathing in zones of “opened up” chromatin due to the repair at the DSB. The authors should acknowledge this additional interpretation.https://doi.org/10.7554/eLife.00534.016
1) In the text the authors refer to the clustered mutations as showing a “strong tendency towards strand polarity (Figure 1E and F).” We are assuming that this reference is to the observation that one tends to see either all red ticks (clustered Gs) or black ticks (clustered Cs) together in a single cluster. However, it would be good to clarify this in either the text or figure legends.
We agree that this is an essential point that should be heavily emphasised. We have amended the paragraph to read: “Like the cancer kataegis, the clustered mutations in the various yeast transformants showed a strong tendency towards strand polarity; mutations within a cluster occur predominantly at either a C residue or a G residue with over 88% of mutations being strand coordinated (Figure 1E and F).”
2) Was the stability of the AID and APOBEC proteins in yeast measured? Is the amount of protein produced equivalent?
We have measured proteins levels and there is little correlation between protein levels and the extent of deamination. We have confirmed that all AID/APPOBEC proteins are expressed in yeast. We find that AID and AID* are expressed at substantially lower levels in the yeast transformants than the APOBECs and we have included an extra panel in Figure 1–figure supplement 1 to show this.
We suspect that the number of mutations is dependent more on the specific activity of each protein rather than on its abundance, which is in keeping with observations that we have previously made using AID upmutants in mammalian cell transfectants (Wang et al., Nat Struct Mol Biol 16, 769–776 ).
3) For the canavanine resistance assays, cells were grown in glucose medium, diluted, and then grown in galactose. For the I-SceI break assays, cells are transferred from glucose medium to raffinose medium for 4 hr, and then to galactose medium. Why the difference? Also, wouldn't the induction of both the human deaminases and I-SceI be better if, instead of using glucose medium at all, cells were grown in the non-inducing/non-repressing mediums of either raffinose or 3% glycerol 3% lactate, followed by the addition of galactose?
We were concerned that low levels of expression of AID/APOBEC proteins might result in selection for inactivating mutations on the AID/APOBEC plasmids. We therefore wanted to ensure limited and controlled exposure of the genomes to the exogenous mutators; hence the choice of glucose repressive media for culture during periods of clonal expansion. The difference in induction between experiments is historical. We have since tested both induction methods and found them to produce similar results, and we have added a comment in the Materials and methods section to reflect this: “Induction of protein expression both with and without the raffinose step gave similar mutation rates.”
However, we appreciate the reviewers' comments and we can certainly use their suggestions for growth conditions in future experiments.
4) The absence of REV1 could have pleiotrophic effects not related to its deoxycytidyl transferase activity. For example, you may want to note previous work sequencing CAN1 mutations in a rev1-null background (Mito et al., Genetics 179: 1795-1806 August 2008.) In that work, a temperature sensitive mutant of polymerase delta (pol3-t) was shown to lead to an increase in canavanine resistance that was almost completely suppressed in the absence of REV1, without significant changes to the number of transitions or transversions. Also the absence of REV1 suppressed the increased frequencies of chromosome loss, interhomolog exchange, and direct repeat recombination measured in pol3-t.
The reason that we suspect that the effect of REV1 deficiency reflects its deoxycytidyl transferase activity is that the transversion mutations in REV1 sufficient cells are heavily biased towards C:G substitutions. While we fully accept that we cannot exclude indirect effects of REV1 deficiency, addressing the details and mechanistic function for REV1 in kataegis is beyond the scope of this paper, and will require extensive additional work. We have modified the text to reflect the possibility of a non-catalytic role for REV1 adding in the Results section: “This presumably reflects diminished repair of the AID/APOBEC-generated uracils. There was an overall decrease in average total mutation load in AID* transformants of REV1 deficient yeast, that might reflect the possible non-catalytic roles of REV1 during DNA damage repair (Sale et al., 2012).”
5) In Figure 2D, the authors might also want to show the condition; here I-SceI is present but not APOBEC3G. This would add greatly to the paper because it would indicate how much of the mutation localization is due to APOBEC3G (after I-SceI).
We agree that such a control would help emphasise the point; unfortunately we have not sequenced these genomes. The signatures of the detected mutations do, however, make this point as the vast majority occur at the APOBEC3G consensus, indicating they are caused by the activity of APOBEC3G and not other repair processes as we highlight in the figure legend. To further emphasise this, we have added the following sentence to the Discussion: “In our experiments, the signatures of the mutations associated with the I-SceI break (see Figure 2D legend) implicate APOBEC3 activity rather than error-prone polymerases as the source of mutations during the double-strand break repair.”
6) The authors infer long zones of end resection, but this is only an indirect inference based on AID and Apobec action. The single-strandedness could arise from increased breathing in zones of “opened up” chromatin due to the repair at the DSB. The authors should acknowledge this additional interpretation.
We do not favour an explanation based on breathing since breathing suggests a transient melting of the DNA, which would allow the unmutated DNA strand to act as a template upon strand reannealing, thereby resulting in non-mutagenic repair. Furthermore, the lengths of the observed kataegic tracts are not really compatible with measurements of DNA dynamics that suggest that fluctuations in base pairing are a relatively local phenomenon (Jose et al., Proc Natl Acad Sci USA 106, 4231–4236 ). However, we fully admit that we cannot definitively exclude it; we acknowledge in the Discussion a variety of mechanisms that promote single stranded DNA could generate kataegis. In the interests of brevity, we included in the Discussion a selective but not exhaustive list of possible alternative mechanisms.https://doi.org/10.7554/eLife.00534.017
- Benjamin JM Taylor
- Yee Ling Wu
- Cristina Rada
- Michael S Neuberger
- Serena Nik-Zainal
- Lucy A Stebbings
- Keiran Raine
- Peter J Campbell
- Michael R Stratton
- Yee Ling Wu
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
This work was supported by the Medical Research Council (file reference number U105178806 and a MRC Centenary Award) and the Wellcome Trust (grant reference 098051). SN-Z is a Wellcome Trust Clinical Research Training Fellow, YLW was supported by an NIH Ruth L. Kirschstein National Research Service Award (grant number F32AI091311 from the National Institute Of Allergy And Infectious Diseases [The content of this manuscript is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute Of Allergy And Infectious Diseases or the National Institutes of Health]) and PJC is a Wellcome Trust Senior Clinical Research Training Fellow (grant reference 088340 MA). We would like to acknowledge Rebecca Berrens for help establishing yeast protocols and Sarah O'Meara, Stuart McLaren and Peter Ellis as well as the Core Sequencing Facility, the IT group and many other members of the Cancer Genome Project and the Core IT team of the Wellcome Trust Sanger Institute for assistance.
- John Stamatoyannopoulos, University of Washington, United States
- Received: January 11, 2013
- Accepted: March 9, 2013
- Version of Record published: April 16, 2013 (version 1)
© 2013, Taylor et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Downloads (link to download the article as PDF)
Download citations (links to download the citations from this article in formats compatible with various reference manager tools)
Open citations (links to open the citations from this article in various online reference manager services)
Supergene regions maintain alleles of multiple genes in tight linkage through suppressed recombination. Despite their importance in determining complex phenotypes, our empirical understanding of early supergene evolution is limited. Here we focus on the young "social" supergene of fire ants, a powerful system for disentangling the effects of evolutionary antagonism and suppressed recombination. We hypothesize that gene degeneration and social antagonism shaped the evolution of the fire ant supergene, resulting in distinct patterns of gene expression. We test these ideas by identifying allelic differences between supergene variants, characterizing allelic expression across populations, castes and body parts, and contrasting allelic expression biases with differences in expression between social forms. We find strong signatures of gene degeneration and gene-specific dosage compensation. On this background, a small portion of the genes has the signature of adaptive responses to evolutionary antagonism between social forms.
Foldback inversions, also called inverted duplications, have been observed in human genetic diseases and cancers. Here we used a Saccharomyces cerevisiae genetic system that generates gross chromosomal rearrangements (GCRs) mediated by foldback inversions combined with whole-genome sequencing to study their formation. Foldback inversions were mediated by formation of single-stranded DNA hairpins. Two types of hairpins were identified: small-loop hairpins that were suppressed by MRE11, SAE2, SLX1, and YKU80 and large-loop hairpins that were suppressed by YEN1, TEL1, SWR1, and MRC1. Analysis of CRISPR/Cas9-induced double strand breaks (DSBs) revealed that long-stem hairpin-forming sequences could form foldback inversions when proximal or distal to the DSB, whereas short-stem hairpin-forming sequences formed foldback inversions when proximal to the DSB. Finally, we found that foldback inversion GCRs were stabilized by secondary rearrangements, mostly mediated by different homologous recombination mechanisms including single-strand annealing; however, POL32-dependent break-induced replication did not appear to be involved forming secondary rearrangements.