A modified fluctuation assay reveals a natural mutator phenotype that drives mutation spectrum variation within Saccharomyces cerevisiae

  1. Pengyao Jiang
  2. Anja R Ollodart
  3. Vidha Sudhesh
  4. Alan J Herr
  5. Maitreya J Dunham
  6. Kelley Harris  Is a corresponding author
  1. Department of Genome Sciences, University of Washington, United States
  2. Molecular and Cellular Biology Program, University of Washington, United States
  3. Department of Laboratory Medicine and Pathology, University of Washington, United States
  4. Department of Computational Biology, Fred Hutchinson Cancer Research Center, United States


Although studies of Saccharomyces cerevisiae have provided many insights into mutagenesis and DNA repair, most of this work has focused on a few laboratory strains. Much less is known about the phenotypic effects of natural variation within S. cerevisiae’s DNA repair pathways. Here, we use natural polymorphisms to detect historical mutation spectrum differences among several wild and domesticated S. cerevisiae strains. To determine whether these differences are likely caused by genetic mutation rate modifiers, we use a modified fluctuation assay with a CAN1 reporter to measure de novo mutation rates and spectra in 16 of the analyzed strains. We measure a 10-fold range of mutation rates and identify two strains with distinctive mutation spectra. These strains, known as AEQ and AAR, come from the panel’s ‘Mosaic beer’ clade and share an enrichment for C > A mutations that is also observed in rare variation segregating throughout the genomes of several Mosaic beer and Mixed origin strains. Both AEQ and AAR are haploid derivatives of the diploid natural isolate CBS 1782, whose rare polymorphisms are enriched for C > A as well, suggesting that the underlying mutator allele is likely active in nature. We use a plasmid complementation test to show that AAR and AEQ share a mutator allele in the DNA repair gene OGG1, which excises 8-oxoguanine lesions that can cause C > A mutations if left unrepaired.


Mutations are a double-edged sword. At the molecular level, they usually arise as a spontaneous consequence of DNA replication errors or damage and are the ultimate cause of genetic diseases (Iossifov et al., 2014; Sebat et al., 2007; Antonarakis and Beckmann, 2006; Crow, 1997; Nei, 1983). All organisms have evolved complex mechanisms for keeping mutation rates low and safeguarding their genetic information as it is passed from generation to generation (Beckman and Loeb, 1993; Eisen and Hanawalt, 1999); in multicellular organisms, these mechanisms also safeguard somatic tissues from mutations that can cause cancer and age-related decline (Alexandrov et al., 2013; Loeb, 2016; Risques and Kennedy, 2018). A low mutation rate is essential for long-term population survival, and the larger and more complex a genome is, the lower the mutation rate must be to prevent deleterious mutations from arising faster than natural selection can eliminate them (Eigen, 1971; Drake, 1991; Sung et al., 2012a; Acosta et al., 2015). Over long time scales, however, mutations also serve as the raw material for evolution. Although beneficial mutations are rare occurrences, they are essential for the acquisition of novel phenotypes and adaptations (Gompel et al., 2005; McGregor et al., 2007).

A large body of theory has been written to describe how natural selection might act on the mutation rate to balance these beneficial and deleterious effects (Sturtevant, 1937; Kimura, 1967; Leigh,, 1970; Johnson, 1999; André and Godelle, 2006; Sung et al., 2012a). One prediction is that organisms living in more changeable environments might evolve higher mutation rates than organisms living in more stable environments, assuming that the environment determines whether a higher rate of beneficial mutations is likely to counterbalance a higher rate of deleterious mutations. This prediction has been borne out in laboratory evolution experiments, where mutator phenotypes sometimes emerge in populations that are forced to tolerate challenging conditions (Good et al., 2017; Tenaillon et al., 2016; Couce et al., 2017) and mutator strains are often observed to take over chemostat populations by producing beneficial mutations at a higher rate than competing non-mutator strains (Chao and Cox, 1983). Many mutator phenotypes in E. coli have been linked to defects in DNA repair enzymes (Horst et al., 1999). Mutator phenotypes also commonly occur in cancer (Loeb, 2001; Prindle et al., 2010), likely either because of relaxed selection against cellular dysfunction or because it is beneficial for cancer cells to adapt rapidly to their aggressive growth niche. However, it is less clear how much mutation rate variation exists within and between natural populations, and if such variation exists, whether it is maintained by natural selection. The drift barrier hypothesis predicts that mutator alleles will usually be deleterious because they produce more damaging mutations than beneficial ones, but that mutator alleles with relatively small effects may persist in populations because they are not deleterious enough to be efficiently eliminated (Lynch et al., 2016).

Although next-generation sequencing has rapidly increased our ability to measure the genetic variation that currently exists within populations, the extent of mutation rate variation is still more difficult and expensive to measure. One of the original methods for measuring mutation rates is the Luria-Delbrück fluctuation assay (Lang and Murray, 2008; Gou et al., 2019; Luria and Delbrück, 1943), in which a population of microorganisms is allowed to grow clonally for a controlled length of time, then challenged with a form of artificial selection that kills most cells except for those that have happened to acquire specific resistance mutations. The mutation rate can then be calculated from the number of colonies that manage to grow after this artificial selection is imposed.

Although fluctuation assays are an elegant and efficient way for measuring the mutation rates of specific reporter genes, the results are potentially sensitive to the reporter gene being used and where it is located within the genome (Lang and Murray, 2008; Lang and Murray, 2011); in addition, they are not applicable to multicellular organisms. These drawbacks have motivated the development of newer methods that take advantage of high-throughput sequencing, such as mutation accumulation (MA) assays in which a laboratory population is serially bottlenecked for many generations, eliminating most effects of natural selection and allowing mutations to be directly counted by sequencing at the end of the experiment. MA studies have been used to estimate mutation rates in a wide variety of organisms (Lynch et al., 2008; Sharp et al., 2018; Zhu et al., 2014; Farlow et al., 2015; Wang et al., 2019; Keightley et al., 2009), including a recent study of nine heterozygous S. cerevisiae strains (Dutta et al., 2021). However, higher throughput MA experiments require DNA repair genes to be knocked out to induce higher mutation rates that can be accurately measured with fewer generations of labor-intensive propagation, as in Liu and Zhang, 2021.

An alternative source of information about mutational processes in strains with low mutation rates is genetic variation among related individuals who share common ancestors. Polymorphic sites are easier and cheaper to discover than new mutations, because they are present at a higher density within the genome and often shared among several individuals. Mining polymorphisms for information about mutation rates can be difficult since their abundance is affected by genetic drift and natural selection (Scally and Durbin, 2012; Ségurel et al., 2014; Zhu et al., 2017), but despite these limitations, they have provided surprisingly strong evidence for the existence of historical changes to the mutation spectrum, meaning the tendency of mutations to occur most often in certain nucleotide contexts (Hwang and Green, 2004). In humans, for example, Europeans and South Asians have a significantly higher proportion of TCC> TTC mutations than other human groups (Harris, 2015; Harris and Pritchard, 2017), a pattern that is difficult to explain without a recent population-specific increase in the rate of this type of mutation. This pattern might have been caused by either a genetic mutator or an environmental mutagen, but is not explicable by the action of selection or drift or any other process that simply modulates the retention or loss of genetic variation.

Polymorphism data has revealed that each human population and great ape species appears to have a distinctive triplet mutation spectrum, which implies that genetic and/or environmental mutators likely emerge relatively often and act within localized populations to increase mutation rates in specific sequence contexts (Harris and Pritchard, 2017; Goldberg and Harris, 2021). However, identifying these hypothetical mutators is a challenging proposition, not least because some population-specific signatures such as the human TCC> TTC enrichment appear to be relics of mutators that are no longer active. A recent study of de novo mutations in diverse human families found some evidence of mutation rate variation between human populations (Kessler et al., 2020), but argued that most of this variation was driven by the environment rather than genetics. Given that humans from different populations tend to be born and raised in different environments, it is extremely challenging to determine the degree to which genetics and/or the environment are responsible for variation of the rates and spectrum of de novo mutations accumulating within human populations today.

More is known about the genetic architecture of mutagenesis in model organisms, including the single-celled organism Saccharomyces cerevisiae, where it is tractable to disentangle genetic mutator effects from environmental ones by accumulating mutations on different genetic backgrounds in controlled laboratory environments (Serero et al., 2014; Lang et al., 2013; Stirling et al., 2014; Huang et al., 2003; Herr et al., 2011). Many S. cerevisiae mutator alleles have been discovered using genetic screens, which involve creating libraries of artificial mutants in the lab and determining which ones have high mutation rates (Stirling et al., 2014). Mutation rates can be elevated by up to a thousand-fold in lines where DNA proofreading and repair capabilities are artificially knocked out (Herr et al., 2011; Serero et al., 2014; Lang et al., 2013), and quantitative trait loci with more modest effects have been found to underlie a five-fold range of mutation rate variation among a few natural S. cerevisiae strains (Gou et al., 2019). A more complex mutator phenotype has been observed as a result of epistasis between two incompatible alleles found as natural variation in the mismatch repair genes MLH1 and PMS1, although the natural isolates in which these alleles are found appear to have acquired compensatory variants that suppress this mutator phenotype (Bui et al., 2017; Argueso et al., 2003; Raghavan et al., 2018; Heck et al., 2006). An extreme case of long-term maintenance of a hypermutation lineage comes from phylogenetic analysis of bipolar budding yeasts Hanseniaspora which shows large amounts of gene loss, accelerated evolutionary rates and altered mutation spectra (Steenwyk et al., 2019).

Mild environmental stressors, such as high salt, ethanol, and heat, can also alter the mutation rate of S. cerevisiae (Liu and Zhang, 2019; Voordeckers et al., 2020) as well as Arabidopsis thaliana (Jiang et al., 2014; Belfield et al., 2021). The same environmental perturbations can cause detectable changes to both species’ mutation spectra. The mutation spectrum has also been observed to depend on whether S. cerevisiae is replicating in a haploid or diploid state (Sharp et al., 2018). In addition, environmental mutagens, more complex ploidy, and genetic mutation rate modifiers could all conceivably affect the mutation spectrum of natural variation as it accumulates. However, no study to our knowledge has looked at whether any mutational signatures measured in the laboratory are capable of explaining natural mutation spectrum variation observed in polymorphism data from a model species.

Recently, comprehensive sampling efforts have produced a collection of 1011 natural isolates of S. cerevisiae (Peter et al., 2018). This is a uniquely powerful system containing abundant natural variation that accumulated within diverse environments during the recent and ancient evolution of S. cerevisiae, and the panel is also amenable for experimental accumulation of mutations over laboratory growth. Many genetic polymorphisms differentiate these strains, and these are relics of mutations that accumulated over many generations on divergent genetic backgrounds adapted to diverse environmental conditions, ranging from forests to beverage fermentation pipelines. Both environmental mutagens and genetic mutators may have created differences among the mutation spectra of these 1011 strains, but only genetically determined mutation spectrum differences should have the potential to be reproduced in the spectra of mutations accumulated in a controlled lab environment.

We hypothesized that yeast strains with outlying spectra of natural polymorphisms are more likely to have distinct de novo mutation spectra than strains whose polymorphisms have indistinguishable mutation spectra. The same hypothesis underlies previous inferences of de novo mutation spectrum variation from polymorphism data (Harris, 2015; Harris and Pritchard, 2017; Dumont, 2019; Goldberg and Harris, 2021), and rare mutations found in mouse colonies (Dumont, 2019), but this hypothesis has been difficult to test in vertebrate species. To enable such direct testing in S. cerevisiae, we describe a new Luria-Delbrück-based assay that efficiently measures the spectra of de novo mutations in haploid strains using pooled amplicon sequencing. We then use this assay to identify strains with reproducibly measurable mutator phenotypes that explain the spectrum biases of these strains’ polymorphisms. Some proportion of natural mutation spectrum variation might not be reproducible in the lab if it is driven by environmental mutagens, bioinformatic artifacts, or extinct genetic mutators, but our assay has the potential to identify which gradients of mutation spectrum variance are driven by extant genotypic differences.


The mutation spectrum of natural variation in S. cerevisiae

To measure the mutation spectrum of genetic variation present in the 1011 S. cerevisiae natural isolates (Peter et al., 2018), we polarized single nucleotide polymorphisms using the outgroup S. paradoxus (Yue et al., 2017), then classified them into two transition types and four transversion types based on their ancestral and derived alleles. Closely related strains were excluded to avoid overrepresentation of certain groups (Materials and methods). We calculated the proportion of each mutation type among the derived alleles present in each individual strain, utilizing all derived variants present below 50% frequency (a total of 1,213,508 SNPs). In order to minimize bias from ancestral allele misidentification, we excluded strains with extensive, pre-documented introgression from S. paradoxus (Peter et al., 2018). A final set of 729,357 SNPs with ancestral alleles that passed the derived allele frequency filter were used for the following analysis (a subset of the 838,568 SNPs with callable ancestral alleles). Principal component analysis (PCA) on these individual mutation spectra reveals that strains from the same population tend to have more similar mutation spectra than more distantly related strains (Figure 1A, Supplementary file 1A). Some of this structure disappears when SNPs are subsampled to eliminate double-counting of variants that are shared among multiple strains (Figure 1—figure supplement 1), but several clades appear as consistent outliers in both analyses, including the African beer and European wine strains. The compact architecture of the yeast genome makes it infeasible to exclude coding regions and conserved regions, but we obtain similar PC structures when we use only synonymous protein-coding variants and when we use all polymorphisms passing quality filters (Figure 1—figure supplement 2).

Figure 1 with 5 supplements see all
Mutation spectra of natural isolates of S. Cerevisiae.

Principal component analysis of segregating mutation spectrum variation from a subset of the 1011 yeast strains. (A). Mutation spectrum PCA of all natural variants under 50% derived allele frequency. Each strain’s mutation spectrum histogram is projected as a single point, colored to indicate its population of origin (Peter et al., 2018). The inset summarizes the loadings of the first and second principal component vectors. (B). Mutation spectrum PCA of rare variants (derived allele count 2–4). Singleton variants are excluded to minimize the impact of sequencing error. Strains appearing more than 1.8 standard deviations from the origin along both PC1 and PC2 are labeled with their strain names.

Figure 1A shows that the Taiwanese and African beer populations are outliers along PC1. As seen from the principal component loadings, these two groups mainly differ from the rest in the relative proportions of the two transition types (A > G and C > T): Taiwanese strains are enriched for C > T mutations, whereas African beer strains are enriched for A > G mutations. In contrast, PC2 separates the majority of other populations, such as human-associated strains isolated from wine, dairy, and bioethanol production, along a gradient of varying transition/transversion ratio.

Although strains from the same population tend to cluster together, this trend is less pronounced in the 1,011 S. cerevisiae genomes than in previously reported mutation spectrum PCAs of humans, great apes, and mice (Harris and Pritchard, 2017; Dumont, 2019; Goldberg and Harris, 2021). That being said, one methodological difference from these previous studies is that we only partition the yeast mutation spectra into six basic types (A > C, A > G, etc.) rather than the 96 trinucleotide-based types used in analyses of vertebrate mutation spectra, a concession to the small size of the yeast genome. We found that the trinucleotide mutation spectra of yeast exhibit similar PCA structure (Figure 1—figure supplement 3), but that the sparsity of yeast triplet spectra appears to limit their utility.

Figure 1B shows a PCA of rare variant mutation spectra from the same collection of strains used in 1 A. We define rare variants as those with derived allele counts of 2, 3, and 4 (423,312 SNPs out of 729,357 fall within this allele count range) and exclude singletons to minimize the impact of sequencing error. These spectra are noisier than the spectra computed from variants up to 50% frequency, but are potentially more likely to reflect recently active mutational processes. While rare variant spectra exhibit less clustering by population than those in Figure 1A, a subset of strains from several groups appear as outliers. For example, a few Mixed origin and Mosaic beer strains are outliers along a C > A mutation gradient, and African beer and French dairy strains separate out along an A > G mutation gradient. For completeness, we also examined mutation spectra of singleton variants alone (Materials and methods), which are the youngest mutations among polymorphisms (Figure 1—figure supplement 4). It resembles the PCA of non-singleton rare variants, except that C > A mutation variation explains a larger fraction of variation and becomes the PC1 axis.

Several previous studies have found a puzzling discrepancy between the spectra of de novo mutations and polymorphisms in S. cerevisiae: polymorphisms have a transition-to-transversion (ts/tv) ratio around 3:1, compared to only 1:1 for de novo mutations (Agier and Fischer, 2012; Zhu et al., 2017). Our analysis of the 1011 strain collection replicates this finding (Figure 1—figure supplement 5). We also replicate the prior finding that singletons and higher frequency variants have nearly identical ts/tv ratios, but that singletons inferred to be young based on their presence on long shared haplotypes have a lower ts/tv ratio somewhat closer to that of new mutations (Figure 1—figure supplement 5). We note that a discrepancy between de novo and segregating mutation spectra has also been observed in A. thaliana, though in the opposite direction, as A. thaliana de novo mutations are surprisingly enriched for C > T transitions (Weng et al., 2019).

A scalable experimental pipeline for measuring de novo mutation rates and spectra

In order to test whether any of the mutation spectrum differences evident from natural variation in different S. cerevisiae strains are driven by extant genetic mechanisms that increase the rates of specific mutation types, we set out to measure several strains’ de novo mutation spectra and rates experimentally. To this end, we developed an experimental pipeline using the reporter gene CAN1. Traditional reporter gene fluctuation assays only estimate the overall rate of mutations, but we introduced an extra step that utilizes Illumina sequencing of pooled amplicons derived from CAN1 mutants to estimate each strain’s mutation spectrum as well.

The gene CAN1 encodes a transport protein that imports arginine and arginine analogs into yeast cells from the surrounding growth media. This means that strains with a functional CAN1 transporter are sensitive to poisoning by the arginine analog canavanine, while a single loss-of-function mutation can render such cells able to survive on canavanine media (Whelan et al., 1979). Poisoning a culture with canavanine is thus a very efficient method to select for cells with point mutations in CAN1. A limitation of this method is that it only works on genomes that contain exactly one functional copy of CAN1, since canavanine resistance is recessive. This means that it cannot be used to measure mutation rates in diploid or polyploid strains directly, which unfortunately include the Taiwanese strains and African beer strains that are PCA outliers in Figure 1A. However, 133 of the 1011 strains are haploid, leaving many strains of interest that are amenable to the assay, including several outliers in the rare variant PCA (Figure 1B).

A schematic overview of our experimental setup is shown in Figure 2. First, we estimated mutation rates using established fluctuation assay methodology (Lang and Murray, 2008; Gou et al., 2019), which involves plating multiple independent cultures from each strain being investigated. After plating, we picked a single colony from each plated culture and grew it to saturation in canavanine-containing media. We then selected mutants observed to grow in culture to similar saturation density and pooled them in equal proportions to give each mutant a roughly equal frequency in the pool. Individual pools of mutants from each strain were then subjected to PCR amplification of CAN1 followed by Illumina sequencing. Individual mutants were called from the sequencing pools using a customized pipeline (Materials and methods). Mutations collected from different pools of the same strain were combined to calculate the strain’s mutation spectrum. We aimed to collect roughly 300 mutants per strain, enough to detect mutation spectrum differences of the magnitude estimated from polymorphism data in several of the 1011 genomes populations.

Schematic overview of the experimental pipeline.

Overview of the experimental pipeline used to estimate the mutation rate and spectrum for each strain using the reporter gene CAN1. First, mutation rates were estimated using fluctuation assays. Independent mutants were then pooled and sequenced to estimate the mutation spectrum of each strain.

Pooling mutants across canavanine media cultures before sequencing allowed us to efficiently estimate mutation spectra at scale, yielding measurements of many individual mutations per library prep. However, pooling too many mutants during this step could have the potential to compromise the pipeline’s accuracy by putting the frequency of each mutation too close to the expected frequency of Illumina sequencing errors. To test for this failure mode, we Sanger-sequenced 38 independent mutants generated using the lab strain LCTL1 (SEY6211-MATɑ). After pooling these 38 mutants, we performed two replicate library preps and Illumina-sequenced both using our standard procedure. Sanger sequencing identified 37 mutants with single nucleotide mutations plus one containing two adjacent mutations (Supplementary file 1). We expect each of these mutations, which should be present at a frequency of about 1/38 in the pooled culture, to be easily distinguishable from Illumina sequencing errors that occur at a rate of less than 1% per base.

To identify bona fide mutations from each Illumina sequencing pool, we developed a pipeline designed to call mutations present at or above an expected frequency that is inversely proportional to the number of mutants being pooled. In order to minimize false positive mutation calls introduced by sequencing errors, we excluded low coverage regions located at the ends of the amplicons (Materials and Methods). We also identified multinucleotide mutations (MNMs) based on the co-occurrence of variation on the same reads (Averof et al., 2000; Schrider et al., 2011), separating these complex mutations from single base substitutions and small indels. When we tested this Illumina sequencing pipeline on the same mutant pool (~17,000x coverage) that we had previously Sanger sequenced, we detected 37 of the 38 mutations identified by Sanger sequencing, missing only one mutation that occurred at the end of the amplicon located outside our pipeline’s callable region. A second Illumina sequencing replicate measured only 36 of these mutations, missing one additional true mutation. Neither Illumina replicate produced any false positives, verifying that the pipeline is accurate enough to permit pooling of up to 40 CAN1 mutants before each library prep.

Mutation rate variation among haploid natural isolates

We used our pipeline to measure mutation rates and spectra in 16 haploid strains from a wide variety of environments (Supplementary file 1). We selected these strains such that to the extent possible, they include two euploid strains per clade without any copy number variation at the scale of whole chromosome arms. We also selected two lab strains, LCTL1 and LCTL2, to use as controls, since their mutation rates were previously measured: the mutation rate of LCTL1 was measured using a genome-wide mutation accumulation assay by Sharp et al., 2018, while the mutation rate of LCTL2 (GIL 104, a derivative of W303) was measured by Lang and Murray, 2008 using a CAN1 fluctuation assay. Two additional strains from the 1011 collection, AAA and ACS, were selected because their mutation rates had been previously measured in another study (Gou et al., 2019).

We observed a 10-fold range of mutation rate variation in CAN1 among the strains we surveyed: from 2.1⨉10–7 to 2.1⨉10–6 canavanine resistance mutations per gene per cell division (Figure 3). This range of variation is larger than the five-fold range of mutation rate variation found among six S. cerevisiae strains in a recent study (Gou et al., 2019). All estimates from different replicates of the same strain were generally consistent with each other, though three strains (ACS, LCTR2, and AAR) showed close to a twofold difference between our highest and lowest mutation rate measurements. This is within the margin of error observed in previously published fluctuation assays performed at large scale (Gou et al., 2019). Among the three strains (LCTL2, AAA, and ACS) with previously published mutation rate measurements, our results fall within a 1.5-fold range of those estimates, with no particular trend of upward or downward bias.

Figure 3 with 1 supplement see all
Haploid natural isolates exhibit a 10-fold range of mutation rate variation.

Mutation rate variation measured among haploid natural isolates using our CAN1 reporter gene Luria-Delbrück fluctuation assays. Strains are shown ordered by their mean mutation rates. Mutation rates for each strain were estimated using at least two replicates, each estimate represented here by a dot. A standard boxplot spans the interquartile confidence interval of possible mutation rates for each strain.

We noticed that AAR and AEQ, the two strains with the highest mutation rates, formed larger colonies during the fixed duration of the experimental growth period compared to many of the other strains tested. This suggests that AAR and AEQ may have high growth rates. This motivated us to test for correlation between the mutation rates we measured and strain-specific growth rates reported in the literature (Peter et al., 2018), but overall, we found no significant correlation between these attributes (R2 <0.001; p = 0.95) (Figure 3—figure supplement 1).

We observed that strains from the Mosaic beer, Sake, African palm wine, and Asian fermentation clades exhibited higher mutation rates than have been previously reported for any natural S. cerevisiae strains. The two strains with the highest mutation rates, roughly 10-fold higher than that of the control strain LCTL1, were AAR and AEQ, both from the Mosaic beer clade. While this is milder than some mutator phenotypes that have been artificially generated in the lab, to our knowledge, no comparably high mutation rate has been previously reported in a natural isolate of S. cerevisiae, with the exception of the spore derivatives of the incompatible cMLH1-kPMS1 diploid natural isolate (Raghavan et al., 2018).

CAN1 sequencing reveals de novo mutation spectrum differences

We identified a total of 5571 CAN1 mutations across all strains, including 4561 point mutations, 837 indels (Supplementary file 1), and 173 multinucleotide mutations (MNMs) (Supplementary file 1). Ninety percent of the observed indels are single base-pair indels (754 out of 837), and for simplicity we included only single base-pair indels along with point mutations when reporting each strain’s mutation spectrum.

Two of the 4561 point mutations occurred at strain-specific non-reference sites. The remaining 4559 mutations consisted of repeated observations of only 727 unique mutations at 476 positions in CAN1. Given that each mutation was observed an average of 6.2 times, our dataset likely contains a large fraction of the mutations that are able to knock out CAN1’s functionality. While 23% of the mutations are observed only once, there are several hotspots that are mutated many times in our dataset (Figure 4—figure supplement 1, Figure 4—figure supplement 2, Supplementary file 1). For example, genomic location 32,399 on chromosome V was hit more than 100 times across different genetic backgrounds. Eighty-five of these hits created the same G > T mutation, while 19 created another G > C mutation. Across all mutations counted with multiplicity, we observed 2676 missense mutations, 1866 nonsense mutations and only 17 synonymous mutations. These synonymous mutations made up less than 0.37% of the total point mutations observed. Since synonymous mutations are considered a priori unlikely to cause CAN1 to lose functionality, these variants are likely sequencing errors or hitchhikers that occurred in cells containing other inactivating mutations, or possibly rare examples of synonymous mutations disrupting translation (Arthur et al., 2015) in CAN1. This low synonymous mutation rate further demonstrates the accuracy of our pipeline.

We were able to identify MNMs, complex mutation events that create multiple nearby substitutions or indels at once (Stone et al., 2012; Harris and Nielsen, 2014; Averof et al., 2000; Schrider et al., 2011), by looking for the presence of multiple mutations on the same Illumina read (Materials and methods). We estimate that MNMs generate 3.1% of all point mutations, similar to the 2.6% previously reported in a single strain background (Lang and Murray, 2008). Most of our strains have similar ratios of MNMs to single base-pair mutations, but there exist a few outliers (Figure 4A). For example, strain AAB has disproportionately many MNMs (Figure 4A and Figure 4—figure supplement 3) while AAR and AEQ have strikingly few MNMs relative to SNPs.

Figure 4 with 7 supplements see all
De novo mutation rates and spectra in natural isolates.

(A). Single nucleotide mutation rates plotted against MNM rates across strains. These rates were calculated by multiplying the mean mutation rate estimated using CAN1 by the proportion of mutations in each strain measured to be either single-nucleotide mutations or MNMs. Here, single nucleotide mutations include both single base pair substitutions and indels. (B). Mutation spectra in AEQ and AAR show significant enrichment of C > A mutations compared to the control lab strain LCTL1. Only single base-pair indels were used to generate these counts. (C). A PCA of the same strains’ de novo mutation spectra compared to the mutation spectrum reported in Lang and Murray, 2008.

To our knowledge, the largest previous CAN1 fluctuation assay in S. cerevisiae observed point mutations at 102 distinct positions (Lang and Murray, 2008). We observed mutations at 100 of these sites as well as 376 additional sites not previously known to abrogate CAN1 function. Among the two mutations observed by Lang and Murray that are missing from our dataset, one is located near the end of the CAN1 amplicon in a region we exclude due to insufficient sequencing coverage in most strains. The other site is the location of a mutation changing the anticodon ‘CTA’ to ‘TTA’, which is synonymous and thus not likely to have affected CAN1 function.

One possible contributor to the observed differences in mutation rates and spectra are differences in target size: the number of distinct CAN1 mutations that are able to help the strain survive on canavanine media. However, we were able to rule this out as a significant contributor by comparing the ratio of observed missense to nonsense mutations among strains, which was previously used by Lang and Murray, 2008 to estimate mutational target size. We estimate that the mutational target size varies at most two-fold among strains, from 145 bp to 299 bp (Supplementary file 1). Lang and Murray, 2008 previously estimated a target size of 163 bp, falling within our range. This range is too narrow to explain the 10-fold range of mutation rate variation we observe among strains, but it is large enough that we cannot confidently translate our CAN1-based mutation rate estimates into genome-wide estimates of the mutation rate per base pair per generation.

We performed hypergeometric tests to determine whether the mutation spectra we measured from the two control lab strains LCTL1 and LCTL2 were distinct from those measured from haploid natural isolates and from the spectrum measured from the same LCTL2 strain by Lang and Murray, 2008 (Materials and methods). We found the spectra of point mutations we measured from the lab strains LCTL1 and LCTL2 to be statistically indistinguishable from the spectra Lang and Murray, 2008 obtained using Sanger sequencing of canavanine-resistant mutants (p = 0.82 for LCTL1 and p = 0.087 for LCTL2). With indels included, our LCTL1 spectrum appears significantly different from that of Lang and Murray, 2008 (p = 0.0012, Bonferroni corrected p-value: 0.042), but the LCTL2 spectra remain indistinguishable with indels included (p = 0.0054, Bonferroni corrected p-value: 0.189).

The lab strain LCTL1 appears to have a mutation spectrum that is representative of most natural isolates (Figure 4C, Figure 4—figure supplement 4, Figure 4—figure supplement 5). Using Bonferroni corrected p-values to determine significance, we found the strains AAA, ACS, AGM, AHH, AHC, AEF, ADF, ACM, AGR, AAI, and AAB to have mutation spectra that are statistically indistinguishable from that of LCTL1. We found that CEV and AFH are distinguished only by their high proportions of insertions. ADN showed significant but subtle divergence from LCTL1 in the spectrum of single nucleotide variants (Figure 4—figure supplement 4, Figure 4—figure supplement 5).

A natural mutator phenotype with a distinctive mutation spectrum

We identified two strains with mutation spectra that diverged strikingly from the reference LCTL1 spectrum: AEQ (p < 1e-4) and AAR (p < 1e-4), the two strains that also have 10-fold higher mutation rates than LCTL1. Both strains appear highly enriched for C > A mutations compared to LCTL1 (Figure 4B). The strain AAR’s mutation rate estimates appear somewhat bimodal (Figure 3), but C > A mutations are consistently enriched in replicate pools with both lower and higher estimated mutation rates. The main spectrum difference between the two mutation rate modes appears to be a small difference in the C > G mutation proportion (Figure 4—figure supplement 6).

In both AEQ and AAR, the proportion of C > A mutations was measured to be elevated nearly threefold above the proportion of C > A mutations in LCTL1 and similar strains (Figure 4B). Remarkably, this C > A enrichment appears sufficient to explain the placement of AEQ and AAR as rare variant mutation spectrum outliers that we previously saw in Figure 1B, which was computed from polymorphisms sampled genome-wide, not just within CAN1. Both strains have rare variant spectra that are displaced from the population norm along a principal component vector pointing in the direction of increased C > A enrichment. These strains’ high mutation rate, C > A-heavy de novo mutation spectrum, and concordant C > A-heavy rare variant spectrum all point to the conclusion that these Mosaic beer strains display a naturally occurring genetically encoded mutator phenotype.

To assess whether the C > A enrichment phenotype observed in AAR and AEQ is likely shared with any of the 1,011 strains for which we lack mutation spectrum measurements, we ranked all 1011 strains (excluding close relatives) according to the C > A enrichment of their rare variants. We found that SACE_YAG and BRM, the two strains closest to AAR and AEQ in the global neighbor-joining phylogeny, clustered with AAR and AEQ in having high C > A fractions within the top 5% in the dataset. The rest of the top-ranking 5% of the strains exhibit some phylogenetic clustering, but no others fall within the Mosaic beer clade (Figure 5A). Instead, they are somewhat dispersed across two large, diverse clades known as ‘Mosaic region 3’ and the ‘Mixed origin’ clade. We also used a bootstrapping method to find strains with enriched C > A fractions, using an empirical p-value threshold of 0.05. Many of the same strains are outliers in both tests, including the four Mosaic beer strains (Figure 5—figure supplement 1). The phylogenetic clustering of C > A rare variant enrichment suggests that multiple clades may be genetically predisposed toward accumulating relatively higher rates of this mutation type.

Figure 5 with 3 supplements see all
Enrichment of C > A mutations in rare natural variants.

(A) Phylogeny of the 1011 collection with strains in the top 5% of C > A fraction shown in red. (B) Phylogeny of AEQ, AAR, and closely related Mosaic beer strains (C) Top panel: The C > A ratio as a function of minor allele count in Mosaic beer strains that are closely related to AEQ and AAR, including AEQ’s and AAR’s the diploid common ancestor CBS 1782. C > A ratios in polymorphisms are calculated across allele count (AC) bins with cutoffs of 2, 4, 6, and 8. When computing allele counts, closely-related strains are excluded, and each strain is represented as a diploid in genotype. Bottom panel: total number of variants in each AC bin.

Focusing on the small clade containing AAR, AEQ, BRM, and SACE_YAG with all members showing C > A enrichment (Figure 5B), we found that all but BRM are haploid derivatives of the diploid Saccharomyces cerevisiae var diastaticus strain CBS 1782, which was isolated in 1952 from super-attenuated beer (Andrews and Gilliland, 1952). AEQ and AAR differ at roughly 14,000 variant sites (the median pairwise genetic distance in the 1011 strains is 64,000) and SACE_YAG differs from AEQ and AAR at about 11,000 sites each. These differences among haploid derivatives reflect the high level of heterozygosity in the parental diploid strain. The fourth strain with an elevated C > A mutation fraction, BRM, is derived from an independent source: it was isolated in 1988 from a cassava flour factory in Brazil (Laluce et al., 1988). Despite its distinct origin, BRM differs at only 14,000–17,000 sites from each of the aforementioned three strains.

In AAR, AEQ, BRM, and SACE_YAG, rarer variants are notably C > A-enriched, but higher frequency variants exhibit weaker C > A enrichment that declines with increasing allele frequency such that variants of allele count greater than eight have C > A fractions more typical of other strains. We verified that the same pattern holds for variants present in the whole genome of the ancestral diploid isolate CBS 1782, which we sequenced after obtaining a sample from the NCYC database (see Materials and Methods). This suggests that AAR and AEQ indeed inherited their mutator phenotype from their wild ancestors and that this phenotype is at least partially penetrant in the diploid state. In contrast, the four closest outgroups to AAR, AEQ, BRM, and SACE_YAG in the 1011 yeast genomes phylogeny exhibit a consistently lower C > A fraction that does not vary with allele frequency (Figure 5B and Figure 5—figure supplement 2). The concordant enrichment of C > A mutations in rare polymorphisms and de novo mutations from the same strains suggests that this C > A enrichment is genetically determined and is not specific to the CAN1 locus but has affected the entire genome during the recent history of this clade.

A scan for candidate mutator alleles

To explore genetic variation that could underlie the C > A enrichment phenotype observed in AEQ and AAR, we scanned for nonsense and missense mutations in a list of 158 candidate genes known to play roles in DNA replication and repair, including genes that were previously identified to harbor mutator alleles through genetic screens (Supplementary file 1; Boiteux and Jinks-Robertson, 2013; Stirling et al., 2014). No candidate premature stop codons were found to be both present in AAR and AEQ and rare (MAF <0.05) in the 1011 yeast genomes overall. However, we identified 40 sites with at least one rare non-synonymous allele (MAF <0.05) shared by AEQ and AAR and absent from the other haploid strains that we experimentally found to have normal mutation spectra (Supplementary file 1).

One of these missense variants was observed within OGG1, a glycosylase involved in the oxidative stress response that excises the guanine lesion 8-oxo-G. ogg1 null mutants are known to have high C > A mutation rates (Shockley et al., 2013) and mutation rates that are 10-fold elevated above normal (Ni et al., 1999). The OGG1 missense variant present in AAR and AEQ is homozygous in the parental diploid CBS 1782; the same variant is also shared by two other Mosaic beer strains, AQH and AAQ, which are not enriched for C > A rare variants. A close examination of rare variant mutation spectra revealed that AQH may have a C > A enrichment phenotype that is masked by a high rate of C > T mutations (Figure 5—figure supplement 3), but no evidence of C > A enrichment was found in AAQ.

Experimental evidence that OGG1AEQ/AAR is a mutator allele

We sought to experimentally test whether the OGG1 allele present in AEQ and AAR, hereafter called OGG1AEQ/AAR, contributes to the AEQ/AAR mutator phenotype. We accomplished this by expressing the OGG1AEQ/AAR allele in a lab strain with its endogenous ogg1 allele deleted. If OGG1AEQ/AAR contributes to the mutator phenotype of AEQ and AAR, then this gene variant should be less effective at rescuing the ogg1 deletion phenotype than the wild-type OGG1 allele.

To test this hypothesis, we constructed two CEN plasmids. One plasmid was constructed to express OGG1AEQ/AAR; the other was constructed to express OGG1wt, with the same sequence as the OGG1 allele present in the S288C reference. Through Sanger sequencing, we found that OGG1AEQ/AAR in fact contains two single nucleotide differences from the reference sequence (Figure 6A). One of these variants (an A > G substitution) lies 156 bp upstream of OGG1, presumably in the gene’s promoter region, which we previously missed when examining the coding sequence. The other variant is a nonsynonymous change from C to T at position 800, which changes alanine to leucine. While the nonsynonymous mutation in OGG1 is present in only seven strains from the Mosaic beer clade (including AEA which is a close relative of BRM) from the 1011 collection, the promoter mutation is present in 57 strains. Since the promoter mutation is more widespread and not linked to the nonsynonymous mutation, we hypothesize that it is less likely to cause the mutator phenotype. To avoid truncating any important regulatory sequences, we used PCR to amplify two fragments with different lengths of flanking sequence from each endogenous OGG1 locus: one shorter fragment called OGG1-A, (1.7 kb) and one longer fragment called OGG1-B (2.2 kb), creating a total of four plasmids. We then used these OGG1 plasmids, as well as an empty plasmid, to transform haploid S. cerevisiae with an ogg1 deletion derived from the heterozygous diploid deletion collection (SGA) (Boone, 2007).

Mutation rates of an ogg1 deletion strain complemented by OGG1AEQ/AAR versus OGG1wt.

(A) Locations of the genetic variants differentiating the OGG1wt and OGG1AEQ/AAR alleles. Solid and dashed lines denote the boundaries of OGG1_A and OGG1_B (not to scale). OGG1_A contains a 1.7 kb PCR fragment including OGG1, while OGG1_B contains a 2.2 kb PCR fragment. (B) Mutation rate estimates of ogg1 deletion strains transformed by plasmids containing different OGG1 alleles.

We used fluctuation assays to estimate the mutation rates of ogg1 deletion strains complemented with these different OGG1 plasmids. We found that the mutation rate of the ogg1 null yeast with an empty control plasmid was elevated fourfold compared to the mutation rate of the strain transformed with OGG1wt. This difference is smaller than the 10-fold mutation rate difference previously measured between an ogg1 null mutant and a strain with an intact endogenous OGG1 allele (Ni et al., 1999), possibly reflecting a difference in the efficiency of endogenous OGG1 expression versus expression of the same allele from a plasmid. We found that the strain transformed with OGG1AEQ/AAR had a 2.4-fold higher mutation rate than the strain transformed with OGG1wt, suggesting that the AEQ/AAR allele is less effective at complementing the DNA repair deficiency of the ogg1 deletion strain. This supports our hypothesis that the OGG1AEQ/AAR allele is at least in part responsible for the high mutation rate of the AEQ and AAR strains. We measured similar mutation rates in strains transformed by the shorter and longer PCR fragments containing each OGG1 allele, suggesting that the additional regulatory DNA present in the longer sequence did not substantially impact OGG1 expression.


In this study, we compare the mutation spectra of natural variants and de novo mutations within diverse strains of S. cerevisiae. We find both population genetic and experimental evidence pointing to a recent increase in the rate of C > A mutations within one clade as a result of a naturally occurring genetically encoded mutator phenotype.

Our highly accurate pipeline uses a classical fluctuation assay to quantify de novo mutation rate in the reporter gene CAN1, then utilizes sequencing of pooled mutants to measure mutation spectra. Sanger validations confirmed that this approach has an undetectably low false positive rate and a very low false negative rate. The reliability of our pipeline is further supported by concordance between our mutation spectrum measurements and previous measurements made from the same lab strains (Lang and Murray, 2008). Using this method, we identified two extreme mutation spectrum outliers—the Mosaic beer strains AEQ and AAR—as well as several strains with more subtle mutation spectrum phenotypes that are good candidates for future investigation.

Although the mutation spectrum variation from natural polymorphisms we report here is reminiscent of observed mutation spectrum variation among humans, great apes, and laboratory mouse strains, we note that S. cerevisiae mutation spectra separate less by population compared to vertebrate mutation spectra. While it is possible to infer a human genome’s continental group of origin using its mutation spectrum alone, the same is not generally true in S. cerevisiae. Several factors, which are not mutually exclusive, might underlie this difference. One is the presence of complex population structure and pervasive gene flow between S. cerevisiae clades (Schacherer et al., 2009; Peter et al., 2018; Liti et al., 2009). Another factor is the small size of the S. cerevisiae genome; each strain has two orders of magnitude fewer derived alleles than most vertebrate genomes have. This limits our ability to detect how flanking base pairs affect the mutation rate of each site in the genome and constrains us to study a lower dimensional mutation spectrum than the context-dependent ones studied in species with larger genomes.

While gene flow and genome size might be responsible for the relatively modest magnitude of mutation spectrum divergence between most strains of S. cerevisiae, it is also possible that DNA replication and repair are intrinsically more uniform in S. cerevisiae than in vertebrates, perhaps because of the greater efficiency of selection against weakly deleterious mutator alleles in a unicellular organism that exists at large effective population sizes and often reproduces asexually. Asexual reproduction should theoretically increase the efficiency of selection against mutator alleles because deleterious variants created by the mutator cannot recombine onto other genetic backgrounds; on the other hand, it can also limit the efficiency of selection against individual deleterious mutations by permanently tethering them to particular genetic backgrounds. Further measurements of mutation spectrum variation within other species will be needed to determine whether the stability observed here is indeed characteristic of unicellular eukaryotes. If mutation spectra tend to be stable within species that have low mutation rates and strong selection against mutation rate modifiers, we might expect to see even less mutation spectrum variation among populations of ciliates like Paramecium and Tetrahymena, whose mutation rates are substantially lower than that of S. cerevisiae (Sung et al., 2012b).

Many questions about mutation spectrum variation within S. cerevisiae and other species remain unresolved and present important avenues for future work. For example, it is unclear whether rare polymorphisms in the Mixed origin and Mosaic region three clades are enriched for C > A mutations due to the same genetic mechanisms active in AEQ and AAR. Other genes might underlie the mutation spectrum differences observed among other strains, though our analyses suggest that some mutation spectrum gradients that dominate the common variation PCA are unlikely to be explained by extant mutators. One such gradient is the A > G enrichment in the African beer yeast clade, which is less pronounced in our rare variant PCA (Figure 1B) compared to our PCA extracted from variation of all frequencies (Figure 1A). This is somewhat reminiscent of the frequency distribution of the TCC> TTC mutation ‘pulse’ that distinguishes Europeans and South Asians from other human populations, and may suggest that the African beer A > G enrichment was caused by an extinct mutator allele or a mutagen found in a past environment.

Natural selection might contribute to the mutation spectrum variation within S. cerevisiae if certain mutation types are more often beneficial than others and if such asymmetries vary between populations. One example of how natural selection can affect mutation spectrum variation is that transitions are more often synonymous than transversions are (Freeland and Hurst, 1998), leading to more frequent selection against transversions in genic regions. However, we note that most of the gradient structure observed in our analyses can be reproduced with synonymous mutations alone, meaning that selection is unlikely to explain much of the natural yeast mutation spectrum variation we observe.

The C > A mutations enriched in AEQ, AAR, and their relatives might be a signature of oxidative stress damage; such mutations are a known signature of failure to repair 8-oxoguanine lesions, which is consistent with a causal role for these strains’ missense substitution in the oxidative stress response gene OGG1 as verified through our OGG1 plasmid assay. A recent study knocked out OGG1 in human cells and found that this elevated the rate of a C > A-dominated mutational signature known in the Cosmic catalog as SBS18 (Zou, 2020). SBS18 was initially identified in tumors from individuals with pathogenic variation in Mutyh (Viel et al., 2017), a direct interacting partner of OGG1 in the 8-oxoguanine repair pathway that has also been implicated in a C > A-dominated germline mutational signature in mice (Sasani et al., 2021).

Although our OGG1 plasmid assay provides compelling evidence that the 8-oxoguanine repair pathway plays a role in the AAR/AEQ mutator phenotype, we note that C > A mutations do not comprise all of the excess mutations measured in AEQ and AAR. Compared to LCTL1, the ratio of C > A mutations to C > T mutations is elevated roughly threefold in AEQ and AAR, which is less than the 10-fold overall elevation of the mutation rate in these strains. These mutation data imply that all mutation types have higher rates in AEQ and AAR compared to other S. cerevisiae strains, not just C > A. Further work will be required to verify whether the C > A enrichment and broad-spectrum mutation rate increase are driven by the same biochemical mechanism.

Although our estimates of de novo mutation spectra are all based on the CAN1 locus, we note that CAN1 mutation spectra are generally similar to the genome-wide spectra obtained from MA experiments (Zhu et al., 2014; Sharp et al., 2018), except that CAN1 spectra have more mutations at C/G base pairs and relatively fewer mutations at A/T base pairs (Figure 4—figure supplement 7). The 2676 and 1866 missense and nonsense mutations in our dataset contain numerous instances of all six mutation types that comprise our summary mutation spectrum, suggesting that CAN1 contains many opportunities to ascertain the full spectrum observed genomewide. We noticed that the CAN1 spectra we measured from the LCTL2 strain showed a slightly higher proportion of C > A mutations than that measured from the same strain by Lang and Murray, 2008 also using a CAN1 reporter. This could be due to the fact that Lang et al. used ten times the concentration of canavanine that we used, which might mean that some mutants could have grown after plating on our canavanine media (Shor et al., 2013). The fact that we do not see elevated C > A mutations in LCTL1 suggests that this could be a LCTL2 strain-specific behavior.

In summary, the results presented in this paper provide some of the most direct evidence to date that eukaryotic mutation spectra are variable within species (Harris, 2015; Harris and Pritchard, 2017; Dumont, 2019; Goldberg and Harris, 2021). It has been proposed that the best explanation for such mutation spectrum heterogeneity is the frequent emergence of nearly neutral mutator alleles that turn over rapidly as a consequence of weak purifying selection on the mutation rate. Our de novo mutation spectrum measurements provide experimental verification of this claim, showing that at least one mutational signature whose activity varies among natural yeast strains is likely caused by an extant mutator allele.

Although our results show that the mutation spectrum bias shared by certain Mosaic beer yeast is genetically encoded, it is worth noting that this C > A gradient is not the principal axis of mutation spectrum variation in the 1011 yeast genomes that we computed from all variants (Figure 1A). It remains to be seen how many other mutator or antimutator alleles might exist within this strain collection and to what extent they can explain the mutation spectrum variation observed among strains from different environments. A broader question still is whether the forces that created S. cerevisiae’s mutation spectrum variation are similar to the forces that shaped the distinctive mutation spectra of different human populations and great ape species. If we can identify the genes that underlie natural yeast mutator phenotypes such as the one described in this study, it will likely be more straightforward to test these genes for mutator activity in humans and other species than to discover mutator alleles via any kind of agnostic genome scan.

Materials and methods

Variant filtering and mutation PCA analysis

Request a detailed protocol

We filtered the original variants from the 1011 S. cerevisiae collection (Peter et al., 2018) by including biallelic SNPs with less than 20% missing genotypes. We restricted to regions in the genomes where reads can be uniquely mapped (‘mregions_100_annot_2011.bed’ from Jubin et al., 2014) and excluded repeat-masked regions. Closely related strains (pairwise genetic distance less than 8000) are excluded from the 1011 dataset. Singletons were excluded when counting individual mutations in Figure 1 to minimize the impact of sequencing errors. Strains with extensive introgression from S. paradoxus (clades 2, 9, and 10 from Peter et al., 2018) were excluded in order to minimize bias from errors in the inference of ancestral and derived alleles. Ancestral states of mutations were inferred using five S. paradoxus sequences (Yue et al., 2017), aligned to the S. cerevisiae reference genome R64-1-1 using lastz v1.04.00 (Harris, 2007). Only sites that are fixed in four out of five strains were inferred to be the ancestral alleles, and other sites were ignored. When computing the mutation spectra of strains from variants for Figure 1A, each individual strain was assumed to be diploid, with homozygous derived alleles counted with twice the weight as heterozygous derived alleles. When counting rare variants, homozygous derived alleles were given the same weights as heterozygous derived alleles.

To further minimize confounding of the mutation spectrum by ancestral allele misidentification, only variants with derived allele frequency less than 0.5 were used. Variants that passed all filtering criteria were used to compute a normalized mutation spectrum histogram for each individual strain. When performing PCAs, no more than 30 strains from each population were randomly sampled to minimize bias from uneven sampling. The same strains were used to generate PCA plots in Figure 1—figure supplements 14 (Supplementary file 1), except that strains with fewer than eight singletons or rare variants were further excluded when generating Figure 1—figure supplement 4 and Figure 1B. Our definition of singletons varied as a function of ploidy (Figure 1—figure supplements 45): In haploids and homozygous diploids (as defined in Peter et al., 2018), a singleton will be fixed in the strain where it occurs (represented as homozygous), but in other types of strains, a singleton is required to be heterozygous. In all cases, a singleton is a variant present in only a single strain.

Fluctuation assays and sequencing

View detailed protocol

We performed fluctuation assays according to an established protocol (Lang, 2018) with the following modifications: 4μl of overnight inoculant was diluted in 40 ml SC-Arginine +2% Glucose media. 50μl of the diluted cultures were distributed in 96-well round-bottom plates (Costar 3788) for each strain. Plates were sealed with Breathe-Easy sealing membrane (Sigma Z380059). SC-Arginine-Serine+ Canavanine (60 mg/liter L-canavanine) Omni plates (Nunc OmniTray 242811) were used and dried for 2–4 days in a 30℃ incubator before using. Depending on the strains, 50μl of culture were diluted one- to fourfold when plating on the Omni plates, either to reduce the background or to avoid growth of too many mutant colonies. After plating, the plates were dried and then incubated at 30℃ for 48 hr. Independent mutants from separate cultures were inoculated into 200μl SC-Arginine-Serine + 60 mg/Liter Canavanine +2% Glucose media, and then grown to saturation over ~43 hr at 30℃ with shaking. Optical densities (ODs) were measured after incubation, and only mutants that reached similar saturation ODs were pooled (150μl each) to achieve equal proportions. Genomic DNA from each pool was extracted using the Hoffman Winston protocol (Hoffman and Winston, 1987). CAN1 was then PCR amplified using published primers (Lang and Murray, 2008) with 15 cycles. Two independent 25μl PCR reactions were then pooled and cleaned up with a Zymo Clean & Concentrator Kit (D4004). Sequencing libraries were prepared using the Nextera XT DNA Library Preparation Kit with customized indices. Sequencing runs with 75 or 150 bp paired-end reads were performed using an Illumina NextSeq 550 sequencer (BioProject: PRJNA691686).

Calculation of mutation rates

Request a detailed protocol

The rSalvador package (Zheng, 2017) was used to estimate the number of mutation events (m) in each fluctuation assay using maximum likelihood under the Lea-Coulson model (Lea and Coulson, 1949; Luria and Delbrück, 1943; Ma et al., 1992). The total number of cells (Nt) was measured by counting colonies seeded with dilutions of cells on YPD plates with dilutions ranging from 1:10,000 to 1:40,000. The rate of loss-of-function mutations per CAN1 gene per cell division was estimated to be m/Nt.

Mutation calling

Request a detailed protocol

Sequencing reads were first mapped using bowtie v2.2.3 (Langmead and Salzberg, 2012) to the Scer3 S288C reference CAN1 PCR fragment sequence using primers designed by Lang and Murray, 2008. Mutation coordinates were therefore called relative to the start of the CAN1 amplicon. Adapters were trimmed using the program trim_galore v0.6.6 (Krueger, 2021) and paired-end reads were merged using pear v0.9.11 (Zhang et al., 2014). The command `fastq_quality_filter -q 20 p 94` was used to remove low quality reads before running bowtie. A MAPQ cutoff of 40 was used for SNPs and a cutoff of 20 was used for indels. Pysamstats v1.1.2 (Alistair, 2021) was used to compute the frequencies of all possible alleles at each base pair. Sites with read depth less than 200 or with less than 40% coverage of the amplicon were excluded. After the first round of mapping, sites that were fixed in each strain were called and compared to the SNPs in the 1011 collection to confirm strain identity. We then performed a second round of read mapping using the same pipeline except that each strain’s reads were mapped to a strain-specific CAN1 reference sequence.

For each sequencing pool, we let N be the number of mutants that were pooled prior to sequencing. Non-reference alleles with frequencies between 0.65 ⨉ 1 /N and 0.95 were included as evidence of mutations, discarding alleles below this frequency range as likely to be sequencing errors and alleles above this frequency range as likely to be strain-specific SNPs. Adjacent indels were merged if their frequencies differed by less than 10%. MNMs were identified in each pool by first flagging pairs of mutations occurring at similar frequencies (plus or minus 9%) within 10 bp of one another and then verifying the coexistence of the two mutations on at least 70% of the paired-end reads where at least one of the two mutations appears. Complex MNMs containing three or more variants were identified by merging MNMs that share an SNP in common. To obtain single nucleotide mutation counts and indel counts, mutations that are part of MNMs were first excluded from each pool. The coordinates of each mutation were converted back from CAN1-specific coordinates to genomic positions. Point mutations were further annotated using VEP (McLaren et al., 2016) to further categorize into missense, nonsense, or synonymous mutation types.

Allele frequencies were used to estimate the multiplicity of each mutant as follows: First, the mean and standard deviation of all mutant allele frequencies were calculated from each pool. Each allele frequency more than two standard deviations above the mean was then translated into a mutation count by dividing it by the mean allele frequency and then rounding to the nearest integer. Mutations with frequencies less than two standard deviations above the mean are assumed to be mutations with count 1.

Statistically quantifying mutation spectrum differentiation

Request a detailed protocol

To compare the mutation spectra between strains, mutations were first classified as one of the six general classes of base-substitutions (A > C, A > G, A > T, C > A, C > G, C > T) or as single base-pair insertions or deletions. We then compared the mutation spectra of the two control strains LCTL1 and LCTL2 to all other haploid isolates as well as one spectrum published by Lang and Murray, 2008 (a total of 35 tests) using a pairwise hypergeometric test (Adams and Skopek, 1987), a custom python script (Tracy et al., 2020). In the first round of this test, the paired mutation counts were arranged in a 2 × 8 contingency table. To test the null hypothesis that the two mutation spectra are the same, the hypergeometric probability of the observed table was calculated and compared to the hypergeometric probabilities of 10,000 random tables with the same row and column totals. The number of random tables with a higher hypergeometric probability than the observed provides an estimate of the p-value. We used the conservative Bonferroni correction to compute the significance cutoff (0.05/35 = 0.001429). A second set of Bonferroni-corrected p-values was calculated after excluding indels to form a 2 × 6 contingency table. These p-values were used to determine how many of the significant mutation spectrum differences were driven by the indel category (Figure 4—figure supplement 4, Figure 4—figure supplement 5).

Whole-genome sequencing and variant calling of CBS 1782

Request a detailed protocol

A sample of CBS 1782 was obtained from the National Collection of Yeast Cultures (NCYC 361) January of 2021. Genomic library was prepared from the Illumina Nextera DNA Flex Library Prep kit (Illumina 20018704). Sequencing runs with 150 bp paired-end reads were performed using an Illumina NextSeq 550 sequencer to roughly 250 x coverage. Sequencing reads were mapped using BWA mem v 0.7.17 (Li and Durbin, 2009). Only reads that are uniquely mapped with MQ >20 were used. GATK4 HaplotypeCaller was used for variant calling (Poplin, 2018). Variant sites (±2 bp) that overlap with deletion were filtered out. The variants for CBS 1782 were then merged with the raw vcf file for the 1011 collection excluding closely-related strains, and underwent the same filtering as for the PCA analysis.

Constructing OGG1 plasmids

Request a detailed protocol

Haploid yeast with both ogg1 and leu2 deleted was generated from spores dissected from the heterozygous diploid strain #20,510 of the SGA collection (Boone, 2007). CEN plasmids were generated using pRS415 with LEU2 as the selection marker (Chee and Haase, 2012). OGG1-A and OGG1-B were amplified by PCR from strain AAR for OGG1AEQ/AAR or from LCTL1 for OGG1wt. Primers for OGG1-A: CGATAGTTTGGCGTGCGATA, CGCCTTGGTGACCGTTTT. Primers for OGG1-B: GGTTCTTCCCAATCATCCGA,AGGGCTTATTGACGACGACA. pRS415 were linearized by HindIII and BamHI, followed by Gibson assembly with the OGG1 alleles.

Data availability

Sequencing data have been uploaded to the SRA and approved (Accession numbers PRJNA691686).

The following data sets were generated
    1. Jiang P
    2. Ollodart A
    3. Sudhesh V
    4. Herr A
    5. Dunham M
    6. Harris K
    (2021) NCBI BioProject
    ID PRJNA691686. A modified fluctuation assay reveals a natural mutator phenotype that drives mutation spectrum variation within Saccharomyces cerevisiae.
The following previously published data sets were used


  1. Book
    1. Harris RS
    Improved Pairwise Alignment of Genomic DNA
    Pennsylvania State University.
  2. Book
    1. Lang G
    (2018) Measuring mutation rates using the Luria-delbrück fluctuation assay
    In: Muzi-Falconi M, Brown G, editors. Genome Instability. Springer. pp. 21–31.
    1. Loeb LA
    A Mutator Phenotype in Cancer
    Cancer Research 61:3230–3239.
    1. Nei M
    Genetic Polymorphism and the Role of Mutation in Evolution
    Evolution of Genes and Proteins 71:165–190.

Decision letter

  1. Magnus Nordborg
    Reviewing Editor; Austrian Academy of Sciences, Austria
  2. Molly Przeworski
    Senior Editor; Columbia University, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

[Editors' note: this paper was reviewed by Review Commons.]

Acceptance summary:

This article looks at natural variation in mutation rate in yeast, using differences in the pattern of polymorphism between strains to identify likely allelic differences in DNA repair pathways. It is a rare example of population level data and molecular experiments supporting each other, shedding light on a fundamental biological phenomena: mutation and repair.


Author response

We thank the eLife editors and reviewers for their many constructive suggestions, which have helped us improve the manuscript substantially. We address questions compiled from the editor as well as updates from our previous response to reviewers below. Moreover, we are pleased to report the results of an extra experiment that we believe considerably strengthens the paper by providing evidence for the mutator activity of the candidate gene OGG1.

This manuscripts uses differences in the pattern of polymorphism between various yeast strain to argue that there must have been differences in mutation rates between them, then uses experiments to demonstrate that these differences still exist and have a genetic basis. Understanding mutation rate variation could help us understand the process of mutation, arguably one of the most fundamental processes in genetics and evolution.

Everyone was generally happy with your response to the public reviews, with one exception: we strongly suggest you try a reversion-based mutation essay to validate the inferred spectrum. While we find your results convincing, we feel that such a control should at least be attempted, as it would take the paper to a new level in terms of the supporting evidence. We agree that it is possible that the control experiment will not work for your strains, but you would be able to find that out for sure in less than 1 week by simply plating a haploid yeast cultures on 5-FOA plates. If it does not work, so be it, but if it does, the paper will be much stronger.

We appreciate the suggestion regarding performing a reversion-based mutation essay as a validation to our measured mutation spectrum. We attempted to perform such an assay, but we unfortunately encountered many road-blocks along the way that prevented us from doing this successfully in the natural isolates AEQ and AAR. Before detailing these roadblocks, we also wish to point out that for the URA3 reversion reporter, the ura3-29 allele can be successfully reverted by C>T, C>A, and C>G mutations (Bulock, Xing, and Shcherbakova 2020), making it complicated to use this reporter to specifically detect whether a strain’s mutation spectrum is enriched for C>A mutations, but a version of TRP5 has been engineered to specifically detect C>A mutations (Williams et al. 2005).

We tested whether AEQ or AAR might already be auxotrophic for Trp or Ura, but found that they were prototrophic. Therefore, we attempted to use CRISPR to engineer a reversion allele into AEQ and AAR, but we noticed that it failed to replace the locus with the provided template and instead induced a small deletion at the guide RNA cut site. It seems that AEQ and AAR use non-homologous end joining (NHEJ) rather than homologous recombination (HR) to repair double-strand breaks introduced by CRISPR. This phenomenon could be interesting to study further, but also makes it more challenging to perform genomic engineering on these strains. For the above reasons, we are not able to perform the reversion assay as asked. That being said, we are hopeful that our new section verifying the mutagenicity of OGG1AAR/AEQ might in part make up for the lack of this confirmatory reversion assay, since it provides additional evidence that AAR and AEQ have a mutator phenotype by showing that the OGG1 allele present in these strains does not complement the ogg1 knockout strain as effectively as the laboratory strain allele of OGG1 is able to do (Line 501-542).

In addition, the Reviewing Editor humbly suggests adding a discussion about similar findings in Arabidopsis "strains", where attempts have also been made to link natural polymorphism patterns with measured mutation rates (including interesting environmental effects). See, e.g.:

Weng, Mao-Lun, Claude Becker, Julia Hildebrandt, Manuela Neumann, Matthew T.

Rutter, Ruth G. Shaw, Detlef Weigel, and Charles B. Fenster. 2019. "Fine-Grained Analysis of Spontaneous Mutation Spectrum and Frequency in Arabidopsis thaliana." Genetics 211 (2): 703-14.

Jiang, C., A. Mithani, E. J. Belfield, R. Mott, L. D. Hurst, and N. P. Harberd. 2014. "Environmentally Responsive Genome-Wide Accumulation of de novo Arabidopsis thaliana Mutations and Epimutations." Genome Research, October. https://doi.org/10.1101/gr.177659.114.

Jiang, Caifu, Aziz Mithani, Xiangchao Gan, Eric J. Belfield, John P. Klingler, Jian-Kang

Zhu, Jiannis Ragoussis, Richard Mott, and Nicholas P. Harberd. 2011. "Regenerant Arabidopsis Lineages Display a Distinct Genome-Wide Spectrum of Mutations Conferring Variant Phenotypes." Current Biology: CB 21 (16): 1385-90.

Belfield, Eric J., Carly Brown, Zhong Jie Ding, Lottie Chapman, Mengqian Luo, Eleanor Hinde, Sam W. van Es, et al. 2020. "Thermal Stress Accelerates Arabidopsis thaliana

Mutation Rate." Genome Research, December. https://doi.org/10.1101/gr.259853.119.

Thanks for these useful and informative references! We have added several of them to the paper’s introduction and Results sections:

Line 142: “Mild environmental stressors, such as high salt, ethanol, and heat, can also alter the mutation rate of S. cerevisiae laboratory strains (Liu and Zhang 2019; Voordeckers et al. 2020) as well as Arabidopsis thaliana (Jiang et al. 2014; Belfield et al. 2021)”

Line 240: “We note that a discrepancy between de novo and segregating mutation spectra has also been observed in A. thaliana, though in the opposite direction, as A. thaliana de novo mutations are surprisingly enriched for C>T transitions (Weng et al. 2019).”

Reviewer #1:


The authors describe a useful modified fluctuation assay that couples conventional

mutation rate analysis with mutational spectrum characterization of forward mutations at the S. cerevisiae CAN1 locus. They nicely showed that wild yeast isolates display a wide range of mutation rates with strains AAR and AEQ displaying rates ~10-fold higher than the control lab strain. These two strains also showed a bias for C>A mutations, and were the only strains analyzed that had a mutation spectrum statistically different from the lab control. Together, these data provide a compelling proof-of-principle of the applicability of the modified fluctuation analysis approach described in this manuscript. Overall, the manuscript is very well written, and the work reported in it does represent a valuable contribution to the field. However, two primary shortcomings were identified that can be addressed to strengthen the conclusions prior to publication. Both points described below pertain to the analysis of the possible C>A specific mutator phenotype in strains AAR and


Major comments:

1. The work presented in the manuscript does suggest that these two haploids are likely to display the C>A mutator phenotype. Yet, the authors fell short of providing a full and unambiguous demonstration that would elevate the significance of their discovery. They could have directly tested the predicted C>A specific mutator phenotype by conducting additional experiments, one of which is relatively simple. Specifically, they could have performed a simple reversion-based mutation assay to validate the reported C>A mutator phenotype displayed by AAR and AEQ. For example, into AAR, AEQ, and a wild type control, the authors could introduce an engineered auxotrophic marker allele (e.g., ura3 mutation) caused by an A to C substitution, which upon mutation back to A restores prototrophic growth in minimal media (ie. reversion from ura3-C to URA3-A). Such specific reversible allele should be relatively easy to integrate into the AAR and AEQ genomes, as well as in the control strain. Based on the authors' prediction, AAR and AEQ should display a very large increase (far higher than 10 fold) in the reversion rate when compared to a control haploid. To demonstrate the specificity of the mutation spectrum, the authors could test the reversion rates of a different engineered allele requiring a reversion mutation in the opposite direction (ie. reversion from ura3-A to URA3-C). If the AAR and AEQ mutator is specific C>A, one would predict that all three strains should have similar mutation rates for a reversion in the A>C direction. This additional genetic work would thoroughly validate the central discovery and would reinforce the usefulness of the method described in the manuscript.

Alternatively, a conventional mutation accumulation and whole genome re-sequencing experiment with parallel lines of AAR, AEQ and a control strain would also very effectively validate the C>A mutator prediction, and it would also answer the authors' discussion point about specificity to the CAN1 locus. However, it would be more costly and much more time consuming.

Please see our response to this issue above.

2. The second concern is in regard to the relatively extensive conclusions drawn about the possible evolutionary significance of the possible C>A mutator in AAR and AEQ. The authors should be more cautious and conservative in the proposed interpretation. As the authors note:

'Three of the four C>A-enriched mosaic beer strains, AAR, AEQ, and SACE_YAG, are all haploid derivatives of the [highly heterozygous] diploid Saccharomyces cerevisiae var diastaticus strain CBS1782, which was isolated in 1952 from super-attenuated beer.'

From this statement, and because the paper cited provided few details on the isolation of CBS1782, it is presumed that these haploid derivatives were most likely isolated as recombinant spores. Furthermore, it is unclear when this isolation occurred, and for how many generations strains AAR and AEQ have been propagated in a haploid state.

Herein lies a critical point: AAR and AEQ were recently derived from a diploid background with a "high level of heterozygosity". In a heterozygous diploid context, deleterious point mutations (and any resulting mutator phenotypes) would likely be masked by the presence of wild-type alleles. Now, as haploids, they express a novel genotype (i.e., combination of defective or incompatible parental alleles), which manifests as a mutator phenotype. In this respect, AAR and AEQ appear analogous to the spore derivatives of the incompatible cMLH1-kPMS1 isolate referred to in the manuscript as a notable exception. The analysis of strains harboring incompatible MLH1-PMS1 mutations by Raghavan et al. demonstrated that the heterozygous diploid parents were not themselves mutators, but that haploid spores which had inherited the pair of incompatible alleles displayed mutator phenotype. Collectively, while it can certainly be argued that the strains AAR and AEQ (like the MLH1/PMS1 incompatible strains) are mutators now, this fact alone does not support the conclusion that they have adapted to survive the expression of an extant mutator phenotype. This premise could be tested by analyzing the mutation rates/spectra of four new spores derived from a single tetrad of CBS 1782. Do the four sibling spores display similar or different mutational rates and spectra? If all four spores from a single tetrad exhibit the 10-fold increase in CAN1 mutation rate and the C>A transversion bias, then it can be inferred that the diploid parent is also a mutator in the same manner. Further direct analysis of mutation rates and spectrum in the parent diploid CBS 1782 would complete the work. This finding would be quite significant, and would provide strong evidence that wild strains can in fact tolerate the expression of a chronic mutator allele.

We have now included a new experiment where we sequenced the diploid ancestral strain CBS 1782, called variants and examined polymorphisms together with the 1011 strains (Figure: 5C).

We found that the diploid CBS 1782 also shows the enrichment of C>A mutations that is seen in AEQ and AAR, especially in the rarest category of variants. This result supports the hypothesis that the naturally arising mutator allele is present in the diploid CBS 1782.

To better investigate whether the mutator phenotype is caused at least in part by the OGG1 allele in AEQ and AAR, we put this allele on a plasmid and used it to transform an ogg1 knockout strain. The resulting yeast has a higher mutation rate than an ogg1 knockout transformed with a plasmid containing the reference allele of OGG1, suggesting that OGG1AEQ/AAR is less effective at DNA repair than OGG1wt. The OGG1AEQ/AAR allele contains two SNPs in AEQ and AAR, both of which are homozygous in CBS 1782, further supporting our hypothesis that the diploid harbors the mutator allele.

We had hoped to measure mutation rates from four haploid derivatives of CBS 1782 and examine their segregation patterns, but this would ideally require sporulation to produce four viable spores. We performed tetrad dissection on CBS 1782 after knocking out its HO locus, but unfortunately found one or two surviving spores out of the four spores produced. As an alternative strategy, we could make the diploid hemizygous for CAN1 or use a dominant drug resistance marker. We hope to attempt these modifications in a future study, as they require some care in the experimental design and strain engineering.

Minor comments:

A final, relatively minor point. That the new haploids AAR and AEQ show distinct mutation rates and spectra opens the door to an interesting line of inquiry, which may help to identify the causative mutator allele in a manner more efficient than searching for missense mutations. It is stated, and it is understandable, that the identification of the possible causal mutations is beyond the scope of the present manuscript. In this spirit, it would be much more appropriate to restrict such considerations to the Discussion section. Specifically, while the authors make a plausible case for OGG1 being a candidate gene responsible for the C>A mutator phenotype, no experimental demonstration was attempted.

As such, that text segment should be moved from the Results to the Discussion section.

We have conducted extra experiments to test whether the OGG1 allele present in AEQ and AAR is responsible for their increase in mutation rate by expressing a plasmid carrying different OGG1 alleles in an ogg1 knockout strain (Figure:6). Our results clearly show that the OGG1AEQ/AAR allele increases mutation rate (~2.4-fold increase compared to expression of the lab strain allele). The fact that OGG1 is in the oxidative stress pathway in which null alleles specifically increase C>A mutations adds additional evidence for our hypothesis.

Reviewer #1 (Significance (Required)):

As stated in the summary section above, the manuscript by Jiang et al. represents a substantial contribution to the fields of genome stability and genome evolution. The method described is likely to be useful beyond budding yeast. The work will be appreciated by a broad audience of geneticists. The additional work and text modifications proposed above would likely further elevate the impact of this work.

Reviewer #2:

Mutation is a fundamental force in organismal evolution, and therefore understanding the evolution of mutational mechanisms are important in evolutionary studies. In this manuscript, the authors used strains of S. cerevisiae as a model system to study the variations of rates and spectra in mutations with bioinformatic and experimental approaches. First, the authors analyzed the polymorphism data from 1011 strains by PCA analysis and show the variations in spectra. Second, the authors used fluctuation test combined with deep sequencing of the resistance gene to identify mutation rates and spectra in 18 strains, which show ~10-fold mutation rate variations and increased C-to-A mutations in two strains.

For the second part, the experimental procedures and statistical analysis are mostly solid. For the first part, as what authors said in the introduction, polymorphism is not equal to the mutation spectra. I think the authors did a good job by being cautious in the wording and having no over-inference after the analysis. It is thus inevitable that the conclusion of this part sounds mostly descriptive. The overall writing is very clear. I will recommend the publication in field-specific journals.

Minor comments:

P9 – It is very hard to not wonder how the 16 strains were picked in the fluctuation tests. Some comments on that will be appreciated. E.g., was that informed by the results of Figure 1?

We actually did not pick strains based on the results of Figure 1, one reason being that the CAN1 reporter method only works on haploid strains with a canavanine sensitivity phenotype. We also restricted our analysis to strains without known aneuploidies to maximize our ability to accurately measure the spectra of the strains’ polymorphisms. When possible, given these constraints, we included at least two randomly selected strains from each clade of the 1011 collection whenever possible. These constraints are now explained on page 8:

Line 308: “We used our pipeline to measure mutation rates and spectra in 16 haploid strains from a wide variety of environments (Supplementary File 1C). […] We also selected two lab strains, LCTL1 and LCTL2, to use as controls, since their mutation rates were previously measured”.

P17- In the paragraph "natural selection might contribute.…" , is there any example of "certain mutation types are more often beneficial than others"?

We have added a discussion of this point to the revision (Page: 15).

Line 605: “One example of how natural selection can affect mutation spectrum variation is that transitions are more often synonymous than transversions are (Freeland and Hurst 1998), leading to more frequent selection against transversions in genic regions.”

P20 – Extra '' in the sentence "Adjacent indels were merged if their frequencies differed by less than 10%."

We have fixed this in revision.

In the discussion, it might be good to add a paragraph to compare the rate and spectra reported here and the ones found by MA and then NGS approach(e.g., Zhu et al. 2014).

We have added a reference to the Zhu et al. (2014) spectrum in the discussion (Page 15, Line 633, Figure 4—figure supplement 7), extending our existing comparison of mutation spectra previously reported using CAN1 (Lang and Murray 2008) and the MA approach (Sharp et al. 2018). Our CAN1 method also obtains results that are consistent with the Lang et al. 2008 study on the same control strain, with a small (but not significant) discrepancy discussed in the discussion (Line 404, Line 638).

Reviewer #2 (Significance (Required)):

The significance of this manuscript will be relatively specific to evolutionary biologists and geneticists, especially those who use yeasts as a model system. For example, I expect the variation of mutation rates and spectra found in this manuscript will impact the following population-genetic analysis in this collection of 1011 strains and motivate more studies on the molecular machineries which affect mutation rates and spectra.

In addition, in terms of methodological novelty, adding a novel step of reporter-gene sequencing is a reasonable way to get some information on mutation spectra as it is less labor-intensive than NGS of MAs. Other statistical or experimental procedures in this manuscript mostly follow the approaches which have been developed in previous literature and thus show not much novelty.

Reviewer #3:

The authors show that certain yeast strains have altered mutation rates/bias. The study is well motivated, genetic variation in mutation rates are not easily uncovered, and capitalizes on yeast and a high-throughput mutation rate/bias method that validates findings of C>A bias from yeast polymorphism data. The results are solid and clearly presented and I have no major concerns.

We appreciate these encouraging comments.

Major comments:


Minor comments:

Should have comma: "In addition, environmental.…".

We have fixed this in revision.

Using S. paradoxus to classify derived vs ancestral alleles may not work as well as allele frequency. A 1/100 rare variant is 100x more likely derived than common variant. But with S. paradoxus divergence of say 5%, 5% polymorphic sites are misclassified or NA. Of course, since you used both, this is not a concern. But the number of variants included/excluded in each analysis should be reported. Also, I was a bit surprised that the rare variants are more noisy since most variants are rare.

We agree that the heuristic of classifying rare alleles as derived will do the right thing the majority of the time, but this could potentially create artifactual differences between the mutation spectra of different populations because the exact ratio of rare derived alleles to common derived alleles depends on the population’s demographic history and true site frequency spectrum. If two populations had the same mutation spectrum but very different proportions of variants that are polarized incorrectly, this could create the appearance of a mutation spectrum difference where none exists. In the revision, we report the total number of variants filtered because of the variation present in S. paradoxus (Line 187, 189, 191 for all natural variants after filtering, Line 220 for rare variants).

Although the reviewer is correct that most variants are rare, most of the derived variants present in an individual genome are common (since each common variant is present in many genomes). Since each mutation spectrum in the default PCA is computed by counting up the derived variants in an individual genome, these counts are considerably lower and noisier when only rare variants are used or when the common variants are subsampled to randomly assign each one to just a single genome.

In regards to variation in mutation rate based on canavanine resistance. There is a caveat that some strains may be more canavanine resistant – due to differences in transporter abundance or some other aspect of metabolism. Thus, the same mutation would survive and grow (barely) in one strain background, but not another. This caveat is very unlikely to have much of an impact but it would be worth discussing.

Thanks for pointing this out. We also considered the possibility that our mutation rate estimates could be confounded by slight differences in canavanine resistance between strains, and have added results in the revised version (Page 10):

Line 389: “One possible contributor to the observed differences in mutation rates and spectra are differences in target size: the number of distinct CAN1 mutations that are able to help the strain survive on canavanine media. […] among strains, but it is large enough that we cannot confidently translate our CAN1-based mutation rate estimates into genome-wide estimates of the mutation rate per base pair per generation.”

The explanation for synonymous mutations is hitchhikers or errors. However, they could also disrupt translation, here's one possibility PMC4552401.

Thanks for pointing this out. We now mention the possibility of synonymous mutation disrupting translation in Line 366.

Are there CAN allele differences between strains? If there are some, it might be worth mentioning why you do/don't think this influences the mutation rate. E.g. CGG is one step from stop but CGT is not.

We thank the reviewer for pointing to this analysis. We calculated the number of opportunities for missense and nonsense mutations in each strain, and estimated the mutational target size for each strain in the results (Page: 10, Line 389). We find that the overall number of opportunities for missense and nonsense mutations do not differ among strains substantially enough to explain the observed differences between the actual numbers of missense and nonsense mutations.

For the allele counts in Figure 5B. 2 indicates a variant is present in one strain so there are only 9 mutations present in AAR and not found in ANY other strain or just not found in the four listed? Likewise AAR has 36 for count 4, meaning that there are 36 variants present in AAR and one other strain, where other strains are just the 4 shown in the table, or other strains being any of the 1011?

The Figure 5B in the previous version is the Figure 5C in the updated manuscript, with the extra diploid CBS 1782 included as well. We have explained rare allele count more clearly in the revised manuscript (first described in Line 220, which is calculated the same way throughout).

"To our knowledge, this is one of the first" This is an odd way to put it and could be rephrased. As it stand you are either the first and not knowledgeable or knowledgeable and not the first.

We have fixed this in our revision.

"humans, great apes,.." Could you put the citations in the discussion too. I was a little surprised there was no mention of C>A bias as it relates to studies in bacteria and cancer, where there has been a lot of work on mutational spectra. A comment on this literature or whether the C>A biases are not found elsewhere would be nice.

Thanks for this excellent suggestion. We have added a discussion paragraph about some relevant C>A signatures previously identified in cancer as well as the mouse germline:

Line 55: “Many mutator phenotypes in E. coli have been linked to defects in DNA repair enzymes (Horst, Wu, and Marinus 1999; Loeb 2001; Prindle, Fox, and Loeb 2010)). Mutator phenotypes also commonly occur in cancer ((Horst, Wu, and Marinus 1999; Loeb 2001; Prindle, Fox, and Loeb 2010), likely either because of relaxed selection against cellular dysfunction or because it is beneficial for cancer cells to adapt rapidly to their aggressive growth niche.”

Line 612: “The C>A mutations enriched in AEQ, AAR, and their relatives might be a signature of oxidative stress damage; such mutations are a known signature of failure to the repair of 8-oxoguanine lesions, which is consistent with a causal role for these strains’ missense substitution in the oxidative stress response gene OGG1 as which were verified through our OGG1 plasmid assay. […] SBS18 was initially identified in tumors from individuals with pathogenic variation in Mutyh (Viel, et al. 2017), a direct interacting partner of OGG1 in the 8-oxoguanine repair pathway that has also been implicated in a C>A-dominated germline mutational signature in mice (Sasani, et al. 2021).”

Reviewer #3 (Significance (Required)):

I am an evolutionary geneticist with expertise in genomics and bioinformatics. In addition to reviewing papers I also regularly handle papers as an editor. The manuscript provides rare insight into population variation in mutation rates. While differences in mutational biases are well known between species and in some cases within a species, we typically don't know what causes this biases. Environmental factors are often thought to be involved; this work clearly shows that genetic (mutator strains) exist and impact polymorphism in yeast. The manuscript does a nice job in the introduction of explaining the background on mutation rate research and motivation for the work. It also clear explains the advantage of an experimental high-throughput mutation rate/spectra approach. Thus, I believe this new angle on a long-standing problem will be of interest to the community of evolutionary geneticists outside of yeast researchers.

We appreciate the kind comments, which exactly capture the type of impact we hope our study will have.


Belfield, Eric J., Carly Brown, Zhong Jie Ding, Lottie Chapman, Mengqian Luo, Eleanor Hinde, Sam W. van Es, et al. 2021. “Thermal Stress Accelerates Arabidopsis thaliana Mutation Rate.” Genome Research 31 (1): 40–50.

Bulock, Chelsea R., Xuanxuan Xing, and Polina V. Shcherbakova. 2020. “Mismatch Repair and DNA Polymerase δ Proofreading Prevent Catastrophic Accumulation of Leading Strand Errors in Cells Expressing a Cancer-Associated DNA Polymerase ϵ Variant.” Nucleic Acids Research 48 (16): 9124–34.

Freeland, S. J., and L. D. Hurst. 1998. “The Genetic Code Is One in a Million.” Journal of Molecular Evolution 47 (3): 238–48.

Horst, J. P., T. H. Wu, and M. G. Marinus. 1999. “Escherichia coli Mutator Genes.” Trends in Microbiology 7 (1): 29–36.

Jiang, Caifu, Aziz Mithani, Eric J. Belfield, Richard Mott, Laurence D. Hurst, and Nicholas P. Harberd. 2014. “Environmentally Responsive Genome-Wide Accumulation of de novo Arabidopsis thaliana Mutations and Epimutations.” Genome Research 24 (11): 1821–29.

Lang, Gregory I., and Andrew W. Murray. 2008. “Estimating the Per-Base-Pair Mutation Rate in the Yeast Saccharomyces cerevisiae.” Genetics 178 (1): 67–82.

Liu, Haoxuan, and Jianzhi Zhang. 2019. “Yeast Spontaneous Mutation Rate and Spectrum Vary with Environment.” Current Biology: CB 29 (10): 1584–91.e3.

Loeb, L. A. 2001. “A Mutator Phenotype in Cancer.” Cancer Research 61 (8): 3230–39.

Prindle, Marc J., Edward J. Fox, and Lawrence A. Loeb. 2010. “The Mutator Phenotype in Cancer: Molecular Mechanisms and Targeting Strategies.” Current Drug Targets 11 (10): 1296–1303.

Sharp, Nathaniel P., Linnea Sandell, Christopher G. James, and Sarah P. Otto. 2018. “The Genome-Wide Rate and Spectrum of Spontaneous Mutations Differ between Haploid and Diploid Yeast.” Proceedings of the National Academy of Sciences of the United States of America 115 (22): E5046–55.

Voordeckers, Karin, Camilla Colding, Lavinia Grasso, Benjamin Pardo, Lore Hoes, Jacek Kominek, Kim Gielens, et al. 2020. “Ethanol Exposure Increases Mutation Rate through Error-Prone Polymerases.” Nature Communications 11 (1): 3664.

Weng, Mao-Lun, Claude Becker, Julia Hildebrandt, Manuela Neumann, Matthew T. Rutter, Ruth G. Shaw, Detlef Weigel, and Charles B. Fenster. 2019. “Fine-Grained Analysis of Spontaneous Mutation Spectrum and Frequency in Arabidopsis thaliana.” Genetics 211 (2): 703–14.

Williams, Teresa-Marie, Rebecca M. Fabbri, Jason W. Reeves, and Gray F. Crouse. 2005. “A New Reversion Assay for Measuring All Possible Base Pair Substitutions in Saccharomyces cerevisiae.” Genetics 170 (3): 1423–26.

Zhu, Yuan O., Mark L. Siegal, David W. Hall, and Dmitri A. Petrov. 2014. “Precise Estimates of Mutation Rate and Spectrum in Yeast.” Proceedings of the National Academy of Sciences of the United States of America 111 (22): E2310–18.


Article and author information

Author details

  1. Pengyao Jiang

    Department of Genome Sciences, University of Washington, Seattle, United States
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Software, Visualization, Writing – original draft, Validation
    Competing interests
    No competing interests declared
  2. Anja R Ollodart

    1. Department of Genome Sciences, University of Washington, Seattle, United States
    2. Molecular and Cellular Biology Program, University of Washington, Seattle, United States
    Methodology, Resources
    Competing interests
    No competing interests declared
  3. Vidha Sudhesh

    Department of Genome Sciences, University of Washington, Seattle, United States
    Competing interests
    No competing interests declared
  4. Alan J Herr (deceased)

    Department of Laboratory Medicine and Pathology, University of Washington, Seattle, United States
    , Methodology, Supervision, Funding acquisition, Validation
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9498-0972
  5. Maitreya J Dunham

    Department of Genome Sciences, University of Washington, Seattle, United States
    Conceptualization, Writing – review and editing, Methodology, Supervision, Funding acquisition, Validation
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-9944-2666
  6. Kelley Harris

    1. Department of Genome Sciences, University of Washington, Seattle, United States
    2. Department of Computational Biology, Fred Hutchinson Cancer Research Center, Seattle, United States
    Conceptualization, Writing – review and editing, Investigation, Methodology, Project administration, Supervision, Software, Visualization, Writing – original draft, Validation
    For correspondence
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0302-2523


National Institute of General Medical Sciences (1R35GM133428-01)

  • Kelley Harris

National Institute of General Medical Sciences (P41GM103533)

  • Maitreya J Dunham

National Institute of General Medical Sciences (R01GM118854)

  • Alan J Herr

Burroughs Wellcome Fund (Career Award at the Scientific Interface)

  • Kelley Harris

Kinship Foundation (Searle Scholarship)

  • Kelley Harris

Pew Charitable Trusts (Pew Scholarship)

  • Kelley Harris

Alfred P. Sloan Foundation (Sloan Fellowship)

  • Kelley Harris

National Human Genome Research Institute (T32HG00035)

  • Anja R Ollodart

Howard Hughes Medical Institute (Faculty Scholar Award)

  • Maitreya J Dunham

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.


We dedicate this paper to the memory of Dr. Alan J Herr, a sorely missed colleague, mentor, friend and inspiring role model. We thank all members of the Harris and Dunham labs for helpful comments and discussions. We thank Nathaniel Sharp and Greg Lang for sharing strains. We also thank Joseph Schacherer for sharing the 1,011 strain collection with the Dunham lab. PJ was supported by a Burroughs Wellcome Fund Career Award at the Scientific Interface awarded to KH. KH acknowledges additional support from a Searle Scholarship, a Sloan Research Fellowship, a Pew Biomedical Scholarship, and National Institute of General Medical Sciences Grant 1R35GM133428-01. AJH was supported by the National Institute for General Medical Sciences (NIH/NIGMS R01GM118854). ARO was supported by the National Human Genome Research Institute of the NIH under award T32 HG00035. The research of MJD was supported by NIH/NIGMS award P41 GM103533 and a Faculty Scholar grant from the Howard Hughes Medical Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the Burroughs Wellcome Fund, the Kinship Foundation, the Sloan Foundation, the Pew Charitable Trust, HHMI, the NIH, or NIGMS.

Senior Editor

  1. Molly Przeworski, Columbia University, United States

Reviewing Editor

  1. Magnus Nordborg, Austrian Academy of Sciences, Austria

Version history

  1. Preprint posted: January 11, 2021 (view preprint)
  2. Received: March 11, 2021
  3. Accepted: September 14, 2021
  4. Accepted Manuscript published: September 15, 2021 (version 1)
  5. Version of Record published: October 7, 2021 (version 2)
  6. Version of Record updated: December 16, 2021 (version 3)
  7. Version of Record updated: December 24, 2021 (version 4)


© 2021, Jiang et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.


  • 2,703
    Page views
  • 271
  • 17

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Pengyao Jiang
  2. Anja R Ollodart
  3. Vidha Sudhesh
  4. Alan J Herr
  5. Maitreya J Dunham
  6. Kelley Harris
A modified fluctuation assay reveals a natural mutator phenotype that drives mutation spectrum variation within Saccharomyces cerevisiae
eLife 10:e68285.

Share this article


Further reading

    1. Evolutionary Biology
    2. Genetics and Genomics
    Thomas A Sasani, Aaron R Quinlan, Kelley Harris
    Research Article

    Maintaining germline genome integrity is essential and enormously complex. Although many proteins are involved in DNA replication, proofreading, and repair, mutator alleles have largely eluded detection in mammals. DNA replication and repair proteins often recognize sequence motifs or excise lesions at specific nucleotides. Thus, we might expect that the spectrum of de novo mutations – the frequencies of C>T, A>G, etc. – will differ between genomes that harbor either a mutator or wild-type allele. Previously, we used quantitative trait locus mapping to discover candidate mutator alleles in the DNA repair gene Mutyh that increased the C>A germline mutation rate in a family of inbred mice known as the BXDs (Sasani et al., 2022, Ashbrook et al., 2021). In this study we developed a new method to detect alleles associated with mutation spectrum variation and applied it to mutation data from the BXDs. We discovered an additional C>A mutator locus on chromosome 6 that overlaps Ogg1, a DNA glycosylase involved in the same base-excision repair network as Mutyh (David et al., 2007). Its effect depends on the presence of a mutator allele near Mutyh, and BXDs with mutator alleles at both loci have greater numbers of C>A mutations than those with mutator alleles at either locus alone. Our new methods for analyzing mutation spectra reveal evidence of epistasis between germline mutator alleles and may be applicable to mutation data from humans and other model organisms.

    1. Chromosomes and Gene Expression
    2. Evolutionary Biology
    Katherine Rickelton, Trisha M Zintel ... Courtney C Babbitt
    Research Article Updated

    Primate evolution has led to a remarkable diversity of behavioral specializations and pronounced brain size variation among species (Barton, 2012; DeCasien and Higham, 2019; Powell et al., 2017). Gene expression provides a promising opportunity for studying the molecular basis of brain evolution, but it has been explored in very few primate species to date (e.g. Khaitovich et al., 2005; Khrameeva et al., 2020; Ma et al., 2022; Somel et al., 2009). To understand the landscape of gene expression evolution across the primate lineage, we generated and analyzed RNA-seq data from four brain regions in an unprecedented eighteen species. Here, we show a remarkable level of variation in gene expression among hominid species, including humans and chimpanzees, despite their relatively recent divergence time from other primates. We found that individual genes display a wide range of expression dynamics across evolutionary time reflective of the diverse selection pressures acting on genes within primate brain tissue. Using our samples that represent a 190-fold difference in primate brain size, we identified genes with variation in expression most correlated with brain size. Our study extensively broadens the phylogenetic context of what is known about the molecular evolution of the brain across primates and identifies novel candidate genes for the study of genetic regulation of brain evolution.