Epistasis between mutator alleles contributes to germline mutation rate variability in laboratory mice

  1. Department of Human Genetics, University of Utah
  2. Department of Human Genetics, University of Utah; Department of Biomedical Informatics, University of Utah · Funded by NIH/NHGRI R01HG012252
  3. Department of Genome Sciences, University of Washington · Funded by NIH/NIGMS R35GM133428; Burroughs Wellcome Career Award at the Scientific Interface; Searle Scholarship; Pew Scholarship; Sloan Fellowship; Allen Discovery Center for Cell Lineage Tracing

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Ziyue Gao
    University of Pennsylvania, Philadelphia, United States of America
  • Senior Editor
    Detlef Weigel
    Max Planck Institute for Biology Tübingen, Tübingen, Germany

Reviewer #1 (Public Review):

The mutation rate and spectrum have been found to differ between populations as well as across individuals within the same population. Hypothesizing that some of the observed variation has a genetic basis, the authors of this paper have made important contributions in the past few years in identifying genetic variants that modify mutation rate or spectrum in natural populations. This paper makes one significant step further by developing a new method for mapping genetic variants associated with the mutation spectrum, which reveals new biological insights.

Using traditional quantitative trait locus (QTL) mapping in the BXD mouse recombinant inbred lines (RILs), the authors of this paper previously identified a genetic locus associated with C>A mutation rate. However, this approach has limited power, as it suffers from multiple testing burden as well as noise in the "observed mutation rate/spectrum phenotype" due to rarity and randomness of mutation events. To overcome these limitations, the authors developed a new method that they named "inter-haplotype distance" (IHD), which in short measures the difference in the aggregate mutation spectrum between two groups of individuals with distinct genotypes at a specific genomic locus. With this new approach, they recover the previously reported candidate mutator locus (near Mutyh gene) and identify a new candidate variant that modifies the C>A mutation rate on only one genetic background. Using more rigorous statistical testing, the authors show convincingly synergistic epistatic effects between the mutator alleles at the two loci.

Overall, the analyses presented are well done and provide convincing evidence for the major findings, including the new candidate mutator locus and its epistatic interaction with the Mutyh locus. The new IHD method introduced is innovative and outperforms traditional QTL mapping under certain conditions, but some of its statistical properties and limitations are not fully described. The part that describes how the method works is a little hard to follow (partially due to the confusing name; see comments below), but the rest of the paper is very well written. Below are my comments and suggestions on how to improve, but I identify no major issues.

The name of the new method "inter-haplotype distance" is more confusing than helpful, as the haplotype information is not critical for implementing this method. First, the mutation spectrum is aggregated genome-wide regardless of the haplotypes where the mutations are found. Second, the only critical haplotype information is that at the focal site (i.e., the locus that is tested for association): individuals are aggregated together when they belong to the same "haplotype group" at the focal site. However, for the classification step, haplotype information is not really necessary: individuals can be grouped based on their genotypes at the given locus (e.g., AA vs AB). As the authors mentioned, this method can be potentially applied to other mutation datasets, where haplotype information may well be unavailable. I hope the authors can reconsider the name and remove the term "haplotype" (perhaps something like "inter-genotype distance"?) to avoid giving the wrong impression that haplotype information is critical for applying this method.

The biggest advantage of the IHD method over QTL mapping is alleviation of the multiple testing burden, as one comparison tests for any changes in the mutation spectrum, including simultaneous, small changes in the relative abundance of multiple mutation types. Based on this, the authors claim that IHD is more powerful to detect a mutator allele that affects multiple mutation types. Although logically plausible, it is unclear under what quantitative conditions IHD can actually have greater power over QTL. It will be helpful to support this claim by providing some simulation results.

The flip side of this advantage of IHD is that, when a significant association is detected, it is not immediately clear which mutation type is driving the signal. Related to this, it is unclear how the authors reached the point that "...the C>A mutator phenotype associated with the locus on chromosome 6", when they only detected significant IHD signal at rs46276051 (on Chr6), when conditioning on D genotypes at the rs27509845 (on Chr4) and no significant signal for any 1-mer mutation type by traditional mapping. The authors need to explain how they deduced that C>A mutation is the major source of the signal. In addition, beyond C>A mutations, can mutation types other than C>A contribute to the IHD signal at rs46276051? More generally, I hope the authors can provide some guidelines on how to narrow a significant IHD signal to specific candidate mutation type(s) affected, which will make the method more useful to other researchers.

To account for differential relatedness between the inbred lines, the authors regressed the cosine distance between the two aggregate mutation spectra on the genome-wide genetic similarity and take the residual as the adjusted test metric. What is the value of the slope from this regression? If significantly non-zero, this would support a polygenic architecture of the mutation spectrum phenotype, which could be interesting. If not, is this adjustment really necessary? In addition, is the intercept assumed to be zero for this regression, and does such an assumption matter? I would appreciate seeing a supplemental figure on this regression.

Reviewer #2 (Public Review):

In this paper Sasani, Quinlan and Harris present a new method for identifying genetic factors affecting germline mutation, which is particularly applicable to genome sequence data from mutation accumulation experiments using recombinant inbred lines. These are experiments where laboratory organisms are crossed and repeatedly inbred for many generations, to build up a substantial number of identifiable germline mutations. The authors apply their method to such data from mice, and identify two genetic factors at two separate genetic loci. Clear evidence of such factors has been difficult to obtain, so this is an important finding. They further show evidence of an epistatic interaction between these factors (meaning that they do not act independently in their effects on the germline mutation process). This is exciting because such interactions are difficult to detect and few if any other examples have been studied.

The authors present a careful comparison of their method to another similar approach, quantitative trait locus (QTL) analysis, and demonstrate that in situations such as the one analysed it has greater power to detect genetic factors with a certain magnitude of effect. They also test the statistical properties of their method using simulated data and permutation tests. Overall the analysis is rigorous and well motivated, and the methods explained clearly.

The main limitation of the approach is that it is difficult to see how it might be applied beyond the context of mutation accumulation experiments using recombinant inbred lines. This is because the signal it detects, and hence its power, is based on the number of extra accumulated mutations linked to (i.e. on the same chromosome as) the mutator allele. In germline mutation studies of wild populations the number of generations involved (and hence the total number of mutations) is typically small, or else the mutator allele becomes unlinked from the mutations it has caused (due to recombination), or is lost from the population altogether (due to chance or perhaps selection against its deleterious consequences).

Nevertheless, accumulation lines are a common and well established experimental approach to studying mutation processes in many organisms, so the new method could have wide application and impact on our understanding of this fundamental biological process.

The evidence presented for an epistatic interaction is convincing, and the authors suggest some plausible potential mechanisms for how this interaction might arise, involving the DNA repair machinery and based on previous studies of the proteins implicated. However as with all such findings, given the higher degree of complexity of the proposed model it needs to be treated with greater caution, perhaps until replicated in a separate dataset or demonstrated in follow-up experiments exploring the pathway itself.

Reviewer #3 (Public Review):

Sasani et al. develop and implement a new method for mutator allele discovery in the BXD mouse population. This new "IHD" method carries several notable strengths, including the ability to aggregate de novo mutations across individuals to reduce data sparsity and to combine mutation rate frequencies across multiple nucleotide contexts into a single estimate. These advantages may render the IHD method better suited to mutator discovery under certain scenarios, as compared to conventional QTL or association mapping. Overall, the theoretical premise of the IHD method is judged to be both strong and innovative, and careful simulation studies benchmark its power.

The authors then apply their method to the BXD mouse recombinant inbred mapping population. As proof-of-principle, they first successfully re-identify a known mutator locus in this population on chr4. Next, to assess possible genetic interactions involving this known mutator, Sasani et al. condition on the chr4 mutator genotype and reimplement the IHD scan. This strategy led them to identify a second locus on chr6 that interacts epistatically with the chr4 locus; mice with "D" alleles at both loci exhibit a significantly increased burden of C>A de novo mutations, even though mice with the D allele at the chr6 locus alone show no appreciable increase in the C>A mutation fraction. This exciting discovery not only adds to the catalog of known mutator alleles, but also reveals key aspects of mutator biology. Notably, this finding reinforces the hypothesis that segregating variants in genes associated with DNA repair influence germline mutation spectra. Further, Sasani et al.'s findings suggest that some mutators may lie dormant until recombined onto a permissive genetic background. This discovery could have intriguing implications for the evolution of mutators in natural populations.

Despite a high level of overall enthusiasm for this work, some weaknesses are identified in the IHD method, approach for nominating candidate genes within the newly identified chr6 locus, and the authors' conclusions.

Under simulated scenarios, the authors' new IHD method is not appreciably more powerful than conventional QTL mapping methods. While this does not diminish the rigor or novelty of the authors findings, it does temper enthusiasm for the IHD method's potential to uncover new mutators in other populations or datasets. Further, adaptation of this methodology to other datasets, including human trios or multigenerational families, will require some modification, which could present a barrier to broader community uptake. Notably, BXD mice are (mostly) inbred, justifying the authors consideration of just two genotype states at each locus, but this decision prevents out-of-the-box application to outbred populations and human genomic datasets. Lastly, some details of the IHD method are not clearly spelled out in the paper. In particular, it is unclear whether differences in BXD strain relatedness due to the breeding epoch structure are fully accounted for in permutations. The method's name - inter-haplotype distance - is also somewhat misleading, as it seems to imply that de novo mutations are aggregated at the scale of sub-chromosomal haplotype blocks, rather than across the whole genome.

Nominating candidates within the chr6 mutator locus requires an approach for defining a credible interval and excluding/including specific genes within that interval as candidates. Sasani et al. delimit their focal window to 5Mb on either side of the SNP with the most extreme P-value in their IHD scan. This strategy suffers from several weaknesses. First, no justification for using 10 Mb window, as opposed to, e.g., a 5 Mb window or a window size delimited by a specific threshold of P-value drop, is given, rendering the approach rather ad hoc. Second, within their focal 10Mb window, the authors prioritize genes with annotated functions in DNA repair that harbor protein coding variants between the B6 and D2 founder strains. While the logic for focusing on known DNA repair genes is sensible, this locus also houses an appreciable number of genes that are not functionally annotated, but could, conceivably, perform relevant biological roles. These genes should not be excluded outright, especially if they are expressed in the germline. Further, the vast majority of functional SNPs are non-coding, (including the likely causal variant at the chr4 mutator previously identified in the BXD population). Thus, the author's decision to focus most heavily on coding variants is not well-justified. Sasani et al. dedicate considerable speculation in the manuscript to the likely identity of the causal variant, ultimately favoring the conclusion that the causal variant is a predicted deleterious missense variant in Mbd4. However, using a 5Mb window centered on the peak IHD scan SNP, rather than a 10Mb window, Mbd4 would be excluded. Further, SNP functional prediction accuracy is modest [e.g., PMID 28511696], and exclusion of the missense variant in Ogg1 due its benign prediction is potentially premature, especially given the wealth of functional data implicating Ogg1 in C>A mutations in house mice. Finally, the DNA repair gene closest to the peak IHD SNP is Rad18, which the authors largely exclude as a candidate.

Additionally, some claims in the paper are not well-supported by the author's data. For example, in the Discussion, the authors assert that "multiple mutator alleles have spontaneously arisen during the evolutionary history of inbred laboratory mice" and that "... mutational pressure can cause mutation rates to rise in just a few generations of relaxed selection in captivity". However, these statements are undercut by data in this paper and the authors' prior publication demonstrating that a number of candidate variants are segregating in natural mouse populations. These variants almost certainly did not emerge de novo in laboratory colonies, but were inherited from their wild mouse ancestors. Further, the wild mouse population genomic dataset used by the authors falls far short of comprehensively sampling wild mouse diversity; variants in laboratory populations could derive from unsampled wild populations.

Finally, the implications of a discovering a mutator whose expression is potentially conditional on the genotype at a second locus are not raised in the Discussion. While not a weakness per se, this omission is perceived to be a missed opportunity to emphasize what, to this reviewer, is one of the most exciting impacts of this work. The potential background dependence of mutator expression could partially shelter it from the action of selection, allowing the allele persist in populations. This finding bears on theoretical models of mutation rate evolution and may have important implications for efforts to map additional mutator loci. It seems unfortunate to not elevate these points.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation