Disentangling the effect on genomic diversity of natural selection from that of demography is notoriously difficult, but necessary to properly reconstruct the history of species. Here, we use high-quality human genomic data to show that purifying selection at linked sites (i.e. background selection, BGS) and GC-biased gene conversion (gBGC) together affect as much as 95% of the variants of our genome. We find that the magnitude and relative importance of BGS and gBGC are largely determined by variation in recombination rate and base composition. Importantly, synonymous sites and non-transcribed regions are also affected, albeit to different degrees. Their use for demographic inference can lead to strong biases. However, by conditioning on genomic regions with recombination rates above 1.5 cM/Mb and mutation types (C↔G, A↔T), we identify a set of SNPs that is mostly unaffected by BGS or gBGC, and that avoids these biases in the reconstruction of human history.
All data generated and script to analyse them is provided on the dryad repository: http://dx.doi.org/10.5061/dryad.t76fk80
Data from: Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferencesAvailable at Dryad Digital Repository under a CC0 Public Domain Dedication.
- Laurent Excoffier
- Laurent Excoffier
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
- Krishna Veeramah, Stony Brook University, United States
© 2018, Pouyet et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
The Sanger Excellence Fellowship has been established to increase the representation of researchers with Black-heritage backgrounds at a leading research centre in the UK.
Maintaining germline genome integrity is essential and enormously complex. Although many proteins are involved in DNA replication, proofreading, and repair, mutator alleles have largely eluded detection in mammals. DNA replication and repair proteins often recognize sequence motifs or excise lesions at specific nucleotides. Thus, we might expect that the spectrum of de novo mutations – the frequencies of C>T, A>G, etc. – will differ between genomes that harbor either a mutator or wild-type allele. Previously, we used quantitative trait locus mapping to discover candidate mutator alleles in the DNA repair gene Mutyh that increased the C>A germline mutation rate in a family of inbred mice known as the BXDs (Sasani et al., 2022, Ashbrook et al., 2021). In this study we developed a new method to detect alleles associated with mutation spectrum variation and applied it to mutation data from the BXDs. We discovered an additional C>A mutator locus on chromosome 6 that overlaps Ogg1, a DNA glycosylase involved in the same base-excision repair network as Mutyh (David et al., 2007). Its effect depends on the presence of a mutator allele near Mutyh, and BXDs with mutator alleles at both loci have greater numbers of C>A mutations than those with mutator alleles at either locus alone. Our new methods for analyzing mutation spectra reveal evidence of epistasis between germline mutator alleles and may be applicable to mutation data from humans and other model organisms.