Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences

Abstract

Disentangling the effect on genomic diversity of natural selection from that of demography is notoriously difficult, but necessary to properly reconstruct the history of species. Here, we use high-quality human genomic data to show that purifying selection at linked sites (i.e. background selection, BGS) and GC-biased gene conversion (gBGC) together affect as much as 95% of the variants of our genome. We find that the magnitude and relative importance of BGS and gBGC are largely determined by variation in recombination rate and base composition. Importantly, synonymous sites and non-transcribed regions are also affected, albeit to different degrees. Their use for demographic inference can lead to strong biases. However, by conditioning on genomic regions with recombination rates above 1.5 cM/Mb and mutation types (C↔G, A↔T), we identify a set of SNPs that is mostly unaffected by BGS or gBGC, and that avoids these biases in the reconstruction of human history.

Data availability

All data generated and script to analyse them is provided on the dryad repository: http://dx.doi.org/10.5061/dryad.t76fk80

The following data sets were generated
The following previously published data sets were used

Article and author information

Author details

  1. Fanny Pouyet

    Institute of Ecology and Evolution, University of Bern, Berne, Switzerland
    For correspondence
    fanny.pouyet@iee.unibe.ch
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5614-6998
  2. Simon Aeschbacher

    Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
    Competing interests
    The authors declare that no competing interests exist.
  3. Alexandre Thiéry

    Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
    Competing interests
    The authors declare that no competing interests exist.
  4. Laurent Excoffier

    Institute of Ecology and Evolution, University of Bern, Bern, Switzerland
    For correspondence
    laurent.excoffier@iee.unibe.ch
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7507-6494

Funding

Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (310030B-166605)

  • Laurent Excoffier

University of Berkeley (Visiting Miller Professorship)

  • Laurent Excoffier

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Reviewing Editor

  1. Krishna Veeramah, Stony Brook University, United States

Version history

  1. Received: March 1, 2018
  2. Accepted: August 17, 2018
  3. Accepted Manuscript published: August 20, 2018 (version 1)
  4. Accepted Manuscript updated: August 23, 2018 (version 2)
  5. Version of Record published: October 9, 2018 (version 3)

Copyright

© 2018, Pouyet et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 8,415
    views
  • 1,017
    downloads
  • 116
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Fanny Pouyet
  2. Simon Aeschbacher
  3. Alexandre Thiéry
  4. Laurent Excoffier
(2018)
Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences
eLife 7:e36317.
https://doi.org/10.7554/eLife.36317

Share this article

https://doi.org/10.7554/eLife.36317

Further reading

    1. Evolutionary Biology
    2. Genetics and Genomics
    Yannick Schäfer, Katja Palitzsch ... Jaanus Suurväli
    Research Article Updated

    Copy number variation in large gene families is well characterized for plant resistance genes, but similar studies are rare in animals. The zebrafish (Danio rerio) has hundreds of NLR immune genes, making this species ideal for studying this phenomenon. By sequencing 93 zebrafish from multiple wild and laboratory populations, we identified a total of 1513 NLRs, many more than the previously known 400. Approximately half of those are present in all wild populations, but only 4% were found in 80% or more of the individual fish. Wild fish have up to two times as many NLRs per individual and up to four times as many NLRs per population than laboratory strains. In contrast to the massive variability of gene copies, nucleotide diversity in zebrafish NLR genes is very low: around half of the copies are monomorphic and the remaining ones have very few polymorphisms, likely a signature of purifying selection.

    1. Computational and Systems Biology
    2. Genetics and Genomics
    Ardalan Naseri, Degui Zhi, Shaojie Zhang
    Research Article

    Runs of homozygosity (ROH) segments, contiguous homozygous regions in a genome were traditionally linked to families and inbred populations. However, a growing literature suggests that ROHs are ubiquitous in outbred populations. Still, most existing genetic studies of ROH in populations are limited to aggregated ROH content across the genome, which does not offer the resolution for mapping causal loci. This limitation is mainly due to a lack of methods for the efficient identification of shared ROH diplotypes. Here, we present a new method, ROH-DICE, to find large ROH diplotype clusters, sufficiently long ROHs shared by a sufficient number of individuals, in large cohorts. ROH-DICE identified over 1 million ROH diplotypes that span over 100 SNPs and are shared by more than 100 UK Biobank participants. Moreover, we found significant associations of clustered ROH diplotypes across the genome with various self-reported diseases, with the strongest associations found between the extended HLA region and autoimmune disorders. We found an association between a diplotype covering the HFE gene and hemochromatosis, even though the well-known causal SNP was not directly genotyped or imputed. Using a genome-wide scan, we identified a putative association between carriers of an ROH diplotype in chromosome 4 and an increase in mortality among COVID-19 patients (P-value=1.82×10-11). In summary, our ROH-DICE method, by calling out large ROH diplotypes in a large outbred population, enables further population genetics into the demographic history of large populations. More importantly, our method enables a new genome-wide mapping approach for finding disease-causing loci with multi-marker recessive effects at a population scale.