Broad-scale variation in human genetic diversity levels is predicted by purifying selection on coding and non-coding elements

  1. David A Murphy  Is a corresponding author
  2. Eyal Elyashiv
  3. Guy Amster
  4. Guy Sella  Is a corresponding author
  1. Oklahoma Medical Research Foundation, United States
  2. Columbia University, United States

Abstract

Analyses of genetic variation in many taxa have established that neutral genetic diversity is shaped by natural selection at linked sites. Whether the mode of selection is primarily the fixation of strongly beneficial alleles (selective sweeps) or purifying selection on deleterious mutations (background selection) remains unknown, however. We address this question in humans by fitting a model of the joint effects of selective sweeps and background selection to autosomal polymorphism data from the 1000 Genomes Project. After controlling for variation in mutation rates along the genome, a model of background selection alone explains ~60% of the variance in diversity levels at the megabase scale. Adding the effects of selective sweeps driven by adaptive substitutions to the model does not improve the fit, and when both modes of selection are considered jointly, selective sweeps are estimated to have had little or no effect on linked neutral diversity. The regions under purifying selection are best predicted by phylogenetic conservation, with ~80% of the deleterious mutations affecting neutral diversity occurring in non-exonic regions. Thus, background selection is the dominant mode of linked selection in humans, with marked effects on diversity levels throughout autosomes.

Data availability

Shared data can be found at github.com/sellalab/HumanLinkedSelectionMaps. This repository includes fully documented code for: downloading and processing public datasets used, running inferences, analyzing results, and generating all figures from the manuscript. This repository also includes B-maps for all "best-fitting" models described in the manuscript. Customized CADD scores with bStatistic removed are available on Data Dryad at https://doi.org/10.5061/dryad.n8pk0p2x0.

The following data sets were generated
The following previously published data sets were used

Article and author information

Author details

  1. David A Murphy

    Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, United States
    For correspondence
    david-murphy@omrf.org
    Competing interests
    No competing interests declared.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-0715-3355
  2. Eyal Elyashiv

    Department of Biological Sciences, Columbia University, New York, United States
    Competing interests
    Eyal Elyashiv, is affiliated with MyHeritage. The author has no financial interests to declare..
  3. Guy Amster

    Department of Biological Sciences, Columbia University, New York, United States
    Competing interests
    Guy Amster, is affiliated with Flatiron Health Inc. The author has no financial interests to declare..
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9108-5200
  4. Guy Sella

    Department of Biological Sciences, Columbia University, New York, United States
    For correspondence
    gs2747@columbia.edu
    Competing interests
    No competing interests declared.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5239-7930

Funding

NIH (GM115889)

  • Guy Sella

NIH (T32GM008798)

  • David A Murphy

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2022, Murphy et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,708
    views
  • 317
    downloads
  • 24
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. David A Murphy
  2. Eyal Elyashiv
  3. Guy Amster
  4. Guy Sella
(2022)
Broad-scale variation in human genetic diversity levels is predicted by purifying selection on coding and non-coding elements
eLife 11:e76065.
https://doi.org/10.7554/eLife.76065

Share this article

https://doi.org/10.7554/eLife.76065

Further reading

    1. Evolutionary Biology
    Mattias Siljestam, Claus Rueffler
    Research Article

    The majority of highly polymorphic genes are related to immune functions and with over 100 alleles within a population, genes of the major histocompatibility complex (MHC) are the most polymorphic loci in vertebrates. How such extraordinary polymorphism arose and is maintained is controversial. One possibility is heterozygote advantage (HA), which can in principle maintain any number of alleles, but biologically explicit models based on this mechanism have so far failed to reliably predict the coexistence of significantly more than ten alleles. We here present an eco-evolutionary model showing that evolution can result in the emergence and maintenance of more than 100 alleles under HA if the following two assumptions are fulfilled: first, pathogens are lethal in the absence of an appropriate immune defence; second, the effect of pathogens depends on host condition, with hosts in poorer condition being affected more strongly. Thus, our results show that HA can be a more potent force in explaining the extraordinary polymorphism found at MHC loci than currently recognized.

    1. Cancer Biology
    2. Evolutionary Biology
    Arman Angaji, Michel Owusu ... Johannes Berg
    Research Article

    In growing cell populations such as tumours, mutations can serve as markers that allow tracking the past evolution from current samples. The genomic analyses of bulk samples and samples from multiple regions have shed light on the evolutionary forces acting on tumours. However, little is known empirically on the spatio-temporal dynamics of tumour evolution. Here, we leverage published data from resected hepatocellular carcinomas, each with several hundred samples taken in two and three dimensions. Using spatial metrics of evolution, we find that tumour cells grow predominantly uniformly within the tumour volume instead of at the surface. We determine how mutations and cells are dispersed throughout the tumour and how cell death contributes to the overall tumour growth. Our methods shed light on the early evolution of tumours in vivo and can be applied to high-resolution data in the emerging field of spatial biology.