Broad-scale variation in human genetic diversity levels is predicted by purifying selection on coding and non-coding elements

Abstract
Data availability
Article and author information
Metrics

Abstract

Analyses of genetic variation in many taxa have established that neutral genetic diversity is shaped by natural selection at linked sites. Whether the mode of selection is primarily the fixation of strongly beneficial alleles (selective sweeps) or purifying selection on deleterious mutations (background selection) remains unknown, however. We address this question in humans by fitting a model of the joint effects of selective sweeps and background selection to autosomal polymorphism data from the 1000 Genomes Project. After controlling for variation in mutation rates along the genome, a model of background selection alone explains ~60% of the variance in diversity levels at the megabase scale. Adding the effects of selective sweeps driven by adaptive substitutions to the model does not improve the fit, and when both modes of selection are considered jointly, selective sweeps are estimated to have had little or no effect on linked neutral diversity. The regions under purifying selection are best predicted by phylogenetic conservation, with ~80% of the deleterious mutations affecting neutral diversity occurring in non-exonic regions. Thus, background selection is the dominant mode of linked selection in humans, with marked effects on diversity levels throughout autosomes.

Data availability

Shared data can be found at github.com/sellalab/HumanLinkedSelectionMaps. This repository includes fully documented code for: downloading and processing public datasets used, running inferences, analyzing results, and generating all figures from the manuscript. This repository also includes B-maps for all "best-fitting" models described in the manuscript. Customized CADD scores with bStatistic removed are available on Data Dryad at https://doi.org/10.5061/dryad.n8pk0p2x0.

The following data sets were generated

1. Murphy DA
2. et al
(2022) data from: Broad-scale variation in human genetic diversity levels is predicted by purifying selection on coding and non-coding elements
Dryad Digital Repository, doi:10.5061/dryad.n8pk0p2x0.

https://dx.doi.org/10.5061/dryad.n8pk0p2x0

The following previously published data sets were used

1. 1000 Genomes Project Consortium
(2015) Data from: A global reference for human genetic variation
doi:10.1038/nature15393.

https://www.internationalgenome.org/data/
1. Karolchik
2. D
(2004) Data from: The UCSC Table Browser data retrieval tool
doi:10.1093/nar/gkh103.

https://genome.ucsc.edu/
1. Hinch et al
(2011) Data from: The landscape of recombination in African Americans
doi:10.1038/nature10336.

https://www.well.ox.ac.uk/~anjali/AAmap/
1. Kircher et al
(2014) Data from: A general framework for estimating the relative pathogenicity of human genetic variants
doi:10.1038/ng.2892.

https://cadd.gs.washington.edu/download
1. Moore et al
(2020) Data from: Expanded encyclopaedias of DNA elements in the human and mouse genomes
doi:10.1038/s41586-020-2493-4.

https://screen.encodeproject.org/
1. Paten et al
(2008) Data from: Genome-wide nucleotide-level mammalian ancestor reconstruction
doi:10.1101/gr.076521.108.

https://ftp.ebi.ac.uk/pub/software/ensembl/jherrero/ancestral/
1. Barrett et al
(2013) Data from: NCBI GEO: archive for functional genomics data sets--update
GEO accession: GSM1127119.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM1127119
1. Jonsson et al
(2017) Data from: Parental influence on human germline de novo mutations in 1,548 trios from Iceland
doi:10.1038/nature24018.

https://www.nature.com/articles/nature24018#Sec28
1. Steinrucken et al
(2018) Data from: Model-based detection and analysis of introgressed Neanderthal ancestry in modern humans
doi:10.1111/mec.14565.

http://dical-admix.sourceforge.net/

Article and author information

Author details

David A Murphy

Genes and Human Disease Research Program, Oklahoma Medical Research Foundation, Oklahoma City, United States

For correspondence
david-murphy@omrf.org

Competing interests
No competing interests declared.

"This ORCID iD identifies the author of this article:" 0000-0002-0715-3355
Eyal Elyashiv

Department of Biological Sciences, Columbia University, New York, United States

Competing interests
Eyal Elyashiv, is affiliated with MyHeritage. The author has no financial interests to declare..
Guy Amster

Department of Biological Sciences, Columbia University, New York, United States

Competing interests
Guy Amster, is affiliated with Flatiron Health Inc. The author has no financial interests to declare..

"This ORCID iD identifies the author of this article:" 0000-0002-9108-5200
Guy Sella

Department of Biological Sciences, Columbia University, New York, United States

For correspondence
gs2747@columbia.edu

Competing interests
No competing interests declared.

"This ORCID iD identifies the author of this article:" 0000-0002-5239-7930

Funding

NIH (GM115889)

Guy Sella

NIH (T32GM008798)

David A Murphy

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.