Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferences
Abstract
Disentangling the effect on genomic diversity of natural selection from that of demography is notoriously difficult, but necessary to properly reconstruct the history of species. Here, we use high-quality human genomic data to show that purifying selection at linked sites (i.e. background selection, BGS) and GC-biased gene conversion (gBGC) together affect as much as 95% of the variants of our genome. We find that the magnitude and relative importance of BGS and gBGC are largely determined by variation in recombination rate and base composition. Importantly, synonymous sites and non-transcribed regions are also affected, albeit to different degrees. Their use for demographic inference can lead to strong biases. However, by conditioning on genomic regions with recombination rates above 1.5 cM/Mb and mutation types (C↔G, A↔T), we identify a set of SNPs that is mostly unaffected by BGS or gBGC, and that avoids these biases in the reconstruction of human history.
Data availability
All data generated and script to analyse them is provided on the dryad repesitory: http://datadryad.org/review?doi=doi:10.5061/dryad.t76fk80
-
Data from: Background selection and biased gene conversion affect more than 95% of the human genome and bias demographic inferencesAvailable at Dryad Digital Repository under a CC0 Public Domain Dedication.
Article and author information
Author details
Funding
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (310030B-166605)
- Laurent Excoffier
University of Berkeley (Visiting Miller Professorship)
- Laurent Excoffier
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2018, Pouyet et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 8,625
- views
-
- 1,036
- downloads
-
- 127
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Genetics and Genomics
RNA-binding proteins (RBPs) perform diverse functions including the regulation of chromatin dynamics and the coupling of transcription with RNA processing. However, our understanding of their actions in mammalian neurons remains limited. Using affinity purification, yeast-two-hybrid and proximity ligation assays, we identified interactions of multiple RBPs with neural retina leucine (NRL) zipper, a Maf-family transcription factor critical for retinal rod photoreceptor development and function. In addition to splicing, many NRL-interacting RBPs are associated with R-loops, which form during transcription and increase during photoreceptor maturation. Focusing on DHX9 RNA helicase, we demonstrate that its expression is modulated by NRL and that the NRL–DHX9 interaction is positively influenced by R-loops. ssDRIP-Seq analysis reveals both stranded and unstranded R-loops at distinct genomic elements, characterized by active and inactive epigenetic signatures and enriched at neuronal genes. NRL binds to both types of R-loops, suggesting an epigenetically independent function. Our findings suggest additional functions of NRL during transcription and highlight complex interactions among transcription factors, RBPs, and R-loops in regulating photoreceptor gene expression in the mammalian retina.
-
- Computational and Systems Biology
- Genetics and Genomics
Root causal gene expression levels – or root causal genes for short – correspond to the initial changes to gene expression that generate patient symptoms as a downstream effect. Identifying root causal genes is critical towards developing treatments that modify disease near its onset, but no existing algorithms attempt to identify root causal genes from data. RNA-sequencing (RNA-seq) data introduces challenges such as measurement error, high dimensionality and non-linearity that compromise accurate estimation of root causal effects even with state-of-the-art approaches. We therefore instead leverage Perturb-seq, or high-throughput perturbations with single-cell RNA-seq readout, to learn the causal order between the genes. We then transfer the causal order to bulk RNA-seq and identify root causal genes specific to a given patient for the first time using a novel statistic. Experiments demonstrate large improvements in performance. Applications to macular degeneration and multiple sclerosis also reveal root causal genes that lie on known pathogenic pathways, delineate patient subgroups and implicate a newly defined omnigenic root causal model.