Mutation saturation for fitness effects at human CpG sites

  1. Ipsita Agarwal  Is a corresponding author
  2. Molly Przeworski  Is a corresponding author
  1. Columbia University, United States

Abstract

Whole exome sequences have now been collected for millions of humans, with the related goals of identifying pathogenic mutations in patients and establishing reference repositories of data from unaffected individuals. As a result, we are approaching an important limit, in which datasets are large enough that, in the absence of natural selection, every highly mutable site will have experienced at least one mutation in the genealogical history of the sample. Here, we focus on CpG sites that are methylated in the germline and experience mutations to T at an elevated rate of ~10-7 per site per generation; considering synonymous mutations in a sample of 390,000 individuals, ~99% of such CpG sites harbor a C/T polymorphism. Methylated CpG sites provide a natural mutation saturation experiment for fitness effects: as we show, at current sample sizes, not seeing a non-synonymous polymorphism is indicative of strong selection against that mutation. We rely on this idea in order to directly identify a subset of CpG transitions that are likely to be highly deleterious, including ~27% of possible loss-of-function mutations, and up to 20% of possible missense mutations, depending on the type of functional site in which they occur. Unlike methylated CpGs, most mutation types, with rates on the order of 10-8 or 10-9, remain very far from saturation. We discuss what these findings imply for interpreting the potential clinical relevance of mutations from their presence or absence in reference databases and for inferences about the fitness effects of new mutations.

Data availability

All source data are freely available to researchers, with sources provided in the manuscript. Data and code to generate the figures is available at https://github.com/agarwal-i/cpg_saturation.

The following previously published data sets were used

Article and author information

Author details

  1. Ipsita Agarwal

    Department of Biological Sciences, Columbia University, New York, United States
    For correspondence
    ia2337@columbia.edu
    Competing interests
    No competing interests declared.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8537-0008
  2. Molly Przeworski

    Department of Systems Biology, Columbia University, New York, United States
    For correspondence
    mp3284@columbia.edu
    Competing interests
    Molly Przeworski, Senior editor, eLife.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5369-9009

Funding

National Institutes of Health (GM122975)

  • Molly Przeworski

National Institutes of Health (GM121372)

  • Molly Przeworski

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2021, Agarwal & Przeworski

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,217
    views
  • 253
    downloads
  • 24
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Ipsita Agarwal
  2. Molly Przeworski
(2021)
Mutation saturation for fitness effects at human CpG sites
eLife 10:e71513.
https://doi.org/10.7554/eLife.71513

Share this article

https://doi.org/10.7554/eLife.71513

Further reading

    1. Evolutionary Biology
    2. Genetics and Genomics
    Giulia Ferraretti, Paolo Abondio ... Marco Sazzini
    Research Article

    It is well established that several Homo sapiens populations experienced admixture with extinct human species during their evolutionary history. Sometimes, such a gene flow could have played a role in modulating their capability to cope with a variety of selective pressures, thus resulting in archaic adaptive introgression events. A paradigmatic example of this evolutionary mechanism is offered by the EPAS1 gene, whose most frequent haplotype in Himalayan highlanders was proved to reduce their susceptibility to chronic mountain sickness and to be introduced in the gene pool of their ancestors by admixture with Denisovans. In this study, we aimed at further expanding the investigation of the impact of archaic introgression on more complex adaptive responses to hypobaric hypoxia evolved by populations of Tibetan/Sherpa ancestry, which have been plausibly mediated by soft selective sweeps and/or polygenic adaptations rather than by hard selective sweeps. For this purpose, we used a combination of composite-likelihood and gene network-based methods to detect adaptive loci in introgressed chromosomal segments from Tibetan WGS data and to shortlist those presenting Denisovan-like derived alleles that participate to the same functional pathways and are absent in populations of African ancestry, which are supposed to do not have experienced Denisovan admixture. According to this approach, we identified multiple genes putatively involved in archaic introgression events and that, especially as regards TBC1D1, RASGRF2, PRKAG2, and KRAS, have plausibly contributed to shape the adaptive modulation of angiogenesis and of certain cardiovascular traits in high-altitude Himalayan peoples. These findings provided unprecedented evidence about the complexity of the adaptive phenotype evolved by these human groups to cope with challenges imposed by hypobaric hypoxia, offering new insights into the tangled interplay of genetic determinants that mediates the physiological adjustments crucial for human adaptation to the high-altitude environment.

    1. Evolutionary Biology
    Tristan Roget, Claire Macmurray ... Michael Rera
    Research Article

    Signs of ageing become apparent only late in life, after organismal development is finalized. Ageing, most notably, decreases an individual’s fitness. As such, it is most commonly perceived as a non-adaptive force of evolution and considered a by-product of natural selection. Building upon the evolutionarily conserved age-related Smurf phenotype, we propose a simple mathematical life-history trait model in which an organism is characterized by two core abilities: reproduction and homeostasis. Through the simulation of this model, we observe (1) the convergence of fertility’s end with the onset of senescence, (2) the relative success of ageing populations, as compared to non-ageing populations, and (3) the enhanced evolvability (i.e. the generation of genetic variability) of ageing populations. In addition, we formally demonstrate the mathematical convergence observed in (1). We thus theorize that mechanisms that link the timing of fertility and ageing have been selected and fixed over evolutionary history, which, in turn, explains why ageing populations are more evolvable and therefore more successful. Broadly speaking, our work suggests that ageing is an adaptive force of evolution.