1. Evolutionary Biology
  2. Genetics and Genomics
Download icon

Mutation saturation for fitness effects at human CpG sites

  1. Ipsita Agarwal  Is a corresponding author
  2. Molly Przeworski  Is a corresponding author
  1. Columbia University, United States
Research Article
  • Cited 0
  • Annotations
Cite this article as: eLife 2021;10:e71513 doi: 10.7554/eLife.71513


Whole exome sequences have now been collected for millions of humans, with the related goals of identifying pathogenic mutations in patients and establishing reference repositories of data from unaffected individuals. As a result, we are approaching an important limit, in which datasets are large enough that, in the absence of natural selection, every highly mutable site will have experienced at least one mutation in the genealogical history of the sample. Here, we focus on CpG sites that are methylated in the germline and experience mutations to T at an elevated rate of ~10-7 per site per generation; considering synonymous mutations in a sample of 390,000 individuals, ~99% of such CpG sites harbor a C/T polymorphism. Methylated CpG sites provide a natural mutation saturation experiment for fitness effects: as we show, at current sample sizes, not seeing a non-synonymous polymorphism is indicative of strong selection against that mutation. We rely on this idea in order to directly identify a subset of CpG transitions that are likely to be highly deleterious, including ~27% of possible loss-of-function mutations, and up to 20% of possible missense mutations, depending on the type of functional site in which they occur. Unlike methylated CpGs, most mutation types, with rates on the order of 10-8 or 10-9, remain very far from saturation. We discuss what these findings imply for interpreting the potential clinical relevance of mutations from their presence or absence in reference databases and for inferences about the fitness effects of new mutations.

Data availability

All source data are freely available to researchers, with sources provided in the manuscript. Data and code to generate the figures is available at https://github.com/agarwal-i/cpg_saturation.

The following previously published data sets were used

Article and author information

Author details

  1. Ipsita Agarwal

    Department of Biological Sciences, Columbia University, New York, United States
    For correspondence
    Competing interests
    No competing interests declared.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8537-0008
  2. Molly Przeworski

    Department of Systems Biology, Columbia University, New York, United States
    For correspondence
    Competing interests
    Molly Przeworski, Senior editor, eLife.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5369-9009


National Institutes of Health (GM122975)

  • Molly Przeworski

National Institutes of Health (GM121372)

  • Molly Przeworski

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Reviewing Editor

  1. Jeffrey Ross-Ibarra, University of California, Davis, United States

Publication history

  1. Received: June 22, 2021
  2. Accepted: November 21, 2021
  3. Accepted Manuscript published: November 22, 2021 (version 1)


© 2021, Agarwal & Przeworski

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.


  • 0

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Evolutionary Biology
    2. Microbiology and Infectious Disease
    Erik Bakkeren et al.
    Research Article

    Many plasmids encode antibiotic resistance genes. Through conjugation, plasmids can be rapidly disseminated. Previous work identified gut luminal donor/recipient blooms and tissue-lodged plasmid-bearing persister cells of the enteric pathogen Salmonella enterica serovar Typhimurium (S.Tm) that survive antibiotic therapy in host tissues, as factors promoting plasmid dissemination among Enterobacteriaceae. However, the buildup of tissue reservoirs and their contribution to plasmid spread await experimental demonstration. Here, we asked if re-seeding-plasmid acquisition-invasion cycles by S.Tm could serve to diversify tissue-lodged plasmid reservoirs, and thereby promote plasmid spread. Starting with intraperitoneal mouse infections, we demonstrate that S.Tm cells re-seeding the gut lumen initiate clonal expansion. Extended spectrum beta-lactamase (ESBL) plasmid-encoded gut luminal antibiotic degradation by donors can foster recipient survival under beta-lactam antibiotic treatment, enhancing transconjugant formation upon re-seeding. S.Tm transconjugants can subsequently re-enter host tissues introducing the new plasmid into the tissue-lodged reservoir. Population dynamics analyses pinpoint recipient migration into the gut lumen as rate-limiting for plasmid transfer dynamics in our model. Priority effects may be a limiting factor for reservoir formation in host tissues. Overall, our proof-of-principle data indicates that luminal antibiotic degradation and shuttling between the gut lumen and tissue-resident reservoirs can promote the accumulation and spread of plasmids within a host over time.

    1. Evolutionary Biology
    2. Stem Cells and Regenerative Medicine
    Michael J Abrams et al.
    Research Article

    Can limb regeneration be induced? Few have pursued this question, and an evolutionarily conserved strategy has yet to emerge. This study reports a strategy for inducing regenerative response in appendages, which works across three species that span the animal phylogeny. In Cnidaria, the frequency of appendage regeneration in the moon jellyfish Aurelia was increased by feeding with the amino acid L-leucine and the growth hormone insulin. In insects, the same strategy induced tibia regeneration in adult Drosophila. Finally, in mammals, L-leucine and sucrose administration induced digit regeneration in adult mice, including dramatically from mid-phalangeal amputation. The conserved effect of L-leucine and insulin/sugar suggests a key role for energetic parameters in regeneration induction. The simplicity by which nutrient supplementation can induce appendage regeneration provides a testable hypothesis across animals.