Mutation saturation for fitness effects at human CpG sites

  1. Ipsita Agarwal  Is a corresponding author
  2. Molly Przeworski  Is a corresponding author
  1. Columbia University, United States

Abstract

Whole exome sequences have now been collected for millions of humans, with the related goals of identifying pathogenic mutations in patients and establishing reference repositories of data from unaffected individuals. As a result, we are approaching an important limit, in which datasets are large enough that, in the absence of natural selection, every highly mutable site will have experienced at least one mutation in the genealogical history of the sample. Here, we focus on CpG sites that are methylated in the germline and experience mutations to T at an elevated rate of ~10-7 per site per generation; considering synonymous mutations in a sample of 390,000 individuals, ~99% of such CpG sites harbor a C/T polymorphism. Methylated CpG sites provide a natural mutation saturation experiment for fitness effects: as we show, at current sample sizes, not seeing a non-synonymous polymorphism is indicative of strong selection against that mutation. We rely on this idea in order to directly identify a subset of CpG transitions that are likely to be highly deleterious, including ~27% of possible loss-of-function mutations, and up to 20% of possible missense mutations, depending on the type of functional site in which they occur. Unlike methylated CpGs, most mutation types, with rates on the order of 10-8 or 10-9, remain very far from saturation. We discuss what these findings imply for interpreting the potential clinical relevance of mutations from their presence or absence in reference databases and for inferences about the fitness effects of new mutations.

Data availability

All source data are freely available to researchers, with sources provided in the manuscript. Data and code to generate the figures is available at https://github.com/agarwal-i/cpg_saturation.

The following previously published data sets were used

Article and author information

Author details

  1. Ipsita Agarwal

    Department of Biological Sciences, Columbia University, New York, United States
    For correspondence
    ia2337@columbia.edu
    Competing interests
    No competing interests declared.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8537-0008
  2. Molly Przeworski

    Department of Systems Biology, Columbia University, New York, United States
    For correspondence
    mp3284@columbia.edu
    Competing interests
    Molly Przeworski, Senior editor, eLife.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5369-9009

Funding

National Institutes of Health (GM122975)

  • Molly Przeworski

National Institutes of Health (GM121372)

  • Molly Przeworski

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

© 2021, Agarwal & Przeworski

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,316
    views
  • 259
    downloads
  • 25
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Ipsita Agarwal
  2. Molly Przeworski
(2021)
Mutation saturation for fitness effects at human CpG sites
eLife 10:e71513.
https://doi.org/10.7554/eLife.71513

Share this article

https://doi.org/10.7554/eLife.71513

Further reading

    1. Evolutionary Biology
    2. Microbiology and Infectious Disease
    Zach Hensel
    Short Report

    Accurate estimation of the effects of mutations on SARS-CoV-2 viral fitness can inform public-health responses such as vaccine development and predicting the impact of a new variant; it can also illuminate biological mechanisms including those underlying the emergence of variants of concern. Recently, Lan et al. reported a model of SARS-CoV-2 secondary structure and its underlying dimethyl sulfate reactivity data (Lan et al., 2022). I investigated whether base reactivities and secondary structure models derived from them can explain some variability in the frequency of observing different nucleotide substitutions across millions of patient sequences in the SARS-CoV-2 phylogenetic tree. Nucleotide basepairing was compared to the estimated ‘mutational fitness’ of substitutions, a measurement of the difference between a substitution’s observed and expected frequency that is correlated with other estimates of viral fitness (Bloom and Neher, 2023). This comparison revealed that secondary structure is often predictive of substitution frequency, with significant decreases in substitution frequencies at basepaired positions. Focusing on the mutational fitness of C→U, the most common type of substitution, I describe C→U substitutions at basepaired positions that characterize major SARS-CoV-2 variants; such mutations may have a greater impact on fitness than appreciated when considering substitution frequency alone.

    1. Evolutionary Biology
    Yiheng Zhang, Xing Wang ... Xiaoguang Yang
    Research Article

    Although fossil evidence suggests the existence of an early muscular system in the ancient cnidarian jellyfish from the early Cambrian Kuanchuanpu biota (ca. 535 Ma), south China, the mechanisms underlying the feeding and respiration of the early jellyfish are conjectural. Recently, the polyp inside the periderm of olivooids was demonstrated to be a calyx-like structure, most likely bearing short tentacles and bundles of coronal muscles at the edge of the calyx, thus presumably contributing to feeding and respiration. Here, we simulate the contraction and expansion of the microscopic periderm-bearing olivooid Quadrapyrgites via the fluid-structure interaction computational fluid dynamics (CFD) method to investigate their feeding and respiratory activities. The simulations show that the rate of water inhalation by the polyp subumbrella is positively correlated with the rate of contraction and expansion of the coronal muscles, consistent with the previous feeding and respiration hypothesis. The dynamic simulations also show that the frequent inhalation/exhalation of water through the periderm polyp expansion/contraction conducted by the muscular system of Quadrapyrgites most likely represents the ancestral feeding and respiration patterns of Cambrian sedentary medusozoans that predated the rhythmic jet-propelled swimming of the modern jellyfish. Most importantly for these Cambrian microscopic sedentary medusozoans, the increase of body size and stronger capacity of muscle contraction may have been indispensable in the stepwise evolution of active feeding and subsequent swimming in a higher flow (or higher Reynolds number) environment.