Mutation saturation for fitness effects at human CpG sites
Abstract
Whole exome sequences have now been collected for millions of humans, with the related goals of identifying pathogenic mutations in patients and establishing reference repositories of data from unaffected individuals. As a result, we are approaching an important limit, in which datasets are large enough that, in the absence of natural selection, every highly mutable site will have experienced at least one mutation in the genealogical history of the sample. Here, we focus on CpG sites that are methylated in the germline and experience mutations to T at an elevated rate of ~10-7 per site per generation; considering synonymous mutations in a sample of 390,000 individuals, ~99% of such CpG sites harbor a C/T polymorphism. Methylated CpG sites provide a natural mutation saturation experiment for fitness effects: as we show, at current sample sizes, not seeing a non-synonymous polymorphism is indicative of strong selection against that mutation. We rely on this idea in order to directly identify a subset of CpG transitions that are likely to be highly deleterious, including ~27% of possible loss-of-function mutations, and up to 20% of possible missense mutations, depending on the type of functional site in which they occur. Unlike methylated CpGs, most mutation types, with rates on the order of 10-8 or 10-9, remain very far from saturation. We discuss what these findings imply for interpreting the potential clinical relevance of mutations from their presence or absence in reference databases and for inferences about the fitness effects of new mutations.
Data availability
All source data are freely available to researchers, with sources provided in the manuscript. Data and code to generate the figures is available at https://github.com/agarwal-i/cpg_saturation.
Article and author information
Author details
Funding
National Institutes of Health (GM122975)
- Molly Przeworski
National Institutes of Health (GM121372)
- Molly Przeworski
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2021, Agarwal & Przeworski
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 2,382
- views
-
- 264
- downloads
-
- 26
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Evolutionary Biology
Mammalian gut microbiomes are highly dynamic communities that shape and are shaped by host aging, including age-related changes to host immunity, metabolism, and behavior. As such, gut microbial composition may provide valuable information on host biological age. Here, we test this idea by creating a microbiome-based age predictor using 13,563 gut microbial profiles from 479 wild baboons collected over 14 years. The resulting ‘microbiome clock’ predicts host chronological age. Deviations from the clock’s predictions are linked to some demographic and socio-environmental factors that predict baboon health and survival: animals who appear old-for-age tend to be male, sampled in the dry season (for females), and have high social status (both sexes). However, an individual’s ‘microbiome age’ does not predict the attainment of developmental milestones or lifespan. Hence, in our host population, gut microbiome age largely reflects current, as opposed to past, social and environmental conditions, and does not predict the pace of host development or host mortality risk. We add to a growing understanding of how age is reflected in different host phenotypes and what forces modify biological age in primates.
-
- Evolutionary Biology
A major question in animal evolution is how genotypic and phenotypic changes are related, and another is when and whether ancient gene order is conserved in living clades. Chitons, the molluscan class Polyplacophora, retain a body plan and general morphology apparently little changed since the Palaeozoic. We present a comparative analysis of five reference quality genomes, including four de novo assemblies, covering all major chiton clades, and an updated phylogeny for the phylum. We constructed 20 ancient molluscan linkage groups (MLGs) and show that these are relatively conserved in bivalve karyotypes, but in chitons they are subject to re-ordering, rearrangement, fusion, or partial duplication and vary even between congeneric species. The largest number of novel fusions is in the most plesiomorphic clade Lepidopleurida, and the chitonid Liolophura japonica has a partial genome duplication, extending the occurrence of large-scale gene duplication within Mollusca. The extreme and dynamic genome rearrangements in this class stands in contrast to most other animals, demonstrating that chitons have overcome evolutionary constraints acting on other animal groups. The apparently conservative phenome of chitons belies rapid and extensive changes in genome.