A community-maintained standard library of population genetic models
Abstract
The explosion in population genomic data demands ever more complex modes of analysis, and increasingly these analyses depend on sophisticated simulations. Re-cent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.
Data availability
All resources are available from https://github.com/popsim-consortium/stdpopsim
Article and author information
Author details
Funding
National Institute of General Medical Sciences (R35GM119856)
- Christopher C Kyriazis
- Kirk E Lohmueller
National Institute of General Medical Sciences (R01GM117241)
- Jeffrey R Adrion
- Andrew D Kern
National Institute of General Medical Sciences (R01GM127348)
- Travis J Struck
- Ryan N Gutenkunst
National Institute of General Medical Sciences (R00HG008696)
- Ariella L Gladstein
- Daniel R Schrider
National Institute of General Medical Sciences (R35GM127070)
- Noah Dukler
- Adam Siepel
National Human Genome Research Institute (R01HG010346)
- Noah Dukler
- Adam Siepel
Villum Fonden (00025300)
- Graham Gower
- Fernando Racimo
UC MEXUS-CONACYT
- Diego Ortega Del Vecchyo
Robertson Foundation
- Jerome Kelleher
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Reviewing Editor
- Graham Coop, University of California, Davis, United States
Publication history
- Received: January 7, 2020
- Accepted: June 15, 2020
- Accepted Manuscript published: June 23, 2020 (version 1)
- Accepted Manuscript updated: June 25, 2020 (version 2)
- Version of Record published: August 19, 2020 (version 3)
Copyright
© 2020, Adrion et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 4,453
- Page views
-
- 474
- Downloads
-
- 27
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Further reading
-
- Computational and Systems Biology
- Evolutionary Biology
Studies of protein fitness landscapes reveal biophysical constraints guiding protein evolution and empower prediction of functional proteins. However, generalisation of these findings is limited due to scarceness of systematic data on fitness landscapes of proteins with a defined evolutionary relationship. We characterized the fitness peaks of four orthologous fluorescent proteins with a broad range of sequence divergence. While two of the four studied fitness peaks were sharp, the other two were considerably flatter, being almost entirely free of epistatic interactions. Mutationally robust proteins, characterized by a flat fitness peak, were not optimal templates for machine-learning-driven protein design – instead, predictions were more accurate for fragile proteins with epistatic landscapes. Our work paves insights for practical application of fitness landscape heterogeneity in protein engineering.
-
- Computational and Systems Biology
- Evolutionary Biology
Using a neural network to predict how green fluorescent proteins respond to genetic mutations illuminates properties that could help design new proteins.