A community-maintained standard library of population genetic models

  1. Jeffrey R Adrion
  2. Christopher B Cole
  3. Noah Dukler
  4. Jared G Galloway
  5. Ariella L Gladstein
  6. Graham Gower
  7. Christopher C Kyriazis
  8. Aaron P Ragsdale
  9. Georgia Tsambos
  10. Franz Baumdicker
  11. Jedidiah Carlson
  12. Reed A Cartwright
  13. Arun Durvasula
  14. Ilan Gronau
  15. Bernard Y Kim
  16. Patrick McKenzie
  17. Philipp W Messer
  18. Ekaterina Noskova
  19. Diego Ortega-Del Vecchyo
  20. Fernando Racimo
  21. Travis J Struck
  22. Simon Gravel
  23. Ryan N Gutenkunst
  24. Kirk E Lohmueller
  25. Peter L Ralph
  26. Daniel R Schrider
  27. Adam Siepel
  28. Jerome Kelleher  Is a corresponding author
  29. Andrew D Kern  Is a corresponding author
  1. Department of Biology and Institute of Ecology and Evolution, University of Oregon, United States
  2. Weatherall Institute of Molecular Medicine, University of Oxford, United Kingdom
  3. Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, United States
  4. Department of Genetics, University of North Carolina at Chapel Hill, United States
  5. Lundbeck GeoGenetics Centre, Globe Institute, University of Copenhagen, Denmark
  6. Department of Ecology and Evolutionary Biology, University of California, Los Angeles, United States
  7. Department of Human Genetics, McGill University, Canada
  8. Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Australia
  9. Department of Mathematical Stochastics, University of Freiburg, Germany
  10. Department of Genome Sciences, University of Washington, United States
  11. The Biodesign Institute and The School of Life Sciences, Arizona State University, United States
  12. Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, United States
  13. The Efi Arazi School of Computer Science, Herzliya Interdisciplinary Center, Israel
  14. Department of Biology, Stanford University, United States
  15. Department of Ecology, Evolution, and Environmental Biology, Columbia University, United States
  16. Department of Computational BiologyCornell University, United States
  17. Computer Technologies Laboratory, ITMO University, Russian Federation
  18. International Laboratory for Human Genome Research, National Autonomous University of Mexico, Mexico
  19. Departmentof Molecular and Cellular Biology, University of Arizona, United States
  20. Department of Mathematics, University of Oregon, United States
  21. Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, United Kingdom
15 figures, 1 table and 1 additional file

Figures

Structure of stdpopsim.

(A) The hierarchical organization of the stdpopsim catalog contains all model simulation information within individual species (expanded information shown here for H. sapiens only). Each species is associated with a representation of the physical genome, and one or more genetic maps and demographic models. Dotted lines indicate that only a subset of these categories is shown. At right we show example code to specify and simulate models using (B) the python API or (C) the command line interface.

Comparing estimates of N(t) in humans.

Here we show estimates of population size over time (N(t)) inferred using four different methods: smc++, stairway plot, and MSMC with n=2 and n=8 samples. Data were generated by simulating replicate human genomes under the OutOfAfricaArchaicAdmixture_5R19 model (Ragsdale and Gravel, 2019) and using the HapMapII_GRCh37 genetic map (Frazer et al., 2007). From top to bottom, we show estimates for each of the three populations in the model (YRI, CEU, and CHB). In shades of blue we show the estimated N(t) trajectories for each of three replicates. As a proxy for the ‘truth’, in black we show inverse coalescence rates as calculated from the demographic model used for simulation (see text).

Comparing estimates of N(t) in Drosophila.

Population size over time (N(t)) estimated from an African population sample. Data were generated by simulating replicate D. melanogaster genomes under the African3Epoch_1S16 model (Sheehan and Song, 2016) with the genetic map of Comeron et al., 2012. In shades of blue we show the estimated N(t) trajectories for each replicate. As a proxy for the ‘truth’, in black we show inverse coalescence rates as calculated from the demographic model used for simulation (see text).

Parameters estimated using a multi-population human model.

Here we show estimates of N(t) inferred using ai, fastsimcoal2, and smc++. (A) Data were generated by simulating replicate human genomes under the OutOfAfrica_3G09 model and using the HapMapII_GRCh37 genetic map inferred in Frazer et al., 2007. (B) For ai and fastsimcoal2 we show parameters inferred by fitting the depicted IM model, which includes population sizes, migration rates, and a split time between CEU and YRI samples. (C) Population size estimates for each population (rows) from ai, fastsimcoal2, and smc++ (columns). In shades of blue we show N(t) trajectories estimated from each simulation, and in black simulated population sizes for the respective population. The population split time, TDIV, is shown at the bottom (simulated value in black and inferred values in blue), with a common x-axis to the population size panels.

Appendix 1—figure 1
Validating the SLiM engine backend under a genetic map.

Here, we validate our integration of the SLiM (Haller et al., 2019; Haller and Messer, 2019) engine backend. We show quantile-quantile plots between SLiM and msprime engines for three population genetic summary statistics: r2, Tajima’s π, and Tajima’s D. Additionally, we show runtimes for generating each simulation replicate. Data were generated by simulating 100 replicates of human chromosome 22 under the AncientEurasia_9K19 model (Kamm et al., 2019) using the HapMapII_GRCh37 genetic map (Frazer et al., 2007). 12 samples were drawn from each population (excluding basal Eurasians). From top to bottom, we show results using three scaling factors for the population sizes: Q = 1, Q = 10, and Q = 50. Kolmogorov-Smirnov two-sample test statistics (D) and p-values are shown, testing the null hypothesis that the quantiles were drawn from the same continuous distribution.

Appendix 1—figure 2
Validating the SLiM engine backend under uniform recombination.

Here, we validate our integration of the SLiM (Haller et al., 2019; Haller and Messer, 2019) engine backend. We show quantile-quantile plots between SLiM and msprime engines for three population genetic summary statistics: r2, Tajima’s π, and Tajima’s D. Additionally, we show runtimes for generating each simulation replicate. Data were generated by simulating 100 replicates of human chromosome 22 under the AncientEurasia_9K19 model (Kamm et al., 2019) using a uniform rate of recombination across the chromosome. 12 samples were drawn from each population (excluding basal Eurasians). From top to bottom, we show results using three scaling factors for the population sizes: Q = 1, Q = 10, and Q = 50. Kolmogorov-Smirnov two-sample test statistics (D) and p-values are shown, testing the null hypothesis that the quantiles were drawn from the same continuous distribution.

Appendix 1—figure 3
Comparing simulated population sizes and inverse coalescence rates in humans.

Data are shown from human genomes under the OutOfAfricaArchaicAdmixture_5R19 model (Ragsdale and Gravel, 2019) and using the HapMapII_GRCh37 genetic map (Frazer et al., 2007). From left to right, we show sizes for each of the three populations in the model: YRI, CEU, and CHB. We plot the simulated sizes for each population in black, and in red we plot inverse coalescence rates as calculated from the demographic model used for simulation (see text). In this specific model, these two measures are near identical, but in other models with higher migration rates we expect to see a larger departure between the two.

Appendix 1—figure 4
Comparing estimates of N(t) in humans.

Estimates of population size over time (N(t)) inferred using four different methods, smc++, stairway plot, and MSMC with n=2 and n=8. Data were generated by simulating replicate human genomes under the OutOfAfrica_3G09 model (Gutenkunst et al., 2009) and using the HapMapII_GRCh37 genetic map (Frazer et al., 2007). From top to bottom, we show estimates for each of the three populations in the model: YRI, CEU, and CHB. In shades of blue, we show the estimated N(t) trajectories for each replicate. As a proxy for the ‘truth’, in black we show inverse coalescence rates as calculated from the demographic model used for simulation (see text).

Appendix 1—figure 5
Comparing estimates of N(t) in humans.

Here, we show estimates of population size over time (N(t)) inferred using fourdifferent methods, smc+, and stairway plot, and MSMC with n=2 and n=8. Data were generated by simulating replicate human genomes under a constant sized population model with N=104 and using the HapMapII_GRCh37 genetic map (Frazer et al., 2007). As a proxy for the ‘truth’, in black we show inverse coalescence rates as calculated from the demographic model used for simulation (see text).

Appendix 1—figure 6
Comparing estimates of N(t) in A. thaliana.

Here, we show estimates of population size over time (N(t)) inferred using four different methods, smc++, and stairway plot, and MSMC with n=2 and n=8. Data were generated by simulating replicate A. thaliana genomes under the African2Epoch_1H18 model (Durvasula et al., 2017) and using the SalomeAveraged_TAIR7 genetic map (Salomé et al., 2012). As a proxy for the ‘truth’, in black we show inverse coalescence rates as calculated from the demographic model used for simulation (see text).

Appendix 1—figure 7
Migration rate estimates for the human Gutenkunst model.

Here, we show inferred migration rates from ai and fastsimcoal2. Data were generated by simulating replicate human genomes under the Gutenkunst et al., 2009 model and using the genetic map inferred in Frazer et al., 2007. Directional migration from Europe to Africa is represented as MIG_AF_EU and migration from Africa to Europe is represented as MIG_EU_AF. Note that the x-axis coordinates are arbitrary.

Appendix 1—figure 8
Parameters estimated using a two-population Drosophila model.

Here, we show estimates of N(t) inferred using ai, fastsimcoal2, and smc++. Data were generated by simulating replicate Drosophila genomes under the Li and Stephan, 2006 model and using the genetic map inferred in Comeron et al., 2012. See legend of Figure 4 for details. In shades of blue, we show the estimated N(t) trajectories for each replicate. In black we show the simulated population sizes.

Appendix 1—figure 9
Migration rate parameters estimated under a two-population Drosophila model.

Here, we show inferred migration rates from ai and fastsimcoal2. Data were generated by simulating replicate Drosophila genomes under the Li and Stephan, 2006 model and using the genetic map inferred in Comeron et al., 2012. Directional migration from Europe to Africa is represented as MIG_AF_EU and migration from Africa to Europe is represented as MIG_EU_AF. Note that the x-axis coordinates are arbitrary.

Appendix 1—figure 10
Workflow for our N(t) inference methods comparison.

Here, we show single replicate for two chromosomes, chr22 and chrX, simulated under the HomSap OutOfAfrica_3G09 demographic model, with a HapmapII_GRCh37 genetic map. Note that the data used as input by all inference methods smc++, MSMC, and stairway plot, come from the same set of simulations.

Appendix 1—figure 11
Parameters estimated from a generic IM model Here we show estimates of N(t) inferred using ai, fastsimcoal2, and smc++.

Data were generated by simulating under a generic IM model with a human genome and Frazer et al., 2007 genetic map. In shades of blue we show the estimated N(t) trajectories for each replicate. In black we show the simulated population sizes.

Tables

Table 1
Initial set of demographic models in the catalog and summary of computing resources needed for simulation.

For each model, we report the CPU time, maximum memory usage and the size of the output tskit file, as simulated using the msprime simulation engine (version 0.7.4). In each case, we simulate 100 samples drawn from the first population, for the shortest chromosome of that species and a constant chromosome-specific recombination rate. The times reported are for a single run on an Intel i5-7600K CPU. Computing resources required will vary widely depending on sample sizes, chromosome length, recombination rates and other factors.

Model IDCitationCPU(s)Ram(MB)File(MB)
HomSap (Homo sapiens)
 Africa_1T12Tennessen et al., 201210.0194.223.3
 Zigzag_1S14Schiffels and Durbin, 20143.3106.17.9
 AshkSub_7G19Gladstein and Hammer, 201913.8216.326.4
 OutOfAfrica_3G09Gutenkunst et al., 200910.2182.021.1
 OutOfAfrica_2T12Tennessen et al., 201210.7198.424.1
 AncientEurasia_9K19Kamm et al., 201963.1304.441.2
 AmericanAdmixture_4B11Browning et al., 201810.6188.122.3
 PapuansOutOfAfrica_10J19Jacobs et al., 2019204.5524.777.8
 OutOfAfricaArchaicAdmixture_5R19Ragsdale and Gravel, 20198.8185.421.7
DroMel (Drosophila melanogaster)
 OutOfAfrica_2L06Li and Stephan, 2006252.8678.0106.7
 African3Epoch_1S16Sheehan and Song, 20163.0123.911.5
AraTha (Arabidopsis thaliana)
 African2Epoch_1H18Huber et al., 20184.3220.516.5
 African3Epoch_1H18Huber et al., 20182.6241.318.4
PonAbe (Pongo abelii
 TwoSpecies_2L11Locke et al., 20117.2171.914.7

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Jeffrey R Adrion
  2. Christopher B Cole
  3. Noah Dukler
  4. Jared G Galloway
  5. Ariella L Gladstein
  6. Graham Gower
  7. Christopher C Kyriazis
  8. Aaron P Ragsdale
  9. Georgia Tsambos
  10. Franz Baumdicker
  11. Jedidiah Carlson
  12. Reed A Cartwright
  13. Arun Durvasula
  14. Ilan Gronau
  15. Bernard Y Kim
  16. Patrick McKenzie
  17. Philipp W Messer
  18. Ekaterina Noskova
  19. Diego Ortega-Del Vecchyo
  20. Fernando Racimo
  21. Travis J Struck
  22. Simon Gravel
  23. Ryan N Gutenkunst
  24. Kirk E Lohmueller
  25. Peter L Ralph
  26. Daniel R Schrider
  27. Adam Siepel
  28. Jerome Kelleher
  29. Andrew D Kern
(2020)
A community-maintained standard library of population genetic models
eLife 9:e54967.
https://doi.org/10.7554/eLife.54967