Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations

  1. M Elise Lauterbur  Is a corresponding author
  2. Maria Izabel A Cavassim
  3. Ariella L Gladstein
  4. Graham Gower
  5. Nathaniel S Pope
  6. Georgia Tsambos
  7. Jeffrey Adrion
  8. Saurabh Belsare
  9. Arjun Biddanda
  10. Victoria Caudill
  11. Jean Cury
  12. Ignacio Echevarria
  13. Benjamin C Haller
  14. Ahmed R Hasan
  15. Xin Huang
  16. Leonardo Nicola Martin Iasi
  17. Ekaterina Noskova
  18. Jana Obsteter
  19. Vitor Antonio Correa Pavinato
  20. Alice Pearson
  21. David Peede
  22. Manolo F Perez
  23. Murillo F Rodrigues
  24. Chris CR Smith
  25. Jeffrey P Spence
  26. Anastasia Teterina
  27. Silas Tittes
  28. Per Unneberg
  29. Juan Manuel Vazquez
  30. Ryan K Waples
  31. Anthony Wilder Wohns
  32. Yan Wong
  33. Franz Baumdicker
  34. Reed A Cartwright
  35. Gregor Gorjanc
  36. Ryan N Gutenkunst
  37. Jerome Kelleher
  38. Andrew D Kern
  39. Aaron P Ragsdale
  40. Peter L Ralph
  41. Daniel R Schrider
  42. Ilan Gronau  Is a corresponding author
  1. Department of Ecology and Evolutionary Biology, University of Arizona, United States
  2. Department of Ecology and Evolutionary Biology, University of California, Los Angeles, United States
  3. Embark Veterinary, Inc, United States
  4. Section for Molecular Ecology and Evolution, Globe Institute, University of Copenhagen, Denmark
  5. Institute of Ecology and Evolution, University of Oregon, United States
  6. School of Mathematics and Statistics, University of Melbourne, Australia
  7. Ancestry DNA, United States
  8. 54Gene, Inc, United States
  9. Universite Paris-Saclay, CNRS, INRIA, Laboratoire Interdisciplinaire des Sciences du Numerique, France
  10. School of Life Sciences, University of Glasgow, United Kingdom
  11. Department of Computational Biology, Cornell University, United States
  12. Department of Cell and Systems Biology, University of Toronto, Canada
  13. Department of Biology, University of Toronto Mississauga, Canada
  14. Department of Evolutionary Anthropology, University of Vienna, Austria
  15. Human Evolution and Archaeological Sciences (HEAS), University of Vienna, Austria
  16. Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Germany
  17. Computer Technologies Laboratory, ITMO University, Russian Federation
  18. Agricultural Institute of Slovenia, Department of Animal Science, Slovenia
  19. Entomology Department, The Ohio State University, United States
  20. Department of Genetics, University of Cambridge, United Kingdom
  21. Department of Zoology, University of Cambridge, United Kingdom
  22. Department of Ecology, Evolution, and Organismal Biology, Brown University, United States
  23. Center for Computational Molecular Biology, Brown University, United States
  24. Department of Genetics and Evolution, Federal University of Sao Carlos, Brazil
  25. Department of Genetics, Stanford University School of Medicine, United States
  26. Department of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Sweden
  27. Department of Integrative Biology, University of California, Berkeley, United States
  28. Department of Biostatistics, University of Washington, United States
  29. Broad Institute of MIT and Harvard, United States
  30. Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, United Kingdom
  31. Cluster of Excellence - Controlling Microbes to Fight Infections, Eberhard Karls Universit¨at Tubingen, Germany
  32. School of Life Sciences and The Biodesign Institute, Arizona State University, United States
  33. The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, United Kingdom
  34. Department of Molecular and Cellular Biology, University of Arizona, United States
  35. Department of Integrative Biology, University of Wisconsin–Madison, United States
  36. Department of Mathematics, University of Oregon, United States
  37. Department of Genetics, University of North Carolina at Chapel Hill, United States
  38. Efi Arazi School of Computer Science, Reichman University, Israel
2 figures, 1 table and 1 additional file

Figures

Phylogenetic tree of species available in the stdpopsim catalog, including the six species we published in the original release (Adrion et al., 2020, in blue), and 15 species that have since been added (in orange).

Solid circles indicate species that have one (light gray) or more (dark gray) demographic models and recombination maps. Branch lengths were derived from the divergence times provided by TimeTree5 (Kumar et al., 2022). The horizontal bar below the tree indicates 500 million years (my). Source code for generating the tree is given in Figure 1—source code 1 and 2.

The species parameters and demographic model used for Anopheles gambiae in the stdpopsim catalog.

(A) The parameters associated with the genome build and species, including chromosome lengths, average recombination rates (per base per generation), and average mutation rates (per base per generation). (B) A graphical depiction of the demographic model, which consists of a single population whose size changes throughout the past 11,260 generations in 67-time intervals (note the log scale). The width at each point depicts the effective population size (Ne), with the horizontal bar at the bottom indicating the scale for Ne=106. This figure is adapted from the data on the stdpopsim catalog documentation page (see Data availability) and plotted with POPdemog (Zhou et al., 2018). Source code for generating the figure is given in Figure 2—source code 1.

Tables

Table 1
Guidelines for dealing with missing parameters.

For each parameter, we provide a suggested course of action, and mention the main discrepancies between simulated data and real genomic data that could be caused by misspecification of that parameter.

Missing parameterSuggested actionPossible discrepancies
Mutation rateBorrow from the closest relative with a citable mutation rateNumber of polymorphic sites
Recombination rateBorrow from the closest relative with a citable recombination ratePatterns of linkage disequilibrium
Gene conversion rate and tract lengthSet the rate to 0 or borrow from the closest relative with a citable rateLengths of shared haplotypes across individuals
Demographic modelSet the effective population size (Ne) to a value that reflects the observed genetic diversityFeatures of genetic diversity that are captured by the site frequency spectrum, such as the prevalence of low-frequency alleles

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. M Elise Lauterbur
  2. Maria Izabel A Cavassim
  3. Ariella L Gladstein
  4. Graham Gower
  5. Nathaniel S Pope
  6. Georgia Tsambos
  7. Jeffrey Adrion
  8. Saurabh Belsare
  9. Arjun Biddanda
  10. Victoria Caudill
  11. Jean Cury
  12. Ignacio Echevarria
  13. Benjamin C Haller
  14. Ahmed R Hasan
  15. Xin Huang
  16. Leonardo Nicola Martin Iasi
  17. Ekaterina Noskova
  18. Jana Obsteter
  19. Vitor Antonio Correa Pavinato
  20. Alice Pearson
  21. David Peede
  22. Manolo F Perez
  23. Murillo F Rodrigues
  24. Chris CR Smith
  25. Jeffrey P Spence
  26. Anastasia Teterina
  27. Silas Tittes
  28. Per Unneberg
  29. Juan Manuel Vazquez
  30. Ryan K Waples
  31. Anthony Wilder Wohns
  32. Yan Wong
  33. Franz Baumdicker
  34. Reed A Cartwright
  35. Gregor Gorjanc
  36. Ryan N Gutenkunst
  37. Jerome Kelleher
  38. Andrew D Kern
  39. Aaron P Ragsdale
  40. Peter L Ralph
  41. Daniel R Schrider
  42. Ilan Gronau
(2023)
Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations
eLife 12:RP84874.
https://doi.org/10.7554/eLife.84874.3