Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations
Figures

Phylogenetic tree of species available in the stdpopsim catalog, including the six species we published in the original release (Adrion et al., 2020, in blue), and 15 species that have since been added (in orange).
Solid circles indicate species that have one (light gray) or more (dark gray) demographic models and recombination maps. Branch lengths were derived from the divergence times provided by TimeTree5 (Kumar et al., 2022). The horizontal bar below the tree indicates 500 million years (my). Source code for generating the tree is given in Figure 1—source code 1 and 2.
-
Figure 1—source code 1
R code for generating the figure.
- https://cdn.elifesciences.org/articles/84874/elife-84874-fig1-code1-v1.zip
-
Figure 1—source code 2
Newick-format tree used as input in R code.
- https://cdn.elifesciences.org/articles/84874/elife-84874-fig1-code2-v1.zip

The species parameters and demographic model used for Anopheles gambiae in the stdpopsim catalog.
(A) The parameters associated with the genome build and species, including chromosome lengths, average recombination rates (per base per generation), and average mutation rates (per base per generation). (B) A graphical depiction of the demographic model, which consists of a single population whose size changes throughout the past 11,260 generations in 67-time intervals (note the log scale). The width at each point depicts the effective population size (), with the horizontal bar at the bottom indicating the scale for . This figure is adapted from the data on the stdpopsim catalog documentation page (see Data availability) and plotted with POPdemog (Zhou et al., 2018). Source code for generating the figure is given in Figure 2—source code 1.
-
Figure 2—source code 1
R code for generating the figure.
- https://cdn.elifesciences.org/articles/84874/elife-84874-fig2-code1-v1.zip
Tables
Guidelines for dealing with missing parameters.
For each parameter, we provide a suggested course of action, and mention the main discrepancies between simulated data and real genomic data that could be caused by misspecification of that parameter.
Missing parameter | Suggested action | Possible discrepancies |
---|---|---|
Mutation rate | Borrow from the closest relative with a citable mutation rate | Number of polymorphic sites |
Recombination rate | Borrow from the closest relative with a citable recombination rate | Patterns of linkage disequilibrium |
Gene conversion rate and tract length | Set the rate to 0 or borrow from the closest relative with a citable rate | Lengths of shared haplotypes across individuals |
Demographic model | Set the effective population size () to a value that reflects the observed genetic diversity | Features of genetic diversity that are captured by the site frequency spectrum, such as the prevalence of low-frequency alleles |