Sweepstakes reproductive success via pervasive and recurrent selective sweeps

  1. Einar Árnason  Is a corresponding author
  2. Jere Koskela
  3. Katrín Halldórsdóttir
  4. Bjarki Eldon
  1. Institute of Life- and environmental Sciences, University of Iceland, Iceland
  2. Department of Organismal and Evolutionary Biology, Harvard University, United States
  3. Department of Statistics, University of Warwick, United Kingdom
  4. Leibniz Institute for Evolution and Biodiversity Science, Museum für Naturkunde, Germany
28 figures, 10 tables and 1 additional file

Figures

Figure 1 with 2 supplements
Neutrality test statistics and distribution of the neutrality index.

(a) Manhattan plots of Tajima’s D (Tajima, 1989) and Fay and Wu's H (Fay and Wu, 2000) showed mostly negative values at all chromosomes implying deviations from neutrality. Sliding window estimates (window size 100 kb with 20 kb step size) using GL1 genotype likelihoods for the South/south-east population. Value of the statistic under Kingman coalescent neutrality equilibrium indicated with magenta horizontal line. (b) Kernel density contours (Duong, 2022) of the -log10p value significance of Fisher’s exact test associated with the McDonald–Kreitman test (McDonald and Kreitman, 1991) plotted against the neutrality index (Rand and Kann, 1996) -logNINI=(Pn×Ds)/(Ps×Dn). NI=(Pn/Ps)/(Dn/Ds) where Pn, Ps, Dn, and Ds are the number of non-synonymous and synonymous polymorphic and fixed sites, respectively, for all genes of each chromosome. Negative values of -logNI imply purifying (negative) and background selection and positive values imply positive selection (selective sweeps). The outgroup is Pacific cod (Gma). Overall, the cloud of positive values is denser than the cloud of negative values. The red horizontal line is at nominal significance level of 0.05 for individual tests; no test reached the 0.05/n Bonferroni adjustment for multiple testing. The mean (green vertical line) and the median of logNI were 0.27 and 0.21, respectively, and imply that the proportion of adaptive non-synonymous substitutions α=1-NI (Smith and Eyre-Walker, 2002) is 19–24%. Figure 1—figure supplement 1 shows neutrality statistics for the Þistilfjörður population. Figure 1—figure supplement 2 shows distribution and violin plot of -logNI across each chromosome from the South/south-east population.

Figure 1—figure supplement 1
Neutrality tests for Þistilfjörður population.

(a) Manhattan plots of Tajima’s D (Tajima, 1989) and Fay and Wu's H (Fay and Wu, 2000) for the Þistilfjörður population showed mostly negative values at all chromosomes implying deviations from neutrality. Sliding window estimates (window size 100 kb with 20 kb step size) using GL1 genotype likelihoods. Value of statistic under Kingman coalescent neutrality equilibrium indicated with magenta horizontal line.

Figure 1—figure supplement 2
Neutrality Index and violin plot of neutrality index across chromosomes.

(a) The distribution of -logNI (neutrality index) per chromosome (and (b) violin plots with quartiles) were heavier on the positive side implying more positive than negative selection.

Figure 2 with 1 supplement
Approximate Bayesian computation (ABC) joint estimation of parameters of the neutral Ξ-Beta(2-α,α) coalescent (random sweepstakes) and of population growth.

A kernel density estimator (Duong, 2022) for the joint ABC-posterior density of (α,β)ΘB. The parameter α determines the skewness of the offspring distribution in the neutral Ξ-Beta(2-α,α) coalescent model, and the parameter β is a population-size rescaled rate of exponential population growth. Estimates using GL1 for the South/south-east population. A bivariate model-fitting analysis adding exponential population growth to the Ξ-Beta(2-α,α) coalescent does not improve model fit for random sweepstakes. The population growth parameter (β) only has an effect under maximal sweepstakes (low values of α). Figure 2—figure supplement 1 explores the random sweepstakes model with population growth using both GL1 and GL2 likelihood estimates of site-frequency spectra for both the South/south-east and Þistilfjörður populations, and for different ranges of parameter values.

Figure 2—figure supplement 1
Joint estimate of growth and coalescent parameter for other situations.

Approximate Bayesian computation (ABC) joint estimation of parameters of the neutral Ξ-Beta(2-α,α) coalescent (random sweepstakes) and of population growth. A kernel density estimator for the joint ABC-posterior density of (α,β)ΘB. The parameter α determines the skewness of the offspring distribution in the neutral Beta(2-α,α) coalescent model, and the β is a population-size rescaled rate of exponential population growth. (b) Estimates using GL1 for the Þistilfjörður population, (c) using GL2 for the South/south-east population, (d) using GL2 for the the Þistilfjörður population, and using GL1 for the South/south-east population with a narrower (e) and a wider (f) range of parameter values.

Figure 3 with 4 supplements
Fit of observations to models: the no-sweepstakes model, the random sweepstakes model, and the selective sweepstakes model.

(a) Mean observed site-frequency spectra for the 19 non-inversion chromosomes combined estimated with GL1 likelihood for the South/south-east populations (sample size n=68). Error bars of observed data (dots) are ±2 standard deviations of the bootstrap distribution with 100 bootstrap replicates. Expected site-frequency spectra are the Kingman coalescent modelling no sweepstakes, the best approximate maximum likelihood estimates (Eldon et al., 2015) of the Ξ-Beta coalescent modelling random sweepstakes, and the approximate Bayesian computation (ABC) estimated Durrett–Schweinsberg coalescent (DS) modelling selective sweepstakes. (b) The observed site-frequency spectra of different sized fragments and various functional classes compared to expectations of the Durrett–Schweinsberg coalescent (DS) ABC estimated for the non-inversion chromosomes for the South/south-east population. The compound parameter c ranges from 5 to 11. Fragment sizes of 25 and 100 kb. The different functional groups are fourfold degenerate sites (4Dsites), intronic sites, non-selection sites (sites more than 500 kb away from peaks of selection scan, Appendix 6—figure 8), intergenic sites, promoters, exons, 3-UTR sites (3-UTRs), and 5-UTR sites (5-UTRs) in order of selective constraints. (c) Deviations from expectations of the Durrett–Schweinsberg model of recurrent selective sweeps of different sized fragments and functional groups for the South/south-east population. Figure 3—figure supplement 1 shows comparable results for the Þistilfjörður population. Figure 3—figure supplement 1 shows site-frequency spectrum polarized with 100% consensus of walleye pollock (Gch), Pacific cod (Gma), and Arctic cod (Bsa) to minimize potential effects of SNP misorientation and low-level ancestral introgression (Appendix 10). Figure 3—figure supplement 4 shows site-frequency spectrum for transversions only removing transition sites that are more likely to be at mutation saturation to adddress potential SNP misorientation. Figure 3—figure supplement 4 shows site-frequency spectrum truncated by removing singletons and doubletons and the n-1 and n-2 classes that are most sensitive to SNP misorientation and low-level ancestral introgression.

Figure 3—figure supplement 1
Site-frequency spectra and model fit for the replicate Þistilfjörður population.

(a) Site-frequency spectra of 19 non-inversion chromosomes compared to expectations of Kingman-, Ξ-Beta-, and Durrett–Schweinsberg (DS) coalescents for the Þistilfjörður population (sample size n=71). Error bars of observed data (dots) are ±2 standard deviations of the bootstrap distribution with 100 bootstrap replicates. (b) Site-frequency spectra of random fragments and various functional groups compared to expecations of the DS coalescent for the compound parameter c ranging from 5 to 8. (c) Deviations from DS expectations for random fragments and various functional groups.

Figure 3—figure supplement 2
Site-frequency spectra polarized using a 100% consensus of three outgroup taxa.

(a) Site-frequency spectra otained using as outgroup sites that are in full agreement (100% consensus) among walleye pollock (Gch), Pacific cod (Gma), and Arctic cod (Bsa) compared with expectations of the Kingman, the Ξ-Beta, and the Durrett–Schweinsberg coalescents (a). The distribution (b) and traces (c) of the approximate Bayesian computation (ABC) estimation of the compound parameter c of the Durrett–Schweinsberg coalescent. Based on observed data of the 19 non-inversion chromosomes combined for the South/south-east population (sample size n=68).

Figure 3—figure supplement 3
Site-frequency spectra of transversions excluding transitions.

Site-frequency spectra and model fit of transversions of the 19 non-inversion chromosomes of the South/south-east population compared with expectations of of the Kingman, the Ξ-Beta, and the Durrett–Schweinsberg colaescents (a). The distribution and traces of the approximate Bayesian computation (ABC) estimation of the compound parameter c of the Durrett–Schweinsberg colaescent (b, c). Based on observed data of the 19 non-inversion chromosomes combined for the South/south-east population.

Figure 3—figure supplement 4
Site-frequency spectra excluding singletons and doubletons.

(a) Truncated and full site-frequency spectra compared. Singleton and the n-1 class and doubleton and the n-2 class were excluded and compared with the full site-frequency spectrum and with expectations of the Durrett–Schweinsberg colaescent (DS). (b) The distribution of the approximate Bayesian computation (ABC) estimation of the compound parameter c of the Durrett–Schweinsberg colaescent excluding singletons and n-1 class, (c) also excluding doubletons and the n-2 class, and (d) the distribution for the full data. Based on observed data of the 19 non-inversion chromosomes combined for the South/south-east population.

Figure 4 with 1 supplement
Deviations from fit to the random sweepstakes model and the selective sweepstakes model.

(a, b) Deviations of site frequencies from approximate maximum likelihood best-fit expectations of the neutral Ξ-Beta(2α,α) coalescent modelling random sweepstakes. Deviations of the mean site frequencies of non-inversion chromosomes 3–6, 8–11, and 13–23 estimated with genotype likelihoods GL1 from best-fit expectations of the Ξ-Beta(2α,α) coalescent with α^=1.16 for the South/south-east population (sample size n=68) (a) and with α^=1.16 for the Þistilfjörður population (sample size n=71) (b). Deficiency of intermediate allele frequency classes and excess mainly at right tail of site-frequency spectrum. (c, d) Deviations of GL1 estimated site frequencies from expectations of the Durrett–Schweinsberg model of recurrent selective sweeps for the South/south-east population with a compound parameter c=8.25 and the Þistilfjörður population with a compound parameter c=6.3, respectively. Better fit than random model but also with excess at right tail of site-frequency spectrum. Deviations reported as the log of the odds ratio (in blue), the difference of the observed and expected logit of site frequencies. The dashed red line at zero represents the null hypothesis of no difference between observed and expected. The darker and lighter shaded gray areas represent the 95% and the 99% confidence regions of the approximately normally distributed log odds ratio. Figure 4—figure supplement 1 shows comparable deviation from fit for the GL2 genotype likelihood data.

Figure 4—figure supplement 1
Deviations from fit to the random sweepstakes model and the selective sweepstakes model for GL2 genotype likelihood data.

(a, b) Deviations of the mean site frequencies of non-inversion chromosomes 3–6, 8–11, and 13–23 estimated with genotype likelihoods GL2 from approximate maximum likelihood best-fit expectations of the neutral Ξ-Beta(2α,α) coalescent (random sweepstakes model) for the South/south-east population with α^=1.04 and for the Þistilfjörður population with α^=1.12, respectively. (c, d) Deviations of the mean site frequencies of non-inversion chromosomes 3–6, 8–11, and 13–23 estimated with genotype likelihoods GL2 from expectations of the Durrett–Schweinsberg model of recurrent selective sweeps for the South/south-east population with a compound parameter c^=10.75 and the Þistilfjörður population with a compound parameter c^=9.25, respectively. Deviations reported as the log of the odds ratio (in blue), the difference of the observed and expected logit of site frequencies. The dashed red line at zero represents the null hypothesis of no difference between observed and expected. The darker and lighter shaded gray areas represent the 95% and the 99% confidence regions of the approximately normally distributed log odds ratio. Samples sizes n=68 and n=71 for the South/south-east and Þistilfjðröur populations respectively.

Approximate Bayesian computation (ABC) estimation of parameters of the Durrett-Schweinsberg coalescent (Durrett and Schweinsberg, 2005) (the selective sweepstakes model) for various functional regions of the genome.

For each category from top to bottom the mean, the median, and the mode of the ABC-posterior distribution of the compound parameter cΘDS using site-frequency spectra computed from likelihood GL1 and GL2 for the South/south-east (South) and Þistilfjörður (Þistilfj.) populations. The different functional groups are fourfold degenerate sites (4Dsites), intronic sites, non-selection sites (sites more than 500 kb away from peaks of selection scan, Appendix 6—figure 8), intergenic sites, promoters, exons, 3-UTR sites (3-UTRs), and 5-UTR sites (5-UTRs), regions ranging from less to more constrained by selection.

Genomic scans of selective sweeps by two methods.

(a) Manhattan plots from detection of selective sweeps using RAiSD (Alachiotis and Pavlidis, 2018) and (b) by using OmegaPlus (Alachiotis et al., 2012). The ω statistic of OmegaPlus (b) measures increased linkage disequilibrium in segments on either the left or the right sides of a window around selected site and a decrease in linkage disequilibrium between the segments across the selected site (Kim and Nielsen, 2004; Alachiotis and Pavlidis, 2018). The μ statistic of RAiSD (a) is a composite measure based on three factors, a reduction of genetic variation in a region around a sweep, a shift in the site-frequency spectrum from intermediate- towards low- and high-frequency derived variants, and a factor similar to ω that measures linkage disequilibrium on either side of and across the site of a sweep. Chromosomes with alternating colours. Indications of selective sweeps are found throughout each chromosome.

Appendix 6—figure 1
Sampling localities at Iceland.

Sampling localities ranging from Vestmannaeyjar to Höfn on the south and south-east coast (blue circles, n=68) and Þistilfjörður in the north-east (red circles, n=71) on a map of Iceland. Depth contours are at −25, −50, −100, −200, −400, −600, and −800 m. The two localities serve as statistical replicates, the South/south-east and the Þistilfjörður population, respectively.

Appendix 6—figure 2
Neutrality test statistics in sliding windows across all chromosomes for GL2 estimates.

(a, b) Manhattan plots of Tajima’s D (Tajima, 1989), Fu and Li’s D (Fu and Li, 1993), Fay and Wu’s H (Fay and Wu, 2000), and Zeng’s E (Zeng et al., 2006) for the South/south-east population and the Þistilfjörður population, respectively. Sliding window estimates (window size 100 kb with 20 kb step size) using GL2 genotype likelihoods. Value of statistic under Kingman coalescent neutrality equilibrium indicated with magenta horizontal line.

Appendix 6—figure 3
The random sweepstakes model.

(a, b) Observed site-frequency spectra of non-inversion chromosomes and expectations of the Ξ-Beta(2α,α) coalescent (the random sweepstakes model) for the South/south-east population (sample size n=68) and Þistilfjörður population (sample size n=71), respectively. The observed mean site-frequency spectrum of the non-inversion chromosomes 3–6, 8–11, and 13–23 polarized with Gma as outgroup and estimated under genotype likelihoods GL1 and GL2 and expected site-frequency spectrum of the Ξ-Beta(2-α,α) coalescent with α=1.35, α=1.20, α=1.16, α=1.12, α=1.04, and α=1.00 which are representative samples of the posterior estimates that coincide with the kernel density estimates (Figure 2). The α=1.16, α=1.12, and α=1.04, which represent the approximate maximum likelihood best estimates as detailed in Figure 4a, b and Figure 4—figure supplement 1a, b.

Appendix 6—figure 4
Piecewise comparison of expectations of the Λ-Beta(2α,α) coalescent and deviations from fit.

The observed mean site-frequency spectrum of the non-inversion chromosomes 3–6, 8–11, and 13–23 polarized with Gma as outgroup and estimated under genotype likelihoods GL1 and GL2 and expected site-frequency spectrum of the Λ-Beta(2α,α) coalescent with α=1.35, α=1.20, α=1.10, and α=1.00. Population South/south-east (sample size n=68) (a) and population Þistilfjörður (sample size n=71) (b). Deviations from the maximum likelihood estimated expecations of Λ-Beta(2α,α) coalescent for the South/south-east (c) and the Þistilfjörður population (d).

Appendix 6—figure 5
Fit to the selective sweepstakes model for GL2 estimated site-frequency spectra.

(a, b) Mean observed site-frequency spectra for the 19 non-inversion chromosomes combined estimated with GL2 likelihood for the South/south-east (sample size n=68) and Þistilfjörður populations (sample size n=71), respectively. Error bars of observed data are ±2 standard deviations of the bootstrap distribution with 100 bootstrap replicates. Expected site-frequency spectra are the Kingman coalescent (the no sweepstakes model), the best approximate maximum likelihood estimates (Eldon et al., 2015) of the Ξ-Beta model (the random sweepstakes model), and the Durrett–Schweinsberg coalescent (DS) (the selective sweepstakes model) approximate Bayesian computation (ABC) estimated for the non-inversion chromosomes (non-inv). (c, d) The observed site-frequency spectra of different sized fragments and various functional classes compared to expectations of the Durrett–Schweinsberg coalescent (DS) ABC estimated for the non-inversion chromosomes for the South/south-east population and the Þistilfjörður population, respectively. The compound parameter c ranges from 7 to 14. The different functional groups are fourfold degenerate sites (Dsites), intronic sites, non-selection sites (sites more than 500 kb away from peaks of selection scan, Appendix 6—figure 8), intergenic sites, promoters, exons, 3-UTR sites (3-UTRs), and 5-UTR (5-UTRs) sites in order of selective constraints. (e, f) Deviations from expectations of the Durrett–Schweinsberg model of recurrent selective sweeps of different sized fragments and functional groups for the South/south-east population and the Þistilfjörður population, respectively.

Appendix 6—figure 6
Decay of linkage disequilibrium with distance: observed and under an extension of the Durrett–Schweinsberg model.

(a) Observed linkage disequilibrium (LD), measured as r2 , with distance in kb (kilobase). Non-inversion chromosomes from the South/south-east population as an example. LD decays rapidly to background values. (b) A subset of the distances from panel a (red × in circles) overlaid on the simulated empirical distribution of LD profiles (boxplot) obtained from the extension of the Durrett–Schweinsberg model described in Appendix 4. (c–f ) Posterior distributions of parameters from which panel b has been sampled. The c parameter was constrained to lie between 5 and 12.5 to enforce consistency with the site-frequency spectrum (SFS)-based results in Figure 3 and Appendix 6—figure 7, while γ/s and θ were constrained between 0 and 10,000 to avoid transient approximate Bayesian computation (ABC)-MCMC chains.

Appendix 6—figure 7
Approximate Bayesian computation (ABC) estimation of parameters of the Durrett–Schweinsberg coalescent (Durrett and Schweinsberg, 2005).

(a–d) ABC-posterior densities of the compound parameter cΘDS using site-frequency spectra computed from likelihood GL1 (a, c) and GL2 (b, d) for the South/south-east and Þistilfjörður populations, respectively. (e–h) Corresponding trace plots demonstrating the good mixing of the ABC-MCMC.

Appendix 6—figure 8
Principal components based genomic scan of selection for South/south-east (top) and Þistilfjörður (bottom) populations.

Regions of 500 kb on either side of peaks exceeding -log10p4 were excluded to define regions of no selection for analysis in Figures 3 and 5.

Appendix 6—figure 9
Observed site-frequency spectra compared to SLiM simulated site-frequency spectra under no-sweepstakes reproduction and random sweepstakes reproduction with selection.

Forward simulation using SLiM (Haller and Messer, 2019) of negative (background) selection and positive selection with variable dominance and with no-sweepstakes and random sweepstakes models of reproduction. (a–f) The Wright–Fisher no-sweepstakes model (population size 2N=104) with selection. Negative mutations are modelled as a shifted gamma distribution with mean and shape as shown in each panel and with dominance h=1, and positive mutations with fixed effects with dominance (h) and selective advantage (selection coefficient s) as shown in each panel. In b and c, there is no negative selection but only positive mutations of fixed effects with h and s as shown. In d, there are only negative mutations with same configuration as in a. In e and f, both positive and negative mutations with configurations as shown. In g–i, random sweepstakes using a model in the domain of attraction of the Ξ-Beta(2-α,α)-coalescent with population size 2N=2000, α=1.1 (g) and α=1.25 (h, i), with both negative and positive mutations in g and h with configurations as shown, and only positive mutations in i. In all graphs a loess regression curve is fitted to the SLiM data points and compared to predictions of the Durrett–Schweinsberg (DS) coalescent with compound parameter c=6. The circles are site-frequency spectrum of chromosome 3 from the South/south-east coast population estimated with GL1 genotype likelihood. The scripts to generate the graphs are available at https://github.com/eldonb/selective-sweepstakes, (copy archived at swh:1:rev:3235fd1a87f2741b486cb9fe17a15ae85f605d26; Eldon, 2022b)

Appendix 6—figure 10
Observed site-frequency spectra compared to msprime simulated site-frequency spectra under Kingman coalescent with recurrent selective sweeps.

Backwards simulation using msprime (Baumdicker et al., 2021). (a, b) The standard Kingman coalescent model interrupted by randomly occurring hard sweeps. Each sweep with a selection coefficient s (and time dt between allele frequency updates) occurs at a random location on a chromosome of length 1 Mbp. msprime simulations of the Kingman coalescent and where hard sweeps occur at random times using a structured coalescent approach to model a sweep (Braverman et al., 1995), and msprime simulates a stochastic sweep trajectory according to a conditional diffusion model (Kim and Stephan, 2002; Coop and Griffiths, 2004). See the documentation of msprime for further details (tskit.dev/msprime/docs/stable/ancestry.html#sec-ancestry-models-selective-sweeps). The effective population size was Ne=104, mutation rate μ=10-8, and recombination rate.γ=10-7 The circles represent the site-frequency spectrum of chromosome 3 (GL1) from the South/south-east coast population, and the red line is the normalized exact expected branch-length spectrum predicted by the Durrett–Schweinsberg coalescent with parameter c=6. The scripts to produce the graphs are available at https://github.com/eldonb.

Appendix 6—figure 11
Estimated demographic history and frequency spectra from simulated demographic scenarios under the Kingman coalescent.

(top, left and right) Demographic history estimated with the stairway plot method (Liu and Fu, 2015; Liu and Fu, 2020) from the site-frequency spectra of the non-inversion chromosomes estimated with GL1 likelihoods of the South/south-east and Þistilfjörður population, respectively. Population expansion in the distant past and relative stability in more recent times. Demographic history estimated with smc++ (Terhorst et al., 2016) for the South/south-east population. smc++ run with default values (c) and treating runs of homozygosity as missing with the --missing-cutoff 10 flag (smcpp-noc-sharp) (d). Expected site-frequency spectra simulated using msprime (Kelleher et al., 2016; Baumdicker et al., 2021) based on the demographic scenarios of the stairway plot (e) and the smc++ (f) for the South/south-east population. The observed site-frequency spectra of the non-inversion chromosomes of the South/south-east population estimated using the GL1 and GL2 likelihoods and polarized using different outgroups (Bsa, Gch, and Gma) (e). For the smc ++ comparison the observed data are the average of the non-inversion chromosomes of the South/south-east population estimated using the GL1 genotype likelihood and polarized with Gma as outgroup (f).

Appendix 6—figure 12
Stairway plots of demographic history of the populations of GL2 likelihood data.

Demographic history estimated from the site-frequency spectra of the non-inversion chromosomes based on GL2 likelihoods for the South/south-east (top) and the Þistilfjörður (bottom) populations, respectively, with the stairway plot method (Liu and Fu, 2015; Liu and Fu, 2020).

Appendix 6—figure 13
Groups from principal component analysis (PCA), conjectured as cryptic population structure, and observed site-frequency spectra compared to coalescent expectations.

(a, d, g, j) Groups revealed by PCA of variation at inversion chromosomes Chr01, Chr02, Chr07, and Chr12, respectively, conjectured to represent cryptic population structure that should extend to the whole genome. (b, e, h, k) The site-frequency spectra estimated for the groups of the respective inversion chromosome using variation at each inversion chromosome. (c, f, i, l) The site-frequency spectra estimated for the groups of the respective inversion chromosome using variation at the 19 non-inversion chromosomes. Observed site-frequency spectra compared to expectations based on the Kingman, 1982 (no-sweepstakes), the Ξ-Beta(2-α,α) coalescent (Schweinsberg, 2000) (random sweepstakes) with α=1 (the Bolthausen–Sznitman coalescent, BS), and the Durrett-Schweinsberg coalescent (Durrett and Schweinsberg, 2005) of recurrent selective sweeps (DS) approximate Bayesian computation (ABC) estimated for the PCA groups of each chromosome. Data from the South/south-east population. Non-inversion chromosomes show no peaks at intermediate frequencies as expected under the conjecture. The conjecture of cryptic population structure is rejected.

Appendix 6—figure 14
Population structure, isolation with migration, and population growth under the Kingman coalescent.

(a–i) Simulations using msprime (Kelleher et al., 2016; Baumdicker et al., 2021) of the effects of a mixed sample of two divergent populations evolving under the Kingman coalescent (no sweepstakes model) with population growth on the expected site-frequency spectrum. A two island model with migration rate m and per-generation population growth rate g. The effective number of migrants (Nem) increases from 0.02 to 2, and hence the degree of isolation decreases, going from left to right. The sample size of the minor populations (the population from which fewer individuals are sampled) decreases from top to bottom (minor sample size 4…1). The effects of population growth g displayed with different colours. Simulated model expectations (solid lines) compared to observed data (circles) of chromosome 4 estimated with GL1 and Gma as outgroup from the South/south-east population for comparison. Also included are the expectation of the Durrett–Schweinsberg coalescent (Durrett and Schweinsberg, 2005) (selective sweepstakes model, best-fit DS: triangle) to observations. (i) Only a minor sample size of one combined with the highest rate of migration and particular growth rates gives closest resemblance to observations.

Appendix 6—figure 15
Population structure, isolation with migration, and population growth under the Xi-Beta coalescent.

(a-i) Simulations using msprime (Kelleher et al., 2016; Baumdicker et al., 2021) of the effects of a mixed sample of two divergent populations evolving under the Ξ-Beta(2-α,α) coalescent (the random sweepstakes model) on the expected site-frequency spectrum. A two island model with migration and different values of the α parameter (displayed with different colours). The effective number of migrants b (comparable to Nem in Appendix 6—figure 14) increases, and hence the degree of isolation decreases, going from left to right. The sample size of the minor populations decreases from top to bottom (4–1). Simulated model expectations (solid lines) compared to observed data (circles) of chromosome 4 estimated with GL1 and Gma as outgroup from the South/south-east population for comparison. Also included are the expectation of the Durrett–Schweinsberg coalescent (Durrett and Schweinsberg, 2005) (selective sweepstakes model, best-fit DS: triangle) to observations. (i) Only a minor sample size of one combined with the highest rate of migration and particular values of α gives closest resemblance to observations.

Appendix 6—figure 16
Estimated site-frequency spectra with a leave-one-out approach.

Estimated site-frequency spectra for chromosome 4 of 67 individuals leaving out each individual in turn from the 68 individuals of the South/south-east population. Circles are site-frequency spectrum of the original sample of 68 individuals. Based on the simulations results in Appendix 6—figure 14 and Appendix 6—figure 15 that a minor sample size of one can resemble model expectations, one of the leave-one-out samples should be divergent if the sample of 68 individuals is composed of 67 individuals from one population and a single individual from a divergent population. None of the leave-one-out samples is off so this conjecture is rejected.

Appendix 6—figure 17
Observed site-frequency spectra at inversion chromosomes and coalescent expectations for the South/south-east population.

(a–d) Observed site-frequency spectra estimated with GL1 for the four inversion chromosomes, chromosome 1 (Chr01), chromosome 2 (Chr02), chromosome 7 (Chr07), and chromosome 12 (Chr12), respectively in the South/south-east population (sample size n=68). Error bars are ±2 standard deviations of the bootstrap distribution with 100 bootstrap replicates. Expected site-frequency spectra are the Kingman coalescent (the no sweepstakes model), the Durrett–Schweinsberg (DS) coalescent (selective sweepstakes model) approximate Bayesian computation (ABC) estimated for the non-inversion chromosomes (DS non-inv), and the best approximate maximum likelihood estimates (Eldon et al., 2015) of the Ξ-Beta model (the random sweepstakes model). The best estimated α^ values were α^Ξ=1.16, α^Ξ=1.16, α^Ξ=1.16, and α^Ξ=1.12, for chromosomes 1, 2, 7, and 12, respectively.

Appendix 6—figure 18
Observed site-frequency spectra at inversion chromosomes and coalescent expectations for the Þistilfjörður population.

(a–d) Observed site-frequency spectra estimated with GL1 for the four inversion chromosomes, chromosome 1 (Chr01), chromosome 2 (Chr02), chromosome 7 (Chr07), and chromosome 12 (Chr12), respectively for the Þistilfjðrður population (sample size n=71). Error bars are ±2 standard deviations of the bootstrap distribution with 100 bootstrap replicates. Expected site-frequency spectra are the Kingman coalescent (the no sweepstakes model), the Durrett–Schweinsberg (DS) coalescent (selective sweepstakes model) approximate Bayesian computation (ABC) estimated for the non-inversion chromosomes (non-inv), and the best approximate maximum likelihood estimates (Eldon et al., 2015) of the Ξ-Beta model (the random sweepstakes model). The best estimated α^ values are α^Ξ=1.18, α^Ξ=1.16, α^Ξ=1.17, and α^Ξ=1.08, for chromosomes 1, 2, 7, and 12, respectively.

Appendix 6—figure 19
Observed site-frequency spectra compared to SLiM forward simulated site-frequency spectra based on demographic scenarios with and without selective sweeps and with background selection and recurrent bottlenecks.

Forward simulation using SLiM (Haller and Messer, 2019). (a–c) Each scenario has two islands of initial population size 300. Both islands undergo exponential growth at per-generation rate g until a total size of 1000. The per-generation migration probability is m. The SLiM simulation is run until the whole population has a MRCA, at which point 136 haploid genomes (as the sample from the South/south-east population) are drawn from the population. Each scenario is simulated 1000 times to estimate the mean normalized site-frequency spectrum. The genome length is set to 100 kb, and the recombination and mutation rates are 10−8 per site per generation. The ‘No sweeps’ scenario undergoes deleterious mutations with fitness effects described by a gamma distribution with mean d and shape parameter 0.2. The ‘Sweeps’ scenario has the same deleterious mutations, and also beneficial mutations with a fixed fitness effect of sm. The relative rate of these positive mutations to the deleterious ones is sr. The observed site-frequency spectrum is the mean of the 100 kb fragments across all non-inversion chromosomes. Only sweeps scenarios show U-shaped site-frequency spectra. (d) Results of simulations of background selection. In all cases a population of size N=105 evolves according to the Wright–Fisher model assuming a chromosome segment of size 105 bp with recombination rate 10−7 per site per generation that collects neutral or negative mutations with frequency μ=10-7 per site per generation. Negative mutations were modelled as Gamma-distributed with a negative sign, with mean −0.1 (a, b, d) or −0.05 (c, e, f) all with shape parameter 0.2. The relative frequency of negative versus neutral mutations was 1:1 for (a, b, d) and 1:9 for (c, e, f). The points represent the logits of the normalized site-frequency spectrum of a random sample of 136 chromosomes (corresponding to the sample size of the South/south-east population) averaged over 100 (a, b, d, e) and 10 (d, f) replicates and taken after 105 generations (b, e), 2 × 105 generations (a, c), and 106 generations (d, f). U-shaped site-frequency spectra only found for short runs (b, e). (e, f) Simulations were produced by the C++ simulation code forward available at https://github.com/eldonb/forward; Eldon, 2022a for individual-based forward-in-time simulations with random sweepstakes, randomly occurring bottlenecks, and selection. Haploid model in e and diploid model in f.

Appendix 6—figure 20
Neighbour joining tree of gadid taxa.

Based on p-distance (nucleotide substitutions per nucleotide site) of whole genome among the gadid taxa Atlantic cod (Gadus morhua, Gmo), walleye pollock (G. chalcogramma, Gch), Pacific cod (G. macrocephalus, Gma), Greenland cod (G. ogac, Gog), and Arctic cod (Boreogadus saida, Bsa). Under the assumption that the focal taxa, Atlantic cod and walleye pollock, diverged 3.5 × 106 years ago (Vermeij, 1991; Vermeij and Roopnarine, 2008; Coulson et al., 2006; Carr and Marshall, 2008), the distance between these taxa is used for mutation rate estimation in Appendix 7—table 4.

Appendix 6—figure 21
Schematic illustration of the three coalescent models, Kingman (no sweepstakes), Xi-Beta (random sweepstakes), and DS (selective sweepstakes).

(a) In each generation, any given pair of diploid parents in a low-fecundity population produces only a small number of offspring, a no-sweepstakes scenario. At most two ancestral lineages (shown as blue lines) of a sample can, therefore, be involved in a given family with non-negligible probability in a large population, leading to at most two ancestral lineages merging each time when the ancestral tree is viewed on a coalescent timescale of N generations per coalescent time unit. (b) In a highly fecund population reproducing according to random sweepstakes reproduction, a given pair of diploid parents may produce a huge number of offspring, scooping up a number of ancestral lineages of a sample (shown as blue lines) in an instance of random sweepstakes. The resulting gene genealogy may include multiple and simultaneous multiple mergers of ancestral lineages of a sample. (c) An example of the effects of selective sweepstakes through repeated selective sweeps on the genealogy of a neutral site. Shown is a hypothetical history of ancestral lineages of a sample (blue lines) at the neutral site during a sweep of the beneficial allelic type B at a site different from the neutral site. At the start of a sweep a single chromosome not ancestral to the sample experiences a mutation to type B. During the sweep one of the ancestral chromosomes has several descendants while another (shown in dotted blue lines) manages to ‘escape’ a sweep by recombining onto a ‘b’ background. At the end of the sweep all chromosomes have a ‘B’ background, however, not all of the ancestral lineages will trace back to the initial B chromosome. Since we are only interested in the genealogy at the neutral site only the ancestral relations of the neutral site are shown (blue lines). Viewed on a coalescent timescale of N time units per one coalescent time unit, the sweep happens instantaneously, and thus appears as an instantaneous merger of three lineages in the genealogy of the neutral site.

Appendix 6—figure 22
Relative diversity and the compound parameter c along chromosome 4.

The compound parameter c=δs2/γ of the Durrett–Schweinsberg model measures the rate of selective sweeps (δ) times the squared selection coefficient (s2) of the beneficial mutation over the recombination rate (γ) between the selected site and the neutral site of interest. The compound parameter c can be considered to be essentialy the density of selection per map unit along a chromosome (Aeschbacher et al., 2017). The number of single nucleotide polymorphisms (SNPs) in a 25 k fragment is proportional to branch length, which again is proportional to the compound parameter c. The relative diversity is the number of SNPs normalized by the mean number of SNPs on chromosome fragment location is indicative of the density of selection along a chromosome.

Tables

Appendix 7—table 1
Diversity and neutrality test statistics for the South/south-east population.

Watterson’s estimator of the population scaled mutation rate per nucleotide site θW, the pairwise nucleotide diversity per nucleotide site θπ, Tajima’s DT, Fu and Li’s DF, and number of nucleotide sites based on GL1 and GL2 likelihoods (sample size n=68).

GL1 likelihoodGL2 likelihood
θWθπDTDFnSitesθWθπDTDFnSites
Chr010.00460.0024−1.64−5.77183324220.00560.0025−1.84−6.7118332093
Chr020.00500.0020−1.98−6.00158283470.00600.0022−2.11−6.8415828079
Chr030.00530.0020−2.09−6.22202027690.00630.0021−2.21−6.9820202435
Chr040.00540.0020−2.08−6.03225842800.00650.0022−2.19−6.7922583924
Chr050.00530.0020−2.10−6.22155425620.00640.0021−2.22−6.9915542313
Chr060.00520.0019−2.11−6.33177209890.00620.0021−2.22−7.0917720709
Chr070.00560.0022−2.01−5.88210800020.00660.0024−2.13−6.6421079620
Chr080.00540.0020−2.09−6.09183538830.00650.0022−2.21−6.8518353624
Chr090.00530.0019−2.13−6.42181957280.00630.0021−2.25−7.1618195437
Chr100.00510.0019−2.09−6.27174507290.00610.0020−2.21−7.0617450432
Chr110.00500.0018−2.14−6.54201388930.00590.0019−2.26−7.3220138619
Chr120.00430.0016−2.14−6.32194488270.00530.0017−2.26−7.1819448580
Chr130.00490.0018−2.14−6.38186515750.00590.0019−2.26−7.1818651311
Chr140.00530.0019−2.14−6.34207048940.00630.0020−2.25−7.0920704623
Chr150.00540.0019−2.17−6.41181002130.00640.0020−2.27−7.1518099944
Chr160.00510.0019−2.09−6.13222331780.00610.0021−2.21−6.9322232862
Chr170.00530.0020−2.06−5.99118138090.00630.0022−2.18−6.7811813609
Chr180.00530.0019−2.11−6.23159315580.00630.0021−2.23−7.0115931312
Chr190.00550.0020−2.10−6.23138583020.00650.0022−2.21−6.9813858066
Chr200.00500.0018−2.15−6.56163711680.00590.0019−2.27−7.3316370967
Chr210.00520.0019−2.10−6.29144402200.00620.0021−2.22−7.0714440024
Chr220.00540.0020−2.08−6.12138384400.00650.0022−2.19−6.8913838214
Chr230.00520.0020−2.08−6.27146987190.00620.0021−2.20−7.0514698473
All0.00520.0019−2.08−6.22176313700.00620.0021−2.20−7.0017631099
Appendix 7—table 2
Diversity and neutrality test statistics for the Þistilfjörður population.

θW Watterson’s estimator of the population scaled mutation rate per nucleotide site, θπ the pairwise nucleotide diversity per nucleotide site, Tajima’s DT, Fu and Li’s DF, and number of nucleotide sites based on GL1 and GL2 likelihoods (sample size n=71).

GL1 likelihoodGL2 likelihood
θWθπDTDFnSitesθWθπDTDFnSites
Chr010.00680.0037−1.51−5.99161593620.00900.0040−1.84−7.5516159148
Chr020.00690.0030−1.86−6.18143066270.00920.0034−2.10−7.6514306351
Chr030.00730.0029−1.99−6.38182838150.00960.0033−2.19−7.7618283555
Chr040.00740.0030−1.97−6.14204354430.00970.0034−2.17−7.5220435122
Chr050.00730.0029−2.00−6.36139339820.00960.0032−2.20−7.7413933752
Chr060.00720.0028−2.00−6.46160487680.00940.0032−2.21−7.8416048531
Chr070.00760.0034−1.83−6.06190082700.00990.0038−2.05−7.4619007926
Chr080.00740.0030−1.98−6.20165591060.00970.0033−2.18−7.5916558861
Chr090.00730.0028−2.03−6.59163814980.00960.0032−2.23−7.9316381249
Chr100.00700.0028−1.98−6.42157898380.00930.0032−2.19−7.8315789584
Chr110.00690.0026−2.04−6.73182110810.00910.0029−2.24−8.1218210846
Chr120.00610.0024−2.03−6.52175973470.00820.0027−2.24−8.0717597135
Chr130.00680.0026−2.04−6.58168468920.00900.0029−2.24−8.0116846697
Chr140.00730.0028−2.04−6.52186998770.00950.0031−2.23−7.8918699625
Chr150.00740.0028−2.06−6.54163493270.00970.0031−2.25−7.8616349118
Chr160.00700.0028−1.98−6.27202594940.00920.0032−2.19−7.7120259231
Chr170.00720.0030−1.93−6.09106673960.00950.0033−2.15−7.5210667225
Chr180.00720.0029−2.00−6.39143054790.00950.0032−2.21−7.7914305261
Chr190.00750.0030−1.98−6.33124652230.00980.0034−2.18−7.6812465024
Chr200.00690.0026−2.06−6.73148291910.00910.0029−2.25−8.1114829009
Chr210.00710.0029−1.99−6.43130140090.00940.0032−2.20−7.8313013813
Chr220.00740.0030−1.97−6.30124070340.00970.0034−2.17−7.7012406815
Chr230.00720.0029−1.97−6.40132730110.00940.0032−2.18−7.8113272801
All0.00710.0029−1.97−6.37159057420.00940.0032−2.18−7.7815905508
Appendix 7—table 3
Demographic statistics, correction factor, C, and generation length, G, of female component of Atlantic cod in Iceland.

Age-specific survival rate, li, was based, respectively, on the average and the 1948–1952 and the 1963–1967 instantaneous mortality estimated from tagging experiments of Icelandic cod (Jónsson, 1996). Age-specific fecundity based on the average age-specific weight in catch (Anonymous, 2001) and fecundity by weight relationships (Marteinsdottir and Begg, 2002) and similar relationships for Newfoundland cod for comparison (May, 1967). The C and G are, respectively, the correction factor for the effects of overlapping generations and generation time based on demographic estimation (Jorde and Ryman, 1995; Jorde and Ryman, 1996; Laikre et al., 1998) and iteration of Equations 5–9 in Jorde and Ryman, 1996. Table is truncated at Age class 15 for lack of population data on older age classes.

AgeAge’48–’52’63–’67GMMay
classl¯ililibi×106bi×106
011.00001.00001.00000.000
120.33960.49660.23690.000
230.11530.24660.05610.000
340.03920.12250.01330.380.52
450.01330.06080.00320.620.78
560.00450.03020.00071.011.15
670.00150.01500.00021.591.67
780.00050.00740.00002.372.31
890.00020.00370.00003.283.03
9100.00010.00180.00004.243.73
10110.00000.00090.00005.304.48
11120.00000.00050.00006.415.24
12130.00000.00020.00007.686.07
13140.00000.00010.00008.796.78
14150.00000.00010.000010.427.79
C10.57.917.620.0
G5.16.34.64.6
C/G2.11.33.83.8
Appendix 7—table 4
Genetic divergence between the Atlantic cod and walleye pollock sister taxa and rate of evolution.

The p-distance, the proportion of sites per nucleotide site that differ between the sister taxa Atlantic cod and walleye pollock (Appendix 6—figure 20) estimated with ngsDist (Vieira et al., 2015) setting the total number of sites (--tot_sites) equal to the number of sites that pass quality filtering in the estimation of site-frequency spectra (Appendix 7—table 7). The mutation rate μ which is the p-distance per nucleotide site per year are calculated under the assumption that these taxa diverged 3.5×106 years ago (Vermeij, 1991; Vermeij and Roopnarine, 2008; Coulson et al., 2006; Carr and Marshall, 2008). The number of substitutions per year, based on the number of sites in each chromosome (chromosomal length, last column), and its inverse, the number of years per substitution, are the rates for either lineage. Also given are the average over the chromosomes, and the whole-genome numbers. Based on the overall p-distances between the Atlantic cod sample from the South/south-east population (sample size n=68) and a sample of 36 walleye pollock from a single locality in the Gulf of Alaska.

Chromosomep per siteμ=p per site per yearNumber of substitutions per yearNumber of years per substitutionNumber of sites
Chr010.005047.21×10-100.0224530875876
Chr020.005007.14×10-100.0214928732775
Chr030.004927.03×10-100.0224630954429
Chr040.004907.00×10-100.0313343798135
Chr050.005127.31×10-100.0185425300426
Chr060.005087.25×10-100.0205027762770
Chr070.005117.29×10-100.0254034137969
Chr080.004977.11×10-100.0214729710654
Chr090.005187.40×10-100.0205126487948
Chr100.005137.33×10-100.0205027234273
Chr110.005057.22×10-100.0224530713045
Chr120.004957.08×10-100.0224630948897
Chr130.005237.47×10-100.0224628829685
Chr140.005087.26×10-100.0214729586942
Chr150.004997.13×10-100.0204928657694
Chr160.004987.12×10-100.0254034794352
Chr170.005027.16×10-100.0166421723002
Chr180.005137.33×10-100.0185524902675
Chr190.005297.56×10-100.0176022015597
Chr200.005067.23×10-100.0185624843429
Chr210.005217.45×10-100.0176022358821
Chr220.005167.37×10-100.0185723744039
Chr230.005297.56×10-100.0195225242006
Average0.005087.26×10-100.0214928406758
Genome0.005077.25×10-100.4742653355439
Appendix 7—table 5
A list of key terms and a brief description.
TermDescription
High fecundityThe ability of organisms (e.g. broadcast spawners) to produce huge numbers of offspring, or on the order of the population size
Sweepstakes reproductionHigh variance and high skew in the distribution of number of offspring, where most of the time individuals produce small (relative to the population size) number of offspring, but occasionally a few individuals contribute the bulk of the offspring forming a new generation of reproducing individuals
Random sweepstakesA chance matching of reproduction in a highly fecund population with favorable environmental conditions; random sweepstakes is one example of a mechanism turning high fecundity into sweepstakes reproduction
Selective sweepstakesA mechanism turning high fecundity into sweepstakes reproduction, in which juveniles pass through selective filters during their development, resulting in highly skewed offspring distribution
Moran modelA population model of genetic reproduction, in which a single random individual produces one offspring replacing another individual that perishes to keep the population size constant
GenealogyThe ancestral relations of a sample of gene copies (see Appendix 6—figure 21)
CoalescentA probabilistic model of the random ancestral relations of a hypothetical sample of gene copies
Multiple-merger coalescentA coalescent process in which a random number of ancestral lineages merges each time (see Appendix 6—figure 21)
Ξ-Beta (2-α,α)-coalescentA multiple-merger coalescent derived from a model of random sweepstakes
Durrett–Schweinsberg modelA model of recurrent selective sweeps of a new beneficial mutation each time approximating selective sweepstakes
Durrett–Schweinsberg coalescentA coalescent model for the genealogy at a single site linked to a site experiencing beneficial mutation; during a sweep some lineages of the neutral site may escape a sweep through recombination (see Appendix 6—figure 21)
Appendix 7—table 6
Approximate Bayesian computation (ABC) priors of parameter for various analysis.
ParameterABC prior
α for the Beta (2-α,α)-coalescentUniform between 1.01 and 1.99
β, the growth rate for the Beta (2-α,α)-coalescent with population growthImproper, uniform prior on the whole positive half-line
c for the single-locus DS modelImproper, uniform prior on the whole positive half-line
c for the DS model with recombinationUniform between 10 and 25 (to force consistency with the posterior in the single-locus analysis)
γ/s, the ratio of the recombination rate and the selection coefficient, in the DS model with recombinationUniform between 0 and 10,000
θ, the mutation rate in the DS model with recombinationUniform between 0 and 10,000
Fraction of whole-chromosome sweeps in the DS model with recombinationUniform between 0 and 1
Appendix 7—table 7
Genetic divergence between the Atlantic cod and walleye pollock sister taxa and rate of evolution from GL1 estimated site-frequency spectra.

The site-frequency spectrum of the South/south-east population of Atlantic cod estimated with ANGSD and genotype likelihood GL1 (Korneliussen et al., 2014), using walleye pollock (Gch) as outgroup to polarise the spectrum, gives all sites that pass quality filtering L, the number of invariant sites I, the number of segregating sites S, and the number of fixed sites F between the focal population and the outgroup taxon. The number of substitutions per year and the number of years per substitution are calculated from fixed sites under the assumption that these taxa diverged 3.5×106 years ago (Vermeij, 1991; Vermeij and Roopnarine, 2008; Coulson et al., 2006; Carr and Marshall, 2008). The average over the chromosomes and the whole-genome numbers are also given. Compare to Appendix 7—table 4 and Appendix 7—table 8.

ChromosomeAll sites,LInvariant sites,ISegregating sites,SFixed sites,FSubstitutions per yearYears per substitution
Chr0118350418177367284682471454430.04224
Chr0215850624152692224374401439620.04124
Chr0320231166194673615920441717610.04920
Chr0422623179217425676738372067750.05917
Chr0515557754149632904578521366120.03926
Chr0617738562170905775067271412580.04025
Chr0721107906202821696457381800000.05119
Chr0818381649176813365490231512900.04323
Chr0918212083175280655334481505710.04323
Chr1017472145168298374914081508990.04323
Chr1120157683194391025504661681150.04821
Chr1219475709188383524652191721380.04920
Chr1318669907180022885042781633410.04721
Chr1420723905199463976051011724070.04920
Chr1518123369174350245388321495130.04323
Chr1622268819214605876245201837120.05219
Chr1711831346113764613449211099640.03132
Chr1815955850153487664618401452440.04124
Chr1913869827133145084213411339780.03826
Chr2016390870158075854485501347350.03826
Chr2114455156139119664142471289430.03727
Chr2213854159133149724139651252220.03628
Chr2314714440141545404244961354030.03926
Mean17652892169974655031971522300.04323
Genome4060165263909417011157354035012861.0001
Appendix 7—table 8
Genetic divergence between the Atlantic cod and walleye pollock sister taxa and rate of evolution from GL2 estimated site-frequency spectra.

The site-frequency spectrum of the South/south-east population of Atlantic cod estimated with ANGSD and genotype likelihood GL2 (Korneliussen et al., 2014), using walleye pollock (Gch) as outgroup to polarize the spectrum, gives all sites that pass quality filtering L, the number of invariant sites I, the number of segregating sites S, and the number of fixed sites F between the focal population and the outgroup taxon. The number of substitutions per year and the number of years per substitution are calculated from fixed sites under the assumption that these taxa diverged 3.5×106 years ago (Vermeij, 1991; Vermeij and Roopnarine, 2008; Coulson et al., 2006; Carr and Marshall, 2008). The average over the chromosomes and the whole-genome numbers are also given. Compare to Appendix 7—table 4 and Appendix 7—table 7.

ChromosomeAll sites,LInvariant sites,ISegregating sites,SFixed sites,FSubstitutions per yearYears per substitution
Chr0118350189176450665612971438250.04124
Chr0215850406151848855233681421530.04125
Chr0320230947193564887049861694730.04821
Chr0422622912216141918049942037260.05817
Chr0515557576148779185447501349080.03926
Chr0617738379169955206034451394140.04025
Chr0721107635201648137655331772890.05120
Chr0818381450175789296533571491640.04323
Chr0918211894174299016332821487120.04224
Chr1017471932167367605861771489950.04323
Chr1120157484193331406582991660450.04721
Chr1219475530187398065659491697750.04921
Chr1318669720179049836033041614330.04622
Chr1420723717198358867174411703900.04921
Chr1518123186173347826408241475800.04224
Chr1622268589213410707462711812480.05219
Chr1711831198113099884128311083790.03132
Chr1815955648152615695506771434020.04124
Chr1913869662132377974995071323590.03826
Chr2016390728157212555363971330770.03826
Chr2114454994138332804943171273970.03627
Chr2213853971132378234926611234870.03528
Chr2314714263140745175061401336060.03826
Mean17652696169021906002521502540.04323
Genome4060120103887503671380580734558370.9871
Appendix 7—table 9
Hardy–Weinberg test of PCA groups as inversion genotypes.

Observed O and Hardy–Weinberg expected E haplotype frequencies, allele frequency p, X2 test statistic distributed as χ2, and probability P of test statistic. Arranged by chromsome and by population. Based on the assumption that groups revealed by principal componenet analysis (PCA) represent composite genotypes of inversion haplotypes.

ChromosomePCA groupSouth/south-eastÞistilfjörður
OEpX2POEpX2P
Chr01AA77.440.330.060.803128.520.631.600.21
Chr01AB3130.112832.96
Chr01BB3030.44129.52
Chr02CC4130.760.760.690.413639.560.754.990.03
Chr02CD2224.473426.87
Chr02DD53.7614.56
Chr07EE4848.620.850.330.564243.380.780.920.36
Chr07EF1917.762724.23
Chr07FF11.6223.38
Chr12GG6261.130.960.140.706261.350.931.380.24
Chr12GH65.7489.30
Chr12HH00.1310.35
Appendix 7—table 10
Genetic diversity and background selection simulations.

The genetic variation accumulated under different cases in SLiM (Haller and Messer, 2019) simulations of background selection (Appendix 6—figure 19d). In all cases a population of size N=105 evolves according to the Wright–Fisher model assuming a chromosome segment of size 105 bp with recombination rate 10−7 per site per generation that collects neutral or negative mutations with frequency μ=10-7 per site per generation as now specified. Negative mutations were modelled as Gamma-distributed with a negative sign, with mean −0.1 (A, B, D) or −0.05 (C, E, F) all with shape parameter 0.2. The relative frequency of negative versus neutral mutations was 1:1 for (A, B, D) and 1:9 for (C, E, F). The points represent the logits of the normalized site-frequency spectrum of a random sample of 136 chromosomes (corresponding to the sample size of the South/south-east population) averaged over 100 (A, B, C, E) and 10 (D, F) replicates and taken after 105 generations (B, E), 2 × 105 generations (A, C), and 106 generations (D, F).

CaseAverage number of segregating sitesAverageΠAverage π per seg site
A8934.51257.00.14
B7765.2872.20.11
C15568.82248.70.14
D9896.61574.00.16
E13001.81426.90.11
F18857.73370.90.18

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Einar Árnason
  2. Jere Koskela
  3. Katrín Halldórsdóttir
  4. Bjarki Eldon
(2023)
Sweepstakes reproductive success via pervasive and recurrent selective sweeps
eLife 12:e80781.
https://doi.org/10.7554/eLife.80781