Regions of the genome reflect correlated sets of genealogical relationships (A), each of which represents a set of ancestors with varying spatial positions back in time. We extract genotypes from …
Simulations were on a 50 × 50 landscape and error is expressed in map units. (A) True and predicted locations by population mean dispersal rate and number of SNPs. 450 randomly-sampled individuals …
(A) Error for runs with 100,000 SNPs and varying numbers of training samples. (B) Error for runs with 450 training samples and varying number of SNPs.
The first three epochs (with very high loss) were excluded from the plot to improve axis scaling.
Number of SNPs per run is shown on the right. Both methods were run on randomly selected SNPs with minor allele count >2 from the first five million base pairs of chromosome 2L.
Black points are predictions from 2Mbp windows, blue points are training sample locations, and the red point is the true location for each individual. Contours show the 95%, 50%, and 10% quantiles …
Here, we generate predictions (black points) from bootstrap samples of the complete genotype matrix (in contrast to using separate sets of SNPs extracted from windows as used for figures in the main …
The geographic centroid of per-window predictions for each individual is shown in black points, and lines connect predicted to true locations. Sample localities are colored by the mean test error …
Boxplots show the distribution of Euclidean distance between the true and predicted locations of validation samples across 10 replicate training runs. Network shapes are described on the horizontal …
The geographic centroid of per-window predictions for each individual is shown in black points, and lines connect predicted to true locations. Sample localities are colored by the mean test error …
scales from 0 (maximum complexity) to 1 (minimum complexity). The blue line shows a linear regression (). High within-host diversity does not appear to explain outliers in Locator’s prediction …
The geographic centroid of per-window predictions for each individual is shown in black points, and lines connect predicted to true locations. Sample localities are colored by the mean test error …
(A) Windowed Locator predictions for Maya sample HGDP00871. (B) PCAs of all HGDP samples run on SNPs extracted from windows with predicted locations in western Europe (left) and west Africa (right). …
The top 2% of windows by test error were excluded from this analysis. The slope of the least-squares linear fit is −99.9723 km/(cM/Mbp) and has adjusted .
Triangles show approximate centromere locations.
Triangles show approximate centromere locations.
Despite differences in error among genomic windows (Figure 8—figure supplements 1 and 2), error in the mean genome-wide predicted location is very similar when using megabase (top) or centimorgan …
Black points show sampling locations. Arrows are colored by genotype at variant rs3827760 and point towards the predicted location. Frequency of the A allele by longitude is shown below the map.
Validation error in terms of map units and generations of mean population dispersal for Locator runs in simulations with 450 training samples and 100,000 SNPs.
Note that while absolute error increases along with dispersal rate, it is roughly constant when expressed in terms of generations of mean dispersal.
Mean and median prediction error for Locator and SPASIBA run on simulations and Anopheles data as shown in Figure 3.
Error is in terms of map units for simulated data (total landscape width = 50).
Test error for windowed analyses of empirical datasets using the location with highest kernel density and the centroid of per-window predictions, as median (90% interval).