The spatiotemporal patterns of major human admixture events during the European Holocene

  1. Manjusha Chintalapati  Is a corresponding author
  2. Nick Patterson  Is a corresponding author
  3. Priya Moorjani  Is a corresponding author
  1. Department of Molecular and Cell Biology, University of California, Berkeley, United States
  2. Broad Institute of Harvard and MIT, United States
  3. Human Evolutionary Biology, Harvard University, United States
  4. Center for Computational Biology, University of California, Bekerley, United States
15 figures, 9 tables and 3 additional files

Figures

Figure 1 with 9 supplements
Simulation results.

We constructed n admixed individuals with 20% European (CEU) and 80% African (YRI) ancestry using ~380,000 genome-wide SNPs for admixture dates ranging between 10 and 200 generations. To minimize any issues with overfitting, we used French and Yoruba from the Human Genome Diversity Panel as reference populations in DATES (Distribution of Ancestry Tracts of Evolutionary Signals). We show the true time of admixture (X-axis, in generations) and the estimated time of admixture (±1 SE) (Y-axis, in generations). Standard errors were calculated using a weighted block jackknife approach by removing one chromosome in each run (Materials and methods). (A) Effect of sample size: We varied the sample size (n) of target group between 1 and 10 individuals. (B) Effect of data quality: To mimic the features of ancient genomes, we generated n=10 target individuals with pseudo-haploid genotypes and missing genotype rate as 10% (orange), 30% (purple), and 60% (green). See Figure 1—figure supplements 19 for additional simulations to test the performance of DATES. R code to replicate this figure is available at: https://github.com/manjushachintalapati/DATES_EuropeanHolocene/blob/main/1.R.

Figure 1—figure supplement 1
Varying time of admixture up to 300 generations.

We simulated data for 10 admixed individuals with 20% European (CEU) and 80% African (YRI) ancestry and varied the time of admixture between 10 and 300 generations. The X-axis shows the true time of admixture, and the Y-axis shows the estimated time of admixture (±1 SE) inferred using DATES (Distribution of Ancestry Tracts of Evolutionary Signals).

Figure 1—figure supplement 2
Impact of sample size of the target (admixed) and reference populations.

We simulated n admixed individuals with 20% European (CEU) and 80% African (YRI) ancestry and applied DATES with m reference samples of French and Yoruba ancestry. (A) Effect of sample size of target population. Each panel shows the results of simulations with n target individuals shown in the legend and m=28 French and m=21 Yoruba reference samples from each source group. (B) Effect of sample size (m) of reference populations. Each panel shows the results of simulations with n=10 target individuals and m reference samples from each source group shown in the legend. The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis.

Figure 1—figure supplement 3
Impact of admixture proportion.

We simulated data for 10 admixed individuals with European (CEU) ancestry (α) in the range of 1–40% (the rest derived from Africans). We ran DATES (Distribution of Ancestry Tracts of Evolutionary Signals) to infer the time of admixture and ancestry proportion. (A) Impact on the estimated time of admixture: Each panel shows the estimated date of admixture for a different value of α shown in the legend. The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis. (B) Impact on estimated ancestry proportion: Each panel shows the estimated proportion of admixture for a different value of α shown in the legend. The red dashed horizontal line further indicates the value of α used. The true time of admixture is shown on X-axis with the inferred proportion of admixture on Y-axis.

Figure 1—figure supplement 4
Impact of divergence between the ancestral population and reference populations used in DATES (Distribution of Ancestry Tracts of Evolutionary Signals).

We simulated 10 admixed individuals with 80% European (CEU) and 20% African (YRI) ancestry. We applied DATES to infer the timing of admixture using reference populations. In each panel, we show the estimated dates of admixture using French and a group that is increasingly divergent from Yoruba (shown in the legend as the FST with Yoruba). The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis.

Figure 1—figure supplement 5
Impact of divergence between the two source populations.

We simulated n admixed individuals with 20% European (CEU) and 80% ancestry from a range of populations with increasing relatedness to Europeans (shown in the legend as the FST to Europeans). Specifically, the other reference population we used was either West Africans (YRI), East Asians (CHB), South Americans (MXL) or Southern Europeans (TSI). We used the following reference populations for the inference: French (for all simulations) with one of the other references as either Yoruba, Tujia, Maya, or Italian, respectively. We show results for varying target sample sizes of (A) n=10 and (B) n=1. The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis. We note the inferred dates for CEU/TSI mixtures were not significant for older timescales and hence not shown.

Figure 1—figure supplement 6
Impact of using the admixed individuals themselves as one of the reference groups in DATES (Distribution of Ancestry Tracts of Evolutionary Signals).

We simulated data for 10 admixed individuals with European (CEU) ancestry (α) in the range of 20–80% (the rest derived from Africans [YRI]) using CEU and YRI as reference populations. Using a non-overlapping set of CEU and YRI individuals, we generated 10 additional individuals that we used as reference samples in DATES. For each simulation, we ran DATES with Europeans (French) and a non-overlapping set of simulated admixed individuals as the reference populations (shown in blue), or Yoruba and simulated admixed individuals (shown in orange). The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis.

Figure 1—figure supplement 7
Impact of sample size and data quality of target samples.

We simulated data for n admixed individuals with 20% European (CEU) and 80% African (YRI) ancestry. In each panel, we varied three key features of the data from the target population, notably sample size (n=1 or 10), type of genotypes (diploid or pseudo-haploid) and missing genotype rate (between 10% and 60%). (A) Diploid genotypes with missing data for n=10 admixed individuals. Each panel shows the results of x% of missing diploid genotypes (shown in the legend). (B) Pseudo-haploid genotypes with missing data for n=10 admixed individuals. Each panel shows the results of x% of missing pseudo-haploid genotypes (shown in the legend). (C) Pseudo-haploid genotypes with missing data for n=1 admixed individuals. Each panel shows the results of x% of missing pseudo-haploid genotypes (shown in the legend). The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis.

Figure 1—figure supplement 8
Impact of data quality of target and reference populations as a function of divergence between true and reference populations used in DATES (Distribution of Ancestry Tracts of Evolutionary Signals).

We simulated data for n=10 admixed individuals with 20% European (CEU) and 80% African (YRI) ancestry with pseudo-haploid genotypes. The reference populations used also had pseudo-haploid genotypes. We further varied three key features of the data, missing genotype rate in reference populations, missing genotype rate in target populations, and divergence between true source populations and reference population used for the analysis. In each row, we show the admixture dates using reference populations with increasing divergence to true source population (FST shown in the row title). In each column, we varied the missing genotype rate in the target population (shown in the column title). Further, each panel shows results of missing data in the reference genomes (shown in the legend). (a) Reference populations of French and Yoruba (FST(true, reference)~0). (b) Reference populations of French and Bantu Kenya (FST(true, reference)~0.009). (c) Reference populations of French and San (FST(true, reference)~0.103). The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis.

Figure 1—figure supplement 9
Impact of small sample size and data quality of target and reference populations as a function of divergence between true and reference populations used in DATES (Distribution of Ancestry Tracts of Evolutionary Signals).

We simulated data for n=1 admixed individuals with 20% European (CEU) and 80% African (YRI) ancestry with pseudo-haploid genotypes. The reference populations used also had pseudo-haploid genotypes. We further varied three key features of the data, missing genotype rate in reference populations, missing genotype rate in target populations, and divergence between true source populations and reference population used for the analysis. In each row, we show the admixture dates using reference populations with increasing divergence to true source population (FST shown in the row title). In each column, we varied the missing genotype rate in the target population (shown in the column title). Further, each panel shows results of missing data in the reference genomes (shown in the legend). (a) Reference populations of French and Yoruba (FST(true, reference)~0). (b) Reference populations of French and Bantu Kenya (FST(true, reference)~0.009). (c) Reference populations of French and San (FST(true, reference)~0.103). The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis.

Figure 2 with 2 supplements
Timeline of admixture events in ancient Europe.

We applied DATES (Distribution of Ancestry Tracts of Evolutionary Signals) to ancient samples from Europe. In the right panel, we show the sampling locations of the ancient specimens, and in the left panel, we show the admixture dates for each target group listed on the X-axis. The inferred dates in generations were converted to dates in BCE by assuming a mean generation time of 28 years (Moorjani et al., 2011) and accounting for the average sampling age (shown as gray dots) of all ancient individuals in the target group (Materials and methods). The top panel shows the formation of western hunter-gatherer (WHG)-eastern hunter-gatherer (EHG) cline (in blue) using Mesolithic hunter-gatherers (HGs) as the target and EHG and WHG as reference populations. The middle panel shows admixture dates of local HGs and Anatolian farmers (in orange) using Neolithic European groups as targets and Anatolian farmers-related groups and WHG-related groups as reference populations. The bottom panel shows the spread of Steppe pastoralist-related ancestry (in green) estimated using middle and late Neolithic, Chalcolithic, and Bronze Age samples from Europe as target populations and early Steppe pastoralist-related groups (Afanasievo and Yamnaya Samara) and a set of Anatolian farmers and WHG-related groups as reference populations. For the middle to late Bronze Age (MLBA) samples from Eurasia, we used the early Steppe pastoralist-related groups and the Neolithic European groups as reference populations. The cultural affiliation (Corded Ware Complex [CWC], Bell Beaker complex [BBC], or Steppe MLBA cultures) of the individuals is shown in the legend. See Figure 2—figure supplements 1 and 2 we applied DATESfor decay curves for all samples and stratified datesfor Iron Gates HGs. R code to replicate this figure is available at: https://github.com/manjushachintalapati/DATES_EuropeanHolocene/blob/main/3.R.

Figure 2—figure supplement 1
DATES (Distribution of Ancestry Tracts of Evolutionary Signals) ancestry covariance decay curves.

We show the weighted ancestry covariance decay curves generated using DATES for all the target groups analyzed in the study. Each subplot shows the decay curve for one target population with the associated reference groups shown in the title. For each target, in the legend, we show the inferred average dates of admixture (±1 SE) in generations before the individual lived, in BCE that accounts for the average age of all the individuals in the target and the mean generation time of human populations (see Materials and methods). We also show the normalized root-mean-square deviation (NRMSD) values for all fitted curves and the plots with NRMSD >0.7 are shown in gray. For consistency, we use the same colors as Figure 2.

Figure 2—figure supplement 2
Timing of western hunter-gatherer (WHG) and eastern hunter-gatherer (EHG) admixture in Iron Gates hunter-gatherer (HG) samples.

The time of admixture in Iron Gates HG samples grouped in bins of C14 age of 500 years. The C14 age is shown on X-axis and the admixture time in BCE for corresponding samples is shown on the Y-axis.

Genetic formation of early Anatolian farmers and early Bronze Age Steppe pastoralists.

The top panel shows a map with sampling locations of the target groups analyzed for admixture dating. The bottom panels show the inferred times of admixture for each target using DATES (Distribution of Ancestry Tracts of Evolutionary Signals) by fitting an exponential function with an affine term y=Ae-λd+c, where d is the genetic distance in Morgans and λ = (t+1) is the number of generations since admixture (t) (Materials and methods). We start the fit at a genetic distance (d) >0.5 cM (centiMorgans) to minimize confounding with background LD and estimate a standard error by performing a weighted block jackknife removing one chromosome in each run. For each target, in the legend, we show the inferred average dates of admixture (±1 SE) in generations before the individual lived, in BCE accounting for the average age of all the individuals and the mean human generation time, and the normalized root-mean-square deviation (NRMSD) values to assess the fit of the exponential curve (Materials and methods). The bottom left shows the ancestry covariance decay curve for early Anatolian farmers inferred using one reference group as a set of pooled individuals of western hunter-gatherer (WHG)-related and Levant Neolithic farmers-related individuals as a proxy of Anatolian hunter-gatherer (AHG) ancestry and the second reference group containing Iran Neolithic farmer-related individuals. The bottom right shows the ancestry covariance decay curve for early Steppe pastoralists groups, including all Yamnaya and Afanasievo individuals as the target group and eastern hunter-gatherer (EHG)-related and Iran Neolithic farmer-related groups as reference populations. R code to replicate this figure is available at: https://github.com/manjushachintalapati/DATES_EuropeanHolocene/blob/main/2.R.

Appendix 1—figure 1
Impact of the discretization parameter (qbin) on accuracy.

We show three subplots for a sample size of n=1 (Panel A) and n=20 (Panel B). For each subplot, we simulated data for n admixed individuals with 20% ancestry from Europeans (1000 Genomes, CEU) and 80% ancestry from Africans (1000 Genomes, YRI) with the time of admixture (λ) shown on the X-axis and the estimated admixture time inferred using DATES on Y-axis. We ran DATES using varying qbin values shown in different colors.

Appendix 1—figure 2
Impact of the discretization parameter (qbin) on runtime.

We show three subplots for sample size of n=1 (top left), n=20 (top right), and n=100 (bottom). For each subplot, we simulated data for n admixed individuals with 20% ancestry from Europeans (1000 Genomes, CEU) and 80% ancestry from Africans (1000 Genomes, YRI) with the time of admixture (λ) of 100 generations ago. We show the impact of qbin (X-axis) on the runtime measured in seconds. For sample sizes, n>1, r2 between qbin and runtime is >0.99.

Appendix 1—figure 3
Histogram of the normalized root-mean-square deviation (NRMSD) values computed as the normalized residual between the empirical and fitted decay curves, for all the ancient DNA populations reported in Figure 2—figure supplement 1.

The red vertical line represents the value NRMSD = 0.7, which we used as the threshold to exclude populations from our analysis because visual inspection of fitted curves above this threshold suggests the results are too noisy to make a reliable inference (see Figure 2—figure supplement 1 for all fitted decay curves).

Appendix 1—figure 4
Ancestry covariance curves for the lowest (left) and highest (right) NRMSD values in our ancient DNA populations in our study.

For details of all curves and NRMSD estimation, see Figure 2—figure supplement 1.

Appendix 1—figure 5
Effect of missing genotypes on the performance of DATES (Distribution of Ancestry Tracts of Evolutionary Signals) and ALDER: We simulated data for 10 admixed individuals with varying proportions of missing data (shown in each panel).

The estimated admixture times (±1 SE) from DATES (green) and ALDER (pink) are shown on Y-axis and the true time of admixture is shown on X-axis. For a fair comparison with ALDER, the dates reported here for DATES are using exponential fit to λ-1 generations (instead of the default of λ generations).

Appendix 2—figure 1
Model for multiple pulses of admixture.

The admixed population (Target) derives ancestry from three populations, from the two gene flow events that occurred t2 generations ago (older pulse) between PopA and PopB with α1 / α2 ancestries respectively resulting in an intermediate group S, which then mixes with PopC with α3 ancestry at t1 generations ago (younger pulse).

Appendix 2—figure 2
Multiple pulses of admixture with equal proportions of ancestry from sources.

We generated a target population that has ancestry from three groups (PopA, PopB, PopC) with ancestry proportions of 33%, 33%, and 33% respectively that mixed at two distinct times (t1=30,60,100 and t2=10 generations ago). We used CEU, YRI, and CHB as PopA, PopB, and PopC and varied the order of the three ancestrals, and applied DATES with pairs of populations as the reference to infer the timing of the mixture. We show the expected dates (t1 or t2 depending on the references used), the orange dashed line corresponds to the older pulse and the blue dashed line corresponds to the younger pulse of admixture. The blue points correspond to DATES estimates using PopA and PopC or PopB and PopC as references. The orange points correspond to DATES estimates using PopA and PopB as references. Panel (A) shows the admixture scenario with PopA = CEU, PopB = YRI, and PopC = CHB. Panel (B) shows the admixture scenario with PopA = CEU, PopB = CHB, and PopC = YRI, and Panel (C) shows the admixture scenario with PopA = CHB, PopB = YRI, and PopC = CEU.

Appendix 2—figure 3
Two pulses of admixture with unequal proportions of ancestry from reference populations.

We generated a target population that has ancestry from three groups with variation in ancestry from three sources. Model A: PopA, PopB, PopC with ancestry proportion of 4%, 16%, and 80% respectively that mixed at two distinct times (t1=30,60,100 and t2=10 generations ago). Model B: PopA, PopB, PopC with ancestry proportion of 16%, 64%, and 20% respectively that mixed at two distinct times (t1=30,60,100 and t2=10 generations ago). We used CEU, YRI, and CHB as PopA, PopB, and PopC and varied the order of the three ancestral populations, and applied DATES with pairs of populations as the reference to infer the timing of the mixture. Figures shows the true time of admixture on the X-axis and inferred time on Y-axis, the orange dashed line corresponds to the older pulse and the blue dashed line corresponds to the younger pulse of admixture. The blue points correspond to DATES estimates using PopA and PopC or PopB and PopC as references. The orange points correspond to DATES estimates using PopA and PopB as references. Panel (A) shows the admixture scenario with PopA = CEU, PopB = YRI, and PopC = CHB. Panel (B) shows the admixture scenario with PopA = CEU, PopB = CHB, and PopC = YRI. Panel (C) shows the admixture scenario with PopA = CHB, PopB = YRI, and PopC = CEU.

Appendix 2—figure 4
Impact of founder events on inferred dates of admixture.

(A) Schematic for demographic scenario shows that PopC was formed through admixture between PopA and PopB at time TA. Following admixture, PopC experienced a severe bottleneck that occurred TB generations ago where the effective population size decreased from 12,500 to NB for a duration of DB generations. After TB, the population recovered to the original population size. (B) We simulated data for 10 individuals with admixture occurring at 50, 100, and 200 generations with bottleneck post-admixture for a period of DB = 1, 5, or 10 generations (shown in the legend) with the effective population size during bottleneck as NB = 10, 100, 500, or 1000 individuals (shown as four panels). The true admixture time is shown on X-axis, and the estimated time of admixture (±1 SE) is shown on Y-axis.

Appendix 2—figure 5
Impact of founder event with no recovery in admixed population.

Schematic for the demographic history of the admixed group PopC that has ancestry from PopA and PopB followed by a severe bottleneck post-admixture without recovery to present (i.e., maintenance of historically low population size to present NB). The FST (PopA, PopC) is 0.202 and FST (PopB, PopC) is 0.168.

Appendix 2—figure 6
Impact of models with no admixture, severe founder event.

(A) Demographic scenario with three populations. PopA and PopB diverged 1800 generations ago and PopB and PopC diverged 1000 generations ago. PopC had a bottleneck at TB generations ago with population size during bottleneck NB. (B) Ancestry covariance curves for PopC. We simulated data for 25 individuals from PopA and PopB and 10 individuals from PopC. We applied DATES (Distribution of Ancestry Tracts of Evolutionary Signals) on PopC with PopA and PopB as sources and show the decay curves for different timing and effective population size of founder events. Note, none of the simulations show significant exponential fits and all dates include 0 in the estimated CI.

Appendix 2—figure 7
Effect of drift, no admixture: impact of models with no admixture, severe founder event (without recovery).

(A) Demographic scenario with three populations. PopA and PopB diverged 1800 generations ago and PopB and PopC diverged 1000 generations ago. PopC had a bottleneck at TB generations ago with population size during bottleneck NB. The population size NB is maintained to present. (B) Ancestry covariance curves for PopC. We simulated data for 25 individuals from PopA and PopB and 10 individuals from PopC. We applied DATES (Distribution of Ancestry Tracts of Evolutionary Signals) on PopC with PopA and PopB as sources and show the decay curves for different timing and effective population size of founder events.

Tables

Appendix 1—table 1
Comparison of results using DATES (Distribution of Ancestry Tracts of Evolutionary Signals) v753 and v4010 using simulated data.
(A) Simulated data with the target sample size (n) of 10 individuals
True time of admixture (generations)DATES (V753) (mean ± SE)DATES (v4010) (mean ± SE)
1010.0±0.511.0±0.6
2019.1±1.519.6±1.5
3028.0±1.528.9±1.3
4046.1±2.145.7±1.8
5055.5±2.955.6±2.8
6059.4±2.260.5±2.4
7069.3±4.269.8±4.0
8084.2±4.484.0±3.9
9097.7±3.793.7±3.6
100107.4±5.4106.7±4.5
110113.6±5.5112.9±4.7
120122.7±5.4124.3±5.5
130138.6±7.7134.1±6.2
140153.2±9.2152.0±8.4
150147.5±9.0146.6±8.2
160181.4±9.8176.6±8.2
170178.0±8.1175.9±7.4
180180.7±10.6182.3±9.3
190172.9±15.3174.8±12.7
200204.7±17.1208.8±13.4
210194.5±13.3196.3±11.9
220255.8±17.0250.7±13.8
230251.9±18.6237.0±13.0
240234.7±18.3241.5±14.2
250228.3±13.8233.2±11.9
260254.2±21.7253.0±16.1
270291.4±22.4292.1±20.0
280252.4±25.2248.1±22.4
290277.9±22.4285.4±20.5
300318.6±23.3315.1±20.3
(B) Simulated data with sample size (n) ranging between 10 and 20 individuals.
True time of admixture (generations)Sample sizeDATES (V753) (mean ±SE)DATES (v4010) (mean ±SE)
1016.9±2.57.8±2.6
1058.8±0.89.9±0.8
10109.9±1.210.8±1.2
101510.9±0.711.8±0.7
102010.3±0.611.3±0.6
50151.7±7.151.5±7.5
50559.7±3.558.6±3.2
501048.7±2.750.1±2.7
501554.2±2.154.7±2.1
502052.9±1.953.1±1.9
1001124.5±17.6122.5±13.2
1005107.2±7.6108.2±7.5
10010103.3±7.7100.1±3.8
1001599.4±4.5103.4±3.3
1501136.4±29.2144.2±26.3
1505142.6±11.4143±11.7
15010156.9±9158.6±7
15015142.9±7.8146.1±6.7
15020156.5±5.4152.9±4.3
2001195.4±88.4160.2±73.9
2005210.9±20.7206.7±18.7
20010225±18.7219.8±18
20015200±10.6197.7±9
20020189.4±11190.3±9
Appendix 1—table 2
Comparison of results with Narasimhan et al., 2019.
PopulationReference populations*DATES (v753)DATES (v4010)
(mean ±SE; in generations)(mean ±SE; in generations)
Indus_Periphery_PoolAASI and Iranian-farmer-related71±1562±7
SPGTAASI and Steppe-pastoralist-related26±328±3
  1. *

    We used the reference populations of AASI ancestry that includes South Asians from the 1000 Genomes Project (Phase 3) including Sri Lankan Tamil from the UK (STU.SG) and Indian Telugu from the UK (ITU.SG), as well as BIR.SG, and Iranian farmer-related ancestry including Aigyrzhal_BA, Sarazm_EN, Geoksyur_EN, Parkhai_Anau_EN, and Steppe-pastoralist-related including Central_Steppe_MLBA.

Appendix 1—table 3
Comparison of dates of the spread of Neolithic farming from Rivollat et al., 2020.
PopulationnDATES (v753)Population in our study (v44 1240K)n (v44 1240K)DATES (v4010)#
Bulgaria_MP_Neolithic98.4±2.3Bulgaria_MalakPreslavets_N38.05±3
Serbia_Neolithic4Serbia_EN322.8±9.8
Romania_EN232.1±10.4Romania_EN*229.7±7.1
Croatia_Impressa2Croatia_EN_Impressa2
Hungary_ALPc_MN2321.5±4.7Hungary_MN_ALPc2121.9±1.6
Hungary_LBK_MN1012.8±5.2Hungary_MN_LBK618.6±7.4
Hungary_ALBK_MN214.8±3.2Hungary_MN_ALBK_Szakalhat219.3±3.3
Hungary_LN1821.5±3.7Hungary_LN1828.03±3.8
Austria_LBK_EN815.5±4.6Austria_EN_LBK917.6±2.3
Czech_MN518.3±7.7Czech_MN432.9±6.3
France_MN326.5±5.6France_MN4330±1.3
Iberia_EN1015.6±2.5Spain_EN1120.6±3.6
Iberia_MN752.4±4.3Spain_MLN4256.3±4
Germany_LBK_EN2714.4±2.6Germany_EN_LBK5417.4±2.7
Germany_Blatterhohle_MN412.3±2.5Germany_Blatterhohle_MN416.2±2.9
Germany_Esperstedt_MN1Germany_MN_Esperstedt1
England_Neolithic2945.5±5.5England_N.SG17
Wales_Neolithic645.3±7.4Wales_N450.7±3.3
Scotland_Neolithic4250.9±3.8Scotland_N3056.6±2.9
Ireland_Neolithic1346.9±7.5Ireland_MN.SG2650.8±2.2
  1. (Blue) indicates samples sizes that differ across both studies.

  2. # For DATES (Distribution of Ancestry Tracts of Evolutionary Signals), we used pooled western hunter-gatherer (WHG) and Anatolian farmers as the reference populations except for samples marked with *.

  3. *

    Indicates cases where the results were not significant as the 95% CI includes 0.

Appendix 1—table 4
Comparison of ALDER and DATES (Distribution of Ancestry Tracts of Evolutionary Signals) for varying samples sizes and times of admixture.

We simulated data for 20 and 100 admixed individuals using the CEU and YRI from 1000G with the mixture proportion of 20% from European and 80% African ancestry. The dates reported here for DATES are using exponential fit to λ-1 generations.

Time of admixture (gen)Number of individuals, n=20Number of individuals, n=100
ALDER mean ±1 SE (gen)DATES mean ±1SE(gen)ALDER mean ±1SE(gen)DATES mean ±1SE(gen)
109.3±0.810.7±0.610.2±0.310±0.3
2019.4±1.319.7±0.820.2±0.320.3±0.3
3028.5±1.730.8±1.530.6±0.930.5±0.7
4040.9±240.3±1.540.6±0.740.6±0.4
5047.9±3.649.6±1.650±1.150.9±0.7
6055.7±2.760.3±1.562±2.263.2±1
7071.4±474±2.774±2.272.4±1.3
8080.6±4.882.5±2.985.3±2.384.4±1.1
9087.8±4.288.9±394.1±2.792.9±1.3
10093.7±4.998.1±2.9101.9±3.9103.6±1
110121.4±5.4118.2±3.7120.7±4.3115.5±1.8
120116.5±8.5128.4±3.9121.2±5.1121.5±1.7
130138.2±9.2133.7±4.6130.2±4.8132.8±1.7
140134.5±17.5142.4±7144.9±7.3145.3±3.1
150144.8±23.8149.5±7.4155.1±7157.5±2.8
160141.9±11.3166.7±5.9154.5±8.5161.7±2.4
170173.4±13.7175.1±6.9170.3±6.2173.5±3
180204.6±17.8195.5±7.1174.2±7180.7±3.3
190221.3±23.9210.4±9.4191.2±16.2197.2±4.6
200202.8±11.1196±6.5188.5±16.5202.6±4.7
Appendix 1—table 5
Admixture dates in present-day populations inferred using ROLLOFF, Globetrotter, ALDER, and DATES (Distribution of Ancestry Tracts of Evolutionary Signals).
PopulationnkSource1Source2ROLLOFFGlobetrotterALDER formal testALDER_2-ref datesDATESComments
Hazara22Mongola (10)Iranian (13)23±122±0.9Long-range LD--24.6±1.0
Uzbekistani15Mongola (10)Iranian (13)20±1.419±1.1SUCCEEDS19.18±2.2221.3±1.4
Uyghur10Mongola (10)Iranian (13)23±2.622±1.3SUCCEEDS16.73±1.3822.2±2.1
Makrani22Bantu Kenya (11)Balochi (21)18±1.818±1.2Long-range LD--13.2±1.6
Druze42Yoruba (21)Cypriot (12)39±7.337±1.9FAILS44.02±6.3743.4±6.1
Mozabite25Yoruba (21)Moroccan (22)23±1.921±1.3Long-range LD--21.6±1.8
Turkish17Mongola (10)Iranian (13)28±3.224±1.5FAILS25.62±2.4828.5±2.3
Brahui23Bantu Kenya (11)Balochi (21)13±3.420±1.5Long-range LD--10.4±1.6*Possibly multi-way admixture (Pagani et al., 2017)
Yemeni4Bantu Kenya (11)Syrian (16)15±2.314±1.8FAILS6.29±2.8912.7±1.6
Pima14Turkish (17)Mayan (21)9±3.66±0.9SUCCEEDS6.29±0.897.8±1.1
Bantu South Africa8San Khomani (30)Yoruba (21)26±2.525±2.3Long-range LD--27.9±2.2
Tu10Greek (20)Han N-China (10)33±6.325±2.3FAILS28.83±2.831.3±1.96
West Sicilian10Yoruba (21)East Sicilian (10)26±7.827±3.9FAILS42.72±16.3437.4±16.4
Cambodian10Uyghur (10)Han (34)17±4.720±2.7SUCCEEDS24.28±5.3633.6±3.7*
Georgian20Adygei (17)Greek (20)--30±3.3FAILS3.15±1.22--
Romanian13Lithuanian (10)East Sicilian (10)--31±2.6FAILS----
Bulgarian18Polish (16)Cypriot (12)--28±3.5FAILS40.95±16.4291.1±24.7*Possibly multi-way admixture or different model of admixture (see Main text and Haak et al., 2015)
Hezhen8Tujia (10)Mongola (10)--13±1.3FAILS2.92±1.4--
Oroqen9Yakut (25)Mongola (10)--15±2Long-range LD----
Hungarian18Cypriot (12)Polish (16)65±2439±3.5FAILS54.83±25.2761.8±19.1
Han N-China10Turkish (17)Tujia (10)37±11.126±3.8FAILS48.17±10.3644.3±5.1*
Daur9Tujia (10)Mongola (10)--21±1.7FAILS----
Greek20Polish (16)Cypriot (12)69±18.536±3.7FAILS55.54±8.9362.6±16.9
Melanesian10Papuan (16)Cambodian (10)66±12.128±7.6FAILS64.91±5.4268.6±7.1*
Mandenka22Moroccan (22)Yoruba (21)22±10.319±4.2FAILS17.25±6.0585.8±19.0 *#ΩPossibly multiple admixture events (Price et al., 2009)
Indian13Cambodian (10)Sindhi (23)91±41.153±8.4FAILSn/an/aThere are multiple “Indian” groups in the dataset making it unclear which target was used
North Italian12Cypriot (12)French (28)--71±11.8FAILS12.44±4.32--
Polish16French (28)Lithuanian (10)--31±5.1FAILS----
Tuscan8Cypriot (12)French (28)--35±6.1FAILS----
San Namibia5Sandawe (28)San Khomani (30)--48±8.9Long-range LD----
  1. Columns 1–5 include results from Table S12 from Hellenthal et al., 2014. We only show significant dates (|Z| > 2).

  2. Following Hellenthal et al., we created a merged dataset of the Human Genome Diversity Panel, Henn et al. and Behar et al. containing 1642 individuals and 465543 SNPs. This dataset was used for ALDER and DATES analysis.

  3. Standard errors in DATES were estimated using chromosome jackknife (see Materials and methods).

  4. -- indicates results where the inferred results were not significant, either the method failed or the 95% CI included 0.

  5. * indicates DATES estimates that significantly differ from Globetrotter estimates (not within two SEs).

  6. # indicates DATES estimates that significantly differ from ROLLOFF results.

  7. Ω indicates DATES estimates that significantly differ from ALDER results.

  8. n/a indicates target population was unclear.

Appendix 1—table 6
Admixture dates for Neolithic European groups using DATES (Distribution of Ancestry Tracts of Evolutionary Signals) and modified ALDER (Lipson et al., 2017).
RegionPopulationModified ALDER* (Lipson et al., 2017)DATES
GermanyBlätterhöhle_MN18.5±4.614±3
GermanyGermany_MN26.2±4.436±20
GermanyLBK_EN14.9±2.418±3
HungaryLBK_EN17.8±2.022±2
HungaryBaden_CA27.6±3.849±11
HungaryLasinja_CA29.3±5.232±6
HungaryLBKT_MN30.3±5.823±10
HungaryProtoboleraz_CA44.3±6.446±7
HungaryStarcevo_EN4.5±1.95±2
HungaryTDLN20.9±2.729±4
HungaryTisza_LN18.2±6.627±9
SpainIberia_CA49.6±5.255±6
SpainIberia_EN19.4±2.320±3
SpainIberia_MN49.9±7.752±8
  1. *

    Modified ALDER – We report the individual level dates from Extended Data Table 4 based on average of individual level dates calculated using Anatolian farmers and western hunter-gatherer (WHG) as sources and high coverage Anatolian farmers as helper samples. For details, see Lipson et al., 2017

  2. DATES – We used pooled WHG groups and Anatolian farmers as references in DATES.

Appendix 2—table 1
Impact of reference populations in two-way admixed groups.
Model: We generated target populations with two pulses of gene flow where PopA and PopB mixed at time t1 generations ago with ancestry proportion of α1 and α2, followed by gene flow from PopC at t2 generations ago with ancestry proportion of α3
Targett2/t1Ref1Ref 2α1=20%,
α2=80%,
α3=80%
α1=50%,
α2=50%,
α3=50%
α1=20%,
α2=80%,
α3=20%
α1=20%,
α2=80%,
α3=10%
(A) Using reference populations PopA and PopC
PopA = CEU
PopB = YRI
PopC = CHB
100/10HanFrench10.310.712.012.6
PopA = CEU
PopB = CHB
PopC = YRI
100/10YorubaFrench10.39.710.210.6
PopA = CHB
PopB = YRI
PopC = CEU
100/10FrenchHan10.811.620.744
PopA = CEU
PopB = YRI
PopC = CHB
60/10HanFrench11.310.711.814.4
PopA = CEU
PopB = CHB
PopC = YRI
60/10YorubaFrench10.410.810.311.2
PopA = CHB
PopB = YRI
PopC = CEU
60/10FrenchHan11.212.619.940.7
(B) Using reference populations PopB and PopC
PopA = CEU
PopB = YRI
PopC = CHB
100/10HanYoruba10.211.511.311.9
PopA = CEU
PopB = CHB
PopC = YRI
100/10YorubaHan10.19.710.310.5
PopA = CHB
PopB = YRI
PopC = CEU
100/10FrenchYoruba10.112.812.214
PopA = CEU
PopB = YRI
PopC = CHB
60/10HanYoruba11.012.612.214.6
PopA = CEU
PopB = CHB
PopC = YRI
60/10YorubaHan10.311.110.611.4
PopA = CHB
PopB = YRI
PopC = CEU
60/10FrenchYoruba10.413.412.418
(C) Using reference populations PopC and ‘admixed’ individuals with ancestry from PopA and PopB (30% PopA/70% PopB) ancestry
PopA = CEU
PopB = YRI
PopC = CHB
100/10HanAdmixed
(30% CEU/
70% YRI)
10.110.910.810.5
PopA = CEU
PopB = CHB
PopC = YRI
100/10YorubaAdmixed
(30% CEU/
70% CHB)
10.19.510.010
PopA = CHB
PopB = YRI
PopC = CEU
100/10FrenchAdmixed
(30% CHB/
70% YRI)
10.011.210.910.8
PopA = CEU
PopB = YRI
PopC = CHB
60/10HanAdmixed
(30% CEU/
70% YRI)
1111.310.911.8
PopA = CEU
PopB = CHB
PopC = YRI
60/10YorubaAdmixed
(30% CEU/
70% CHB)
10.310.810.310.5
PopA = CHB
PopB = YRI
PopC = CEU
60/10FrenchAdmixed
(30% CHB/
70% YRI)
10.211.310.612.1
(D) Using reference populations PopC and pooled individuals of PopA and PopB ancestry
PopA = CEU
PopB = YRI
PopC = CHB
100/10HanFrench + Yoruba10.110.9910.49.5
PopA = CEU
PopB = CHB
PopC = YRI
100/10YorubaFrench + Han10.29.510.039.9
PopA = CHB
PopB = YRI
PopC = CEU
100/10FrenchHan + Yoruba9.910.410.610
PopA = CEU
PopB = YRI
PopC = CHB
60/10HanFrench + Yoruba10.910.810.310.3
PopA = CEU
PopB = CHB
PopC = YRI
60/10YorubaFrench + Han10.310.710.110.5
PopA = CHB
PopB = YRI
PopC = CEU
60/10FrenchHan + Yoruba10.010.39.811
  1. Note: the estimated dates are shown per scenario are averages of 10 simulations.

Appendix 2—table 2
Impact of continuous gene flow.

The table shows true and inferred times of admixture in PopC using PopA and PopB used as the reference populations.

The true period of continuous admixture, λ generationsInferred time of admixture (mean ±1 SE) is shown on Y-axis generations
10–1515±1
20–3023±2
40–6053±3
40–10064±4
Appendix 2—table 3
Admixture time estimates from DATES (Distribution of Ancestry Tracts of Evolutionary Signals) for populations with extreme bottlenecks with a historically low population size that does not recover until the present.
AdmixtureNe before admixtureNe post-admixtureInferred time of admixture
10012,500400096±5
350096±5
300088±4
250099±5
200092±6
150078±6
100086±6
50054±9
10042±10

Additional files

Supplementary file 1

Data and admixture dates inferred using DATES (Distribution of Ancestry Tracts of Evolutionary Signals) for European groups during the Holocene (Excel sheet).

(A) Information on ancient samples used in our study. (B) Estimated dates of admixture for population mixture events during the European Holocene.

https://cdn.elifesciences.org/articles/77625/elife-77625-supp1-v2.xlsx
Supplementary file 2

Formal tests of admixture for populations in Europe using qpAdm and D-statistics with default parameters in ADMIXTOOLS (Excel sheet).

(A) Modeling population admixture of hunter-gatherer (HG) groups using qpAdm in ADMIXTOOLS. (B) D-statistics to assess the affinity of Mesolithic HG groups to western hunter-gatherers (WHGs) or Anatolian farmers. (C) Modeling population admixture of Near Eastern farmers using qpAdm in ADMIXTOOLS. (D) Modeling population admixture of Neolithic European groups using qpAdm in ADMIXTOOLS. (E) Modeling population admixture of Neolithic European groups per individual using qpAdm in ADMIXTOOLS. (F) D-statistics to explore the affinity of the target groups to Steppe pastoralists or Anatolian farmers. (G) Modeling population admixture of Early Steppe pastoralists groups using qpAdm in ADMIXTOOLS. (H) Genetic distance (FST) in early Steppe pastoralists groups. (I) Modeling population admixture of Bronze Age groups using qpAdm in ADMIXTOOLS. (J) Modeling population admixture of Bronze Age groups per individual using qpAdm in ADMIXTOOLS. (K) Modeling population admixture of middle to late Bronze Age (MLBA) Steppe pastoralists groups. Age groups using qpAdm in ADMIXTOOLS.

https://cdn.elifesciences.org/articles/77625/elife-77625-supp2-v2.xlsx
Transparent reporting form
https://cdn.elifesciences.org/articles/77625/elife-77625-transrepform1-v2.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Manjusha Chintalapati
  2. Nick Patterson
  3. Priya Moorjani
(2022)
The spatiotemporal patterns of major human admixture events during the European Holocene
eLife 11:e77625.
https://doi.org/10.7554/eLife.77625