Research Article

Admixture into and within sub-Saharan Africa

Wellcome Trust Centre for Human Genetics, United Kingdom
Wellcome Trust Sanger Institute, United Kingdom
Medical Research Council Unit, The Gambia
Royal Victoria Teaching Hospital, The Gambia
Centre National de Recherche et de Formation sur le Paludisme, Burkina Faso
University of Rome La Sapienza, Italy
Navrongo Health Research Centre, Ghana
Komfo Anokye Teaching Hospital, Ghana
University of Buea, Cameroon
KEMRI-Wellcome Trust Research Programme, Kenya
Kilimanjaro Christian Medical College, Tanzania
London School of Hygiene and Tropical Medicine, United Kingdom
College of Medicine, University of Malawi, Malawi
University of Bamako, Mali

Jun 21, 2016

Open access
Copyright information

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Similarity between two individuals in the combination of genetic markers along their chromosomes indicates shared ancestry and can be used to identify historical connections between different population groups due to admixture. We use a genome-wide, haplotype-based, analysis to characterise the structure of genetic diversity and gene-flow in a collection of 48 sub-Saharan African groups. We show that coastal populations experienced an influx of Eurasian haplotypes over the last 7000 years, and that Eastern and Southern Niger-Congo speaking groups share ancestry with Central West Africans as a result of recent population expansions. In fact, most sub-Saharan populations share ancestry with groups from outside of their current geographic region as a result of gene-flow within the last 4000 years. Our in-depth analysis provides insight into haplotype sharing across different ethno-linguistic groups and the recent movement of alleles into new environments, both of which are relevant to studies of genetic epidemiology.

https://doi.org/10.7554/eLife.15266.001

eLife digest

Our genomes contain a record of historical events. This is because when groups of people are separated for generations, the DNA sequence in the two groups’ genomes will change in different ways. Looking at the differences in the genomes of people from the same population can help researchers to understand and reconstruct the historical interactions that brought their ancestors together. The mixing of two populations that were previously separate is known as admixture.

Africa as a continent has few written records of its history. This means that it is somewhat unknown which important movements of people in the past generated the populations found in modern-day Africa. Busby et al. have now attempted to use DNA to look into this and reconstruct the last 4000 years of genetic history in African populations.

As has been shown in other regions of the world, the new analysis showed that all African populations are the result of historical admixture events. However, Busby et al. could characterize these events to unprecedented level of detail. For example, multiple ethnic groups from The Gambia and Mali all show signs of sharing the same set of ancestors from West Africa, Europe and Asia who mixed around 2000 years ago. Evidence of a migration of people from Central West Africa, known as the Bantu expansion, could also be detected, and was shown to carry genes to the south and east. An important next step will be to now look at the consequences of the observed gene-flow, and ask if it has contributed to spreading beneficial, or detrimental, mutations around Africa.

https://doi.org/10.7554/eLife.15266.002

Introduction

Advances in DNA analysis technology and the drive to understand the genetic basis of human phenotypes has led to a rapid growth in the amount of genomic data that is available for analysis. Whilst tens of thousands of genetic variants have been associated with different diseases in populations of European descent (Welter et al., 2014), less progress has been made in studies of important diseases in Africa (Need and Goldstein, 2009). Several consortia are beginning to focus on understanding the genetic basis of infectious and non-communicable disease specifically in Africa (Malaria Genomic Epidemiology Network, 2008; 2015; H3Africa Consortium, 2014; Gurdasani et al., 2014), and a number of recent studies have described patterns of genetic variation across the continent (Campbell and Tishkoff, 2008; Tishkoff et al., 2009; Gurdasani et al., 2014). Analyses of the structure of genetic variation are important in the design, analysis, and interpretation of genetic epidemiology studies – which aim to uncover novel relationships between genes, the environment, and disease (Malaria Genomic Epidemiology Network, 2015) – and provide an opportunity to relate patterns of association to historical connections between different human populations.

Admixture occurs when genetically differentiated ancestral groups come together and mix, a process which is increasingly regarded as a common feature of human populations across the globe (Patterson et al., 2012; Hellenthal et al., 2014; Busby et al., 2015). Genome-wide analyses of African populations are refining previous models of the continent’s history and its impact on genetic diversity. One insight is the identification of clear, but complex, evidence for the movement of Eurasian ancestry back into the continent as a result of admixture over a variety of timescales (Pagani et al., 2012; Pickrell et al., 2014; Gurdasani et al., 2014; Hodgson et al., 2014a; Llorente et al., 2015). On a broad sample of 18 ethnic groups from eight countries, the African Genome Variation Project (AGVP) (Gurdasani et al., 2014) recreated a previous analysis to identify recent Eurasian admixture, within the last 1.5 thousand years (ky), in the Fulani of West Africa (Tishkoff et al., 2009; Henn et al., 2012) and several East African groups from Kenya; older Eurasian ancestry (2–5 ky) in Ethiopian groups, consistent with previous studies of similar populations (Pagani et al., 2012; Pickrell et al., 2014); and a novel signal of ancient ( $>$ 7.5 ky) Eurasian admixture in the Yoruba of Central West Africa (Gurdasani et al., 2014). Comparisons of contemporary sub-Saharan African populations with the first ancient genome from within Africa, a 4.5 ky Ethiopian individual (Llorente et al., 2015), provide additional support for limited migration of Eurasian ancestry back into East Africa within the last 3000 years.

Within this timescale, the major demographic change within Africa was the transition from hunting and gathering to pastoralist and agricultural lifestyles (Diamond and Bellwood, 2003; Smith, 2005; Barham and Mitchell, 2008; Li et al., 2014). This shift was long and complex and occurred at different speeds, instigating contrasting interactions between the agriculturalist pioneers and the inhabitant people (Mitchell, 2002; Marks et al., 2014). The change was initialised by the spread of pastoralism (i.e. the raising and herding of livestock) across Africa and the subsequent movement east and south from Central West Africa of agricultural technology together with the branch of Niger-Congo languages known as Bantu (Mitchell, 2002; Barham and Mitchell, 2008). The extent to which this cultural expansion was accompanied by people is an active research question, but an increasing number of molecular studies indicate that the expansion of languages was accompanied by the diffusion of people (Beleza et al., 2005; Berniell-Lee et al., 2009; Tishkoff et al., 2009; Pakendorf et al., 2011; de Filippo et al., 2012; Ansari Pour et al., 2013; Li et al., 2014; González-Santos et al., 2015).

The AGVP also found evidence of widespread hunter-gatherer ancestry in African populations, including ancient (9 ky) Khoesan ancestry in the Igbo from Nigeria, and more recent hunter-gatherer ancestry in eastern (2.5–4.5 ky) and southern (0.9–4 ky) African populations (Gurdasani et al., 2014). The identification of hunter-gatherer ancestry in non-hunter-gatherer populations together with the timing of these latter events is consistent with the known expansion of Bantu languages across Africa within the last 3 ky (Mitchell, 2002; Diamond and Bellwood, 2003; Smith, 2005; Barham and Mitchell, 2008; Marks et al., 2014; Li et al., 2014). These studies have described the novel and important influence of both Eurasian and hunter-gatherer ancestry on the population genetic history of sub-Saharan Africa and provide an important description of the movement of alleles and haplotypes into and within the continent, but questions remain of the extent and timing of key events, and their impact on contemporary populations.

Here we analyse genome-wide data from 12 Eurasian and 46 sub-Saharan African groups. Half (23) of the African groups represent subsets of samples collected from nine countries as part of the MalariaGEN consortium. Details on the recruitment of samples in relation to studying malaria genetics are published elsewhere (Malaria Genomic Epidemiology Network, 2014; 2015). The remaining 23 groups are from publicly available datasets from a further eight sub-Saharan African countries (Pagani et al., 2012; Schlebusch et al., 2012; Petersen et al., 2013) and the 1000 Genomes Project (1KGP), with Eurasian groups from the latter included to help understand the genetic contribution from outside of the continent (Figure 1—figure supplement 1). With the exception of Austronesian in Madagascar, African languages can be broadly classified into four major macro-families: Afroasiatic, Nilo-Saharan, Niger-Congo, and Khoesan (Blench, 2006); and although we have representative groups from each (Supplementary file 1), our sample represents a significant proportion of the sub-Saharan population in terms of number, but not does not equate to a complete picture of African ethnic diversity. We created an integrated dataset of genotypes at 328,000 high-quality SNPs and use established approaches for comparing population allele frequencies across groups to provide a baseline view of historical gene-flow. We then apply statistical approaches to phasing genotypes to obtain haplotypes for each individual, and use previously published methods to represent the haplotypes that an individual carries as a mosaic of other haplotypes in the sample (so-called chromosome painting [Li and Stephens, 2003]).

We present a detailed picture of haplotype sharing across sub-Saharan Africa using a model-based clustering approach that groups individuals using haplotype information alone. The inferred groups reflect broad-scale geographic patterns. At finer scales, our analysis reveals smaller groups, and often differentiates closely related populations consistent with self-reported ancestry (Tishkoff et al., 2009; Bryc et al., 2010; Hodgson et al., 2014a). We describe these patterns by measuring gene-flow between populations and relate them to potential historical movements of people into and within sub-Saharan Africa. Understanding the extent to which individuals share haplotypes (which we call coancestry), rather than independent markers, can provide a rich description of ancestral relationships and population history (Lawson et al., 2012; Leslie et al., 2015). For each group we use the latest analytical tools to characterise the populations as mixtures of haplotypes and provide estimates for the date of admixture events (Lawson et al., 2012; Hellenthal et al., 2014; Leslie et al., 2015; Montinaro et al., 2015). As well as providing a quantitative measure of the coancestry between groups, we identify the dominant events which have shaped current genetic diversity in sub-Saharan Africa. We close by discussing the relevance of these observations to studying genotype-phenotype associations in Africa.

Results

Broad-scale population structure reflects geography and language

Throughout this article we use shorthand current-day geographical and ethno-linguistic labels to describe ancestry. For example we write “Eurasian ancestry in East African Niger-Congo speakers”, where the more precise definition would be “ancestry originating from groups currently living in Eurasia in groups currently living in East Africa that speak Niger-Congo languages” (Pickrell et al., 2014). We also stress that the use of Khoesan in the current setting refers to groups with shared linguistic characteristics which does not necessarily imply shared close genealogical relationships (Güldemann and Fehn, 2014). Our combined dataset included 3283 individuals from 46 sub-Saharan different African ethnic groups and 12 non-African populations (Figure 1A and Figure 1—figure supplement 1). An initial fineSTRUCTURE analysis (outlined below and in Figure 1—figure supplement 2 and Figure 1—figure supplement 3) demonstrated sub-structure in two of the African ethnic groups, the Fula and Mandinka, so we split both of these populations into two groups, giving a final set of 48 African groups for all analyses.

Figure 1 with 3 supplements see all

Download asset Open asset

Sub-Saharan African genetic variation is shaped by ethno-linguistic and geographical similarity.

(A) the origin of the 46 African ethnic groups used in the analysis; ethnic groups from the same country are given the same colour, but different shapes; the legend describes the identity of each point. Figure 1—figure supplement 1 and Figure 1—source data 1 provide further detail on the provenance of these samples. (B) PCA shows that the first major axis of variation in Africa (PC1, y-axis) splits southern groups from the rest of Africa, each symbol represents an individual; PC2 (x-axis) reflects ethno-linguistic differences, with Niger-Congo speakers split from Afroasiatic and Nilo-Saharan speakers. Tick marks here and in (C) show the scale. (C) The third principle component (PC3, x-axis) represents geographical separation of Niger-Congo speakers, forming a cline from west to east Africans (D) results of the fineSTRUCTURE clustering analysis using copying vectors generated from chromosome painting; each row of the heatmap is a recipient copying vector showing the number of chunks shared between the recipient and every individual as a donor (columns);the tree clusters individuals with similar copying vectors together, such that block-like patterns are observed on the heat map; darker colours on the heatmap represent more haplotype sharing (see text for details); individual tips of the tree are coloured by country of origin, and the seven ancestry regions are identified and labelled to the left of the tree; labels in parentheses describe the major linguistic type of the ethnic groups within: AA = Afroasiatic, KS = Khoesan, NC = Niger-Congo, NS = Nilo-Saharan.

https://doi.org/10.7554/eLife.15266.003

Figure 1—source data 1 Overview of sampled populations describing the continent, region, numbers of individuals used, and the source of any previously published datasets.: https://doi.org/10.7554/eLife.15266.004
Download elife-15266-fig1-data1-v1.xlsx

As an initial description of the genetic structure of the samples we applied principal component analysis to the genotype data (Patterson et al., 2006). As in other regions of the world (Novembre et al., 2008; Behar et al., 2010), the leading principal components show that genetic relationships are broadly defined by geographical and ethno-linguistic similarity (Figure 1B,C). The first two principal components (PCs) reflect ethno-linguistic divides: PC1 splits southern Khoesan speaking populations from the rest of Africa, and PC2 splits the East African Afroasiatic and Nilo-Saharan speakers from sub-Saharan African Niger-Congo speakers. The third axis of variation defines east versus west Africa, suggesting that in general, population structure in Africa largely mirrors linguistic and geographic similarity (Tishkoff et al., 2009).

To access the information from the combination of markers along chromosomes we phased the genotype data into haplotypes, and applied a previously published implementation of chromosome painting (CHROMOPAINTER [Lawson et al., 2012]), to estimate the amount of an individual’s genome that is shared with each other individual in the data. More specifically, we paint each recipient individual’s genome as a mosaic of haplotype segments (chunks) copied from each other donor individual, and summarise these as copying vectors. We used the clustering algorithm implemented in fineSTRUCTURE (Lawson et al., 2012) to group individuals purely on the similarity of these copying vectors (Figure 1 and Figure 1—figure supplement 3). The pairwise coancestry between individuals can be visualised as a heatmap with each row being the copying vector for each sample (Figure 1D), and these are clustered hierarchically to form a tree which describes the inferred relationship between different groups (Figure 1—figure supplement 3).

The fineSTRUCTURE analysis identified 154 clusters of individuals, grouped on the basis of copying vector similarity (Figure 1—figure supplement 3). Some ethnic groups, such as the Yoruba, Mossi, Jola and Ju/’hoansi form clusters containing only individuals from their own ethnic group. In other populations, most notably from The Gambia and Kenya, individuals from several different ethnic groups cluster together. These are the two countries where the most ethnic groups were sampled, seven and four respectively, and differential sampling could partly explain this observation. Consistent with PCA, the fineSTRUCTURE analysis indicates that African populations tend to share more DNA with geographically proximate populations (dark colours on the diagonal; Figure 1D). Block-like structures on the diagonal indicate higher levels of haplotype sharing, as measured by the number of chunks copied, within groups. These patterns are strongest in a subset of the Khoesan speaking individuals (eg. the Ju/’hoansi), several groups from the East Africa (Sudanese, Ari, and Somali groups), and the Fulani and Jola from The Gambia.

Using the results of the PCA and fineSTRUCTURE analyses together with ethno-linguistic classifications and geography, we defined seven groups of populations within Africa (Supplementary file 1), which we refer to as ancestry regions (shown on the left of Figure 1D) when describing gene-flow across Africa. From this perspective, the heatmap also shows evidence for coancestry across the continent (more chunks copied away from the diagonal), which is indicative of historical connections between modern-day groups. For example, east Africans from Kenya, Malawi and Tanzania tend to share more DNA with west Africans (lower right) than vice versa (upper left), which suggests that more haplotypes may have spread from west to east Africa. These patterns of coancestry provide evidence of widespread sharing of haplotypes within and between ancestry regions.

Haplotypes reveal subtle population structure

To quantify population structure, we used two metrics to measure the difference between each of the 48 African and 12 Eurasian groups. First, we used the classical measure $F_{S T}$ (Hudson et al., 1992; Bhatia et al., 2013) which measures the differentiation in SNP allele frequencies between two groups. The second metric uses the difference in copying vectors between two groups to estimate the total variation distance (TVD) (Leslie et al., 2015) at the haplotypic level which provides an alternative measure of differentiation based on combinations of alleles at SNPs along chromosomes. Figure 2A shows these two metrics side by side in the upper and lower diagonal. When compared to the level of differentiation between Eurasian and African populations, $F_{S T}$ measured at our integrated set of SNPs is relatively low between many groups from West, Central, and East Africa (yellows on the upper right triangle). In contrast, TVD between the same populations highlights haplotypic differences within Africa which are as strong as between Europe and Asia (pink and purples in lower left triangle). Whilst pairwise TVD tends to increase with pairwise $F_{S T}$ the relationship is neither perfect (Pearson’s correlation $R^{2}$ = 0.79) nor linear (Figure 2B). For example, the Chonyi from Kenya have relatively low $F_{S T}$ but high TVD with West African groups, like the Jola (Chonyi-Jola $F_{S T}$ = 0.019; Chonyi-Jola TVD = 0.803) showing that, whilst allele frequency differences between the two populations are relatively low, when we compare the populations’ copying vectors, the haplotypic differences are some of the strongest between sub-Saharan groups.

Figure 2 with 1 supplement see all

Download asset Open asset

Haplotypes capture more population structure than independent loci.

(A) For each population pair, we estimated pairwise $F_{S T}$ (upper right triangle) using 328,000 independent SNPs, and $T V D$ (lower left triangle) using population averaged copying vectors from CHROMOPAINTER. $T V D$ measures the difference between two copying vectors. (B) Comparison of pairwise $F_{S T}$ and $T V D$ shows that they are not linearly related: some population pairs have low $F_{S T}$ and high $T V D$ . (Source data is detailed in Figure 2—source data 2 to Figure 2—source data 1).

https://doi.org/10.7554/eLife.15266.008

Figure 2—source data 1 Pairwise $T V D$ for Eurasian populations. $T V D$ has been multiplied by 1000.: https://doi.org/10.7554/eLife.15266.009
Download elife-15266-fig2-data1-v1.xlsx
Figure 2—source data 2 Pairwise $F_{S T}$ for Eurasian populations. We used smartpca to compute $F_{S T}$ for each pair of populations, upper right diagonal, together with standard errors computed using a block jacknife. $F_{S T}$ has been multiplied by 1000.: https://doi.org/10.7554/eLife.15266.010
Download elife-15266-fig2-data2-v1.xlsx
Figure 2—source data 3 Pairwise $F_{S T}$ for African populations. We used smartpca to compute $F_{S T}$ for each pair of populations, upper right diagonal, together with standard errors computed using a block jacknife. $F_{S T}$ has been multiplied by 1000.: https://doi.org/10.7554/eLife.15266.011
Download elife-15266-fig2-data3-v1.xlsx
Figure 2—source data 4 Pairwise $T V D$ for African populations. $T V D$ has been multiplied by 1000.: https://doi.org/10.7554/eLife.15266.012
Download elife-15266-fig2-data4-v1.xlsx

In Figure 2—figure supplement 1 we show a comparison of PCA, based on genotype data, and fineSTRUCTURE, which uses haplotypes, from a subset of individuals from the Central West African Niger-Congo ancestry region (from Nigeria, Ghana, and Burkina Faso). Whilst we observe some, limited, population structure with PCA, when we look at the copying vectors, we can see the subtle differences in copying that cause fineSTRUCTURE to separate the five ethnic groups into clusters containing only other individuals from their own ethnic group of individuals. The exception to this are the Namkam and Kasem, who are very genetically similar (pairwise $F_{S T}$ of $<$ 0.001) and are merged into a single group. So, consistent with results in European populations (Leslie et al., 2015; Busby et al., 2015), chromosome painting analyses of African groups can reveal subtle population structure that is hard to detect using approaches based on genotypes alone (for example PCA and $F_{S T}$ ). Taken together, these observations motivate using haplotype-based approaches to characterise population relationships, in addition to those which consider allele frequencies on their own.

Allele frequency differences show widespread evidence for admixture

As argued above, a full analysis of admixture best leverages haplotype structure, and we return to this below. To gain an initial understanding of admixture, we applied previously published approaches which analyse the correlations in allele frequencies within and between populations (Pickrell et al., 2014; Gurdasani et al., 2014). The first approach, the three-population test ( $f_{3}$ statistic [Reich et al., 2009]), estimates the proportion of shared genetic drift between a target population and two potential source populations to identify significant departures from the null model of no admixture. Negative values are indicative of canonical admixture events where the allele frequencies in the target population are intermediate between the two source populations. Consistent with recent research (Pickrell et al., 2014; Pickrell and Reich, 2014; Gurdasani et al., 2014; Llorente et al., 2015), the majority (83%, 40/48), but not all, of the African groups surveyed showed evidence of admixture ( $f_{3}$ $<$ -5). (Supplementary file 2). We do not infer admixture using this statistic in the Jola, Mossi, Kasem, Namkam, Yoruba, Sudanese, Gumuz, and Ju/’hoansi. In most other groups the most significant $f_{3}$ statistic includes either the Ju/’hoansi or a 1KGP European source (GBR, CEU, FIN, or TSI). Niger-Congo speaking groups from Central West and Southern Africa tend to show most significant statistics involving the Ju/’hoansi, whereas West and East African and Southern Khoesan speaking groups tended to show most significant statistics involving European sources, consistent with an recent analysis on a similar (albeit smaller) set of African populations (Gurdasani et al., 2014).

The second approach, ALDER (Loh et al., 2013; Pickrell et al., 2014) (Supplementary file 2) exploits the fact that correlations between allele frequencies along the genome decay over time as a result of recombination. Linkage disequilibrium (LD) can be generated by admixture events, and leaves detectable signals in the genome that can be used to infer historical processes (Loh et al., 2013). Following Pickrell et al. (2014) and the AGVP (Gurdasani et al., 2014), we computed weighted admixture LD curves using the ALDER (Loh et al., 2013) package and the HAPMAP recombination map to characterise the sources and timing of gene-flow events. Specifically, we estimated the y-axis intercept (amplitude) of weighted LD curves for each target population using curves from an analysis where one of the sources was the target population (self reference) and the other was, separately, each of the other (non-self reference) populations. Theory predicts that the amplitude of these 'one-reference' curves becomes larger the more similar the non-self reference population is to the true admixing source (Loh et al., 2013). As with the $f_{3}$ analysis outlined above, for many of the sub-Saharan African populations, Eurasian and hunter-gatherer groups (such as the Ju/’hoansi) produced the largest amplitudes (Figure 3—figure supplement 1 and Figure 3—figure supplement 2), reinforcing the contribution of these ancestries to our broad set of African populations.

We investigated the evidence for more complex admixture using MALDER (Pickrell et al., 2014), an implementation of ALDER which fits a mixture of exponentials to weighted LD curves to infer multiple admixture events (Figure 3 and Figure 3—source data 1). In Figure 3A, for each target population, we show the ancestry region of the two populations with the greatest MALDER curve amplitudes, together with the date of admixture, for at most two events. Throughout, we convert time since admixture in generations to a date by assuming a generation time of 29 years (Fenner, 2005). We note that the inferred admixture dates indicate when gene-flow occurred between populations and not the arrival of groups into an area, which may often be several generations earlier.

Figure 3 with 7 supplements see all

Download asset Open asset

Inference of admixture in sub-Saharan Africa using MALDER.

We used MALDER to identify the evidence for multiple waves of admixture in each population. (A) For each population, we show the ancestry region identity of the two populations involved in generating the MALDER curves with the greatest amplitudes (coloured blocks) for at most two events. The major contributing sources are highlighted with a black box. Populations are ordered by ancestry of the admixture sources and dates estimates which are shown $\pm$ 1.96 $\times$ s.e. For each event we compared the MALDER curves with the greatest amplitude to other curves involving populations from different ancestry regions. In the central panel, for each source, we highlight the ancestry regions providing curves that are not significantly different from the best curves. In the Jola, for example, this analysis shows that, although the curve with the greatest amplitude is given by Khoesan (green) and Eurasian (dark yellow) populations, curves containing populations from any other African group (apart from Afroasiatic) in place of a Khoesan population are not significantly smaller than this best curve (SOURCE 1). Conversely, when comparing curves where a Eurasian population is substituted with a population from another group, all curve amplitudes are significantly smaller ( $Z < 2$ ). (B) Comparison of dates of admixture $\pm$ 1.96 $\times$ s.e. for MALDER dates inferred using the HAPMAP recombination map and a recombination map inferred from European (CEU) individuals from Hinch et al. (2011). We only show comparisons for dates where the same number of events were inferred using both methods. Point symbols refer to populations and are as in Figure 1. (C) as (B) but comparison uses an African (YRI) map. Source data can be found in Figure 3—source data 1.

https://doi.org/10.7554/eLife.15266.014

Figure 3—source data 1 The evidence for multiple waves of admixture in African populations using MALDER and the HAPMAP recombination map. For each event in each ethnic group we show the largest inferred amplitude and date of an admixture event involving two reference populations (Pop1 and Pop2). We additionally provide the ancestry region identity of the two main reference populations, together with $Z$ scores for curve comparisons between this best curve and those containing populations from different ancestry regions. We use a cut-off of $Z <$ 2 to decide whether sources from multiple ancestries best describe the admixture source.: https://doi.org/10.7554/eLife.15266.015
Download elife-15266-fig3-data1-v1.xlsx
Figure 3—source data 2 The evidence for multiple waves of admixture in African populations using MALDER and the African recombination map. Columns as in Figure 3—source data 1.: https://doi.org/10.7554/eLife.15266.016
Download elife-15266-fig3-data2-v1.xlsx
Figure 3—source data 3 The evidence for multiple waves of admixture in African populations using MALDER and the European recombination map. Columns as in Figure 3—source data 1.: https://doi.org/10.7554/eLife.15266.017
Download elife-15266-fig3-data3-v1.xlsx
Figure 3—source data 4 The evidence for multiple waves of admixture in African populations using MALDER and the HAPMAP recombination map and a mindis of 0.5cM. Columns are as in Figure 3—source data 1. Here we show the results for the MALDER analysis where we over-ride any short-range LD and define a minimum distance of 0.5cM from which to start computing admixture LD curves: https://doi.org/10.7554/eLife.15266.018
Download elife-15266-fig3-data4-v1.xlsx

In general, we find that groups from similar ancestry regions tend to have inferred admixture events at similar times and involving similar sources (Figure 3), which suggests that genetic variation has been shaped by shared historical events. For every event, the curves with the greatest amplitudes involved a population from a (usually non-Khoesan) African ancestry region on one side, and either a Eurasian or Khoesan population on the other. To provide more detail on the composition of the admixture sources, we compared MALDER curve amplitudes using source groups from different ancestry regions (central panel Figure 3A). In general, we were unable to precisely define the ancestry of the African source of admixture, as curves involving populations from multiple different ancestry regions were not statistically different from each other ( $Z < 2$ ; SOURCE 1). Conversely, comparisons of MALDER curves when the second source of admixture was Eurasian (dark yellow) or Khoesan (green), showed that these groups were usually the single best surrogate for the second source of admixture (SOURCE 2).

MALDER uses as input a genetic map to model the expected decay in linkage disequilibirum. We observed a large amount of shared LD at short genetic distances between different African populations (Figure 3—figure supplement 3 and Figure 3—figure supplement 4). Such patterns may result from population genetic processes other than admixture, such as shared demographic history and population bottlenecks (Loh et al., 2013). In the main MALDER analysis we present, short-range LD is removed by computing curves at genetic distances $<$ 2cM where they are correlated between target and reference population. We provide supplementary analyses where this setting was over-ridden by allowing MALDER to start computing LD decay curves at short genetic distances (from 0.5cM), irrespective of any short-range correlations in LD between populations. The main difference between the two analyses is that we do not observe previously reported ancient admixture events in Central West African groups (Gurdasani et al., 2014) without allowing curves to be computed from 0.5cM. Interpretation of such results is therefore challenging.

Inference of older events relies on modelling the decay of LD over short genetic distances because recombination has had more time to break down correlations in allele frequencies between neighbouring SNPs. We investigated the effect of using European (CEU) and Central West African (YRI) specific recombination maps (Hinch et al., 2011) on the dating inference. Whilst dates inferred using the CEU map were consistent with those using the HAPMAP recombination map (Figure 3B), when using the African map dates were consistently older (Figure 3C), although still generally within the last 7ky. There was also variability in the number of inferred admixture events for some populations between the different map analyses (Figure 3—figure supplement 5 and Figure 3—figure supplement 6).

Many West African groups show evidence of admixture within the last 4 ky involving African and Eurasian sources. The Mossi from Burkina Faso have the oldest inferred date of admixture, at roughly 5000BCE. Across East Africa Niger-Congo speakers (orange) we infer admixture within the last 4 ky (and often within the last 1 ky) involving Eurasian sources on the one hand, and African sources containing ancestry from other Niger-Congo speaking African groups from the west, on the other. Despite events between African and Eurasian sources appearing older in the Nilo-Saharan and Afroasiatic speakers from East Africa, we see a similar signal of very recent Central West African ancestry in a number of Khoesan groups from Southern Africa, such as the Khwe and /Gui //Gana, together with Malawi-like (brown) sources of ancestry in recent admixture events in East African Niger-Congo speakers.

Most events involved sources where Eurasian (dark yellow in Figure 3A) groups gave the largest amplitudes. In considering this observation, it is important to note that the amplitude of LD curves will partly be determined by the extent to which a reference population has differentiated from the target. Due to the genetic drift associated with the out-of-Africa bottleneck and subsequent expansion, Eurasian groups will tend to generate the largest curve amplitudes even if the proportion of this ancestry in the true admixing source is small (Pickrell et al., 2014) (in our dataset, the mean pairwise $F_{S T}$ between Eurasian and African populations is 0.157; Figure 2A and Figure 2—source data 2). To some extent this also applies to Khoesan groups (green in Figure 3A), who are also relatively differentiated from other African groups (mean pairwise $F_{S T}$ between Ju/’hoansi and all other African populations in our dataset is 0.095; Figure 2A and Figure 2—source data 2). In light of this, and the observation that curves involving groups from different ancestry regions are often not different from each other, it is difficult to infer the proportion or nature of the African, Khoesan, or Eurasian admixing sources, only that the sources themselves contained African, Khoesan, or Eurasian ancestry. Moreover, given uncertainty in the dating of admixture when using different maps and MALDER parameters, these results should be taken as a guide to the general structure of genetics relationships between African groups, rather than a precise description of the gene-flow events.

Modelling gene-flow with haplotypes

Chromosome painting analysis provides an alternative approach to inferring admixture events which directly uses the similarity in haplotypes (combination of alleles) between pairs of individuals. Evidence of haplotype sharing suggests that the ancestors of two individuals must have been geographically proximal at some point in the past, and the distance over which haplotype sharing extends along chromosomes is inversely related to how far in the past coancestry events have occurred.

We can use copying vectors inferred through chromosome painting to help identify those populations that share ancestry with a recipient group by fitting each vector as a mixture of all other population vectors (Leslie et al., 2015; Montinaro et al., 2015; van Dorp et al., 2015). Figure 4A shows the contribution that each ancestry region makes to these mixtures (MIXTURE MODEL column). Almost all groups can best be described as mixtures of ancestry from different regions. For example, the copying vector of the Bantu ethnic group from Cameroon is best described as a combination of 40% Central West African Niger-Congo (sky blue), 30% Eastern Niger-Congo (orange), 25% Southern Niger-Congo (brown), and the remaining 5% coming from West African Niger-Congo (dark blue) and Khoesan-speaking (green) groups. The mixture model approach is useful for describing coancestry between populations which can result from both admixture and shared evolutionary history.

Figure 4 with 2 supplements see all

Download asset Open asset

Inference of admixture in sub-Saharan African using GLOBETROTTER.

(A) For each group we show the ancestry region identity of the best matching source for the first and, if applicable, second events. Events involving sources that most closely match FULAI and SEMI-BANTU are highlighted by golden and red colours, respectively. Second events can be either multiway, in which case there is a single date estimate, or two-date in which case 2ND EVENT refers to the earlier event. The point estimate of the admixture date is shown as a black point, with 95% CI shown with lines. MIXTURE MODEL: We infer the ancestry composition of each African group by fitting its copying vector as a mixture of all other population copying vectors. The coefficients of this regression sum to 1 and are coloured by ancestry region. 1ST EVENT SOURCES and 2ND EVENT SOURCES show the ancestry breakdown of the admixture sources inferred by GLOBETROTTER, coloured by ancestry region as in the key top right. (B) and (C) Comparisons of dates inferred by MALDER and GLOBETROTTER. Because the two methods sometimes inferred different numbers of events, in (B) we show the comparison based on the inferred number of events in the MALDER analysis, and in (C) for the number of events inferred by GLOBETROTTER. Point symbols refer to populations and are as in Figure 1 and source data can be found in Figure 4—source data 1.

https://doi.org/10.7554/eLife.15266.026

Figure 4—source data 1 Results of the main GLOBETROTTER analysis. $A n a l y s i s$ refers to whether the main or masked analysis was used to produce the final result. Admixture $P$ -values are based on 100 bootstrap replicates of the NULL procedure. Our resulting inference, $r e s$ can be: 1D (two admixing sources at a single date); 1MW (multiple admixing sources at a single date); 2D (admixture at multiple dates); NA (no-admixture); U (uncertain). max( $R_{1}$ ) refers to the $R^{2}$ goodness-of-fit for a single date of admixture, taking the maximum value across all inferred coancestry curves. $F Q_{1}$ is the fit of a single admixture event (i.e. the first principal component, reflecting admixture involving two sources) and $F Q_{2}$ is the fit of the first two principal components capturing the admixture event(s) (the second component might be thought of as capturing a second, less strongly-signalled event. $M$ is the additional $R^{2}$ explained by adding a second date versus assuming only a single date of admixture; we use values above 0.35 to infer multiple dates (although see Supplementary Text for details). As well as the final result, for each event we show the inferred dates, $α$ s and best matching sources for 1D, 1MW, and 2D inferences. Inferred dates are in years(+ 95% CI; B=BCE, otherwise CE); the proportion of admixture from the minority source (source 1) is represented by $α$ . Date confidence intervals are based on 100 bootstrap replicates of the date inference: https://doi.org/10.7554/eLife.15266.027
Download elife-15266-fig4-data1-v1.xlsx
Figure 4—source data 2 Results of the main GLOBETROTTER analysis. $A n a l y s i s$ refers to whether the main or masked analysis was used to produce the final result. Admixture $P$ -values are based on 100 bootstrap replicates of the NULL procedure. Our resulting inference, $r e s$ can be: 1D (two admixing sources at a single date); 1MW (multiple admixing sources at a single date); 2D (admixture at multiple dates); NA (no-admixture); U (uncertain). max( $R_{1}$ ) refers to the $R^{2}$ goodness-of-fit for a single date of admixture, taking the maximum value across all inferred coancestry curves. $F Q_{1}$ is the fit of a single admixture event (i.e. the first principal component, reflecting admixture involving two sources) and $F Q_{2}$ is the fit of the first two principal components capturing the admixture event(s) (the second component might be thought of as capturing a second, less strongly-signalled event. $M$ is the additional $R^{2}$ explained by adding a second date versus assuming only a single date of admixture; we use values above 0.35 to infer multiple dates (although see Supplementary Text for details). As well as the final result, for each event we show the inferred dates, $α$ s and best matching sources for 1D, 1MW, and 2D inferences. Inferred dates are in years(+ 95% CI; B=BCE, otherwise CE); the proportion of admixture from the minority source (source 1) is represented by $α$ . Date confidence intervals are based on 100 bootstrap replicates of the date inference: https://doi.org/10.7554/eLife.15266.028
Download elife-15266-fig4-data2-v1.xlsx

To explicitly test for and characterise admixture we applied GLOBETROTTER (Hellenthal et al., 2014) which is an extension of the mixture model approach described above. Admixture inference can be challenging for a number of reasons: the true admixing source population is often not well represented by a single sampled population; admixture could have occurred in several bursts, or over a sustained period of time; and multiple groups may have come together as complex convolution of admixture events. GLOBETROTTER aims to overcome some of these challenges, in part by using painted chromosomes to explicitly model the correlation structure among nearby SNPs, but also by allowing the sources of admixture themselves to be mixed (Hellenthal et al., 2014). In addition, the approach has been shown to be relatively insensitive to the genetic map used (Hellenthal et al., 2014), and therefore potentially provides a more robust inference of admixture events, the ancestries involved, and their dates. GLOBETROTTER uses the recombination distance between chromosomal chunks of the same ancestry to infer the time since historical admixture has occurred.

Throughout we refer to target populations as recipients, any other sampled populations used to describe the recipient population’s admixture event(s) as surrogates, and populations used to paint both recipient and surrogate populations as donors. Including closely related individuals in chromosome painting analyses can cause the resulting painted chromosomes to be dominated by donors from these close genealogical relationships, which can mask signals of admixture in the genome (Hellenthal et al., 2014; van Dorp et al., 2015). To help ameliorate this, we painted chromosomes for the GLOBETROTTER analysis by using CHROMOPAINTER to paint each individual from a recipient group with the set of donors which did not include individuals from within their own ancestry region. We additionally painted all (59) other surrogate populations with the same set of non-local donors, and used these copying vectors, together with the non-local painted chromosomes, to infer admixture. Using this approach, we found evidence of recent admixture in all African populations (Figure 4A). To summarise these events, we show the composition of the admixing source groups as barplots for each population coloured by the contribution from each African ancestry region and Eurasia, alongside the inferred date (with confidence interval determined by bootstrapping) and the estimated proportion of admixture (Figure 4). For each event we also identify the best matching donor population to the admixture sources.

Direct and indirect gene-flow from Eurasia back into Africa

Both MALDER and GLOBETROTTER analyses identified Eurasian gene-flow in many but not all African populations (Figure 4). In several groups from South Africa, and all from Central West Africa (Ghana, Nigeria, and Cameroon), we infer admixture between groups that are best represented by contemporary populations residing in Africa. As GLOBETROTTER is designed to identify the most recent admixture event(s) (Hellenthal et al., 2014), this observation does not rule out gene-flow from Eurasia back into these groups, but does suggest that subsequent movements between African groups were more important in generating current genetic diversity in these groups. We also do not observe Eurasian ancestry in all East African Niger-Congo speakers, instead finding more evidence for coancestry with Afroasiatic speaking groups. As we show later, Afroasiatic populations have a significant amount of ancestry from outside of Africa, so the observation of this ancestry in several African groups identifies a route by which Eurasian ancestry may have indirectly entered the continent (Pickrell et al., 2014).

Characterising admixture sources as mixtures allows GLOBETROTTER to infer whether Eurasian haplotypes are likely to have come directly into sub-Saharan Africa – in which case the admixture source will contain only Eurasian surrogates – or whether Eurasian haplotypes were brought indirectly together with sub-Saharan groups. In West African Niger-Congo speakers from The Gambia and Mali, we infer admixture involving minor admixture sources which contain mostly Eurasian (dark yellow) and Central West African (sky blue) ancestry, which most closely match the contemporary copying vectors of northern European populations (CEU and GBR) or the Fulani (FULAI, highlighted in gold in Figure 4A). The Fulani, a nomadic pastoralist group found across West Africa, were sampled in The Gambia, at the very western edge of their current range, and have previously reported genetic affinities with Niger-Congo speaking, Sudanic, Saharan, and Eurasian populations (Tishkoff et al., 2009; Henn et al., 2012), consistent with the results of our mixture model analysis (Figure 4A). Admixture in the Fulani differs from other populations from this region, with sources containing greater amounts of Eurasian and Afroasiatic ancestry, but appears to have occurred during roughly the same period (c. 0CE; Figure 5).

Figure 5

Download asset Open asset

A timeline of recent admixture in sub-Saharan Africa.

For all events involving recipient groups from each ancestry region (columns) we combine all date bootstrap estimates generated by GLOBETROTTER and show the densities of these dates separately for the minor (above line) and major (below line) sources of admixture. Dates are additionally stratified by the ancestry region of the surrogate populations (rows), with all dates involving Niger Congo speaking regions combined together (All Niger Congo). Within each panel, the densities are coloured by the ancestry region origin of the surrogates, and in proportion to the components of admixture involved in the admixture event. The integrals of the densities are proportional to the admixture proportions of the events contributing to them.

https://doi.org/10.7554/eLife.15266.031

The Fulani represent the best-matching surrogate to the minor source of recent admixture in the Jola and Manjago, which we interpret as resulting not from specific admixture from them into these groups, but because the mix of African and Eurasian ancestries in contemporary Fulani is the best proxy for the minor sources of admixture in this region. With the exception of the Fulani themselves, the major admixture source in groups across this region is a similar mixture of African ancestries that most closely matches contemporary Gambian and Malian surrogates (Jola, Serere, Serehule, and Malinke), suggesting ancestry from a common West African group within the last 3000 years. The Ghana Empire flourished in West Africa between 300 and 1200CE, and is one of the earliest recorded African states (Roberts, 2007). Whilst its origins are uncertain, it is clear that trade in gold, salt, and slaves across the Sahara, perhaps from as early as the Roman Period, as well as evolving agricultural technologies, were the driving forces behind its development (Oliver and Fagan, 1975; Roberts, 2007). It is possible these interactions through North Africa, catalysed by trade across the Sahara, allowed gene-flow from Europe and North Africa back into West Africa.

We infer more direct admixture from Eurasian sources in two populations from Kenya, where specifically South Asian populations (GIH, KHV) are the most closely matched surrogates to the minor sources of admixture (Figure 5). Interestingly, the Chonyi (1138CE: 1080-1182CE) and Kauma (1225CE: 1167-1254CE) are located on the Kenyan Swahili Coast, a region where Medieval trade across the Indian Ocean is historically documented (Allen, 1993), which might explain this Asian admixture. Alternatively, Blench (2010) notes that the expansion of Arab shipping down the east Coast of Africa in the 10th Century CE masked the Austronesian (i.e. Oceania and Asia) influence of the resident coastal culture. The implication is that Austronesians, who are known to have contributed genes to Madagascan populations (Tofanelli et al., 2009), may also have been in East Africa at about this time. Further work on these groups will help to understand whether the events we observed in the Chonyi and Kauma represent the first evidence of an Austronesian impact in mainland Africa.

In the Kambe, the third group from coastal Kenya, we infer two events, the more recent one involving local groups, and the earlier event involving a European-like source (GBR, 761CE: 461BCE-1053CE). In Tanzanian groups from the same ancestry region, we infer admixture during the same period, this time involving minor admixture sources with Afroasiatic ancestry: in the Giriama (1196CE: 1138-1254CE), Wasambaa (1312CE: 1254-1341CE), and Mzigua (1080: 1007-1138CE). Although the proportions of admixture from these minor sources differ, the major sources of admixture in East African Niger-Congo speakers are similar, containing a mix of Southern Niger-Congo (Malawi), Central West African, Afroasiatic, and Nilo-Saharan ancestries. These events may be an indirect route for European-like gene-flow into East Africa.

In the Afroasiatic speaking populations of East Africa, we infer admixture involving sources containing mostly Eurasian ancestry, which most closely matches the Tuscans (TSI, Figure 4). Visualising the temporal distribution of admixture contributions shows that this ancestry appears to have entered the Horn of Africa in two waves (at c. 1800 and 0CE in Figure 5) as result of admixture into the Afar (326CE: 7-587CE), Wolayta (268CE: 8BCE-602CE), Tigray (36CE: 196BCE-240CE), and Ari (689BCE:965-297BCE). There are no Middle Eastern groups in our analysis, and this group of events may represent previously observed migrations from the Arabian peninsular at the same time (Pagani et al., 2012; Hodgson et al., 2014a).

Although Afroasiatic and Nilo-Saharan speakers were sampled from the same part of East Africa, the ancestry of the major sources of admixture of the former do not contain much Nilo-Saharan ancestry and are predominantly Afroasiatic (pink). In Nilo-Saharan speaking groups (purple), the Sudanese (1341CE: 1225–1660), Gumuz (1544CE: 1384–1718), Anuak (703: 427-1037CE), and Maasai (1646CE: 1584-1743CE), we infer greater proportions of West (blue) and East (orange) African Niger-Congo speaking surrogates in the major sources of admixture, indicating both that the Eurasian admixture occurred into groups with mixed Niger-Congo and Nilo-Saharan/Afroasiatic ancestry, and a clear recent link with Central and West African groups.

Lastly, in two Khoesan speaking groups from South Africa, the $\neq$ Khomani and Karretjie, we infer very recent direct admixture involving Eurasian groups most similar to Northern European populations, with dates aligning to European colonial period settlement in Southern Africa (c. 5 generations or 225 years ago; Figure 5) (Hellenthal et al., 2014). Taken together, and in addition the MALDER analysis above, these observations suggest that gene-flow back into Africa from Eurasia has been common around the edges of the continent, has been sustained over the last 3000 years, and can often be attributed to specific and different historical time periods.

Population movements within Africa and the Bantu expansion

Before discussing the impact of the Bantu expansion, we highlight three inferred admixture events involving sources unconnected to that migration. We infer admixture in the Ju/’hoansi, a San group from Namibia, involving a source that closely matches a local southern African Khoesan group, the Karretjie, and an East African Afroasiatic, specifically Somali, source at 558CE (311-851CE). Another, older, event in the Maasai (254BCE: 764BCE-239CE) also involves an Afroasiatic source. In contrast the minor source in the event inferred in the Luhya (1486: 1428-1573CE) most closely matches Nilo-Saharan groups. The recent date of this event implies that Eastern Niger-Congo speaking groups (e.g. the Luhya) interacted with nearby Nilo-Saharan speakers after the putative arrival of Bantu-speaking groups to Eastern Africa which we discuss below.

Most of the sampled groups in this study, and indeed most sub-Saharan Africans, speak a language belonging to the Niger Congo linguistic phylum (Greenberg, 1972; Nurse and Philippson, 2003). A sub-branch of this group are the so-called 'Bantu' languages – a group of approximately 500 very closely related languages – that are of particular interest because they are spoken by the vast majority of Africans south of the line between Southern Nigeria/Cameroon and Somalia (Pakendorf et al., 2011). Given their high similarity and broad geographic range, it is likely that Bantu languages spread across Africa quickly. Bantu languages can themselves be divided into three major groups: northwestern, which are spoken by groups near to the proto-Bantu heartland of Nigeria/Cameroon; western Bantu languages, spoken by groups situated down the west coast of Africa; and eastern, which are spoken across East and Central Africa (Li et al., 2014).

Whilst there is linguistic and archaeological consensus that the Bantu heartland was in the general region of southern Nigeria and Cameroon (Nurse and Philippson, 2003), it is unclear whether eastern Bantu languages were a primary branch that split off before the western groups began to spread south (the early-split hypothesis), or whether this occurred after the start of the movement south (the late-split hypothesis) (Pakendorf et al., 2011). In a study based on glottochronology, Vansina (1995) suggests that the expansion started 5kya, whilst estimates based on linguistic diversity are slightly later, around 4kya (Blench, 2006). This latter date agrees well with the breakthrough of Neolithic technologies, such as tools and pottery, in the archaeology of the Cameroon proto-Bantu heartlands (Bostoen, 2007) and perhaps further south (Lavachery, 2001), linking the spread of technology and farming with the Bantu expansion.

The early split hypothesis suggests that the eastern Bantu migrated directly east from Cameroon, 3–2.5 kya (Nurse and Philippson, 2003) along the border north of the Congo rainforest, to the Great Lakes Region of East Africa (Pakendorf et al., 2011). The late-split hypothesis, on the other hand, suggests that there was an initial spread south, through the equatorial rainforest, with a sub-group splitting east under the rainforest, arriving later in East Africa, potentially around 2kya (Vansina, 1995). Regardless of the exact route, the expansion spread south, arriving in southern Africa by the late first millennium CE (Nurse and Philippson, 2003). Recent phylogenetic linguistic analysis shows that the relationships between contemporary languages better match predictions based on the late-split hypothesis (Holden, 2002; Currie et al., 2013; Grollemund et al., 2015), an observation supported by genetic analyses (Li et al., 2014).

The current dataset does not cover all of Africa. In particular, it contains no hunter-gather groups outside of southern Africa, and no representation of the western Bantu except the Herero from Namibia. Nevertheless, we explored whether our admixture approach could be used to gain insight into the Bantu expansion. Specifically, we wanted to see whether the dates of admixture and composition of admixture sources were consistent with either of the two major models of the Bantu expansion. In the remaining discussion, we make the following assumption: when we observe ancestry from contemporary groups residing in Cameroon (Semi-Bantu and Bantu) this is a proxy for direct gene-flow from the origin of the Bantu expansion. Alternatively, higher proportions of ancestry from Southern or Eastern Niger-Congo speakers are the result of subsequent indirect gene-flow through these groups, which we use together with the time of admixture to relate to the Bantu expansion. We note that our interpretation may change with future analyses involving populations from the relatively under-sampled central southern Africa.

The major sources of admixture in East African Niger-Congo speakers have both Central West and Southern Niger-Congo ancestry, although it is predominantly the latter (Figure 4). If admixture in Eastern Niger-Congo speakers results from early movements directly from Central West Africa (Cameroon surrogates) then we would expect to see sources with predominantly Central West African ancestry. However, all East African Niger-Congo speakers that we sampled have admixture ancestry from a Southern group (Malawi) within the last 2000 years, suggesting that Malawi is more closely related to their Bantu ancestors than Central West Africans on their own. In the SEBantu (1109:1051-1196CE) and AmaXhosa (1196CE: 1109-1283CE), from east southern Africa, we observe reciprocal admixture events involving major sources most similar to East African Niger-Congo speakers. In west southern Africa, on the other hand, we infer two admixture events in the Herero (1834CE: 1805-1892CE and 674CE: 124BCE-979CE), and a single date in the Khoesan-speaking Khwe (1312; 1152-1399CE), both of which involve sources with higher proportions of ancestry from Cameroon (Figure 4—source data 1). In a third west southern African group, the !Xun (1312CE: 1254-1385CE) from Angola, who do not speak a Bantu language, we also infer admixture from a Cameroon-like source at around the same time as the Khwe. The putative Bantu admixture events in Malawi and the Herero occur before those in the !Xun and Khwe (Figure 4). This suggests a separate, more recent, arrival for Bantu ancestry in west southern compared to east southern Africa, with the former coming directly down the west coast of Africa and the latter from earlier interactions in central southern Africa (de Filippo et al., 2012; Li et al., 2014).

To further explore Bantu ancestry in eastern and southern Africa, we performed additional GLOBETROTTER analyses where we restricted the surrogate populations used to infer admixture (Figure 4—figure supplement 1) to specifically identify ancestry from Cameroon. This analysis allows us to ask whether the indirect Bantu ancestry we observe in East and Southern Niger-Congo speakers can be traced back to the origin of the expansion. When we restrict East African Niger-Congo speakers from having admixture surrogates from either within their ancestry region (locally) or Malawi (Figure 4—figure supplement 1), the sources of admixture mainly contained surrogates from the other non-Malawi Southern African Niger-Congo groups (the AmaXhosa, SEBantu, and Herero), reinforcing the relationship between Southern and East African Niger-Congo speakers. With the exception of the Herero, Southern African Niger-Congo speakers show the reverse relationship, choosing East African surrogates when local groups are removed from the inference. Only when both Eastern and Southern African Niger-Congo speakers were restricted from having surrogates from themselves, and each other, did the admixture sources contain significant proportions of Cameroon ancestry (Figure 4—figure supplement 1). By contrast, regardless of which surrogates are removed, the Herero always have inferred major admixture sources that contain a majority of Cameroon ancestry (Figure 4—figure supplement 2). We discuss the restricted surrogate analysis in further detail in Supplementary file 3, Figure 4—figure supplement 1 and Figure 4—figure supplement 2.

In individuals from Malawi we infer a multi-way event with an older date (471: 340-631CE) involving a minor source which mostly contains ancestry from Cameroon, which is, as mentioned, at a similar date to the event seen in the Herero from Namibia. This Bantu admixture appears to have preceded that in other southern Africans by a few hundred years. Given that ancestry from Malawi is often observed in large proportions in the admixture sources of East and Southern African Niger-Congo speakers, and its position between eastern and the most southern groups, Malawi represents the closest proxy in our dataset for the intermediate group that split from the western Bantu. We also see an admixture source in Malawi with a significant proportion of non-Bantu (green) ancestry (2nd event, minor source in Figure 4), ancestry which we do not observe in the mixture model analysis, but which is also evident in the other east Southern Niger-Congo speakers (the AmaXhosa and SEBantu) implying that gene-flow must have occurred between the expanding Bantus and the resident hunter-gatherer groups (Marks et al., 2014).

In summary, the early date of Bantu admixture in Malawi, its presence as an admixture surrogate across eastern and southern Africa, and the observation of later direct Central West African (Bantu) admixture in western south African groups, highlight the complex dynamics, and multiple waves of migration associated with the movement of Bantu agriculturists from the region around Cameroon into southern and eastern Africa. Moreover, our analysis – in addition to evidence from linguistic phylogenetics (Currie et al., 2013; Grollemund et al., 2015) – provides genetic support for the late-split hypothesis, suggesting that the agriculturist Bantus migrated south around the Congo rainforest before travelling east.

A haplotype-based model of gene-flow in sub-Saharan Africa

Our haplotype-based analyses support a complex picture of recent historical gene-flow in Africa (Figure 6). Using genetics to infer historical demography will always depend on the available samples and methods used to infer population relationships. Our aim here is to highlight the key gene-flow events that chromosome painting allows us to detect, and to describe their affect on the structure of coancestry:

Figure 6 with 1 supplement see all

Download asset Open asset

The geography of recent gene-flow in Africa.

We summarise gene-flow events in Africa using the results of the GLOBETROTTER analysis. For each ethnic group, we inferred the composition of the admixture sources, and link recipient population to surrogates using arrows, the width of which is proportional to the amount it contributes to the admixture event. We separately plot (A) all events involving admixture source components from the Bantu and Semi-Bantu ethnic groups in Cameroon; (B) all events involving admixture sources from East and Southern African Niger-Congo speaking groups; (C) events involving admixture sources from West African Niger-Congo and East African Nilo-Saharan / Afroasiatic groups; (D) all events involving components from Eurasia. in (D) arrows are linked to the labelled 1KGP Eurasian groups. Arrows are coloured by country of origin, as in Figure 1—figure supplement 1. Numbers 1–8 in circles represent the events highlighted in section *A haplotype-based model of gene-flow in sub-Saharan Africa*. An alternative version of this plot, stratified by date, is shown in Figure 6—figure supplement 1.

https://doi.org/10.7554/eLife.15266.032

Colonial Era European admixture in the Khoesan. In two southern African Khoesan groups we see very recent admixture, within the last 250 years, involving northern European ancestry which likely resulted from Colonial Era movements from the UK, Germany, and the Netherlands into South Africa (Thompson, 2001).
The recent arrival of the Western Bantu expansion in southern Africa. Central West African, and in particular ancestry from Cameroon (red ancestry in Figure 6A), is seen in Southern African Niger-Congo and Khoesan speaking groups, the Herero, Khwe and !Xun, indicating that the gradual diffusion of Bantu ancestry reached the south of the continent only within the last 750 years. Bantu ancestry in Malawi appears prior to this event.
Medieval contact between Asia and the East African Swahili Coast. Specific Asian gene-flow is observed into two coastal Kenyan groups, the Kauma and Chonyi, which represents a distinct route of Eurasian, in this case Asian, ancestry into Africa, perhaps as a result of Medieval trade networks between Asia and the Swahili Coast around 1200CE.
Gene-flow across the Sahara. Over the last 3000 years, admixture involving sources containing northern European ancestry is seen on the Western periphery of Africa, in The Gambia and Mali. This ancestry in West Africa is likely to be the result of more gradual diffusion of DNA across the Sahara from northern Africa and across the Iberian peninsular, and not via the Middle East, as in the latter scenario we would expect to see Spanish (IBS) and Italian (TSI) in the admixture sources. We do see limited southern European ancestry in West Africa (Figures 5 and 6D) in the Fulani, suggesting that some Eurasian ancestry may also have entered West Africa via North East Africa (Henn et al., 2012).
Several waves of Mediterranean / Middle Eastern ancestry into north-east Africa. We observe southern European gene-flow into East African Afroasiatic speakers over a more prolonged time period over the last 3000 years, with a major wave 2000 years ago (Figures 5 and 6D). We do not have Middle-Eastern groups in our analysis, so the observed Italian ancestry in the minor sources of admixture – the Tuscans are the closest Eurasian group to the Middle East – is consistent with previous results using the same samples (Pagani et al., 2012; Hodgson et al., 2014a), indicating this region as a major route for the back migration of Eurasian DNA into sub-Saharan Africa (Pagani et al., 2012; Pickrell et al., 2014).
The late split of the Eastern Bantus. Admixture in East African Niger-Congo speakers occurs during the period 500-1500CE, with a peak around 1000CE. The major sources of admixture in these groups is consistently a mixture of Central West African and Southern Niger-Congo speaking groups, in particular Malawi. This result supports the hypothesis that Bantu speakers initially spread south along the western side of the Congo rainforest before splitting off eastwards, and interacting with local groups in central south Africa – for which Malawi is our best proxy – and then moving further north-east and south (Figure 6B).
Pre-Bantu pastoralist movements from East to South Africa. In the Ju/’hoansi we infer an admixture event involving an East African Afroasiatic source which we date to 311-851CE. This event precedes the arrival of Bantu-speaking groups in southern Africa, and is consistent with several recent results linking east to south Africa and the limited spread of cattle pastoralism prior to the Bantu expansion (Figures 5 and 6C) (Pickrell et al., 2014; Ranciaro et al., 2014; Macholdt et al., 2015; Barham and Mitchell, 2008).
Ancestral connections between West Africans and the Sudan. Concentrating on older events, we observe old 'Sudanese' (Nilotic) components in very small proportions in events The Gambia dating to c.0CE (Figure 4—figure supplement 1 and Figure 5) and which may represent ancient expansion relationships between East and West Africa. When we infer admixture in West and Central West African groups without allowing any West Africans to contribute to the inference, we observe a clear signal of Nilo-Saharan ancestry in these groups, consistent with bidirectional movements across the Sahel (Tishkoff et al., 2009) and coancestry with (unsampled) Nilo-Saharan groups in Central West Africa. Indeed, if we look again at the PCA in Figure 1C, we observe that the Nilo-Saharan speakers are between West and East African Niger-Congo speaking individuals on PC3, an affinity which is supported by the presence of West African components in non-Niger-Congo speaking East Africans (Figure 6C).
Ancient Eurasian gene-flow back into Africa and shared hunter-gatherer ancestry. The $f_{3}$ statistics show the general presence of ancient Eurasian and/or Khoesan ancestry across much of sub-Saharan Africa. We tentatively interpret these results as being consistent with recent research suggesting very old ( $>$ 10 kya) migrations back into Africa from Eurasia (Hodgson et al., 2014a), with the ubiquitous hunter-gatherer ancestry across the continent possibly related to the inhabitant populations present across Africa prior to these more recent movements. Future research involving ancient DNA from multiple African populations will help to further characterise these observations.

Discussion

We have presented an in-depth analysis of the genetic history of sub-Saharan Africa in order to characterise its impact on present day diversity. We show that gene-flow has taken place over a variety of different time scales which suggests that, rather than being static, populations have been sharing DNA, particularly over the last 3000 years. An important question in African history is how contemporary populations relate to those present in Africa before the transition to pastoralism that began in the Nile Valley some 9kya. The $f_{3}$ and MALDER analyses show evidence for deep Eurasian and some hunter-gatherer ancestry across Africa, to which our GLOBETROTTER analysis (Figure 4) provides further clarity on the composition of the admixture sources, as well as the timing of events and their impact on groups in our analysis (Figure 6). On the basis of our analysis, none of the African populations in our study has remained isolated and unchanged over the last 4000 years.

With a couple of exceptions (some of the events we have highlighted in Figure 6), the major signals of admixture in our analysis relate to the movement of Eurasian ancestry back into Africa and the movement of genes south and east from Central West Africa, likely as a result of the Bantu expansion. The transition from foraging to pastoralism and agriculture in Africa is likely to have been complex, with its impact on existing populations varying substantially. Our analysis provides an estimate of the timing of this expansion (Figure 5). It is important to note that dates of admixture inferred through genetics will always be more recent than the date at which two populations have come together. Our dataset is not an exhaustive sample of African populations, and there are likely to be other events than those reported here that have been important in generating the current genetic landscape of Africa.

Our analyses show that patterns of haplotype sharing across the sub-Sahara can be characterised by historical gene-flow events involving groups with ancestry from across and outside of the continent. We have identified gene-flow across Africa, implying that haplotypes have been moving over (potentially large) distances in a relatively short amount of time. As a rough estimate, given that events in southern African groups involving Bantu sources have occurred within the last 2000 years (Figure 6) and the distance between Cameroon and south-east Africa is around 4000km, haplotypes have moved across and into different environments at a rate of roughly 2 km/year.

Interpreting haplotype similarity as historical admixture

Analyses that model the correlations in allele frequencies (such as those performed here in the Allele frequency differences show widespread evidence for admixture section) provided initial evidence that the presence of Eurasian DNA across sub-Saharan Africa is the result of gene-flow back into the continent within the last 10,000 years (Gurdasani et al., 2014; Pickrell et al., 2014; Hodgson et al., 2014a), and that some groups have ancient (over 5 kya) shared ancestry with hunter-gather groups (Figure 3) (Gurdasani et al., 2014). Whilst the weighted admixture LD decay curves between pairs of populations used by MALDER suggests that this admixture involved particular groups, the interpretation of such events is difficult. Firstly, because our dataset includes closely related groups, it is not always possible to identify a single best matching reference, implying that sub-Saharan African groups share some ancestry with many different extant groups. On the basis of these analyses alone, it is not possible to characterise the composition of admixture sources. Secondly, when ancient events are identified with MALDER, such as in the Mossi from Burkina Faso, where we estimate admixture around 5000 years ago between a Eurasian (GBR) and a Khoesan speaking group (/Gui //Gana), we know that modern haplotypes are likely to only be an approximation of ancestral diversity (Pickrell and Reich, 2014). Even the Ju/’hoansi, a San group from southern Africa traditionally thought to have undergone limited recent admixture, has experienced gene-flow from non-Khoesan groups within this timeframe (Figure 4) (Pickrell et al., 2012; 2014).

There are complications in relating admixture sources to contemporary populations. For example, our analyses indicate that the Mossi share deep ancestry with Eurasian and Khoesan groups (Figure 3), but any description of the historical event leading to this observation is potentially biased by the discontinuity between extant populations and those present in Africa in the past. It is for this reason that, for older events, we define and refer to broader ancestry regions. So in this case, we describe Eurasian ancestry in general moving back into Africa, rather than British DNA in particular. GLOBETROTTER provides an alternative approach by characterising admixture as occurring between sources that themselves are mixtures of ancestry from contemporary groups. In the situation where no sample group provides a good representation of the admixture source, this additional complexity is likely to be a closer approximation to the truth, with the downside that it is not always possible to assign a specific population label to mixed admixture sources. Using contemporary populations as proxies for ancient groups is not the perfect approach and would be improved by DNA from significant numbers of ancient human individuals, at sufficient quality, with which to calibrate temporal changes in population genetics.

Spread of genes within Africa

When new haplotypes are introduced into a population by gene-flow their fate will be partly be determined by the selective advantage they confer, as well as the chance effects of genetic drift. Selection can occur in response to a number of different factors. Greenlandic Inuit, for example, have adapted genetically to a diet rich in polyunsaturated fatty acids (Fumagalli et al., 2015), and one of the strongest signals of selection in the genome is found around the LCT gene (Bersaglieri et al., 2004), mutations in which allow individuals to continue to digest milk into adulthood. Responding to changes in their environment, populations living at high altitudes have adapted convergently at different genes involved in hypoxic response: at BHLHE41 in Ethiopians (Huerta-Sánchez et al., 2013); EPAS1 and EGLN1 in Tibetans (Yi et al., 2010); and at a separate loci within EGLN1 in Andean groups (Bigham et al., 2010). There are also several examples of humans adapting in response to infectious disease, for example at the LARGE gene in West Africans (Grossman et al., 2013), in response to pressure from Lassa fever, and at CR1 in response to malaria (Gurdasani et al., 2014). Diseases such as malaria are caused by highly polymorphic parasites and movement into new environments might lead to exposure to new strains. An implication of widespread gene-flow is that it can provide a route for potentially beneficial novel mutations to enter populations allowing them to adapt to such change.

A recent example of this process is the observation of higher than expected frequencies of the Duffy-null mutation in populations from Madagascar as a result of admixture with African Bantu speaking groups (Hodgson et al., 2014b). The spread of the Duffy-null allele, an ancient mutation which is thought to have arose at least 30,000 years ago (Hamblin and Di Rienzo, 2000; Hamblin et al., 2002) and confers resistance to Plasmodium vivax malaria, throughout Africa is only possible through contact and gene-flow between populations right across the sub-Sahara. Conversely, the mutation responsible for the sickle cell phenotype, which offers protection against P. falciparum malaria, appears to have recently occurred five times independently in Africa, causing multiple distinct haplotypes to be observed (Hedrick, 2011). These mutations are young, within the order of 250–1750 years old (Currat et al., 2002; Modiano et al., 2008), so will have had limited opportunity to have been moved around by the gene-flow events that we describe. Further work is needed to understand the role of admixture in facilitating adaptation.

Admixture and genetic epidemiology

Epidemiology is the process of identifying the mechanisms that lead to changes in disease prevalence that could result from different environments, behaviours, or genetic backgrounds. Our study helps address these questions by providing a detailed guide to genetic similarity between different ethno-liguistic groups in different geographic locations. This is equally relevant for studies of important infectious disease (such as malaria), as it is for studies of non-communicable diseases which are associated with life-style changes in developing parts of Africa (see Rotimi and Jorde (2010) for a review). As an example, we detect consistent genetic differences between groups in Central West Africa (e.g. the Akans and Namkam/Kasem from Ghana in Figure 2—figure supplement 1), but not in groups from the West and East Africa Niger-Congo ancestry regions (The Gambia and Kenya; Figure 1—figure supplement 3). Within these groups we see individuals with a spectrum of different ancestral backgrounds. These observations are specific to groups in our analysis, and cannot be extended to other groups from similar populations; the seven ethnic groups from The Gambia were all collected in and around Banjul in the Western District, whereas the three ethnic groups from Ghana were collected from two hospitals in the north (Navrongo) and centre (Kumasi) of the country. Nonetheless, the potential for genetic differences to underlie difference in disease should be guided by analyses of haplotype sharing between groups.

Chromosome painting approaches also provide a quantitative measure of the extent to which self-reported ethnic labels capture genetic relationships, which is also important for controlling for the potential confounding effects of population structure (Marchini et al., 2004; Price et al., 2006), when genome-wide data are unavailable. We see that this can vary extensively from one population to the next (Figure 1—figure supplement 3); some individuals who report as coming from the same ethnic group cluster into different groups and other individuals from different ethnicities cluster together. We also show that there are differences in the inferred relationship between populations using analyses genotype based approaches such as PCA and $F_{S T}$ , and our haplotype based analysis (Figure 1) and TVD (Figure 2). These results suggest that for some groups, haplotype similarity to other ancestries can vary more substantially than allele frequencies alone. In designing genotyping and sequencing studies these differences can be important in ensuring that the breadth of variation in African populations is adequately covered (Kwiatkowski, 2005; Gurdasani et al., 2014). Africa has an exciting and important role in furthering understanding of human biology and disease. An understanding of its patterns of genetic diversity and the historical movements of its people should help in this endeavour.

Share this article

Cite this article

Sub-Saharan African genetic variation is shaped by ethno-linguistic and geographical similarity.

Figure 1—source data 1

Haplotypes capture more population structure than independent loci.

Figure 2—source data 1

Figure 2—source data 2

Figure 2—source data 3

Figure 2—source data 4

Inference of admixture in sub-Saharan Africa using MALDER.

Figure 3—source data 1

Figure 3—source data 2

Figure 3—source data 3

Figure 3—source data 4

Inference of admixture in sub-Saharan African using GLOBETROTTER.

Figure 4—source data 1

Figure 4—source data 2

A timeline of recent admixture in sub-Saharan Africa.

The geography of recent gene-flow in Africa.

Author details

George BJ Busby

Contribution

For correspondence

Competing interests

Gavin Band

Contribution

Competing interests

Quang Si Le

Contribution

Competing interests

Muminatou Jallow

Contribution

Competing interests

Edith Bougama

Contribution

Competing interests

Valentina D Mangano

Contribution

Competing interests

Lucas N Amenga-Etego

Contribution

Competing interests

Anthony Enimil

Contribution

Competing interests

Tobias Apinjoh

Contribution

Competing interests

Carolyne M Ndila

Contribution

Competing interests

Alphaxard Manjurano

Contribution

Competing interests

Vysaul Nyirongo

Contribution

Competing interests

Ogobara Doumba

Contribution

Competing interests

Kirk A Rockett

Contribution

Competing interests

Dominic P Kwiatkowski

Contribution

Competing interests

Chris CA Spencer

Contribution

For correspondence

Competing interests

Malaria Genomic Epidemiology Network

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism

Further reading