The evolutionary history of human spindle genes includes back-and-forth gene flow with Neandertals

  1. Stéphane Peyrégne  Is a corresponding author
  2. Janet Kelso
  3. Benjamin M Peter
  4. Svante Pääbo  Is a corresponding author
  1. Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Germany

Abstract

Proteins associated with the spindle apparatus, a cytoskeletal structure that ensures the proper segregation of chromosomes during cell division, experienced an unusual number of amino acid substitutions in modern humans after the split from the ancestors of Neandertals and Denisovans. Here, we analyze the history of these substitutions and show that some of the genes in which they occur may have been targets of positive selection. We also find that the two changes in the kinetochore scaffold 1 (KNL1) protein, previously believed to be specific to modern humans, were present in some Neandertals. We show that the KNL1 gene of these Neandertals shared a common ancestor with present-day Africans about 200,000 years ago due to gene flow from the ancestors (or relatives) of modern humans into Neandertals. Subsequently, some non-Africans inherited this modern human-like gene variant from Neandertals, but none inherited the ancestral gene variants. These results add to the growing evidence of early contacts between modern humans and archaic groups in Eurasia and illustrate the intricate relationships among these groups.

Editor's evaluation

Peyrégne et al., studied the genes encoding proteins of the spindle apparatus which exhibit an elevated number of nonsynonymous substitutions in modern humans. In one of these genes (KNL1), comparisons of modern and archaic humans identify that some Neanderthals carried the modern human haplotype at the KNL1 gene, raising the possibility that Neanderthals acquired it from modern humans. Based on the patterns observed in this gene and estimates of the time to the most recent common ancestor, the authors propose an evolutionary scenario that includes an introgression event from modern humans into Neanderthals around 200,000 years ago, and a more recent introgression event from Neanderthals into non-African populations. This study highlights how inspecting individual genomic regions can reveal a complex history of interactions between modern and archaic humans.

https://doi.org/10.7554/eLife.75464.sa0

Introduction

The ancestors of Neandertals and Denisovans diverged from those of modern humans between 522 and 634 thousand years ago (kya) (Prüfer et al., 2017). Differences between modern humans and Neandertals, Denisovans and other hominins can therefore reveal traits that changed in modern humans over the past half-million years and are specific to modern humans. The paleontological and archaeological records provide information about morphological and cultural traits (Trinkaus, 2006; Roebroeks and Soressi, 2016), yet other traits remain inaccessible by such approaches. The retrieval of DNA from archaic human remains and the reconstruction of Neandertal and Denisovan genomes (Green et al., 2010; Meyer et al., 2012; Prüfer et al., 2014) offer an additional approach to understanding what sets modern and archaic humans apart.

Because large numbers of present-day human genomes have been sequenced while few archaic genomes are available, the comparisons of Neandertal and Denisovan genomes are particularly useful for identifying those nucleotide changes that occurred on the modern human lineage and reached fixation (or high frequencies) in present-day people. According to one estimate (Prüfer et al., 2014), there are 31,389 such differences of which only 96 change the amino acid sequence of 87 proteins. Three of these missense changes occur in a single gene, sperm associated antigen 5 (SPAG5), which encodes a protein associated with the spindle apparatus (Mack and Compton, 2001; Chang et al., 2001). The spindle is a cytoskeletal structure composed of microtubules and associated proteins that attach to the chromosomes and ensures their proper segregation during cell division (Prosser and Pelletier, 2017). SPAG5 is the only gene in the genome with three missense changes, but four of the 87 genes carry two missense changes. One of these is kinetochore scaffold 1 (KNL1, previously CASC5) which encodes a protein in the kinetochore (Cheeseman et al., 2008), a protein structure at the centromere of chromosomes to which the spindle attaches during mitosis and meiosis. The occurrence of three missense changes in SPAG5 and two in KNL1, as well as one in the gene KIF18A, which encodes a protein involved in the movement of chromosomes along microtubules (Mayr et al., 2007), is intriguing as it suggests that components of the spindle may have been subject to natural selection during recent human evolution (Pääbo, 2014; Kuhlwilm and Boeckx, 2019).

We therefore set out to study genes with missense changes that are associated with the spindle apparatus and to reconstruct their evolutionary history in modern and archaic humans. We show that multiple spindle genes may have experienced positive selection on the modern human lineage. We also show that the variant of KNL1 that carries two derived missense mutations was transferred by gene flow from modern humans to early Neandertals and then again from late Neandertals to modern humans.

Results

Spindle-associated genes with missense changes in modern humans

According to the classification of genes in the Gene Ontology database (Ashburner et al., 2000; Gene Ontology Consortium, 2021) that represents current knowledge about the function of genes, there are eight spindle-associated genes among the 87 genes that carry missense changes that are fixed or almost fixed among present-day humans. These eight genes are SPAG5 and KNL1, with three and two changes, respectively, and ATRX, KATNA1, KIF18A, NEK6, RSPH1, and STARD9, which each carry one missense change. A total of 11 missense changes accumulated in these eight genes. Given the length of spindle-associated genes, six missense changes are expected making this a significant enrichment of amino acid changes compared to random expectation (one-tailed permutation test p=0.045; see Methods). Multiple missense changes in a single gene are also more than expected by chance (one-tailed p=0.044 for two changes; one-tailed p<0.001 for three changes). Note that it is the high number of changes in these genes that stands out and not the number of genes, because eight spindle-associated genes carrying at least one missense change is not more than expected (one-tailed p=0.282). The genes with single changes have a range of functions associated with cell division. ATRX is a chromatin remodeler required for normal chromosome alignment, cohesion and segregation during meiosis and mitosis (Ritchie et al., 2014; Ramamoorthy and Smith, 2015). KATNA1 and KIF18A are depolymerases that disassemble microtubules in the spindle apparatus and are essential for proper chromosome alignment at the equator of dividing cells (Mayr et al., 2007; Roll-Mecak and McNally, 2010). NEK6 is a kinase that is required for cell cycle progression through mitosis (Yin et al., 2003) and interacts with proteins of the centrosome (Vaz Meirelles et al., 2010), an organelle that serves as a microtubule-organizing center and is crucial for the spindle apparatus. RSPH1 localizes to the spindle and chromosomes during meiotic metaphase (Tsuchida et al., 1998), when the chromosomes align in the dividing cell. Finally, STARD9 localizes to the centrosome during cell division and is required for spindle assembly by maintaining centrosome integrity (Torres et al., 2011). We describe the location and predicted effects of the missense changes in Appendix 1—tables 1 and 2.

Evolutionary history of spindle proteins in modern humans

The enrichment of missense changes in genes associated with the spindle apparatus suggests that some of these changes may have been subject to positive selection in modern humans (Hudson et al., 1987). To test whether the eleven changes in the eight genes may have been beneficial, we applied a method to detect positive selection that occurred in the common ancestral population of modern humans after their split from archaic humans (Peyrégne et al., 2017). This method relies on a hidden Markov model to identify genomic segments where archaic human genomes fall outside of modern human variation, that is, where all modern humans share a common ancestor more recently than any common ancestor shared with archaic humans. Applying this approach to the genomes of a Neandertal (Prüfer et al., 2014), a Denisovan (Meyer et al., 2012) and 504 Africans (1000 Genomes Project Consortium et al., 2015), we identified such segments spanning 132–547,448 base pairs (bp) around the missense substitutions of each of the eight spindle genes (Figure 1; Figure 1—figure supplement 1). The genetic lengths of these segments are informative about how fast they spread in modern humans since the common ancestor shared with archaic humans. Segments longer than 0.025cM were not observed in neutral simulations (i.e. false positive rate <0.1%, Peyrégne et al., 2017) and therefore represent evidence for positive selection. We account for uncertainty in local recombination rates by using two recombination maps (African-American and deCODE maps, Hinch et al., 2011; Halldorsson et al., 2019). One gene, SPAG5, carries a segment around the missense substitutions longer than 0.025cM in both maps (Figure 2). The segments in two genes, ATRX and KIF18A, exceed this cutoff only for the African-American and deCODE recombination maps, respectively (Figure 2A). Thus, there is consistent evidence for positive selection on SPAG5, while evidence for positive selection affecting ATRX and KIF18A is dependent on the recombination maps used.

Figure 1 with 1 supplement see all
Genomic regions around spindle genes where archaic humans fall outside the modern human variation.

Each panel corresponds to the region around the missense change(s) (red stars) in a spindle gene. The grey boxes correspond to exons. The curves give the posterior probability (computed as in Peyrégne et al., 2017) that an archaic genome (Altai Neandertal in red, Denisova 3 in orange) is an outgroup to present-day African genomes at a particular position (dots on the curves correspond to informative positions, that is polymorphic positions or fixed derived substitutions in Africans from the 1,000 Genomes Project phase III, compared to four ape genomes). Chromosomal locations are given on top.

Evidence for selection in the spindle genes with age estimates of these substitutions.

(A) The genetic length of segments around the missense substitutions where the Altai Neandertal and Denisova 3 fall outside the human variation (Figure 1) using the African-American map, AAmap, or the deCODE map, deCODE19. The grey histogram corresponds to the length distribution of such segments in neutral simulations (Peyrégne et al., 2017). Candidate genes for selection (red) are those with segments longer than 0.025cM (Peyrégne et al., 2017) (vertical red dashed line). (B) Cumulative distributions of pairwise times to the most recent common ancestors (TMRCA) among present-day African chromosomes with the most distant relationships (red; see Methods), or between the chromosomes of present-day Africans and either present-day individuals with the ancestral versions of the missense substitutions (‘Modern (anc)’, in black) or archaic humans (other colors). The pink areas correspond to estimated time intervals for the origin of the missense substitutions and their bounds correspond to the average TMRCAs over the red curve and the next one (back in time), respectively. (C) Summary of ages of substitutions as described in panel B. Genes with evidence of positive selection are highlighted in red.

To gain further insights into the history of the missense substitutions in the spindle genes, we estimated the time when each of these mutations occurred. Note that this time differs from that of fixation, which may have happened much later, particularly in the absence of positive selection. As fixation events since the split with the ancestors of Neandertals and Denisovans are rare (Peyrégne et al., 2017), it is most parsimonious to assume that the mutations that reached fixation in the region of each spindle gene represent a single event of fixation (i.e. were linked to the missense substitution(s) as they spread in modern humans). Thus, estimates of the coalescence time of the haplotypes in which the substitutions are found today provides a most recent bound for when the mutations occurred. By contrast, the most recent common ancestors shared between modern and archaic humans (who carried the ancestral variants) will predate the occurrence of the mutations and therefore provide older bounds for the time of origin of the missense variants.

By computing pairwise differences among the high-quality genome sequences of 104 individuals from Africa (Human Genome Diversity Panel, Bergström et al., 2020; including two non-African individuals with the ancestral version of KATNA1 and KIF18A, respectively; Methods), and four archaic human genomes (Prüfer et al., 2017; Meyer et al., 2012; Prüfer et al., 2014; Mafessoni et al., 2020), we estimated the ages of the missense substitutions of KATNA1, KIF18A, KNL1, SPAG5, and STARD9 (Figure 2B and C, Appendix 2—table 1). We excluded NEK6 and RSPH1 as the regions identified around the missense substitutions were too short to estimate their age (4,104bp and 132bp, respectively). We also excluded ATRX, which is located on the X chromosome. The relative difference in age estimates suggests that the mutations in KATNA1, KNL1, and STARD9 occurred much earlier than the mutations in SPAG5 and KIF18A and may be more than a million years old. By contrast, the mutations in SPAG5 and KIF18A are more recent. This is consistent with that the latter two genes have some evidence of positive selection around their missense substitutions whereas the former genes lack evidence for selection.

A modern human-like KNL1 haplotype in Neandertals

The identification of modern human-specific missense changes was based on the high-coverage genomes of one Neandertal and one Denisovan (Prüfer et al., 2014). With the availability of additional archaic human genome sequences, it is now possible to explore whether the derived states may have occurred in some archaic humans. For five of the spindle genes, sequence information from seven to ten archaic humans is available at the relevant positions (Prüfer et al., 2017; Prüfer et al., 2014; Mafessoni et al., 2020; Hajdinjak et al., 2018; Peyrégne et al., 2019; Slon et al., 2018). In none of them, there is evidence for the presence of the derived missense variants (Appendix 3—table 1). For one gene, STARD9, one DNA fragment sequenced from a Neandertal individual (Denisova 5, 52-fold average sequence coverage) carries the derived variant whereas 29 carry the ancestral variant. This is likely to represent present-day human DNA contamination or a sequencing error. For another gene, SPAG5, which carries three missense substitutions, two DNA fragments sequenced from a Neandertal individual (Mezmaiskaya 1, 1.9-fold average sequence coverage) carry the derived variant at one of these positions (chr17:26,919,034; hg19), whereas three DNA fragments carry the ancestral variant. No information was available for this individual at the two other positions, but none of eight neighbouring positions where present-day Africans carry fixed derived alleles and where information is available for this Neandertal carries any modern human-like allele (Appendix 3—table 2). This could therefore represent present-day human DNA contamination, which amounts to 2–3% in the DNA libraries sequenced from this individual (Prüfer et al., 2017).

In contrast, between one and 27 DNA fragments that cover the two missense substitutions in KNL1 in four of the twelve Neandertals available carried only derived alleles (Figure 3). This includes the Chagyrskaya 8 genome, which is sequenced to 27-fold average genomic coverage. Several of the fragments carrying derived alleles also carry cytosine-to-thymine substitutions near their ends, which is typical of ancient DNA molecules and suggests that the fragments do not represent present-day human DNA contamination (Briggs et al., 2007).

A modern human-like haplotype in some Neandertals.

Genotypes from 13 archaic individuals (y-axis) are shown in a region around the two missense changes (dots) in KNL1. We only show positions (x-axis) that are derived in all Luhya and Yoruba individuals from the 1,000 Genomes Project compared to four great apes (Peyrégne et al., 2017) and at least one high-coverage archaic genome (Chagyrskaya 8, Denisova 3, Vindija 33.19 and Altai Neandertal, i.e., Denisova 5). The colors of the squares and dots represent the genotype, with ancestral and derived alleles. For the low coverage archaic genomes, we randomly sampled a sequence at each position. Red lines indicate the modern human-like haplotype.

Further evidence that the modern human-like alleles in KNL1 in the four Neandertals are not due to present-day DNA contamination comes from the observation that they carry modern human-like derived alleles that occur in all present-day Africans at 21 other positions in a 276kb-long region around the missense variants, whereas the other Neandertal and Denisovan individuals who carry the ancestral alleles do not (Figure 3). In addition, as there is no evidence of ancestral alleles at any of these informative positions, the four Neandertals probably all carried this ‘modern human-like’ haplotype in homozygous form.

To investigate how the divergence among Neandertals of the haplotypes with and without the derived missense substitutions in the KNL1 gene compares to other regions of Neandertal genomes, we calculated the divergence in non-overlapping 276kb-segments between the Altai Neandertal (Denisova 5), who carries ancestral KNL1 substitutions, and Chagyrskaya 8, who carries the derived substitutions (Figure 4A). The number of differences in the KNL1 region is about one order of magnitude higher than the average across the genome and in the top 0.27% of all regions in the genome. Thus, the KNL1 region stands out as unusually diverged between these two Neandertals.

Figure 4 with 1 supplement see all
The modern human-like KNL1 haplotype in Neandertals.

(A) Pairwise differences between two high coverage Neandertal genomes (Chagyrskaya 8 and Altai Neandertal (Denisova 5)) in non-overlapping sliding windows of 276 kb (histogram) and in the KNL1 region (vertical cyan line; chr15:40,818,035–41,094,166, hg19). Windows with less than 10,000 genotype calls for both Neandertals were discarded. (B) The expected length distributions under a model of incomplete lineage sorting based on local recombination rate estimates from the African-American (AA) and deCODE recombination maps and the length of the modern human-like KNL1 haplotype in Neandertals (vertical cyan line). (C) Left panel: Time to the most recent common ancestor (TMRCA) between the Chagyrskaya 8 Neandertal (who carries the modern human-like haplotype) and KNL1 haplotypes in present-day humans with their 95% confidence intervals (bars) for chr15:40,885,107–40,963,160 (hg19). The size of the points corresponds to the number of chromosomes carrying this haplotype in the HGDP dataset. The black rectangles highlight subsets of haplotypes with TMRCAs more recent than the modern-archaic population split time (Prüfer et al., 2017) (shaded pink area). Right panel: Distribution of pairwise TMRCAs between the Neandertal and present-day humans from the HGDP dataset in the region of KNL1 and two other regions with archaic haplotypes in present-day humans (Controls, Zeberg and Pääbo, 2020; Dannemann et al., 2016; COVID risk region: chr3:45,859,651–45,909,024; TLR6, 1 & 10: chr4:38,760,338–38,846,385). We used the Vindija 33.19 genome for the COVID risk haplotype and the Chagyrskaya 8 genome otherwise.

Introgression of the KNL1 haplotype into Neandertals

It is intriguing that the Neandertals who carry the two missense changes in KNL1 also carry modern human-like alleles shared by all (or nearly all) present-day Africans at multiple positions in the region of KNL1 and exhibit unusually high divergence to other Neandertal haplotypes. This raises the question if Neandertals may have inherited this haplotype from ancestors or relatives of modern humans. However, the age of the KNL1 missense mutations predates the divergence of Neandertals and modern humans (Figure 2C) and must thus have been segregating in the ancestral populations of the two groups. This may have resulted in the presence of the derived variants in both Neandertals and modern humans even in the absence of any gene flow (‘incomplete lineage sorting’). However, in this case, the segment carrying similarity between Neandertal and modern human genomes is expected to be shorter than if it entered the Neandertal population by gene flow because the number of generations over which recombination would have had the opportunity to shorten the haplotype would be larger.

We used local estimates of the recombination rate (Methods) and a published method (Huerta-Sánchez et al., 2014) to infer the expected length distribution for haplotypes inherited from the population ancestral to Neandertals and modern humans. The 276kb haplotype is longer than expected (Figure 4B and p ≤ 0.006) and it is therefore unlikely to be inherited from the common ancestral population. Thus, although the missense variants were segregating in the common ancestral population of Neandertals and modern humans, Neandertals did not inherit this haplotype from that population. Rather, the haplotype in Neandertals is likely the result of gene flow between Neandertals and modern humans. Since all present-day humans carry the derived KNL1 variants whereas only some Neandertals do, and since the modern human-like KNL1 haplotype is unusually diverged to other Neandertal haplotypes, gene flow was likely from modern humans (or their relatives) to Neandertals.

Age of KNL1 gene flow into Neandertals

Using the length of the haplotype, the local recombination rate (averaged over the African-American and deCODE recombination maps), the age of the most recent Neandertal carrying the haplotype (~40kya, Hajdinjak et al., 2018), and given the limitation that recombination among the fixed alleles in modern humans cannot be detected, we estimate an older age limit for the haplotypes seen in Neandertals and modern humans of 265kya (Methods). This means that gene flow occurred after this time.

It is possible to further refine the age estimate of this gene flow if some present-day humans are closer to the Neandertals carrying the haplotype than other present-day humans. By comparing the high-quality genomes of Neandertals with and without the modern-like KNL1 haplotype, we identified 206 positions that are derived on the modern-like KNL1 haplotype in Neandertals. We then computed the frequency of these alleles among 104 present-day African genomes and identified 19 positions where 32–39% of present-day people share the derived allele (Appendix 4—table 1). These derived alleles are physically linked (r2 >0.58) and tag a 78kb-long haplotype (102kb in some individuals, r2 >0.39) where some present-day individuals are closer to the Chagyrskaya 8 Neandertal than other present-day individuals. The individuals with the 102kb-long haplotype carry seven differences to the Chagyrskaya 8 Neandertal (among the 36,106 bases called). This number of differences yields an estimate for the most recent common ancestor with Neandertals carrying the modern-like KNL1 haplotype of 251kya (95% CI: 131-421kya; Figure 4C). Note that this haplotype is also too long to be inherited from the common ancestral population of Neandertals and modern humans half a million years ago (95% CI for the age of the last common ancestor: 49-304kya). The length of the 102kb-long haplotype in combination with the number of differences to Chagyrskaya 8 yields an age estimate of 202kya (95% CI: 118-317kya, Methods), consistent with the estimate of gene flow <265kya based on the haplotype length in Neandertals.

In conclusion, some Neandertals carry a haplotype encompassing the KNL1 gene that carries derived alleles also seen in all present-day humans. The length of this haplotype suggests that it entered the Neandertal population less than 265kya. Some present-day people carry a haplotype in this region that share a common ancestor with the modern human-like haplotype in Neandertals 202kya. Both estimates represent times after which the contact that contributed this haplotype to Neandertals must have occurred.

No evidence for selection on the KNL1 haplotype in Neandertals

The oldest Neandertal carrying the modern human-like KNL1 haplotype for whom we have genome sequence data is Chagyrskaya 8, who lived 60-80kya in the Altai Mountains (Mafessoni et al., 2020). It is not present in two older Neandertals from Europe (Peyrégne et al., 2019) and one from Siberia (Prüfer et al., 2014), nor in two other archaic individuals who lived around 60-80kya in Siberia and the Caucasus (Prüfer et al., 2017; Slon et al., 2018). However, it is present in three out of five Neandertals who lived 40-50kya (Prüfer et al., 2017; Hajdinjak et al., 2018) in Europe, which may hint at an increase in its frequency over time.

To test this hypothesis, we investigated how often frequencies of other variants (not necessarily introgressed) in the genomes of these Neandertals similarly increased over time. We identified 7,881 polymorphic transversions at least 50kb apart where the derived allele was absent among the three early Neandertals but present at least once in the Neandertals who lived 60-80kya and 40-50kya. Among these positions, at least three late Neandertals carried the derived alleles at 2,773 positions (35.2%), suggesting that such changes in frequency were common. Although this analysis is limited by the few Neandertal genomes available, it yields no evidence for positive selection on the KNL1 variants in Neandertals.

Reintroduction of the modern-like KNL1 haplotype into non-Africans

As several late Neandertals carried a modern human-like KNL1 haplotype, it is possible that it was reintroduced into modern humans when they met and mixed outside Africa approximately 44-54kya (Iasi et al., 2021). We would then expect that some non-Africans would carry a haplotype that is more closely related to that of Chagyrskaya 8 than the present-day humans analyzed above (those highlighted by a solid box in Figure 4C). Among 825 non-African individuals from around the world, we identified 12 individuals from several populations that carry one chromosome that differs at just 7–13 positions from the Chagyrskaya 8 genome in the KNL1 region (out of ~140kb with genotype calls in the 276kb region; Appendix 5—table 1). By comparison, other non-African individuals carry 54–179 differences to the Chagyrskaya 8 genome in this region (Figure 4C). We estimated that the 12 individuals share a most recent common ancestor with Chagyrskaya 8 about 96 to 139kya in this region of KNL1 (95% CI: 81-145kya to 95-198kya for the individuals with 7 and 13 differences to Chagyrskaya 8, respectively; Methods). These estimates are similar to those computed in Neandertal haplotypes previously described in present-day individuals (Figure 4C, right panel, Zeberg and Pääbo, 2020; Dannemann et al., 2016). Adjoining the 3’-end of this KNL1 region, there are seven positions where these 12 individuals share alleles with archaic genomes (including four alleles shared with Chagyrskaya 8; Figure 4—figure supplement 1) while no other present-day humans in the dataset do so. These observations suggest that these 12 present-day individuals carry a KNL1 haplotype inherited from Neandertals.

Although the missense alleles in KNL1 are fixed among the genomes of 2,504 present-day humans (1000 Genomes Project Consortium et al., 2015), some rare ancestral alleles may exist among present-day human genomes, for example, due to interbreeding with archaic humans or due to back mutations. We therefore looked at the exome and whole-genome sequences from the gnomAD database (v2.1.1, Karczewski et al., 2020). There are no ancestral alleles among the exome sequences (out of 227,420 alleles called). One of the 15,684 whole genomes carries one ancestral allele at one of the two positions carrying missense variants. This may be a back mutation (rs755472529; Appendix 6—table 1). Thus, although there is evidence that gene flow from Neandertals introduced the derived version of KNL1 into modern humans, there is no evidence of the ancestral version of KNL1, which was also present in Neandertals, in present-day people. This is compatible with that the ancestral variants may have been disadvantageous and therefore eliminated after admixture with Neandertals and Denisovans.

Discussion

Spindle genes experienced an unusually large number of missense changes in modern humans since the split from a common ancestor with archaic humans. This is intriguing, as mitotic metaphase has been shown to be prolonged in apical progenitors during human brain organoid development when compared to apes (Mora-Bermúdez et al., 2016; Mora‐Bermúdez et al., 2021). Some of the missense changes in spindle genes that occurred since the separation of modern and archaic humans may be involved in such differences.

One of the spindle genes carrying missense changes, KNL1, has a particularly interesting evolutionary history in that some Neandertals carried a haplotype sharing two missense variants with present-day humans that were hitherto believed to be specific to modern humans. Both the length of the modern-like KNL1 haplotype in Neandertals and the small number of differences between present-day humans and Neandertals in this region suggest that they shared a common ancestor more recently than the estimated split time between these two populations. While only a few Neandertals carry this KNL1 haplotype and its divergence to other Neandertal haplotypes is unusually high (Figure 4A), many variants on the haplotype are fixed or common among present-day humans. Therefore, we infer that ancestors (or relatives) of modern humans contributed this haplotype to Neandertals. Figure 5 summarizes the complex history of KNL1.

Schematic illustration of the history of KNL1.

The tree delineated in black corresponds to the average relationship between the modern and archaic human populations. The inner colored trees correspond to the relationship of different KNL1 lineages, with arrows highlighting the direction of gene flow between populations. The corresponding haplotypes are illustrated on the sides of the tree and show the recombination history in the region (e.g. the recombinant Neandertal haplotype with variants of putative archaic origin in non-Africans). Dots correspond to informative positions, and the stars illustrate the missense substitutions. The age of relevant ancestors are marked by horizontal dotted lines. MH: Modern human.

That groups related to early modern humans contributed a KNL1 haplotype to early Neandertals supports previous evidence based on the analyses of mitochondrial DNA (Meyer et al., 2016; Posth et al., 2017), Y chromosomes (Petr et al., 2020) as well as genome-wide data (Kuhlwilm et al., 2016; Hubisz et al., 2020) for early contacts between populations related to modern humans and Neandertals in Eurasia. The estimated ages of the most recent common ancestor of the KNL1 haplotype of ~202kya (based on pairwise differences and length of the shared haplotype) and ≤265kya (based on the length of the haplotype in Neandertals) suggest that at least some contacts were more recent than 265kya and are in agreement with some of the previous age estimates for such contacts (Hubisz et al., 2020). This could resolve the apparent discrepancy between the high mitochondrial diversity relative to their nuclear diversity among early Neandertals (Peyrégne et al., 2019) as the mitochondrial diversity would reflect diversity inherited from a potentially more diverse modern human population. Notably, human remains that may be morphologically similar to modern humans and are of a relevant age have been found in Greece ~210kya (Harvati et al., 2019) and in the Levant 177–194kya (Hershkovitz et al., 2018). In any event, the complex history of the KNL1 haplotype illustrates the close and intricate relationship between Neandertals and modern humans.

The enrichment of missense changes in spindle genes (Prüfer et al., 2014; Pääbo, 2014) suggests that the spindle apparatus may have played a special role during modern human evolution. We show evidence of positive selection in the region around the missense variants in SPAG5, and perhaps in ATRX and KIF18A. Although there is no evidence of selection for the other variants it should be kept in mind that, based on simulations, the method used only detects selective advantages of at least 0.5% with a 65% probability (Peyrégne et al., 2017). It is also unlikely to detect selection starting from standing variation. Therefore, the absence of evidence for selection does not exclude the possibility of selection. For instance, it is interesting that only derived KNL1 haplotypes introduced into modern humans from Neandertals persisted until present-day. This provides suggestive evidence that the missense mutations in KNL1 may also have been important for modern humans. Ultimately, functional characterization of the potential effects of the ancestral and derived variants in spindle genes is necessary to clarify if these changes had effects that could have been advantageous.

Note added in proof

We refer readers to Mora-Bermúdez et al., 2022 who show an effect of the missense changes in KIF18A and KNL1 on the prolongation of metaphase.

Methods

Identifying genes associated with the spindle apparatus

The human Gene Ontology Annotation database was downloaded on June 8th 2021 (goa_human.gaf.gz, from http://current.geneontology.org/products/pages/downloads.html), together with the basic Gene Ontology terms (http://purl.obolibrary.org/obo/go/go-basic.obo). We selected all the Gene Ontology terms that include the keyword ‘spindle’ (65 terms), identified all the human genes that are associated with at least one of these terms (562 genes out of 19,719) and overlapped them with the list of 87 genes that exhibit fixed missense changes in modern humans (Prüfer et al., 2014; Pääbo, 2014).

Testing for enrichment of missense changes in spindle genes

The dbNSFP (version 4.2), a database of all potential missense variants in the human genome, was downloaded on October 4th 2021 from ftp://dbnsfp@dbnsfp.softgenetics.com/dbNSFP4.2a.zip (Liu et al., 2020). After removing the variants that do not have a coordinate in hg19 and do not pass the minimal filters for the Altai Neandertal (Denisova 5) and Denisovan (Denisova 3) genomes (Prüfer et al., 2017), we randomly sampled 96 of the missense variants reported in this database (from the 62,961,368 remaining after filtering) a thousand times and recorded each time how many variants belonged to a spindle gene (as identified in the previous section). If a variant belonged to multiple genes, it was counted only once. The p-value reported in the Results section corresponds to the number of repetitions with eleven or more missense changes in spindle genes. Similarly, to test whether multiple missense changes in a single gene are expected by chance, we counted how often among the thousand repetitions there was at least one gene with two (or three) or more missense changes. Finally, to test whether eight spindle genes with missense changes is expected by chance, we instead counted how often there were eight or more spindle genes with at least one missense change in these resamples.

Processing the human genome datasets and great ape reference genomes

For most analyses, we used phased genotypes from the high-coverage genomes from the Human Genome Diversity Panel (HGDP, Bergström et al., 2020), as well as gVCF files for each individual to identify positions which have genotype calls. After subsetting to regions of interest (Appendix 2—table 1), gVCF files were converted into VCF files, concatenated with the phased genotypes and merged into a single VCF file using bcftools (Danecek et al., 2021). Genotypes that did not pass all the quality filters from Bergström et al., 2020 were then discarded. Finally, positions were lifted over to hg19 using picard tools (default parameters, http://broadinstitute.github.io/picard) and the chain file hg38ToHg19 from the University of California, Santa Cruz, (UCSC) Genome Browser (hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/hg38ToHg19.over.chain.gz).

Genotypes from four high-coverage archaic genomes (Denisova 3 (Meyer et al., 2012), Altai Neandertal (Prüfer et al., 2014), Vindija 33.19 (Prüfer et al., 2017) and Chagyrskaya 8 (Mafessoni et al., 2020), as generated in Prüfer et al., 2017 and Mafessoni et al., 2020 with snpAD [Prüfer, 2018]) and nine low-coverage archaic genomes (Goyet Q56-1, Mezmaiskaya 2, Les Cottés Z4-1514, Vindija 87 and Spy 94a from Hajdinjak et al., 2018, Hohlenstein-Stadel and Scladina I-4A from Peyrégne et al., 2019 and Mezmaiskaya 1 Prüfer et al., 2017) were merged into a single VCF file. For the low-coverage archaic genomes, one base with a quality of at least 30 was sampled randomly at each position. Positions outside the mappability track ‘map35_100’ (i.e. positions outside regions where all overlapping 35-kmers are unique in the genome, Prüfer et al., 2017) were filtered out, as well as positions without at least one genotype call from the high-coverage archaic genomes. Note that we also checked the bases carried by all sequences overlapping the substitutions in spindle genes from the published BAM files of these archaic human genomes using samtools tview (Danecek et al., 2021).

For each of these datasets, ancestral allele information was retrieved from great ape reference genome assemblies: chimpanzee (panTro4) (Chimpanzee and Analysis, 2005), bonobo (panPan1.1) (Prüfer et al., 2012), gorilla (gorGor3) (Scally et al., 2012), and orangutan (ponAbe2) (Locke et al., 2011) as LASTZ alignments to the human genome GRCh37/hg19 prepared in-house and by the UCSC Genome Browser (Harris, 2007). We defined the ancestral allele as the one carried by at least three of the four ape reference genome assemblies, allowing a third allele or missing information in only one ape reference genome.

Processing the recombination maps

To compute genetic distances, we used a recombination map obtained from crossovers between African and European ancestry tracts in African-Americans (Hinch et al., 2011, available in hg19 coordinates from http://www.well.ox.ac.uk/~anjali/) and a map based on crossovers in parent-offspring pairs from Iceland (deCODE, Halldorsson et al., 2019). The latter was lifted over to hg19 (originally in hg38 coordinates) with the liftover tool and the chain file hg38ToHg19 from UCSC (using a minimum match rate of 0.9 between bases of both assemblies). This resulted in both gaps and overlaps between windows of the recombination map. Therefore, we assumed that the recombination rate in a gap was the average of the two directly adjacent windows, and we truncated the windows that overlap with a previous window (or removed the window if it overlapped completely with the previous one).

Investigating signals of selection in modern humans

For the analysis of positive selection in modern humans, we used bi-allelic single nucleotide polymorphisms (SNPs) from 504 African individuals from the 1,000 Genomes Project phase III (1000 Genomes Project Consortium et al., 2015), excluding African populations that may have European ancestry, that is, African Caribbeans in Barbados (ACB) and individuals with African Ancestry in Southwest US (ASW). We also compiled a list of sites where Africans differ from the common ancestor with chimpanzee. These are positions that are not polymorphic among the 504 African individuals and where six high-coverage African genomes (Mbuti, San and Yoruban individuals from Prüfer et al., 2014) were identical but differed from four great ape reference genome assemblies (panTro4, panPan1.1, gorGor3, and ponAbe2). We then extracted the genotypes of the Altai Neandertal (Denisova 5) or Denisovan (Denisova 3) genomes at these positions, and only considered those that pass the minimal filters for each genome, respectively (Prüfer et al., 2017). Note that we used the genotypes called with snpAD (Prüfer, 2018) and generated in the same study as the minimal filters (Prüfer et al., 2017).

We tested whether the missense substitutions in the spindle genes overlap regions displaying signatures of ancient selective sweeps using the hidden Markov model described in Peyrégne et al., 2017. We executed that method independently for each chromosome and for both the Altai Neandertal and Denisova 3 genomes, using genetic distances computed from the African-American or deCODE recombination maps (Hinch et al., 2011; Halldorsson et al., 2019). We then identified regions around the missense substitutions of spindle genes where the archaic genome fall outside the human variation (‘external’ regions) with a posterior probability above 0.2. We applied this cutoff on the sum of the posterior probabilities of both ‘external’ states (i.e. generated either from drift or from a selective sweep, Peyrégne et al., 2017). We further intersected the regions called with the Altai Neandertal and Denisova 3 genomes and measured the genetic length of the overlap to determine whether there is evidence for selection.

Reconstructing the chronology of the missense substitutions

To get an upper age estimate of the missense substitutions, we estimated the TMRCA between Africans and archaic humans in the regions around the missense substitutions where archaic humans fall outside the human variation. We computed the number of pairwise differences between each African chromosome and each high-coverage archaic human genome (Prüfer et al., 2017; Meyer et al., 2012; Prüfer et al., 2014; Bergström et al., 2020; Mafessoni et al., 2020) sampling a random allele at heterozygous positions in the archaic genomes. The age of the common ancestor of two individuals conditional on the number D of pairwise differences follows a gamma distribution with shape parameter α=D + 1 and rate parameter β=θN+1 (56), where θ is the population scaled mutation rate (i.e. 4Neµ with µ and Ne denoting the mutation rate and the effective population size, respectively) and N is the number of bases with genotype calls in both individuals. This model assumes that the prior probability of coalescence follows an exponential distribution, with rate equal to 1 when time is scaled in units of the diploid effective population size (2Ne), and that both individuals are sampled in the present. To account for a branch shortening S associated with the age of ancient individuals, we truncated the posterior distribution so that the age of the common ancestor cannot be more recent than S and we added S2 as a correction for this branch shortening. The expected TMRCA is then:

E(T|TS)=1β(α+(βS2)αe βS2Γ(α, βS2))+S2(Equation 1, Appendix 7), with Γα,βS2 denoting the upper incomplete gamma function with lower limit βS2 , which we computed with the gammainc() function from the R library expint v0.1–6. We assumed µ is 1.45 × 10–8 mutations per base pair per generation (generation time of 29 years, Scally and Durbin, 2012) and a branch shortening of 50ky for Vindija 33.19 (Prüfer et al., 2017), 70ky for Denisova 3 (Prüfer et al., 2017), 80ky for Chagyrkaya 8 (Mafessoni et al., 2020), and 120ky for the Altai Neandertal (Denisova 5) (Prüfer et al., 2017). We computed the 95% confidence intervals with the qtgamma() function of the R library TruncatedDistributions v1.0. Note that we also estimated the TMRCA in these genomic regions between Africans and a Papuan (HGDP00548), a Sindhi (HGDP00163) or a Biaka (HGDP00461) because they carry the ancestral versions of the missense substitutions in KATNA1, KIF18A, and SPAG5, respectively, and constrain further the upper age estimates of the missense substitutions.

To get a lower age estimate of the missense substitutions, we computed the number of pairwise differences among African chromosomes carrying the derived versions of the missense substitutions from the HGDP dataset (all African individuals, except HGDP00461 for SPAG5). For each region, we identified the two chromosomes that are the most distantly related (with the highest number of differences) and classified all the other African chromosomes into two groups depending on which of the two former chromosomes they are the most closely related in this genomic region. A chromosome was assigned to a group if it shared at least two derived alleles with the African chromosome used for defining this group but no more than one shared derived allele with the African chromosome defining the other group. This removes potential recombinants that could bias downward the estimate of the TMRCA. For each individual, we then counted the number of pairwise differences with individuals from the other group, restricting this analysis to positions that are genotyped in the high-coverage archaic genomes. Finally, we converted these counts into estimates of the TMRCA using the same equation as above, albeit with no branch shortening (i.e. E(T)=αβ).

Testing the hypothesis of gene flow in the KNL1 gene

To test whether the KNL1 haplotype identified in Neandertals could be shared with modern humans because of incomplete lineage sorting, we computed the probability that a haplotype shared since the common ancestral population is as long as, or longer than, the haplotype identified in KNL1. This approach was previously applied to the EPAS1 haplotype shared between Tibetans and Denisovans (Huerta-Sánchez et al., 2014), but we briefly describe it here for completeness. The expected length L for a shared sequence is 1/(r x T), denoting r and T as the recombination rate and the length of the Neandertal and modern human branches since divergence (in generations of 29 years), respectively. As the ancestors of Neandertals and modern humans split from each other around 550kya (Prüfer et al., 2017), the most recent common ancestor shared between modern humans and Neandertals cannot be younger than this split time, in the absence of gene flow. As the modern-like KNL1 haplotype is present in Neandertals that lived about 40kya (Hajdinjak et al., 2018), we set T=550,000–40,000=510,000 years, which corresponds to 17,586 generations of 29 years. Note that we do not include here the length of the modern human branch to be conservative, as we do not know when the substitutions that define the haplotype reached fixation in modern humans. Relying on local recombination rate estimates from the African-American map (0.148cM/Mb, Hinch et al., 2011) and the deCODE map (0.191 cM/Mb, Halldorsson et al., 2019), the expected length L of a haplotype shared through ILS (i.e. 1/(r x T)) is 29.8kb or 38.5kb depending on the recombination map used. Assuming that the length of a shared sequence follows a Gamma distribution with shape parameter 2 and rate parameter 1 /L, the probability that such a sequence is as long as or longer than 276kb is then 1 − GammaCDF(276000, shape = 2, rate = 1 /L). This probability is ≤0.006 and hence a model without gene flow is rejected.

Estimating the time of gene flow

To estimate the time of gene flow, we used an analogous model to that described above (Albers and McVean, 2020, adapted here to account for the branch shortening associated with the age of ancient individuals). In this model, the age of the common ancestor of two individuals conditional on the length L of their shared haplotype follows a gamma distribution with shape parameter α=3 and rate parameter β=2ρL+1, where ρ is the population scaled recombination rate (i.e. 4Ner with r and Ne denoting the recombination rate and the effective population size, respectively). To account for a branch shortening S, we applied again Equation 1, assuming that S is 40,000 years. In the case of the 276kb haplotype in Neandertals that carries alleles that are fixed in present-day humans, we made the conservative assumption that recombinations could not be observed in modern humans (i.e. the alleles were already fixed at the time of introgression) and, therefore, multiplied the age estimates by 2. We did not apply this correction for the estimate based on the 102kb haplotype shared between some present-day humans and Neandertals, as the length of this haplotype depends on the number of recombination events in both Neandertals and modern humans. Using the average recombination rate over the African-American and deCODE recombination maps (0.169cM/Mb), the expected age is 131kya based on the 276kb haplotype in Neandertals and 138kya based on the 102kb shared haplotype. We computed the 95% confidence intervals (83-265kya for the 276kb haplotype and 49-304kya for the 102kb haplotype) with the qtgamma() function of the R library TruncatedDistributions v1.0.

As another estimate for the time of gene flow, we look at the genetic distance between modern humans and Neandertals. For this purpose, we estimated the TMRCAs between Chagyrskaya 8 (the only high coverage genome with this haplotype) and each present-day human genome from the HGDP dataset in the region chr15:40,885,107–40,963,160 (hg19). We randomly sampled one allele at heterozygous positions where the phase is unknown in these genomes (i.e., every heterozygous position of Chagyrskaya 8). The TMRCA conditional on the number D of pairwise differences follows a gamma distribution with parameters α=D + 1 and β=θN+1 (see dating analysis of the missense substitutions). However, conditional on both the observed divergence and the length of the shared haplotype, the TMRCA follows a gamma distribution with shape parameter α=D + 3 and rate parameter β=θN+2ρL+1. In both cases, we accounted for the branch shortening of Chagyrskaya 8 as described above (Equation 1, S=80,000 years) and assumed 1.45 × 10–8 mutations per base pair per generation (generation of 29 years). The 95% confidence intervals were computed as described above.

As these analyses might be sensitive to the value of µ, we tested whether µ in the region of KNL1 may differ from the genome-wide average by computing local estimates of the mutation rate in family trios from Iceland (Jónsson et al., 2017). We counted the number of de novo mutations in the genomes of the probands and divided this number by twice the length of the region and the number of trios in this dataset (1,548) to get a mutation rate estimate. This estimate was similar to the genome-wide average (Appendix 8).

Testing whether the modern-like KNL1 haplotype was under selection in Neandertals

To test whether the increase of the modern-like KNL1 haplotype frequency in Neandertals is unexpected, we quantified how often positions in the genome exhibit similar frequency changes. We identified positions where early Neandertals (the Hohlenstein-Stadel, Scladina I-4A and Denisova 5 Neandertals) carry the ape-like allele, whereas at least one individual carries the derived allele both among those that lived 60-80kya (Chagyrskaya 8, Mezmaiskaya 1, Denisova 11) and among those that lived 40-50kya (Vindija 33.19, Goyet Q56-1, Mezmaiskaya 2, Les Cottés Z4-1514 and Spy 94a). As noted above, only one random allele was considered for the low coverage genomes, in contrast to two for the high-coverage genomes. Positions with more than one missing allele among early Neandertals (out of four alleles) or three missing alleles (out of six) among the late Neandertals were filtered out. We further removed transitions and positions less than 50kb away from the previously ascertained position. We then computed the proportion of those positions where at least three of the later individuals (i.e. those that lived 40-50kya) carried the derived allele.

Detecting evidence of Neandertal gene flow into non-Africans

Twelve non-African genomes from the HGDP dataset share a 96 to 139ky-old common ancestor with Chagyrskaya 8 in the KNL1 region chr15:40,818,035–41,094,166 in hg19 coordinates (inferred with Equation 1 as described in Estimating the time of gene flow). Gene flow is necessary to explain this recent common ancestor, but determining the direction of this gene flow is complicated because Chagyrskaya 8 inherited a modern human-like haplotype in this region. To test whether these 12 non-Africans inherited this copy from Neandertals, we looked for evidence of putative archaic alleles linked to the haplotype in 40 kb windows adjacent to the segment that is modern human-like in Chagyrskaya 8 (i.e. chr15:40,778,035–40,818,035 and chr15:41,094,166–41,134,166, in hg19 coordinates). We defined these putative archaic variants as those shared with at least one Neandertal genome but absent from Africans. We used vcftools (Danecek et al., 2011) to compute the frequency of derived alleles in non-Africans and Africans, revealing that for seven positions in the downstream region (chr15:41,094,166–41,134,166, hg19) the twelve individuals are the only modern humans (out of 929) with a derived allele seen in Neandertal genomes (Figure 4—figure supplement 1).

To check whether these is evidence of rare ancestral alleles at positions with missense changes in spindle-associated genes, we used the browser of the gnomAD database (v2.1.1, Karczewski et al., 2020), which contains 125,748 exome sequences and 15,708 whole-genome sequences from unrelated individuals.

Appendix 1

Appendix 1—table 1
Location and predicted effects of the studied amino acid changes in spindle proteins, as reported in dbNSFP version 4.2 (48).

The predictions are for the ancestral variants. We put “damaging” in between quotation marks as the ancestral versions of ATRX and KATNA1 are unlikely to be damaging (as the ancestral amino acid residues are found in the proteins of many species), but that prediction rather supports a function for these amino acid changes.

proteinpositionamino acid changeprotein domain (Uniprot)effect prediction for the ancestral variant (MutPred)potentially “damaging” ancestral variant according to:
ATRX475D ->H--FATHMM;
M-CAP
KATNA1343A ->T--FATHMM;
PrimateAI
KIF18A67R ->Kkinesin motor (11-355)Loss of methylation (P=0.0087)-
KNL1159H ->Rinteraction domain with BUB1 and BUB1B (1-728)--
1,086G ->S2 × 104 AA approximate repeats
(855–1201)
Loss of phosphorylation (P=0.0382)-
NEK6291D ->Hprotein kinase domain
(45-310)
--
RSPH1213K ->Q---
SPAG543P ->S-Loss of phosphorylation (P=0.0244)
Gain of catalytic residue (P=0.0179)
-
162E ->G---
410D ->H---
STARD93,925A ->T---
Appendix 1—table 2
Deleteriousness and conservation scores at the studied positions with missense changes in spindle genes, as reported in dbNSFP version 4.2 (48).

A high CADD score indicates that the ancestral variant is likely to be deleterious (Kircher et al., 2014; Rentzsch et al., 2019; Pollard et al., 2010) and a high conservation score means that the nucleotide position is highly conserved across species (100 vertebrates for phyloP and phastCons (Siepel et al., 2005), and 34 mammals for GERP ++RS, Davydov et al., 2010). In contrast to the other scores that correspond to a single position, phastCons is a measure of the conservation in the region around the position. In dbNSFP, the scores range from –6.458163–18.301497 for CADD, from –12.3–6.17 for GERP ++RS, from –20.0–10.003 for phyloP and from 0 to 1 for phastCons.

geneposition (hg19) rs IDcorresponding amino acid changeDeleteriousnessConservation
CADD score (hg19)GERP
++RS score
phyloP 100way vertebrate scorephastCons 100way vertebrate score
KATNA16–149,918,766
rs73781249
A343T2.0510335.484.8341.000
SPAG517–26,925,570
NA
P43S–0.425670–3.57–1.4040.000
17–26,919,777
NA
E162G0.2963173.660.2800.001
17–26,919,034
NA
D410H1.0627435.42.0321.000
KNL115–40,912,860
rs755472529
H159R0.4754543.7–0.0160.001
15–40,915,640
NA
G1086S0.8017874.121.0260.054
KIF18A11–28,119,295
rs775297730
R67K1.1345892.620.5250.845
NEK69–127,113,155
rs146443565
D291H–0.293112–1.563.2841.000
ATRXX-76,939,325
rs146863015
D475H–0.1416063.640.7910.840
RSPH121–43,897,491
rs146298259
K213Q–0.0610531.810.6720.079
STARD915–42,985,549
rs573215252
A3925T–0.351117–2.510.0470.000

Appendix 2

Appendix 2—table 1
Age estimates of the missense substitutions in spindle genes.

The ages were estimated in the regions where the Altai Neandertal and Denisova 3 genomes fall outside the human variation (intersection of the regions identified with the African-American and deCODE recombination maps). The lower age corresponds to the mean age of the ancestor of multiple present-day African chromosome pairs. The upper age corresponds to the mean age of the common ancestor shared between each present-day African chromosome and either the archaic genome with the least number of differences (excluding Chagyrskaya 8 for KNL1) or a present-day human with an ancestral version of the missense variant(s).

GenechromosomeRegion (hg19)Lower age (kya)Upper age (kya)
ATRXX76,703,773–77,246,471NANA
KATNA16149,840,973–149,930,4258631,329
KIF18A1128,018,167–28,304,2938431,006
KNL11540,898,141–40,948,3061,0271,690
NEK69127,109,510–127,113,614NANA
RSPH12143,897,417–43,897,549NANA
SPAG51726,875,942–27,045,524677796
STARD91542,941,540–42,989,1609471,401

Appendix 3

Appendix 3—table 1
Coverage depth in archaic human genomes at positions with modern human-specific missense substitutions in spindle genes.

The numbers of DNA fragments carrying a particular base are reported in parentheses after the corresponding base. Bases in uppercase were sequenced in the forward orientation, whereas those in lowercase were sequenced in the reverse orientation. Bases that are modern human-like (derived) are highlighted with an asterisk and may represent present-day human DNA contamination or an allele shared with modern humans. The bases that are compatible with a damage-induced substitution (from the ancestral allele) are highlighted in bold (i.e., C-to-T and G-to-A in the forward and reverse orientation, respectively).

GeneATRXKATNA1KIF18AKNL1NEK6RSPH1SPAG5STARD9
Chr-PositionX-76,939,3256–149,918,76611–28,119,29515–40,912,86015–40,915,6409–127,113,15521–43,897,49117–26,919,03417–26,919,77717–26,925,57815–42,985,549
AncestralCCCAGGTCTGG
DerivedGTTGACGGCAA
Altai NeandertalC (21)
c (31)
C (22)
c (15)
T* (2)
C (19)
c (41)
T* (1)
A (24)
a (28)
G (33)
g (16)
G (23)
g (23)
T (18)
t (26)
C (17)
c (17)
T (18)
t (20)
a (1)
G (28)
g (17)
G (13)
g (16)
A* (1)
Chagyrskaya 8C (10)
c (9)
T (1)
t (1)
C (13)
c (5)
C (15)
c (19)
T* (3)
A (1)
G* (14)
g* (11)
A* (19)
a* (8)
G (5)
g (7)
a (1)
T (17)
t (13)
C (11)
c (7)
T (1)
T (16)
t (11)
G (7)
g (6)
a* (1)
G (8)
g (13)
Denisova 3C (19)
c (17)
C (15)
c (13)
T* (2)
C (17)
c (24)
A (20)
a (30)
G (25)
g (17)
a* (1)
G (15)
g (14)
T (19)
t (27)
C (20)
c (14)
T (16)
t (11)
G (12)
g (6)
G (17)
g (12)
Denisova 11C (1)
c (2)
NAC (1)a (2)g (1)G (1)
g (1)
T (4)
t (2)
c (2)
t (1)
NANAG (1)
Goyet Q56-1C (1)NAC (1)
c (5)
G* (1)A* (1)
a* (2)
G (1)
g (2)
NAC (1)T (1)G (4)
g (2)
G (2)
Hohlenstein-StadelNANANANANANANANANANANA
Les Cottés Z4-1514C (1)
c (1)
c (2)
T* (1)
C (5)
c (4)
A (2)
a (4)
G (3)
g (2)
NAT (2)
t (1)
NAT (1)G (1)G (1)
Mezmaiskaya 1c (3)NAC (1)
c (1)
NAG (2)NAt (2)C (2)
c (1)
g* (2)
NANANA
Mezmaiskaya 2C (1)C (1)
c (1)
T* (1)
C (1)g* (1)A* (1)G (2)
g (1)
NAC (1)T (1)G (2)g (3)
Scladina I-4ANANANANANANANANANANANA
Spy 94 ANAc (1)C (1)g* (1)A* (1)
a* (3)
NAT (1)C (1)T (1)g (1)NA
Vindija 33.19C (16)
c (18)
T (1)
C (8)
c (11)
C (17)
c (20)
T* (1)
A (13)
a (19)
G (20)
g (16)
a* (2)
G (8)
g (8)
T (22)
t (15)
C (12)
c (14)
T (15)
t (7)
G (15)
g (13)
a* (1)
G (14)
g (14)
Appendix 3—table 2
Coverage depth of the Mezmaiskaya 1 genome at positions with modern human-specific substitutions in SPAG5.

Only positions covered by at least one DNA sequence are reported. Bases in uppercase were sequenced in the forward orientation, whereas those in lowercase were sequenced in the reverse orientation. The numbers of DNA fragments carrying a particular base are reported in parentheses after the corresponding base.

NeandertalChr-position (rs ID)AncestralDerivedAllele counts
Mezmaiskaya 117–26,864,608
(rs188710272)
AGA (1)
17–26,891,162
(NA)
TGT (1)
17–26,892,376
(NA)
ATA (2)
17–26,913,024
(NA)
AGa (a)
17–26,919,034
(NA)
CGC (2)
c (1)
g (2)
17–26,948,236
(NA)
GAg (1)
17–26,967,723
(rs558276956)
AGA (3)
17–27,005,275
(NA)
GAG (1)
17–27,010,483
(NA)
GAg (1)

Appendix 4

Appendix 4—table 1
Positions defining the closely related haplotype between some modern humans and Neandertals.

At these positions, the Chagyrskaya 8 genome differs from other high-quality archaic genomes without the modern human-like haplotype but some African genomes from the HGDP dataset carry the same allele as Chagyrskaya 8. Note that the modern human-like haplotype in Neandertals is longer and defined by alleles that are shared with all modern humans (Figure 3).

ChromosomePosition (hg19) rs IDReferenceAlternative (Chagyrskaya 8-like)Chagyrskaya 8-like allele frequency in genomes from the HGDP dataset
AfricansNon-Africans
1540,885,107
rs16970851
AG0.320.41
40,886,017
rs8034043
CT0.320.41
40,886,020
rs8034048
CG0.320.40
40,892,601
rs11855923
GA0.350.40
40,893,573
rs12905162
CA0.380.40
40,905,450
rs11856438
CT0.370.41
40,908,904
rs11852670
AG0.390.41
40,910,707
rs12914743
TC0.380.41
40,915,045
rs8041534
TG0.380.41
40,915,894
rs11070285
TC0.390.41
40,925,214
rs11856802
TA0.390.41
40,926,654
rs11854986
CG0.350.40
40,929,814
rs11070286
TC0.370.41
40,937,647
rs3092979
AG0.380.41
40,959,413
rs73396515
GA0.360.10
40,959,624
rs35047458
GA0.360.40
40,960,432
rs12902568
GA0.360.40
40,963,160
rs7182530
AG0.370.41
40,987,528
rs1801320
GC0.380.11

Appendix 5

Appendix 5—table 1
Origin of the modern human genomes from the HGDP dataset (Bergström et al., 2020) with a KNL1 copy inherited from Neandertals.
samplepopulationregion
HGDP00125HazaraCentral South Asia
HGDP00547Papuan SepikOceania
HGDP00639BedouinMiddle East
HGDP00696PalestinianMiddle East
HGDP00714CambodianEast Asia
HGDP00774HanEast Asia
HGDP00822HanEast Asia
HGDP00954YakutEast Asia
HGDP00960YakutEast Asia
HGDP00966YakutEast Asia
HGDP01023HanEast Asia
HGDP01181YiEast Asia

Appendix 6

Appendix 6—table 1
Allele counts at positions with nearly fixed missense variants in the spindle genes of modern humans from the gnomAD database (v2.1.1), (Karczewski et al., 2020).

Columns 7–8 and 9–10 correspond to the allele counts among the 125,748 whole-exome sequences (WES) and the 15,708 whole-genome sequences (WGS), respectively. Anc = Ancestral

GeneChr-Position (rd ID)Anc(nearly) fixedAllelesVEP Annot.# Anc (WES)Total (WES)# Anc (WGS)Total (WGS)
ATRXX-76,939,325
(rs146863015)
CGG-Cmissense66182,7451122,042
KATNA16–149,918,766
(rs73781249)
CTT-Cmissense259251,19013131,400
KIF18A11–28,119,295
(rs775297730)
CTT-Cmissense26249,508131,396
KNL115–40,912,860
(rs755472529)
AGG-AmissenseNANA131,368
G-Tmissense1227,420NANA
15–40,915,640 (NA)GAA-GmissenseNANANANA
NEK69–127,113,155
(rs146443565)
GCC-Gmissense164250,1402631,404
RSPH121–43,897,491
(rs146298259)
TGG-Tmissense236251,4143031,386
G-Astop gained10251,414131,386
SPAG517–26,919,034
(NA)
CGG-CmissenseNANANANA
17–26,919,777 (NA)TCC-Amissense3251,430NANA
17–26,925,578 (NA)GAA-GmissenseNANANANA
A-Tmissense1251,066NANA
STARD915–42,985,549
(rs573215252)
GAA-Gmissense5139,342331,284
A-Cmissense5139,342NANA

Appendix 7

Derivation of Equation 1

To compute the Time to the Most Recent Common Ancestor (TMRCA) for pairs of modern and archaic humans, we used a published model (Albers and McVean, 2020) that we adapted to account for the branch shortening associated with the age of the archaic individual. We simply truncated the posterior distribution of the TMRCA obtained with this model so that the TMRCA cannot be more recent than the age of the archaic individual, and added a correction to account for missing mutations on the archaic branch. Here, we describe how we derived Equation 1 to compute the expected TMRCA with these modifications from the original model.

The expectation of the truncated distribution is:

E(X|XT)= E(X  XT)P(XT), with X denoting the time to coalescence and T the truncated time, =Tx Gamma(x|α,β) dxTGamma(x|α,β) dx , with Gamma(x|α,β) denoting the probability density function of the gamma distribution (with parameters α and β defined as in Albers and McVean, 2020),

=TxαeβxdxTxα1eβxdx=1βT(βx)αeβxdxT(βx)α1eβxdx=1ββTyαeydyβTyα1eydy

=1βΓ(α+1,βT)Γ(α,βT), with Γ(α,βT) denoting the upper incomplete gamma function.

By integration by parts, one can show that Γα+1,βT=αΓα,βT+(βT)αe-βT to obtain:

E(X|XT)=1β(α+(βT)αe βTΓ(α, βT))

As this model assumes that the two lineages diverge for the same amount of time (i.e., for a given coalescent time T the total branch length is 2T), we therefore set T=S2 , with S denoting the age of the ancient specimen, and added S2 as a correction for the branch shortening (i.e., adds S to the total branch length). This leads to Equation 1:

E(TMRCA|TMRCAS)=1β(α+(βS2)αe βS2Γ(α, βS2))+S2

Appendix 8

Local mutation rate in the region of KNL1

For estimating the age of common ancestors shared between modern humans and Neandertals in the region of KNL1, we assumed the genome-wide average of 1.45 × 10–8 mutations per base pair per generation (i.e., the mutation rate used to estimate the population split times between the ancestors of modern humans and Neandertals Prüfer et al., 2017; Prüfer et al., 2014; estimated from Scally and Durbin, 2012). As the age estimates may be sensitive to the mutation rate used, we also estimated the local mutation rate among family trios from Iceland (Jónsson et al., 2017). We counted the number of de novo mutations in the genomes of the probands and divided this number by twice the length of the region and the number of trios in this dataset (1,548) and arrive at a mutation rate estimate of 0.94 [95% Binomial CI: 0.4–1.8] x 10–8 mutations per base pair per generation in the KNL1 region where modern humans and Chagyrskaya 8 share a haplotype. However, there were only 8 de novo mutations among the Icelandic genomes in this region. If we extend that region by 500 kb on both sides to increase the accuracy of the estimate, the mutation rate is 1.1 [95% Binomial CI: 0.9–1.6] x 10–8 mutations per base pair per generation (45 de novo mutations in this extended region), which is similar to the genome-wide average mutation rate of 1.29 × 10–8 estimated in the original study of the trios (Jónsson et al., 2017). Therefore, there is no evidence that the mutation rate is different from the genome-wide average in this particular region of the genome.

Data availability

The current manuscript is a computational study, so no data have been generated for this manuscript. Software used are freely and widely available from previous publications and original statistical analyses are described in the main text and the Appendices.

The following previously published data sets were used
    1. Gene Ontology Consortium
    (2021) The Gene Ontology resource
    ID goa-human, go-basic. The human Gene Ontology Annotation and the basic Gene Ontology terms.

References

  1. Book
    1. Harris RS
    (2007)
    Improved Pairwise Alignment of Genomic DNA
    United States: The Pennsylvania State University.

Decision letter

  1. Emilia Huerta-Sanchez
    Reviewing Editor; Brown University, United States
  2. George H Perry
    Senior Editor; Pennsylvania State University, United States
  3. Christian Huber
    Reviewer; Pennsylvania State University, United States

Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "The evolutionary history of human spindle genes includes back-and-forth gene flow with Neandertals" for consideration by eLife. Your article has been reviewed by 2 peer reviewers (also read by the reviewing editor), and the evaluation has been overseen by a Reviewing Editor and George Perry as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Christian Huber (Reviewer #2).

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

All reviewers were enthusiastic about the manuscript, and with some revision it will clearly be suitable for publication in eLife. There were a number of clarifications that the reviewers felt could strengthen the paper. These included:

1) Comparisons with multiple versions of the variant calling of the Altai Neanderthal genome, and how using different versions could affect the results.

2) The computations of the p-values associated with multiple missense substitutions (see Reviewer 2's comments).

3) What is happening in spindle genes in the human lineage with respect to positive selection and purifying selection? Why did Neanderthals not inherit mutations that were present in the ancestral population of Neanderthals and modern humans (based on their age estimates)? A more explicit discussion of the role of natural selection in explaining the various patterns that are being observed will be helpful.

4) A complex evolutionary history is being proposed for the KNL1 gene. There were parts of the proposed model that require more justification/clarification:

a. Are the estimates of the variants in Neanderthals consistent with estimates from Hubisz et al., 2020?

b. The mutations have estimated ages that are very old, could the authors explain why introgression into Africans from an unrecognized archaic population in Africa or from an ancestral Neanderthal population in Africa is not a more parsimonious explanation than gene flow from humans into Neanderthals?

c. The evidence that the modern-like KNL1 haplotype was introgressed into non-Africans is not that convincing. There is no methods section for the section “Reintroduction of the modern-like KNL1 haplotype into non-Africans.” Who are these 12 individuals who have 7-13 differences? Is the haplotype present outside of Africa very different than the Neanderthal-like haplotype in Africans? Since this haplotype is also present in Africa (as it was called human specific), how do the authors rule out that what is present in those 12 individuals is not just shared variation between non-Africans and Africans? Has this region been previously identified as introgressed in available introgression maps?

d. Why is the ancestral version of KNL1 not found in present-day humans? Wouldn’t this haplotype have been at large frequency in Neanderthals?

Reviewers had a number of other additional suggestions that we felt would improve the manuscript. Please do review and respond to the individual reviewer comments as well.

Reviewer #1 (Recommendations for the authors):

There are multiple versions of the variant calling of archaic genomes (e.g., Altai 2013/2014 and Altai 2016 are both being used, but these two versions differ by ~1.5 fold in terms of the numbers of variants). Could the authors comment on how variant calling affects the number of recent human missense substitutions identified in all genes and how it affects their results? I imagine variant calling could significantly impact candidate genes studies like this one. Could the author address and discuss this issue and suggest a best practice for other such candidate genes studies.

Page 3. A p-value for the multiple missense substitutions is provided on page 3, lines 75-76. Based on their Methods, this seems to be computed by random sampling from observed polymorphism. However, this process does not rely on an evolutionary model, and in this test, “substitution event = fixation event” (as they are just random draws) and they behave independently. Yet, on page 4 (line 114), the authors propose using a single fixation event resulting in multiple substitutions to explain the substitutions. In this case, after the first mutation occurred, the second mutation happened in the background with the first mutation, then the two mutations was fixed through one fixation event. Is this much more likely than two independent fixation events? I’d like to see some discussion about it before two independent fixation hypothesis is dismissed. If one fixation is more likely, p-values computed based on the number of “two or more independent draws” does not seem to be valid. This distinction could be quite relevant as the entire survey of the spindle genes is based on spindle genes being statistical outliers.

Could the authors comment more on how they get the segment length and how confident/precise are the segments’ lengths? I only see point estimates for the segment lengths. Is it because a hard threshold is used?

I am a bit surprised that the synonymous substitutions in the spindle genes are not investigated in this study, as they also seem relevant here. Is there any particular technical reasons not working with them? Could the authors provide more information about the current spindle gene polymorphisms within human lineage? Suppose these spindle genes were hotspots in modern human adaptation, we might expect signals of positive selection followed by purifying selection in the human lineage (e.g., in AFR population), realized as πN/πS decrease with time? It seems possible to do this using inferred allele age. This could further justify the study on spindle genes.

The fact that three out of five late Neanderthals carry human-like KNL1 points to one of these possibilities: adaptive introgression, drift, incomplete lineage sorting, and high admixture fraction (with humans). Based on the fact that at least three late Neanderthals also carry 2,773 out of 7,881 possibly introgressed human sites, the author can only dismiss the adaptive introgression hypothesis. How about the others? In particular, does the high frequency contradict the ~3% introgression fraction estimated by Hubisz et al., 2020? If not, how would the authors explain the high observed frequency across all possibly human introgressed sites? Could this suggest contamination or other technical cause?

This study frequently used the length of the segment in testing/estimating gene flow, neutrality, and time. The lengths of the inferred introgressed segments depend on the time of gene flow and the admixture proportion. The observed human to archaic introgression fraction seems much higher than a small percentage and hence may not be negligible when predicting the segment length distribution. Could the authors explain more about whether simplifications/assumptions are involved in their inference? And how these may affect their results.

https://doi.org/10.7554/eLife.75464.sa1

Author response

Essential revisions:

All reviewers were enthusiastic about the manuscript, and with some revision it will clearly be suitable for publication in eLife. There were a number of clarifications that the reviewers felt could strengthen the paper. These included:

1) Comparisons with multiple versions of the variant calling of the Altai Neanderthal genome, and how using different versions could affect the results.

We have confirmed that the number of fixed missense variants in the spindle genes does not differ depending on whether the Altai genome is genotyped with GATK or snpAD.

We have used the most up-to-date and accurate genotypes available for all analyses. These are the genotypes generated by snpAD, which models the C-to-T substitutions originating from cytosine deamination correctly and therefore reduces the number of genotyping errors at deaminated sites.

2) The computations of the p-values associated with multiple missense substitutions (see Reviewer 2's comments).

As we describe in the response to Reviewer 2 below, what we test here is if the number of inferred amino acids substitutions in proteins involved in spindle function is unexpected given the overall number of amino acids substitutions in modern humans. This is the case. To us it suggests that spindle function may have changed in modern humans. This is what attracted our interest and motivated the analyses that are presented in the paper.

As the reviewer points out, it is possible, or even likely, that the fixations in KNL1 and SPAG5 may go back to single haplotypes that became fixed. We now clarify (lines 75-79; Methods section lines 387-389) that it is the number of amino acid substitutions in this group of functionally related proteins that attract our interest, not the number of genes affected, which is not significantly different from what might be expected by chance (p=0.28). This is illustrated by the fact that only five genes across the genome carry two or more fixed missense substitutions in modern humans, two of them being SPAG5 and KNL1.

3) What is happening in spindle genes in the human lineage with respect to positive selection and purifying selection? Why did Neanderthals not inherit mutations that were present in the ancestral population of Neanderthals and modern humans (based on their age estimates)? A more explicit discussion of the role of natural selection in explaining the various patterns that are being observed will be helpful.

Without direct evidence for selection we feel that any discussion about selection on spindle genes in general would be highly speculative. It is possible and compatible with the data that some new spindle function(s) evolved on the modern human lineage and was positively selected. Perhaps the epistatic interactions among two or more genes was beneficial. Eventually, functional analyses might be able to address these hypotheses (line 354-356).

As for the absence of the derived alleles in archaic humans, even more speculations are possible. If the alleles were at low frequency in the ancestral population, archaic humans may not have inherited them by chance. The ancestral population may have been structured so that the alleles spread mostly in the subpopulation that contributed to modern humans. Alternatively, archaic humans may have inherited the alleles but they could have been lost due to drift, particularly if there were some population bottlenecks or founder effects when archaic humans split from modern humans. Or they may not have been positively selected in the archaic groups due to some other genetic or environmental factor. We do not have any evidence or arguments to prefer any one of these scenarios and would prefer to abstain from what we feel would be excessive speculation.

4) A complex evolutionary history is being proposed for the KNL1 gene. There were parts of the proposed model that require more justification/clarification:

a. Are the estimates of the variants in Neanderthals consistent with estimates from Hubisz et al., 2020?

We do not bring any new data regarding the proportion of admixture from modern humans to Neandertals. Therefore, there is no inconsistency with the results from Hubisz et al., 2020.

b. The mutations have estimated ages that are very old, could the authors explain why introgression into Africans from an unrecognized archaic population in Africa or from an ancestral Neanderthal population in Africa is not a more parsimonious explanation than gene flow from humans into Neanderthals?

There are two scenarios raised here:

1: The KNL1 haplotype would come from an unrecognized archaic group. The age estimate of the common ancestor between the KNL1 haplotypes of some Neandertals and modern humans is ~200,000 years ago. This makes a scenario where an unknown archaic group would have had to contribute this haplotype to modern humans after 200,000 years ago and it would have spread to become fixed in all modern humans. This group would also have had to be present in Eurasia and contribute this haplotype to Neandertals. This scenario is also made complicated by the fact that the diversity among KNL1 haplotypes in humans suggest that the age of their MRCA is about 1,027,000 years ago.

2: Could the gene flow have been from early Neandertals to modern humans rather than vice versa. Whereas the KNL1 substitutions are fixed in modern humans and sit in a region were the MRCA is more than a million years old, the substitutions are present in just some Neandertals and, at least in the Chagyrskaya Neandertal (homozygous derived at those positions), the diversity in the region is small as both copies of the haplotype show few differences. Furthermore, the KNL1 haplotype with the derived amino acid substitutions is unique across the Neandertal genome in terms of being unusually diverged from the haplotypes without the derived amino acid substitutions (Figure 4A). This is most compatible with gene flow from early modern humans into Neandertals.

c. The evidence that the modern-like KNL1 haplotype was introgressed into non-Africans is not that convincing. There is no methods section for the section “Reintroduction of the modern-like KNL1 haplotype into non-Africans.” Who are these 12 individuals who have 7-13 differences? Is the haplotype present outside of Africa very different than the Neanderthal-like haplotype in Africans? Since this haplotype is also present in Africa (as it was called human specific), how do the authors rule out that what is present in those 12 individuals is not just shared variation between non-Africans and Africans? Has this region been previously identified as introgressed in available introgression maps?

The identifiers of the 12 individuals are provided in Supplement File 2, and we have now added Appendix 5 –Table 1 describing their populations of origin. We have also added a Methods section that describes this analysis (lines 563-579).

There is no evidence for the presence of such haplotype in Africans, as shown in Figure 4C (empty dotted square), and we describe the difference with other haplotypes on lines 289-292. The haplotype in these 12 individuals exhibits 7 to 13 differences to the Chagyrskaya genome. For comparison, other modern human genomes (Africans and non-Africans) exhibit 54 to 179 differences to the Chagyrskaya genome.

As suggested by the reviewer, we checked available introgression maps and the haplotype does not overlap any previously described Neandertal segment in the genomes of present-day people. This is not unexpected given that the haplotype is modern human-like and only seen in the Chagyrskaya Neandertal genome that became available after those maps were generated.

d. Why is the ancestral version of KNL1 not found in present-day humans? Wouldn’t this haplotype have been at large frequency in Neanderthals?

Although this is an interesting observation, the answer to the question “why?” will ultimately depend on functional studies of these changes. At the moment we simply speculate that the ancestral alleles in KNL1 were perhaps deleterious in modern humans (lines 310-312) and so were lost after contact with Neandertals.

Reviewers had a number of other additional suggestions that we felt would improve the manuscript. Please do review and respond to the individual reviewer comments as well.

Reviewer #1 (Recommendations for the authors):

There are multiple versions of the variant calling of archaic genomes (e.g., Altai 2013/2014 and Altai 2016 are both being used, but these two versions differ by ~1.5 fold in terms of the numbers of variants). Could the authors comment on how variant calling affects the number of recent human missense substitutions identified in all genes and how it affects their results? I imagine variant calling could significantly impact candidate genes studies like this one. Could the author address and discuss this issue and suggest a best practice for other such candidate genes studies.

We have used the most up-to-date and accurate genotypes available for all analyses. These genotypes were called with snpAD, which is now the standard genotype caller for ancient DNA. Older genotype calls generated with GATK, which does not model ancient DNA damage, i.e. C-to-T substitutions originating from cytosine deamination, include more errors as described by Prüfer et al., 2017 and Prüfer et al., 2018.

Page 3. A p-value for the multiple missense substitutions is provided on page 3, lines 75-76. Based on their Methods, this seems to be computed by random sampling from observed polymorphism. However, this process does not rely on an evolutionary model, and in this test, “substitution event = fixation event” (as they are just random draws) and they behave independently. Yet, on page 4 (line 114), the authors propose using a single fixation event resulting in multiple substitutions to explain the substitutions. In this case, after the first mutation occurred, the second mutation happened in the background with the first mutation, then the two mutations was fixed through one fixation event. Is this much more likely than two independent fixation events? I’d like to see some discussion about it before two independent fixation hypothesis is dismissed. If one fixation is more likely, p-values computed based on the number of “two or more independent draws” does not seem to be valid. This distinction could be quite relevant as the entire survey of the spindle genes is based on spindle genes being statistical outliers.

What we test here is if the number of inferred amino acids substitutions in proteins involved in spindle function is unexpected given the overall number of amino acids substitutions seen in modern humans. That this is the case suggests to us that spindle function may have changed in modern humans. This is what attracts our interest and motivates the analyses that are then presented in the paper.

As the reviewer points out, it is possible, or even likely, that the fixations in KNL1 and SPAG5 may go back to single haplotypes that became fixed. We now clarify (lines 77-79) that it is number of amino acid substitution in this group of functionally related proteins that attracts our interest, not the number of genes affected, which is not significantly different from what might be expected by chance (p=0.28). This is illustrated by the fact that only five genes across the genome carry two or more fixed missense substitutions in modern humans, two of them being SPAG5 and KNL1.

Could the authors comment more on how they get the segment length and how confident/precise are the segments’ lengths? I only see point estimates for the segment lengths. Is it because a hard threshold is used?

The segments’ length was determined by a hard cutoff on the posterior decoding probability. We assumed that the segment ended when that probability dropped below 0.2 (lines 445-447). As seen in Figure 1, the probability drops rather steeply so that different cutoffs do not substantially change the length of these regions.

I am a bit surprised that the synonymous substitutions in the spindle genes are not investigated in this study, as they also seem relevant here. Is there any particular technical reasons not working with them?

We studied non-synonymous changes because they are more likely to affect the function of the protein.

Could the authors provide more information about the current spindle gene polymorphisms within human lineage? Suppose these spindle genes were hotspots in modern human adaptation, we might expect signals of positive selection followed by purifying selection in the human lineage (e.g., in AFR population), realized as πN/πS decrease with time? It seems possible to do this using inferred allele age. This could further justify the study on spindle genes.

Our focus in this paper is to investigate the evolutionary history of the 11 amino acid changes in the spindle-associated proteins since the split with archaic humans. There is evidence for positive selection around the missense changes in SPAG5, KIF18A and ATRX (lines 106-110 and 129-133). There is also some hint that there could have been negative selection against the ancestral KNL1 haplotype in modern humans (lines 310-312), but there is no compelling evidence for positive selection for the modern human haplotype in Neandertals.

The fact that three out of five late Neanderthals carry human-like KNL1 points to one of these possibilities: adaptive introgression, drift, incomplete lineage sorting, and high admixture fraction (with humans). Based on the fact that at least three late Neanderthals also carry 2,773 out of 7,881 possibly introgressed human sites, the author can only dismiss the adaptive introgression hypothesis.

We think that there is a misunderstanding here, the 7,881 sites which we used to assess the change in frequency of derived alleles over time in Neandertals do not carry introgressed modern human alleles. They are a random set of sites that happen to be segregating in our sample of Neandertal genomes (some of which have very low coverage, which explains the low number of sites investigated).

How about the others? In particular, does the high frequency contradict the ~3% introgression fraction estimated by Hubisz et al., 2020? If not, how would the authors explain the high observed frequency across all possibly human introgressed sites? Could this suggest contamination or other technical cause?

We believe there is no contradiction with that previous study as mentioned in a response above, since the 2,773 SNPs for which we assess frequency changes in Neandertals are not introgressed from modern humans. We have clarified this in the text (lines 275-276)

This study frequently used the length of the segment in testing/estimating gene flow, neutrality, and time. The lengths of the inferred introgressed segments depend on the time of gene flow and the admixture proportion. The observed human to archaic introgression fraction seems much higher than a small percentage and hence may not be negligible when predicting the segment length distribution. Could the authors explain more about whether simplifications/assumptions are involved in their inference? And how these may affect their results.

We do not estimate the archaic introgression fraction as indicated in the two answers above. Hubisz et al., (2020) estimate 3% and this is low enough that recombination between different introgressed haplotypes is unlikely to be common. If the frequency of the KNL1 haplotype were high, recombination among copies of this haplotype may bias the estimate based on the haplotype lengths, but note that, in addition to the haplotype lengths, the conclusion of gene flow and its dating rely on the divergence between Neandertal and modern human haplotypes (i.e. lines 257-260).

https://doi.org/10.7554/eLife.75464.sa2

Article and author information

Author details

  1. Stéphane Peyrégne

    Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
    Contribution
    Conceptualization, Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review and editing
    For correspondence
    stephane_peyregne@eva.mpg.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9823-9102
  2. Janet Kelso

    Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
    Contribution
    Conceptualization, Resources, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3618-322X
  3. Benjamin M Peter

    Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
    Contribution
    Conceptualization, Resources, Supervision, Writing – review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-2526-8081
  4. Svante Pääbo

    Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
    Contribution
    Conceptualization, Funding acquisition, Methodology, Project administration, Resources, Supervision, Writing – review and editing
    For correspondence
    paabo@eva.mpg.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4670-6311

Funding

Max Planck Society

  • Stéphane Peyrégne
  • Janet Kelso
  • Svante Pääbo
  • Benjamin M Peter

European Research Council (694707)

  • Svante Pääbo

NOMIS Foundation

  • Svante Pääbo

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Felipe Mora-Bermúdez, Wieland Huttner, Divyaratan Popli and Hugo Zeberg for helpful discussions and Alba Bossoms Mesa and Leonardo N Iasi for suggestions for the figures. This work was supported by the Max Planck Society, the European Research Council [694707] and the NOMIS Foundation.

Senior Editor

  1. George H Perry, Pennsylvania State University, United States

Reviewing Editor

  1. Emilia Huerta-Sanchez, Brown University, United States

Reviewer

  1. Christian Huber, Pennsylvania State University, United States

Publication history

  1. Received: November 10, 2021
  2. Preprint posted: November 30, 2021 (view preprint)
  3. Accepted: June 14, 2022
  4. Version of Record published: July 11, 2022 (version 1)

Copyright

© 2022, Peyrégne et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 357
    Page views
  • 109
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Stéphane Peyrégne
  2. Janet Kelso
  3. Benjamin M Peter
  4. Svante Pääbo
(2022)
The evolutionary history of human spindle genes includes back-and-forth gene flow with Neandertals
eLife 11:e75464.
https://doi.org/10.7554/eLife.75464

Further reading

    1. Evolutionary Biology
    Milo S Johnson, Michael M Desai
    Research Article Updated

    As an adapting population traverses the fitness landscape, its local neighborhood (i.e., the collection of fitness effects of single-step mutations) can change shape because of interactions with mutations acquired during evolution. These changes to the distribution of fitness effects can affect both the rate of adaptation and the accumulation of deleterious mutations. However, while numerous models of fitness landscapes have been proposed in the literature, empirical data on how this distribution changes during evolution remains limited. In this study, we directly measure how the fitness landscape neighborhood changes during laboratory adaptation. Using a barcode-based mutagenesis system, we measure the fitness effects of 91 specific gene disruption mutations in genetic backgrounds spanning 8000–10,000 generations of evolution in two constant environments. We find that the mean of the distribution of fitness effects decreases in one environment, indicating a reduction in mutational robustness, but does not change in the other. We show that these distribution-level patterns result from differences in the relative frequency of certain patterns of epistasis at the level of individual mutations, including fitness-correlated and idiosyncratic epistasis.

    1. Evolutionary Biology
    2. Genetics and Genomics
    Henrike Indrischek et al.
    Research Article Updated

    Despite decades of research, knowledge about the genes that are important for development and function of the mammalian eye and are involved in human eye disorders remains incomplete. During mammalian evolution, mammals that naturally exhibit poor vision or regressive eye phenotypes have independently lost many eye-related genes. This provides an opportunity to predict novel eye-related genes based on specific evolutionary gene loss signatures. Building on these observations, we performed a genome-wide screen across 49 mammals for functionally uncharacterized genes that are preferentially lost in species exhibiting lower visual acuity values. The screen uncovered several genes, including SERPINE3, a putative serine proteinase inhibitor. A detailed investigation of 381 additional mammals revealed that SERPINE3 is independently lost in 18 lineages that typically do not primarily rely on vision, predicting a vision-related function for this gene. To test this, we show that SERPINE3 has the highest expression in eyes of zebrafish and mouse. In the zebrafish retina, serpine3 is expressed in Müller glia cells, a cell type essential for survival and maintenance of the retina. A CRISPR-mediated knockout of serpine3 in zebrafish resulted in alterations in eye shape and defects in retinal layering. Furthermore, two human polymorphisms that are in linkage with SERPINE3 are associated with eye-related traits. Together, these results suggest that SERPINE3 has a role in vertebrate eyes. More generally, by integrating comparative genomics with experiments in model organisms, we show that screens for specific phenotype-associated gene signatures can predict functions of uncharacterized genes.