The evolutionary history of human spindle genes includes back-and-forth gene flow with Neandertals

  1. Stéphane Peyrégne  Is a corresponding author
  2. Janet Kelso
  3. Benjamin M Peter
  4. Svante Pääbo  Is a corresponding author
  1. Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Germany
5 figures, 8 tables and 1 additional file

Figures

Figure 1 with 1 supplement
Genomic regions around spindle genes where archaic humans fall outside the modern human variation.

Each panel corresponds to the region around the missense change(s) (red stars) in a spindle gene. The grey boxes correspond to exons. The curves give the posterior probability (computed as in Peyrégne et al., 2017) that an archaic genome (Altai Neandertal in red, Denisova 3 in orange) is an outgroup to present-day African genomes at a particular position (dots on the curves correspond to informative positions, that is polymorphic positions or fixed derived substitutions in Africans from the 1,000 Genomes Project phase III, compared to four ape genomes). Chromosomal locations are given on top.

Figure 1—figure supplement 1
Genomic regions where archaic humans fall outside the modern human variation, identified using the most recent deCODE recombination map (Halldorsson et al., 2019).
Evidence for selection in the spindle genes with age estimates of these substitutions.

(A) The genetic length of segments around the missense substitutions where the Altai Neandertal and Denisova 3 fall outside the human variation (Figure 1) using the African-American map, AAmap, or the deCODE map, deCODE19. The grey histogram corresponds to the length distribution of such segments in neutral simulations (Peyrégne et al., 2017). Candidate genes for selection (red) are those with segments longer than 0.025cM (Peyrégne et al., 2017) (vertical red dashed line). (B) Cumulative distributions of pairwise times to the most recent common ancestors (TMRCA) among present-day African chromosomes with the most distant relationships (red; see Methods), or between the chromosomes of present-day Africans and either present-day individuals with the ancestral versions of the missense substitutions (‘Modern (anc)’, in black) or archaic humans (other colors). The pink areas correspond to estimated time intervals for the origin of the missense substitutions and their bounds correspond to the average TMRCAs over the red curve and the next one (back in time), respectively. (C) Summary of ages of substitutions as described in panel B. Genes with evidence of positive selection are highlighted in red.

A modern human-like haplotype in some Neandertals.

Genotypes from 13 archaic individuals (y-axis) are shown in a region around the two missense changes (dots) in KNL1. We only show positions (x-axis) that are derived in all Luhya and Yoruba individuals from the 1,000 Genomes Project compared to four great apes (Peyrégne et al., 2017) and at least one high-coverage archaic genome (Chagyrskaya 8, Denisova 3, Vindija 33.19 and Altai Neandertal, i.e., Denisova 5). The colors of the squares and dots represent the genotype, with ancestral and derived alleles. For the low coverage archaic genomes, we randomly sampled a sequence at each position. Red lines indicate the modern human-like haplotype.

Figure 4 with 1 supplement
The modern human-like KNL1 haplotype in Neandertals.

(A) Pairwise differences between two high coverage Neandertal genomes (Chagyrskaya 8 and Altai Neandertal (Denisova 5)) in non-overlapping sliding windows of 276 kb (histogram) and in the KNL1 region (vertical cyan line; chr15:40,818,035–41,094,166, hg19). Windows with less than 10,000 genotype calls for both Neandertals were discarded. (B) The expected length distributions under a model of incomplete lineage sorting based on local recombination rate estimates from the African-American (AA) and deCODE recombination maps and the length of the modern human-like KNL1 haplotype in Neandertals (vertical cyan line). (C) Left panel: Time to the most recent common ancestor (TMRCA) between the Chagyrskaya 8 Neandertal (who carries the modern human-like haplotype) and KNL1 haplotypes in present-day humans with their 95% confidence intervals (bars) for chr15:40,885,107–40,963,160 (hg19). The size of the points corresponds to the number of chromosomes carrying this haplotype in the HGDP dataset. The black rectangles highlight subsets of haplotypes with TMRCAs more recent than the modern-archaic population split time (Prüfer et al., 2017) (shaded pink area). Right panel: Distribution of pairwise TMRCAs between the Neandertal and present-day humans from the HGDP dataset in the region of KNL1 and two other regions with archaic haplotypes in present-day humans (Controls, Zeberg and Pääbo, 2020; Dannemann et al., 2016; COVID risk region: chr3:45,859,651–45,909,024; TLR6, 1 & 10: chr4:38,760,338–38,846,385). We used the Vindija 33.19 genome for the COVID risk haplotype and the Chagyrskaya 8 genome otherwise.

Figure 4—figure supplement 1
Genotypes of the 12 non-African individuals that inherited one copy of KNL1 from archaic humans.

We show positions within 40kb downstream of the modern-like KNL1 haplotype identified in Chagyrskaya 8 to highlight 7 positions (red marks) where only those 12 individuals (out of 929 individuals across worldwide populations Bergström et al., 2020) carry a derived allele seen in at least one Neandertal genome. We only show positions where at least one of the 12 individuals is heterozygous. The upper panel (‘Archaics’) shows the alleles carried by high-coverage archaic human genomes without the modern-like KNL1 haplotype. The middle panel (‘Chagyrskaya 8-like’) shows the alleles carried by four Neandertal genomes with the modern-human like KNL1 haplotype and the 12 present-day non-African genomes that inherited one copy of KNL1 from archaic humans. The lower panel (‘Other haplotypes’) shows the alleles carried by the other chromosomes (that did not inherit a copy ok KNL1 from archaic humans) of these 12 individuals. For the archaic human genomes, one allele was sampled randomly at heterozygous positions.

Schematic illustration of the history of KNL1.

The tree delineated in black corresponds to the average relationship between the modern and archaic human populations. The inner colored trees correspond to the relationship of different KNL1 lineages, with arrows highlighting the direction of gene flow between populations. The corresponding haplotypes are illustrated on the sides of the tree and show the recombination history in the region (e.g. the recombinant Neandertal haplotype with variants of putative archaic origin in non-Africans). Dots correspond to informative positions, and the stars illustrate the missense substitutions. The age of relevant ancestors are marked by horizontal dotted lines. MH: Modern human.

Tables

Appendix 1—table 1
Location and predicted effects of the studied amino acid changes in spindle proteins, as reported in dbNSFP version 4.2 (48).

The predictions are for the ancestral variants. We put “damaging” in between quotation marks as the ancestral versions of ATRX and KATNA1 are unlikely to be damaging (as the ancestral amino acid residues are found in the proteins of many species), but that prediction rather supports a function for these amino acid changes.

proteinpositionamino acid changeprotein domain (Uniprot)effect prediction for the ancestral variant (MutPred)potentially “damaging” ancestral variant according to:
ATRX475D ->H--FATHMM;
M-CAP
KATNA1343A ->T--FATHMM;
PrimateAI
KIF18A67R ->Kkinesin motor (11-355)Loss of methylation (P=0.0087)-
KNL1159H ->Rinteraction domain with BUB1 and BUB1B (1-728)--
1,086G ->S2 × 104 AA approximate repeats
(855–1201)
Loss of phosphorylation (P=0.0382)-
NEK6291D ->Hprotein kinase domain
(45-310)
--
RSPH1213K ->Q---
SPAG543P ->S-Loss of phosphorylation (P=0.0244)
Gain of catalytic residue (P=0.0179)
-
162E ->G---
410D ->H---
STARD93,925A ->T---
Appendix 1—table 2
Deleteriousness and conservation scores at the studied positions with missense changes in spindle genes, as reported in dbNSFP version 4.2 (48).

A high CADD score indicates that the ancestral variant is likely to be deleterious (Kircher et al., 2014; Rentzsch et al., 2019; Pollard et al., 2010) and a high conservation score means that the nucleotide position is highly conserved across species (100 vertebrates for phyloP and phastCons (Siepel et al., 2005), and 34 mammals for GERP ++RS, Davydov et al., 2010). In contrast to the other scores that correspond to a single position, phastCons is a measure of the conservation in the region around the position. In dbNSFP, the scores range from –6.458163–18.301497 for CADD, from –12.3–6.17 for GERP ++RS, from –20.0–10.003 for phyloP and from 0 to 1 for phastCons.

geneposition (hg19) rs IDcorresponding amino acid changeDeleteriousnessConservation
CADD score (hg19)GERP
++RS score
phyloP 100way vertebrate scorephastCons 100way vertebrate score
KATNA16–149,918,766
rs73781249
A343T2.0510335.484.8341.000
SPAG517–26,925,570
NA
P43S–0.425670–3.57–1.4040.000
17–26,919,777
NA
E162G0.2963173.660.2800.001
17–26,919,034
NA
D410H1.0627435.42.0321.000
KNL115–40,912,860
rs755472529
H159R0.4754543.7–0.0160.001
15–40,915,640
NA
G1086S0.8017874.121.0260.054
KIF18A11–28,119,295
rs775297730
R67K1.1345892.620.5250.845
NEK69–127,113,155
rs146443565
D291H–0.293112–1.563.2841.000
ATRXX-76,939,325
rs146863015
D475H–0.1416063.640.7910.840
RSPH121–43,897,491
rs146298259
K213Q–0.0610531.810.6720.079
STARD915–42,985,549
rs573215252
A3925T–0.351117–2.510.0470.000
Appendix 2—table 1
Age estimates of the missense substitutions in spindle genes.

The ages were estimated in the regions where the Altai Neandertal and Denisova 3 genomes fall outside the human variation (intersection of the regions identified with the African-American and deCODE recombination maps). The lower age corresponds to the mean age of the ancestor of multiple present-day African chromosome pairs. The upper age corresponds to the mean age of the common ancestor shared between each present-day African chromosome and either the archaic genome with the least number of differences (excluding Chagyrskaya 8 for KNL1) or a present-day human with an ancestral version of the missense variant(s).

GenechromosomeRegion (hg19)Lower age (kya)Upper age (kya)
ATRXX76,703,773–77,246,471NANA
KATNA16149,840,973–149,930,4258631,329
KIF18A1128,018,167–28,304,2938431,006
KNL11540,898,141–40,948,3061,0271,690
NEK69127,109,510–127,113,614NANA
RSPH12143,897,417–43,897,549NANA
SPAG51726,875,942–27,045,524677796
STARD91542,941,540–42,989,1609471,401
Appendix 3—table 1
Coverage depth in archaic human genomes at positions with modern human-specific missense substitutions in spindle genes.

The numbers of DNA fragments carrying a particular base are reported in parentheses after the corresponding base. Bases in uppercase were sequenced in the forward orientation, whereas those in lowercase were sequenced in the reverse orientation. Bases that are modern human-like (derived) are highlighted with an asterisk and may represent present-day human DNA contamination or an allele shared with modern humans. The bases that are compatible with a damage-induced substitution (from the ancestral allele) are highlighted in bold (i.e., C-to-T and G-to-A in the forward and reverse orientation, respectively).

GeneATRXKATNA1KIF18AKNL1NEK6RSPH1SPAG5STARD9
Chr-PositionX-76,939,3256–149,918,76611–28,119,29515–40,912,86015–40,915,6409–127,113,15521–43,897,49117–26,919,03417–26,919,77717–26,925,57815–42,985,549
AncestralCCCAGGTCTGG
DerivedGTTGACGGCAA
Altai NeandertalC (21)
c (31)
C (22)
c (15)
T* (2)
C (19)
c (41)
T* (1)
A (24)
a (28)
G (33)
g (16)
G (23)
g (23)
T (18)
t (26)
C (17)
c (17)
T (18)
t (20)
a (1)
G (28)
g (17)
G (13)
g (16)
A* (1)
Chagyrskaya 8C (10)
c (9)
T (1)
t (1)
C (13)
c (5)
C (15)
c (19)
T* (3)
A (1)
G* (14)
g* (11)
A* (19)
a* (8)
G (5)
g (7)
a (1)
T (17)
t (13)
C (11)
c (7)
T (1)
T (16)
t (11)
G (7)
g (6)
a* (1)
G (8)
g (13)
Denisova 3C (19)
c (17)
C (15)
c (13)
T* (2)
C (17)
c (24)
A (20)
a (30)
G (25)
g (17)
a* (1)
G (15)
g (14)
T (19)
t (27)
C (20)
c (14)
T (16)
t (11)
G (12)
g (6)
G (17)
g (12)
Denisova 11C (1)
c (2)
NAC (1)a (2)g (1)G (1)
g (1)
T (4)
t (2)
c (2)
t (1)
NANAG (1)
Goyet Q56-1C (1)NAC (1)
c (5)
G* (1)A* (1)
a* (2)
G (1)
g (2)
NAC (1)T (1)G (4)
g (2)
G (2)
Hohlenstein-StadelNANANANANANANANANANANA
Les Cottés Z4-1514C (1)
c (1)
c (2)
T* (1)
C (5)
c (4)
A (2)
a (4)
G (3)
g (2)
NAT (2)
t (1)
NAT (1)G (1)G (1)
Mezmaiskaya 1c (3)NAC (1)
c (1)
NAG (2)NAt (2)C (2)
c (1)
g* (2)
NANANA
Mezmaiskaya 2C (1)C (1)
c (1)
T* (1)
C (1)g* (1)A* (1)G (2)
g (1)
NAC (1)T (1)G (2)g (3)
Scladina I-4ANANANANANANANANANANANA
Spy 94 ANAc (1)C (1)g* (1)A* (1)
a* (3)
NAT (1)C (1)T (1)g (1)NA
Vindija 33.19C (16)
c (18)
T (1)
C (8)
c (11)
C (17)
c (20)
T* (1)
A (13)
a (19)
G (20)
g (16)
a* (2)
G (8)
g (8)
T (22)
t (15)
C (12)
c (14)
T (15)
t (7)
G (15)
g (13)
a* (1)
G (14)
g (14)
Appendix 3—table 2
Coverage depth of the Mezmaiskaya 1 genome at positions with modern human-specific substitutions in SPAG5.

Only positions covered by at least one DNA sequence are reported. Bases in uppercase were sequenced in the forward orientation, whereas those in lowercase were sequenced in the reverse orientation. The numbers of DNA fragments carrying a particular base are reported in parentheses after the corresponding base.

NeandertalChr-position (rs ID)AncestralDerivedAllele counts
Mezmaiskaya 117–26,864,608
(rs188710272)
AGA (1)
17–26,891,162
(NA)
TGT (1)
17–26,892,376
(NA)
ATA (2)
17–26,913,024
(NA)
AGa (a)
17–26,919,034
(NA)
CGC (2)
c (1)
g (2)
17–26,948,236
(NA)
GAg (1)
17–26,967,723
(rs558276956)
AGA (3)
17–27,005,275
(NA)
GAG (1)
17–27,010,483
(NA)
GAg (1)
Appendix 4—table 1
Positions defining the closely related haplotype between some modern humans and Neandertals.

At these positions, the Chagyrskaya 8 genome differs from other high-quality archaic genomes without the modern human-like haplotype but some African genomes from the HGDP dataset carry the same allele as Chagyrskaya 8. Note that the modern human-like haplotype in Neandertals is longer and defined by alleles that are shared with all modern humans (Figure 3).

ChromosomePosition (hg19) rs IDReferenceAlternative (Chagyrskaya 8-like)Chagyrskaya 8-like allele frequency in genomes from the HGDP dataset
AfricansNon-Africans
1540,885,107
rs16970851
AG0.320.41
40,886,017
rs8034043
CT0.320.41
40,886,020
rs8034048
CG0.320.40
40,892,601
rs11855923
GA0.350.40
40,893,573
rs12905162
CA0.380.40
40,905,450
rs11856438
CT0.370.41
40,908,904
rs11852670
AG0.390.41
40,910,707
rs12914743
TC0.380.41
40,915,045
rs8041534
TG0.380.41
40,915,894
rs11070285
TC0.390.41
40,925,214
rs11856802
TA0.390.41
40,926,654
rs11854986
CG0.350.40
40,929,814
rs11070286
TC0.370.41
40,937,647
rs3092979
AG0.380.41
40,959,413
rs73396515
GA0.360.10
40,959,624
rs35047458
GA0.360.40
40,960,432
rs12902568
GA0.360.40
40,963,160
rs7182530
AG0.370.41
40,987,528
rs1801320
GC0.380.11
Appendix 5—table 1
Origin of the modern human genomes from the HGDP dataset (Bergström et al., 2020) with a KNL1 copy inherited from Neandertals.
samplepopulationregion
HGDP00125HazaraCentral South Asia
HGDP00547Papuan SepikOceania
HGDP00639BedouinMiddle East
HGDP00696PalestinianMiddle East
HGDP00714CambodianEast Asia
HGDP00774HanEast Asia
HGDP00822HanEast Asia
HGDP00954YakutEast Asia
HGDP00960YakutEast Asia
HGDP00966YakutEast Asia
HGDP01023HanEast Asia
HGDP01181YiEast Asia
Appendix 6—table 1
Allele counts at positions with nearly fixed missense variants in the spindle genes of modern humans from the gnomAD database (v2.1.1), (Karczewski et al., 2020).

Columns 7–8 and 9–10 correspond to the allele counts among the 125,748 whole-exome sequences (WES) and the 15,708 whole-genome sequences (WGS), respectively. Anc = Ancestral

GeneChr-Position (rd ID)Anc(nearly) fixedAllelesVEP Annot.# Anc (WES)Total (WES)# Anc (WGS)Total (WGS)
ATRXX-76,939,325
(rs146863015)
CGG-Cmissense66182,7451122,042
KATNA16–149,918,766
(rs73781249)
CTT-Cmissense259251,19013131,400
KIF18A11–28,119,295
(rs775297730)
CTT-Cmissense26249,508131,396
KNL115–40,912,860
(rs755472529)
AGG-AmissenseNANA131,368
G-Tmissense1227,420NANA
15–40,915,640 (NA)GAA-GmissenseNANANANA
NEK69–127,113,155
(rs146443565)
GCC-Gmissense164250,1402631,404
RSPH121–43,897,491
(rs146298259)
TGG-Tmissense236251,4143031,386
G-Astop gained10251,414131,386
SPAG517–26,919,034
(NA)
CGG-CmissenseNANANANA
17–26,919,777 (NA)TCC-Amissense3251,430NANA
17–26,925,578 (NA)GAA-GmissenseNANANANA
A-Tmissense1251,066NANA
STARD915–42,985,549
(rs573215252)
GAA-Gmissense5139,342331,284
A-Cmissense5139,342NANA

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Stéphane Peyrégne
  2. Janet Kelso
  3. Benjamin M Peter
  4. Svante Pääbo
(2022)
The evolutionary history of human spindle genes includes back-and-forth gene flow with Neandertals
eLife 11:e75464.
https://doi.org/10.7554/eLife.75464