1. Computational and Systems Biology
  2. Genetics and Genomics
Download icon

Detecting adaptive introgression in human evolution using convolutional neural networks

  1. Graham Gower  Is a corresponding author
  2. Pablo Iáñez Picazo
  3. Matteo Fumagalli
  4. Fernando Racimo
  1. Lundbeck GeoGenetics Centre, Globe Institute, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
  2. Department of Life Sciences, Silwood Park Campus, Imperial College London, United Kingdom
Tools and Resources
Cite this article as: eLife 2021;10:e64669 doi: 10.7554/eLife.64669
42 figures, 6 tables and 1 additional file

Figures

Figure 1 with 2 supplements
A schematic overview of how genomatnn detects adaptive introgression.

We first simulate a demographic history that includes introgression, such as Demographic Model A1 shown in (A), using the SLiM engine in stdpopsim. Parameter values for this model are given in Appendix 3—table 1. Three distinct scenarios are simulated for a given demographic model: neutral mutations only, a sweep in the recipient population, and adaptive introgression. The tree sequence file from each simulation is converted into a genotype matrix for input to the CNN. (B) shows a genotype matrix from an adaptive introgression simulation, where lighter pixels indicate a higher density of minor alleles, and haplotypes within each population are sorted left-to-right by similarity to the donor population (Nea). In this example, haplotype diversity is low in the recipient population (CEU), which closely resembles the donor (Nea). Thousands of simulations are produced for each simulation scenario, and their genotype matrices are used to train a binary-classification CNN (C). The CNN is trained to output Pr[AI], the probability that the input matrix corresponds to adaptive introgression. Finally, the trained CNN is applied to genotype matrices derived from a VCF/BCF file (D).

Figure 1—figure supplement 1
Schematic overview of Demographic Model A1 and A2.

Schematic overview of Demographic Model A1 (A) and A2 (B). Each population is depicted as a tube, where the tube’s width is proportional to the population’s size at any given time. Horizontal lines with arrows indicate either an ancestor/descendant relation (thick solid lines, open arrow heads), an admixture pulse (dashed lines, closed arrow heads), or a period of continuous migration (thin solid lines, closed arrow heads). The time of continuous migration lines were drawn randomly from the time interval over which the migrations occur. A Demes-format YAML file for each demographic model is available from the genomatnn git repository.

Figure 1—figure supplement 2
Schematic overview of Demographic Model B.

Overview of the Jacobs et al., 2019 demographic model (A), featuring two pulses of Denisovan gene flow into Papuans, which we implemented as the PapuansOutOfAfrica_10J19 model in stdpopsim. The same model is shown in (B), zoomed in to more clearly show the many events occurring between generations 800–2300. Each population is depicted as a tube, where the tube’s width is proportional to the population’s size at any given time. Horizontal lines with arrows indicate either an ancestor/descendant relation (thick solid lines, open arrow heads), an admixture pulse (dashed lines, closed arrow heads), or a period of continuous migration (thin solid lines, closed arrow heads). The time of continuous migration lines were drawn randomly from the time interval over which the migrations occur. DenA and NeaA are the sampled populations corresponding to Altai Denisovan and Altai Neanderthal, while Den1, Den2, and Nea1 correspond to introgressing lineages. A Demes-format YAML file for each demographic model is available from the genomatnn git repository.

Figure 2 with 2 supplements
CNN performance on validation simulations for Demographic Model A.

The CNN was trained using only AI simulations with selected mutation having allele frequency >0.25. (A) Confusion matrix. For the two prediction categories, either 'not AI' or AI, we show the proportion attributed to each of the true (simulated) scenarios. (B) Average CNN prediction for AI scenarios, binned by selection coefficient, s, and time of onset of selection Tsel. (C) ROC curves, precision-recall curves and MCC-F1 curves. The positive condition is AI. The negative conditions are shown using different line styles/colours. The circles indicate the point in ROC-space (respectively Precision-Recall-space, and MCC-F1-space) when using the threshold Pr[AI]>0.5 for classifying a genotype matrix as AI. DFE: distribution of fitness effects. TP: true positives; FP: false positives; TN: true negatives; FN: false negatives; TPR: true positive rate; FPR: false positive rate; ROC: Receiver operating characteristics; MCC: Mathews correlation coefficient; F1: harmonic mean of precision and recall.

Figure 2—figure supplement 1
Performance evaluation for Demographic Model B.

CNN performance on validation simulations for Demographic Model B with unphased data. The CNN was trained using only AI simulations with selected mutation having allele frequency > 25%. (A) Confusion matrix. For the two prediction categories, either 'not AI' or AI, we show the proportion attributed to each of the true (simulated) scenarios. (B) Average CNN prediction for AI scenarios, binned by selection coefficient, s, and time of onset of selection Tsel. (C) ROC curves, precision-recall curves and MCC-F1 curves. The positive condition is AI. The negative conditions are shown using different line styles/colours. The circles indicate the point in ROC-space (respectively Precision-Recall-space, and MCC-F1-space) when using the threshold Pr[AI]>0.5 for classifying a genotype matrix as AI. DFE: distribution of fitness effects. TP: true positives; FP: false positives; TN: true negatives; FN: false negatives; TPR: true positive rate; FPR: false positive rate; ROC: Receiver operating characteristics; MCC: Mathews correlation coefficient; F1: harmonic mean of precision and recall.

Figure 2—figure supplement 2
Comparison to other methods and performance evaluation with misspecified demographic models.

Unit-normalised Matthews correlation coefficient (MCC) versus F1 score (the harmonic mean of accuracy and precision). A value of 0.5 on the vertical axis corresponds to the performance of a random classifier. The point at coordinate (1,1) marked with a black dot corresponds to 100% true positives and 0% false negatives. Lines in MCC-F1 space were drawn by calculating the MCC and F1 values for 100 false-positive rates between 0 and 100, and the point closest to (1,1) is indicated with the symbol shown in the legend. This point may not correspond to an acceptably low false-positive rate, but for the classifiers shown here it is indicative of the method’s overall performance. In all panels, condition positive is the AI simulation scenario, and the condition negative varies by panel column (indicated at top). The 'weakly misspecified' row used simulations of Model A1 as the training/null, and evaluated the method on simulations of Model A2. The 'strongly misspecified' row used simulations of Model A1 as the training/null, and evaluated the method on simulations of Model B.

Saliency maps, showing the CNN’s attention across the input matrices for each simulated scenario, calculated for the CNN trained on Demographic Model A, filtered for beneficial allele frequency >0.25.

Each panel shows the average gradient over 300 input matrices encoding either neutral (top), sweep (middle), or AI (bottom) simulations. Pink/purple colours indicate larger gradients, where small changes in the genotype matrix have a relatively larger influence over the CNN’s prediction. Columns in the input matrix correspond to haplotypes from the populations labelled at the bottom.

Figure 4 with 4 supplements
Comparison of Manhattan plots using beta-calibrated output probabilities for different class ratios.

Each row indicates a single CNN, with equivalent data filtering. Each column indicates different class ratios used for calibration (Neutral:Sweep:AI). AF = Minimum beneficial allele frequency.

Figure 4—figure supplement 1
Reliability plots for Demographic Model A1 with AF > 5%.

Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model A1 with a minimum beneficial allele frequency of 5%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots (Z), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).

Figure 4—figure supplement 2
Reliability plots for Demographic Model A1 with AF > 25%.

Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model A1 with a minimum beneficial allele frequency of 25%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots (Z), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).

Figure 4—figure supplement 3
Reliability plots for Demographic Model B with AF > 5%.

Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model B with a minimum beneficial allele frequency of 5%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots (Z), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).

Figure 4—figure supplement 4
Reliability plots for Demographic Model B with AF > 25%.

Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model B with a minimum beneficial allele frequency of 25%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots (Z), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).

Application of the trained CNN to the Vindija and Altai Neanderthals, and 1000 genomes populations YRI and CEU.

The CNN was applied to overlapping 100 kbp windows, moving along the chromosome in steps of size 20 kbp. The CNN was trained using only AI simulations with selected mutation having allele frequency > 25%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

Application of the trained CNN to the Altai Denisovan and Altai Neanderthal, 1000 genomes YRI populations, and IGDP Melanesians.

The CNN was applied to overlapping 100 kbp windows, moving along the chromosome in steps of size 20 kbp. The CNN was trained using only AI simulations with selected mutation having allele frequency > 25%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

Appendix 4—figure 1
Haplotype plot for the candidate region chr1:104500001–104600000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 2
Haplotype plot for the candidate region chr2:109360001–109460000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 3
Haplotype plot for the candidate region chr2:160160001–160280000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 4
Haplotype plot for the candidate region chr3:114480001–114620000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 5
Haplotype plot for the candidate region chr4:54240001–54340000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 6
Haplotype plot for the candidate region chr5:39220001–39320000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 7
Haplotype plot for the candidate region chr6:28180001–28320000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 8
Haplotype plot for the candidate region chr8:143440001–143560000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 9
Haplotype plot for the candidate region chr9:16700001–16820000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 10
Haplotype plot for the candidate region chr12:85780001–85880000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 11
Haplotype plot for the candidate region chr19:20220001–20380000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 12
Haplotype plot for the candidate region chr19:33580001–33740000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 13
Haplotype plot for the candidate region chr20:62100001–62280000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 14
Haplotype plot for the candidate region chr21:25840001–25940000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 5—figure 1
Genotype plot for the candidate region chr2:129960001–130060000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 2
Genotype plot for the candidate region chr3:3740001–3840000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 3
Genotype plot for the candidate region chr4:41980001–42080000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 4
Genotype plot for the candidate region chr5:420001–520000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 5
Genotype plot for the candidate region chr6:74640001–74740000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 6
Genotype plot for the candidate region chr6:81960001–82060000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 7
Genotype plot for the candidate region chr6:137920001–138120000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 8
Genotype plot for the candidate region chr7:25100001–25200000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 9
Genotype plot for the candidate region chr7:38020001–38120000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 10
Genotype plot for the candidate region chr7:121160001–121260000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 11
Genotype plot for the candidate region chr8:3040001–3140000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 12
Genotype plot for the candidate region chr12:84640001–84740000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 13
Genotype plot for the candidate region chr12:108240001–108340000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 14
Genotype plot for the candidate region chr12:114020001–114280000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 15
Genotype plot for the candidate region chr14:61860001–61960000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 16
Genotype plot for the candidate region chr14:63120001–63220000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 17
Genotype plot for the candidate region chr14:96700001–96820000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 18
Genotype plot for the candidate region chr15:55260001–55400000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 19
Genotype plot for the candidate region chr16:62600001–62700000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 20
Genotype plot for the candidate region chr16:78360001–78460000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 21
Genotype plot for the candidate region chr18:22060001–22160000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 22
Genotype plot for the candidate region chr22:19040001–19140000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Tables

Table 1
Top ranking gene candidates corresponding to Neanderthal AI in Europeans.

We show genes which overlap, or are within 100 kbp of, the 30 highest ranked 100 kbp intervals. Adjacent intervals have been merged. The CNN was trained using only AI simulations with selected mutation having allele frequency >0.25, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

ChromStartEndGenes
1104500001104600000
2109360001109460000LIMS1; RANBP2; CCDC138; EDAR
2160160001160280000TANC1; WDSUB1; BAZ2B
3114480001114620000ZBTB20
45424000154340000SCFD2; FIP1L1; LNX1
53922000139320000FYB; C9; DAB2
62818000128320000ZSCAN16-AS1; ZSCAN16; ZKSCAN8; ZSCAN9; ZKSCAN4; NKAPL; PGBD1; ZSCAN31; ZKSCAN3; ZSCAN12; ZSCAN23
8143440001143560000TSNARE1; BAI1
91670000116820000BNC2
128578000185880000ALX1
192022000120380000ZNF682; ZNF90; ZNF486
193358000133740000RHPN2; GPATCH1; WDR88; LRP3; SLC7A10
206210000162280000CHRNA4; KCNQ2; EEF1A2; PPDPF; PTK6; SRMS; C20orf195; HELZ2; GMEB2; STMN3; RTEL1; TNFRSF6B; ARFRP1; ZGPAT; LIME1; SLC2A4RG; ZBTB46
212584000125940000
Table 2
Top ranking gene candidates corresponding to Denisovan AI in Melanesians.

We show genes which overlap, or are within 100 kbp of, the 30 highest ranked 100 kbp intervals. Adjacent intervals have been merged. The CNN was trained using only AI simulations with selected mutation having allele frequency >0.25, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

ChromStartEndGenes
2129960001130060000
337400013840000SUMF1; LRRN1
44198000142080000TMEM33; DCAF4L1; SLC30A9; BEND4
5420001520000PDCD6; AHRR; C5orf55; EXOC3; CTD-2228K2.5; SLC9A3; CEP72
67464000174740000
68196000182060000
6137920001138120000TNFAIP3
72510000125200000OSBPL3; CYCS; C7orf31; NPVF
73802000138120000EPDR1; NME8; SFRP4; STARD3NL
7121160001121260000
830400013140000CSMD1
128464000184740000
12108240001108340000PRDM4; ASCL4
12114020001114280000RBM19
146186000161960000PRKCH
146312000163220000KCNH5
149670000196820000BDKRB2; BDKRB1; ATG2B; GSKIP; AK7
155526000155400000RSL24D1; RAB27A
166260000162700000
167836000178460000WWOX
182206000122160000OSBPL1A; IMPACT; HRH4
221904000119140000DGCR5; DGCR2; DGCR14; TSSK2; GSC2; SLC25A1; CLTCL1
Appendix 1—table 1
Top ranking gene candidates corresponding to Neanderthal AI in Europeans.

We show genes which overlap, or are within 100 kbp of, the 30 highest ranked 100 kbp intervals. Adjacent intervals have been merged. The CNN was trained using only AI simulations with selected mutation having allele frequency >5%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02 .c

ChromStartEndGenes
13942000139520000RRAGC; MYCBP; GJA9; RHBDL2; AKIRIN1; NDUFS5; MACF1
2159880001160280000TANC1; WDSUB1; BAZ2B
2180060001180160000SESTD1
2227800001227900000RHBDD1; COL4A4
2238820001238960000LRRFIP1; RBM44; RAMP1; UBE2F; SCLY; ESPNL; KLHL30
3114500001114600000ZBTB20
55796000158060000RAB3C
62816000128380000ZSCAN16-AS1; ZSCAN16; ZKSCAN8; ZSCAN9; ZKSCAN4; NKAPL; PGBD1; ZSCAN31; ZKSCAN3; ZSCAN12; ZSCAN23; GPX6
81706000117160000MICU3; ZDHHC2; CNOT7; VPS37A; MTMR7
89184000191940000TMEM64; NECAB1; TMEM55A
91670000116860000BNC2
101180000111900000ECHDC3; PROSER2; UPF2
113774000137840000
192026000120360000ZNF90; ZNF486
193358000133700000RHPN2; GPATCH1; WDR88; LRP3; SLC7A10
201434000114440000MACROD2; FLRT3
Appendix 1—table 2
Top ranking gene candidates corresponding to Denisovan AI in Melanesians.

We show genes which overlap, or are within 100 kbp of, the 30 highest ranked 100 kbp intervals. Adjacent intervals have been merged. The CNN was trained using only AI simulations with selected mutation having allele frequency >5%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

ChromStartEndGenes
128800012980000ACTRT2; LINC00982; PRDM16
1220080001220180000SLC30A10; EPRS; BPNT1; IARS2
2221040001221140000
31540000115500000SH3BP5; METTL6; EAF1; COLQ
44196000142100000TMEM33; DCAF4L1; SLC30A9; BEND4
5135440001135540000TGFBI; SMAD5-AS1; SMAD5; TRPC7
68198000182120000FAM46A
7121160001121260000
99550000195600000IPPK; BICD2; ZNF484
105966000159760000
128078000180880000OTOGL; PTPRQ
128462000184740000
145762000157760000EXOC5; AP5M1; NAA30
172948000129720000NF1; OMG; EVI2B; EVI2A; RAB11FIP4
183818000138320000
205434000154440000
Appendix 2—table 1
Loss and accuracy for CNNs after training for three epochs, as reported by Keras/Tensorflow, for the training and validation datasets.

Binary cross-entropy was used for the loss function.

Demographic modelHyperparametersTrainingValidation
LossAccuracyLossAccuracy
A1AF>0.050.15920.94580.16180.9468
A1AF>0.250.12240.95850.12650.9578
A1AF>0.25; unphased0.13470.95370.13680.9530
BAF>0.05; unphased0.34150.84390.34410.8439
BAF>0.25; unphased0.35460.83720.35830.8376
Appendix 3—table 1
Parameter values used for simulating Demographic Model A1.

A Demes-format YAML file for each demographic model is available from the genomatnn git repository.

ParameterDescriptionValueUnitsSource
NAncancestral pop. size18500Kuhlwilm et al., 2016
NNeaNeanderthal pop. size3400Kuhlwilm et al., 2016
NYRIYRI pop. size27600Kuhlwilm et al., 2016
NCEU0CEU bottleneck pop. size1080Ragsdale and Gravel, 2019
NCEU1CEU growth-start pop. size1450Ragsdale and Gravel, 2019
NCEU2CEU current pop. size13377
rCEUCEU growth rate0.00202Ragsdale and Gravel, 2019
TCEU2CEU time at growth start31.9kyaRagsdale and Gravel, 2019
T0Nea/other split time550kyaPrüfer et al., 2017
T1CEU/YRI split time65.7kyaRagsdale and Gravel, 2019
T2time of Nea → CEU gene flow55kyaPrüfer et al., 2017
ggeneration time29yearsPrüfer et al., 2017
αNea → CEU admixture proportion2.25Prüfer et al., 2017
TAltaisampling time115kyaPrüfer et al., 2017
TVindijasampling time55kyaPrüfer et al., 2017
nNeansample size2diploid individuals
nAfrsample size108diploid individuals
nEursample size99diploid individuals
sselection coefficient10Unif(-4,-1)
Tsel1selection onset (sweep)Unif(1, T1)kya
Tmut1mutation (sweep)Unif(Tsel1, T1)kya
Tsel2selection onset (AI)Unif(1, T2)kya
Tmut2mutation (AI)Unif(T2, T0)kya

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)