Detecting adaptive introgression in human evolution using convolutional neural networks
Figures

A schematic overview of how genomatnn detects adaptive introgression.
We first simulate a demographic history that includes introgression, such as Demographic Model A1 shown in (A), using the SLiM engine in stdpopsim. Parameter values for this model are given in Appendix 3—table 1. Three distinct scenarios are simulated for a given demographic model: neutral mutations only, a sweep in the recipient population, and adaptive introgression. The tree sequence file from each simulation is converted into a genotype matrix for input to the CNN. (B) shows a genotype matrix from an adaptive introgression simulation, where lighter pixels indicate a higher density of minor alleles, and haplotypes within each population are sorted left-to-right by similarity to the donor population (Nea). In this example, haplotype diversity is low in the recipient population (CEU), which closely resembles the donor (Nea). Thousands of simulations are produced for each simulation scenario, and their genotype matrices are used to train a binary-classification CNN (C). The CNN is trained to output Pr[AI], the probability that the input matrix corresponds to adaptive introgression. Finally, the trained CNN is applied to genotype matrices derived from a VCF/BCF file (D).

Schematic overview of Demographic Model A1 and A2.
Schematic overview of Demographic Model A1 (A) and A2 (B). Each population is depicted as a tube, where the tube’s width is proportional to the population’s size at any given time. Horizontal lines with arrows indicate either an ancestor/descendant relation (thick solid lines, open arrow heads), an admixture pulse (dashed lines, closed arrow heads), or a period of continuous migration (thin solid lines, closed arrow heads). The time of continuous migration lines were drawn randomly from the time interval over which the migrations occur. A Demes-format YAML file for each demographic model is available from the genomatnn git repository.

Schematic overview of Demographic Model B.
Overview of the Jacobs et al., 2019 demographic model (A), featuring two pulses of Denisovan gene flow into Papuans, which we implemented as the PapuansOutOfAfrica_10J19 model in stdpopsim. The same model is shown in (B), zoomed in to more clearly show the many events occurring between generations 800–2300. Each population is depicted as a tube, where the tube’s width is proportional to the population’s size at any given time. Horizontal lines with arrows indicate either an ancestor/descendant relation (thick solid lines, open arrow heads), an admixture pulse (dashed lines, closed arrow heads), or a period of continuous migration (thin solid lines, closed arrow heads). The time of continuous migration lines were drawn randomly from the time interval over which the migrations occur. DenA and NeaA are the sampled populations corresponding to Altai Denisovan and Altai Neanderthal, while Den1, Den2, and Nea1 correspond to introgressing lineages. A Demes-format YAML file for each demographic model is available from the genomatnn git repository.

CNN performance on validation simulations for Demographic Model A.
The CNN was trained using only AI simulations with selected mutation having allele frequency >0.25. (A) Confusion matrix. For the two prediction categories, either 'not AI' or AI, we show the proportion attributed to each of the true (simulated) scenarios. (B) Average CNN prediction for AI scenarios, binned by selection coefficient, , and time of onset of selection . (C) ROC curves, precision-recall curves and MCC-F1 curves. The positive condition is AI. The negative conditions are shown using different line styles/colours. The circles indicate the point in ROC-space (respectively Precision-Recall-space, and MCC-F1-space) when using the threshold Pr[AI]>0.5 for classifying a genotype matrix as AI. DFE: distribution of fitness effects. TP: true positives; FP: false positives; TN: true negatives; FN: false negatives; TPR: true positive rate; FPR: false positive rate; ROC: Receiver operating characteristics; MCC: Mathews correlation coefficient; F1: harmonic mean of precision and recall.

Performance evaluation for Demographic Model B.
CNN performance on validation simulations for Demographic Model B with unphased data. The CNN was trained using only AI simulations with selected mutation having allele frequency > 25%. (A) Confusion matrix. For the two prediction categories, either 'not AI' or AI, we show the proportion attributed to each of the true (simulated) scenarios. (B) Average CNN prediction for AI scenarios, binned by selection coefficient, , and time of onset of selection . (C) ROC curves, precision-recall curves and MCC-F1 curves. The positive condition is AI. The negative conditions are shown using different line styles/colours. The circles indicate the point in ROC-space (respectively Precision-Recall-space, and MCC-F1-space) when using the threshold Pr[AI]>0.5 for classifying a genotype matrix as AI. DFE: distribution of fitness effects. TP: true positives; FP: false positives; TN: true negatives; FN: false negatives; TPR: true positive rate; FPR: false positive rate; ROC: Receiver operating characteristics; MCC: Mathews correlation coefficient; F1: harmonic mean of precision and recall.

Comparison to other methods and performance evaluation with misspecified demographic models.
Unit-normalised Matthews correlation coefficient (MCC) versus F1 score (the harmonic mean of accuracy and precision). A value of 0.5 on the vertical axis corresponds to the performance of a random classifier. The point at coordinate marked with a black dot corresponds to 100% true positives and 0% false negatives. Lines in MCC-F1 space were drawn by calculating the MCC and F1 values for 100 false-positive rates between 0 and 100, and the point closest to is indicated with the symbol shown in the legend. This point may not correspond to an acceptably low false-positive rate, but for the classifiers shown here it is indicative of the method’s overall performance. In all panels, condition positive is the AI simulation scenario, and the condition negative varies by panel column (indicated at top). The 'weakly misspecified' row used simulations of Model A1 as the training/null, and evaluated the method on simulations of Model A2. The 'strongly misspecified' row used simulations of Model A1 as the training/null, and evaluated the method on simulations of Model B.

Saliency maps, showing the CNN’s attention across the input matrices for each simulated scenario, calculated for the CNN trained on Demographic Model A, filtered for beneficial allele frequency >0.25.
Each panel shows the average gradient over 300 input matrices encoding either neutral (top), sweep (middle), or AI (bottom) simulations. Pink/purple colours indicate larger gradients, where small changes in the genotype matrix have a relatively larger influence over the CNN’s prediction. Columns in the input matrix correspond to haplotypes from the populations labelled at the bottom.

Comparison of Manhattan plots using beta-calibrated output probabilities for different class ratios.
Each row indicates a single CNN, with equivalent data filtering. Each column indicates different class ratios used for calibration (Neutral:Sweep:AI). AF = Minimum beneficial allele frequency.

Reliability plots for Demographic Model A1 with AF > 5%.
Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model A1 with a minimum beneficial allele frequency of 5%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots (), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).

Reliability plots for Demographic Model A1 with AF > 25%.
Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model A1 with a minimum beneficial allele frequency of 25%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots (), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).

Reliability plots for Demographic Model B with AF > 5%.
Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model B with a minimum beneficial allele frequency of 5%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots (), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).

Reliability plots for Demographic Model B with AF > 25%.
Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model B with a minimum beneficial allele frequency of 25%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots (), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).

Application of the trained CNN to the Vindija and Altai Neanderthals, and 1000 genomes populations YRI and CEU.
The CNN was applied to overlapping 100 kbp windows, moving along the chromosome in steps of size 20 kbp. The CNN was trained using only AI simulations with selected mutation having allele frequency > 25%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

Application of the trained CNN to the Altai Denisovan and Altai Neanderthal, 1000 genomes YRI populations, and IGDP Melanesians.
The CNN was applied to overlapping 100 kbp windows, moving along the chromosome in steps of size 20 kbp. The CNN was trained using only AI simulations with selected mutation having allele frequency > 25%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

Haplotype plot for the candidate region chr1:104500001–104600000 in the Neanderthal-into-European AI scan.
Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Haplotype plot for the candidate region chr2:109360001–109460000 in the Neanderthal-into-European AI scan.
Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Haplotype plot for the candidate region chr2:160160001–160280000 in the Neanderthal-into-European AI scan.
Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Haplotype plot for the candidate region chr3:114480001–114620000 in the Neanderthal-into-European AI scan.
Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Haplotype plot for the candidate region chr4:54240001–54340000 in the Neanderthal-into-European AI scan.
Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Haplotype plot for the candidate region chr5:39220001–39320000 in the Neanderthal-into-European AI scan.
Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Haplotype plot for the candidate region chr6:28180001–28320000 in the Neanderthal-into-European AI scan.
Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Haplotype plot for the candidate region chr8:143440001–143560000 in the Neanderthal-into-European AI scan.
Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Haplotype plot for the candidate region chr9:16700001–16820000 in the Neanderthal-into-European AI scan.
Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Haplotype plot for the candidate region chr12:85780001–85880000 in the Neanderthal-into-European AI scan.
Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Haplotype plot for the candidate region chr19:20220001–20380000 in the Neanderthal-into-European AI scan.
Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Haplotype plot for the candidate region chr19:33580001–33740000 in the Neanderthal-into-European AI scan.
Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Haplotype plot for the candidate region chr20:62100001–62280000 in the Neanderthal-into-European AI scan.
Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Haplotype plot for the candidate region chr21:25840001–25940000 in the Neanderthal-into-European AI scan.
Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Genotype plot for the candidate region chr2:129960001–130060000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr3:3740001–3840000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr4:41980001–42080000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr5:420001–520000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr6:74640001–74740000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr6:81960001–82060000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr6:137920001–138120000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr7:25100001–25200000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr7:38020001–38120000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr7:121160001–121260000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr8:3040001–3140000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr12:84640001–84740000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr12:108240001–108340000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr12:114020001–114280000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr14:61860001–61960000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr14:63120001–63220000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr14:96700001–96820000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr15:55260001–55400000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr16:62600001–62700000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr16:78360001–78460000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr18:22060001–22160000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Genotype plot for the candidate region chr22:19040001–19140000 in the Denisovan-into-Melanesian AI scan.
Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.
Tables
Top ranking gene candidates corresponding to Neanderthal AI in Europeans.
We show genes which overlap, or are within 100 kbp of, the 30 highest ranked 100 kbp intervals. Adjacent intervals have been merged. The CNN was trained using only AI simulations with selected mutation having allele frequency >0.25, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.
Chrom | Start | End | Genes |
---|---|---|---|
1 | 104500001 | 104600000 | |
2 | 109360001 | 109460000 | LIMS1; RANBP2; CCDC138; EDAR |
2 | 160160001 | 160280000 | TANC1; WDSUB1; BAZ2B |
3 | 114480001 | 114620000 | ZBTB20 |
4 | 54240001 | 54340000 | SCFD2; FIP1L1; LNX1 |
5 | 39220001 | 39320000 | FYB; C9; DAB2 |
6 | 28180001 | 28320000 | ZSCAN16-AS1; ZSCAN16; ZKSCAN8; ZSCAN9; ZKSCAN4; NKAPL; PGBD1; ZSCAN31; ZKSCAN3; ZSCAN12; ZSCAN23 |
8 | 143440001 | 143560000 | TSNARE1; BAI1 |
9 | 16700001 | 16820000 | BNC2 |
12 | 85780001 | 85880000 | ALX1 |
19 | 20220001 | 20380000 | ZNF682; ZNF90; ZNF486 |
19 | 33580001 | 33740000 | RHPN2; GPATCH1; WDR88; LRP3; SLC7A10 |
20 | 62100001 | 62280000 | CHRNA4; KCNQ2; EEF1A2; PPDPF; PTK6; SRMS; C20orf195; HELZ2; GMEB2; STMN3; RTEL1; TNFRSF6B; ARFRP1; ZGPAT; LIME1; SLC2A4RG; ZBTB46 |
21 | 25840001 | 25940000 |
Top ranking gene candidates corresponding to Denisovan AI in Melanesians.
We show genes which overlap, or are within 100 kbp of, the 30 highest ranked 100 kbp intervals. Adjacent intervals have been merged. The CNN was trained using only AI simulations with selected mutation having allele frequency >0.25, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.
Chrom | Start | End | Genes |
---|---|---|---|
2 | 129960001 | 130060000 | |
3 | 3740001 | 3840000 | SUMF1; LRRN1 |
4 | 41980001 | 42080000 | TMEM33; DCAF4L1; SLC30A9; BEND4 |
5 | 420001 | 520000 | PDCD6; AHRR; C5orf55; EXOC3; CTD-2228K2.5; SLC9A3; CEP72 |
6 | 74640001 | 74740000 | |
6 | 81960001 | 82060000 | |
6 | 137920001 | 138120000 | TNFAIP3 |
7 | 25100001 | 25200000 | OSBPL3; CYCS; C7orf31; NPVF |
7 | 38020001 | 38120000 | EPDR1; NME8; SFRP4; STARD3NL |
7 | 121160001 | 121260000 | |
8 | 3040001 | 3140000 | CSMD1 |
12 | 84640001 | 84740000 | |
12 | 108240001 | 108340000 | PRDM4; ASCL4 |
12 | 114020001 | 114280000 | RBM19 |
14 | 61860001 | 61960000 | PRKCH |
14 | 63120001 | 63220000 | KCNH5 |
14 | 96700001 | 96820000 | BDKRB2; BDKRB1; ATG2B; GSKIP; AK7 |
15 | 55260001 | 55400000 | RSL24D1; RAB27A |
16 | 62600001 | 62700000 | |
16 | 78360001 | 78460000 | WWOX |
18 | 22060001 | 22160000 | OSBPL1A; IMPACT; HRH4 |
22 | 19040001 | 19140000 | DGCR5; DGCR2; DGCR14; TSSK2; GSC2; SLC25A1; CLTCL1 |
Top ranking gene candidates corresponding to Neanderthal AI in Europeans.
We show genes which overlap, or are within 100 kbp of, the 30 highest ranked 100 kbp intervals. Adjacent intervals have been merged. The CNN was trained using only AI simulations with selected mutation having allele frequency >5%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02 .c
Chrom | Start | End | Genes |
---|---|---|---|
1 | 39420001 | 39520000 | RRAGC; MYCBP; GJA9; RHBDL2; AKIRIN1; NDUFS5; MACF1 |
2 | 159880001 | 160280000 | TANC1; WDSUB1; BAZ2B |
2 | 180060001 | 180160000 | SESTD1 |
2 | 227800001 | 227900000 | RHBDD1; COL4A4 |
2 | 238820001 | 238960000 | LRRFIP1; RBM44; RAMP1; UBE2F; SCLY; ESPNL; KLHL30 |
3 | 114500001 | 114600000 | ZBTB20 |
5 | 57960001 | 58060000 | RAB3C |
6 | 28160001 | 28380000 | ZSCAN16-AS1; ZSCAN16; ZKSCAN8; ZSCAN9; ZKSCAN4; NKAPL; PGBD1; ZSCAN31; ZKSCAN3; ZSCAN12; ZSCAN23; GPX6 |
8 | 17060001 | 17160000 | MICU3; ZDHHC2; CNOT7; VPS37A; MTMR7 |
8 | 91840001 | 91940000 | TMEM64; NECAB1; TMEM55A |
9 | 16700001 | 16860000 | BNC2 |
10 | 11800001 | 11900000 | ECHDC3; PROSER2; UPF2 |
11 | 37740001 | 37840000 | |
19 | 20260001 | 20360000 | ZNF90; ZNF486 |
19 | 33580001 | 33700000 | RHPN2; GPATCH1; WDR88; LRP3; SLC7A10 |
20 | 14340001 | 14440000 | MACROD2; FLRT3 |
Top ranking gene candidates corresponding to Denisovan AI in Melanesians.
We show genes which overlap, or are within 100 kbp of, the 30 highest ranked 100 kbp intervals. Adjacent intervals have been merged. The CNN was trained using only AI simulations with selected mutation having allele frequency >5%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.
Chrom | Start | End | Genes |
---|---|---|---|
1 | 2880001 | 2980000 | ACTRT2; LINC00982; PRDM16 |
1 | 220080001 | 220180000 | SLC30A10; EPRS; BPNT1; IARS2 |
2 | 221040001 | 221140000 | |
3 | 15400001 | 15500000 | SH3BP5; METTL6; EAF1; COLQ |
4 | 41960001 | 42100000 | TMEM33; DCAF4L1; SLC30A9; BEND4 |
5 | 135440001 | 135540000 | TGFBI; SMAD5-AS1; SMAD5; TRPC7 |
6 | 81980001 | 82120000 | FAM46A |
7 | 121160001 | 121260000 | |
9 | 95500001 | 95600000 | IPPK; BICD2; ZNF484 |
10 | 59660001 | 59760000 | |
12 | 80780001 | 80880000 | OTOGL; PTPRQ |
12 | 84620001 | 84740000 | |
14 | 57620001 | 57760000 | EXOC5; AP5M1; NAA30 |
17 | 29480001 | 29720000 | NF1; OMG; EVI2B; EVI2A; RAB11FIP4 |
18 | 38180001 | 38320000 | |
20 | 54340001 | 54440000 |
Loss and accuracy for CNNs after training for three epochs, as reported by Keras/Tensorflow, for the training and validation datasets.
Binary cross-entropy was used for the loss function.
Demographic model | Hyperparameters | Training | Validation | ||
---|---|---|---|---|---|
Loss | Accuracy | Loss | Accuracy | ||
A1 | AF>0.05 | 0.1592 | 0.9458 | 0.1618 | 0.9468 |
A1 | AF>0.25 | 0.1224 | 0.9585 | 0.1265 | 0.9578 |
A1 | AF>0.25; unphased | 0.1347 | 0.9537 | 0.1368 | 0.9530 |
B | AF>0.05; unphased | 0.3415 | 0.8439 | 0.3441 | 0.8439 |
B | AF>0.25; unphased | 0.3546 | 0.8372 | 0.3583 | 0.8376 |
Parameter values used for simulating Demographic Model A1.
A Demes-format YAML file for each demographic model is available from the genomatnn git repository.
Parameter | Description | Value | Units | Source |
---|---|---|---|---|
ancestral pop. size | 18500 | Kuhlwilm et al., 2016 | ||
Neanderthal pop. size | 3400 | Kuhlwilm et al., 2016 | ||
YRI pop. size | 27600 | Kuhlwilm et al., 2016 | ||
CEU bottleneck pop. size | 1080 | Ragsdale and Gravel, 2019 | ||
CEU growth-start pop. size | 1450 | Ragsdale and Gravel, 2019 | ||
CEU current pop. size | 13377 | |||
CEU growth rate | 0.00202 | Ragsdale and Gravel, 2019 | ||
CEU time at growth start | 31.9 | kya | Ragsdale and Gravel, 2019 | |
Nea/other split time | 550 | kya | Prüfer et al., 2017 | |
CEU/YRI split time | 65.7 | kya | Ragsdale and Gravel, 2019 | |
time of Nea → CEU gene flow | 55 | kya | Prüfer et al., 2017 | |
generation time | 29 | years | Prüfer et al., 2017 | |
Nea → CEU admixture proportion | 2.25 | Prüfer et al., 2017 | ||
sampling time | 115 | kya | Prüfer et al., 2017 | |
sampling time | 55 | kya | Prüfer et al., 2017 | |
sample size | 2 | diploid individuals | ||
sample size | 108 | diploid individuals | ||
sample size | 99 | diploid individuals | ||
selection coefficient | ||||
selection onset (sweep) | Unif(1, ) | kya | ||
mutation (sweep) | Unif(, ) | kya | ||
selection onset (AI) | Unif(1, ) | kya | ||
mutation (AI) | Unif(, ) | kya |