Figures and data in Detecting adaptive introgression in human evolution using convolutional neural networks

Figures
Tables
Additional files

42 figures, 6 tables and 1 additional file

Figures

Figure 1 with 2 supplements

Download asset Open asset

A schematic overview of how genomatnn detects adaptive introgression.

We first simulate a demographic history that includes introgression, such as Demographic Model A1 shown in (A), using the SLiM engine in stdpopsim. Parameter values for this model are given in Appendix 3—table 1. Three distinct scenarios are simulated for a given demographic model: neutral mutations only, a sweep in the recipient population, and adaptive introgression. The tree sequence file from each simulation is converted into a genotype matrix for input to the CNN. (B) shows a genotype matrix from an adaptive introgression simulation, where lighter pixels indicate a higher density of minor alleles, and haplotypes within each population are sorted left-to-right by similarity to the donor population (Nea). In this example, haplotype diversity is low in the recipient population (CEU), which closely resembles the donor (Nea). Thousands of simulations are produced for each simulation scenario, and their genotype matrices are used to train a binary-classification CNN (C). The CNN is trained to output Pr[AI], the probability that the input matrix corresponds to adaptive introgression. Finally, the trained CNN is applied to genotype matrices derived from a VCF/BCF file (D).

Figure 1—figure supplement 1

Download asset Open asset

Schematic overview of Demographic Model A1 and A2.

Schematic overview of Demographic Model A1 (A) and A2 (B). Each population is depicted as a tube, where the tube’s width is proportional to the population’s size at any given time. Horizontal lines with arrows indicate either an ancestor/descendant relation (thick solid lines, open arrow heads), an admixture pulse (dashed lines, closed arrow heads), or a period of continuous migration (thin solid lines, closed arrow heads). The time of continuous migration lines were drawn randomly from the time interval over which the migrations occur. A Demes-format YAML file for each demographic model is available from the genomatnn git repository.

Figure 1—figure supplement 2

Download asset Open asset

Schematic overview of Demographic Model B.

Overview of the Jacobs et al., 2019 demographic model (A), featuring two pulses of Denisovan gene flow into Papuans, which we implemented as the PapuansOutOfAfrica_10J19 model in stdpopsim. The same model is shown in (B), zoomed in to more clearly show the many events occurring between generations 800–2300. Each population is depicted as a tube, where the tube’s width is proportional to the population’s size at any given time. Horizontal lines with arrows indicate either an ancestor/descendant relation (thick solid lines, open arrow heads), an admixture pulse (dashed lines, closed arrow heads), or a period of continuous migration (thin solid lines, closed arrow heads). The time of continuous migration lines were drawn randomly from the time interval over which the migrations occur. DenA and NeaA are the sampled populations corresponding to Altai Denisovan and Altai Neanderthal, while Den1, Den2, and Nea1 correspond to introgressing lineages. A Demes-format YAML file for each demographic model is available from the genomatnn git repository.

Figure 2 with 2 supplements

Download asset Open asset

CNN performance on validation simulations for Demographic Model A.

The CNN was trained using only AI simulations with selected mutation having allele frequency >0.25. (A) Confusion matrix. For the two prediction categories, either 'not AI' or AI, we show the proportion attributed to each of the true (simulated) scenarios. (B) Average CNN prediction for AI scenarios, binned by selection coefficient, $s$ , and time of onset of selection $T_{s e l}$ . (C) ROC curves, precision-recall curves and MCC-F₁ curves. The positive condition is AI. The negative conditions are shown using different line styles/colours. The circles indicate the point in ROC-space (respectively Precision-Recall-space, and MCC-F₁-space) when using the threshold Pr[AI]>0.5 for classifying a genotype matrix as AI. DFE: distribution of fitness effects. TP: true positives; FP: false positives; TN: true negatives; FN: false negatives; TPR: true positive rate; FPR: false positive rate; ROC: Receiver operating characteristics; MCC: Mathews correlation coefficient; F₁: harmonic mean of precision and recall.

Figure 2—figure supplement 1

Download asset Open asset

Performance evaluation for Demographic Model B.

CNN performance on validation simulations for Demographic Model B with unphased data. The CNN was trained using only AI simulations with selected mutation having allele frequency > 25%. (A) Confusion matrix. For the two prediction categories, either 'not AI' or AI, we show the proportion attributed to each of the true (simulated) scenarios. (B) Average CNN prediction for AI scenarios, binned by selection coefficient, $s$ , and time of onset of selection $T_{s e l}$ . (C) ROC curves, precision-recall curves and MCC-F₁ curves. The positive condition is AI. The negative conditions are shown using different line styles/colours. The circles indicate the point in ROC-space (respectively Precision-Recall-space, and MCC-F₁-space) when using the threshold Pr[AI]>0.5 for classifying a genotype matrix as AI. DFE: distribution of fitness effects. TP: true positives; FP: false positives; TN: true negatives; FN: false negatives; TPR: true positive rate; FPR: false positive rate; ROC: Receiver operating characteristics; MCC: Mathews correlation coefficient; F₁: harmonic mean of precision and recall.

Figure 2—figure supplement 2

Download asset Open asset

Comparison to other methods and performance evaluation with misspecified demographic models.

Unit-normalised Matthews correlation coefficient (MCC) versus F₁ score (the harmonic mean of accuracy and precision). A value of 0.5 on the vertical axis corresponds to the performance of a random classifier. The point at coordinate $(1, 1)$ marked with a black dot corresponds to 100% true positives and 0% false negatives. Lines in MCC-F₁ space were drawn by calculating the MCC and F₁ values for 100 false-positive rates between 0 and 100, and the point closest to $(1, 1)$ is indicated with the symbol shown in the legend. This point may not correspond to an acceptably low false-positive rate, but for the classifiers shown here it is indicative of the method’s overall performance. In all panels, condition positive is the AI simulation scenario, and the condition negative varies by panel column (indicated at top). The 'weakly misspecified' row used simulations of Model A1 as the training/null, and evaluated the method on simulations of Model A2. The 'strongly misspecified' row used simulations of Model A1 as the training/null, and evaluated the method on simulations of Model B.

Figure 3

Download asset Open asset

Saliency maps, showing the CNN’s attention across the input matrices for each simulated scenario, calculated for the CNN trained on Demographic Model A, filtered for beneficial allele frequency >0.25.

Each panel shows the average gradient over 300 input matrices encoding either neutral (top), sweep (middle), or AI (bottom) simulations. Pink/purple colours indicate larger gradients, where small changes in the genotype matrix have a relatively larger influence over the CNN’s prediction. Columns in the input matrix correspond to haplotypes from the populations labelled at the bottom.

Figure 4 with 4 supplements

Download asset Open asset

Comparison of Manhattan plots using beta-calibrated output probabilities for different class ratios.

Each row indicates a single CNN, with equivalent data filtering. Each column indicates different class ratios used for calibration (Neutral:Sweep:AI). AF = Minimum beneficial allele frequency.

Figure 4—figure supplement 1

Download asset Open asset

Reliability plots for Demographic Model A1 with AF > 5%.

Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model A1 with a minimum beneficial allele frequency of 5%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots ( $Z$ ), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).

Figure 4—figure supplement 2

Download asset Open asset

Reliability plots for Demographic Model A1 with AF > 25%.

Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model A1 with a minimum beneficial allele frequency of 25%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots ( $Z$ ), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).

Figure 4—figure supplement 3

Download asset Open asset

Reliability plots for Demographic Model B with AF > 5%.

Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model B with a minimum beneficial allele frequency of 5%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots ( $Z$ ), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).

Figure 4—figure supplement 4

Download asset Open asset

Reliability plots for Demographic Model B with AF > 25%.

Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model B with a minimum beneficial allele frequency of 25%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots ( $Z$ ), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).

Figure 5

Download asset Open asset

Application of the trained CNN to the Vindija and Altai Neanderthals, and 1000 genomes populations YRI and CEU.

The CNN was applied to overlapping 100 kbp windows, moving along the chromosome in steps of size 20 kbp. The CNN was trained using only AI simulations with selected mutation having allele frequency > 25%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

Figure 6

Download asset Open asset

Application of the trained CNN to the Altai Denisovan and Altai Neanderthal, 1000 genomes YRI populations, and IGDP Melanesians.

The CNN was applied to overlapping 100 kbp windows, moving along the chromosome in steps of size 20 kbp. The CNN was trained using only AI simulations with selected mutation having allele frequency > 25%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

Appendix 4—figure 1

Download asset Open asset

Haplotype plot for the candidate region chr1:104500001–104600000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 2

Download asset Open asset

Haplotype plot for the candidate region chr2:109360001–109460000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 3

Download asset Open asset

Haplotype plot for the candidate region chr2:160160001–160280000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 4

Download asset Open asset

Haplotype plot for the candidate region chr3:114480001–114620000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 5

Download asset Open asset

Haplotype plot for the candidate region chr4:54240001–54340000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 6

Download asset Open asset

Haplotype plot for the candidate region chr5:39220001–39320000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 7

Download asset Open asset

Haplotype plot for the candidate region chr6:28180001–28320000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 8

Download asset Open asset

Haplotype plot for the candidate region chr8:143440001–143560000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 9

Download asset Open asset

Haplotype plot for the candidate region chr9:16700001–16820000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 10

Download asset Open asset

Haplotype plot for the candidate region chr12:85780001–85880000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 11

Download asset Open asset

Haplotype plot for the candidate region chr19:20220001–20380000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 12

Download asset Open asset

Haplotype plot for the candidate region chr19:33580001–33740000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 13

Download asset Open asset

Haplotype plot for the candidate region chr20:62100001–62280000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 14

Download asset Open asset

Haplotype plot for the candidate region chr21:25840001–25940000 in the Neanderthal-into-European AI scan.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 5—figure 1

Download asset Open asset

Genotype plot for the candidate region chr2:129960001–130060000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 2

Download asset Open asset

Genotype plot for the candidate region chr3:3740001–3840000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 3

Download asset Open asset

Genotype plot for the candidate region chr4:41980001–42080000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 4

Download asset Open asset

Genotype plot for the candidate region chr5:420001–520000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 5

Download asset Open asset

Genotype plot for the candidate region chr6:74640001–74740000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 6

Download asset Open asset

Genotype plot for the candidate region chr6:81960001–82060000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 7

Download asset Open asset

Genotype plot for the candidate region chr6:137920001–138120000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 8

Download asset Open asset

Genotype plot for the candidate region chr7:25100001–25200000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 9

Download asset Open asset

Genotype plot for the candidate region chr7:38020001–38120000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 10

Download asset Open asset

Genotype plot for the candidate region chr7:121160001–121260000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 11

Download asset Open asset

Genotype plot for the candidate region chr8:3040001–3140000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 12

Download asset Open asset

Genotype plot for the candidate region chr12:84640001–84740000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 13

Download asset Open asset

Genotype plot for the candidate region chr12:108240001–108340000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 14

Download asset Open asset

Genotype plot for the candidate region chr12:114020001–114280000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 15

Download asset Open asset

Genotype plot for the candidate region chr14:61860001–61960000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 16

Download asset Open asset

Genotype plot for the candidate region chr14:63120001–63220000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 17

Download asset Open asset

Genotype plot for the candidate region chr14:96700001–96820000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 18

Download asset Open asset

Genotype plot for the candidate region chr15:55260001–55400000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 19

Download asset Open asset

Genotype plot for the candidate region chr16:62600001–62700000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 20

Download asset Open asset

Genotype plot for the candidate region chr16:78360001–78460000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 21

Download asset Open asset

Genotype plot for the candidate region chr18:22060001–22160000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 22

Download asset Open asset

Genotype plot for the candidate region chr22:19040001–19140000 in the Denisovan-into-Melanesian AI scan.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Tables

Table 1

Top ranking gene candidates corresponding to Neanderthal AI in Europeans.

We show genes which overlap, or are within 100 kbp of, the 30 highest ranked 100 kbp intervals. Adjacent intervals have been merged. The CNN was trained using only AI simulations with selected mutation having allele frequency >0.25, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

Chrom	Start	End	Genes
1	104500001	104600000
2	109360001	109460000	LIMS1; RANBP2; CCDC138; EDAR
2	160160001	160280000	TANC1; WDSUB1; BAZ2B
3	114480001	114620000	ZBTB20
4	54240001	54340000	SCFD2; FIP1L1; LNX1
5	39220001	39320000	FYB; C9; DAB2
6	28180001	28320000	ZSCAN16-AS1; ZSCAN16; ZKSCAN8; ZSCAN9; ZKSCAN4; NKAPL; PGBD1; ZSCAN31; ZKSCAN3; ZSCAN12; ZSCAN23
8	143440001	143560000	TSNARE1; BAI1
9	16700001	16820000	BNC2
12	85780001	85880000	ALX1
19	20220001	20380000	ZNF682; ZNF90; ZNF486
19	33580001	33740000	RHPN2; GPATCH1; WDR88; LRP3; SLC7A10
20	62100001	62280000	CHRNA4; KCNQ2; EEF1A2; PPDPF; PTK6; SRMS; C20orf195; HELZ2; GMEB2; STMN3; RTEL1; TNFRSF6B; ARFRP1; ZGPAT; LIME1; SLC2A4RG; ZBTB46
21	25840001	25940000

Table 2

Top ranking gene candidates corresponding to Denisovan AI in Melanesians.

Chrom	Start	End	Genes
2	129960001	130060000
3	3740001	3840000	SUMF1; LRRN1
4	41980001	42080000	TMEM33; DCAF4L1; SLC30A9; BEND4
5	420001	520000	PDCD6; AHRR; C5orf55; EXOC3; CTD-2228K2.5; SLC9A3; CEP72
6	74640001	74740000
6	81960001	82060000
6	137920001	138120000	TNFAIP3
7	25100001	25200000	OSBPL3; CYCS; C7orf31; NPVF
7	38020001	38120000	EPDR1; NME8; SFRP4; STARD3NL
7	121160001	121260000
8	3040001	3140000	CSMD1
12	84640001	84740000
12	108240001	108340000	PRDM4; ASCL4
12	114020001	114280000	RBM19
14	61860001	61960000	PRKCH
14	63120001	63220000	KCNH5
14	96700001	96820000	BDKRB2; BDKRB1; ATG2B; GSKIP; AK7
15	55260001	55400000	RSL24D1; RAB27A
16	62600001	62700000
16	78360001	78460000	WWOX
18	22060001	22160000	OSBPL1A; IMPACT; HRH4
22	19040001	19140000	DGCR5; DGCR2; DGCR14; TSSK2; GSC2; SLC25A1; CLTCL1

Appendix 1—table 1

Top ranking gene candidates corresponding to Neanderthal AI in Europeans.

We show genes which overlap, or are within 100 kbp of, the 30 highest ranked 100 kbp intervals. Adjacent intervals have been merged. The CNN was trained using only AI simulations with selected mutation having allele frequency >5%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02 .c

Chrom	Start	End	Genes
1	39420001	39520000	RRAGC; MYCBP; GJA9; RHBDL2; AKIRIN1; NDUFS5; MACF1
2	159880001	160280000	TANC1; WDSUB1; BAZ2B
2	180060001	180160000	SESTD1
2	227800001	227900000	RHBDD1; COL4A4
2	238820001	238960000	LRRFIP1; RBM44; RAMP1; UBE2F; SCLY; ESPNL; KLHL30
3	114500001	114600000	ZBTB20
5	57960001	58060000	RAB3C
6	28160001	28380000	ZSCAN16-AS1; ZSCAN16; ZKSCAN8; ZSCAN9; ZKSCAN4; NKAPL; PGBD1; ZSCAN31; ZKSCAN3; ZSCAN12; ZSCAN23; GPX6
8	17060001	17160000	MICU3; ZDHHC2; CNOT7; VPS37A; MTMR7
8	91840001	91940000	TMEM64; NECAB1; TMEM55A
9	16700001	16860000	BNC2
10	11800001	11900000	ECHDC3; PROSER2; UPF2
11	37740001	37840000
19	20260001	20360000	ZNF90; ZNF486
19	33580001	33700000	RHPN2; GPATCH1; WDR88; LRP3; SLC7A10
20	14340001	14440000	MACROD2; FLRT3

Appendix 1—table 2

Top ranking gene candidates corresponding to Denisovan AI in Melanesians.

We show genes which overlap, or are within 100 kbp of, the 30 highest ranked 100 kbp intervals. Adjacent intervals have been merged. The CNN was trained using only AI simulations with selected mutation having allele frequency >5%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

Chrom	Start	End	Genes
1	2880001	2980000	ACTRT2; LINC00982; PRDM16
1	220080001	220180000	SLC30A10; EPRS; BPNT1; IARS2
2	221040001	221140000
3	15400001	15500000	SH3BP5; METTL6; EAF1; COLQ
4	41960001	42100000	TMEM33; DCAF4L1; SLC30A9; BEND4
5	135440001	135540000	TGFBI; SMAD5-AS1; SMAD5; TRPC7
6	81980001	82120000	FAM46A
7	121160001	121260000
9	95500001	95600000	IPPK; BICD2; ZNF484
10	59660001	59760000
12	80780001	80880000	OTOGL; PTPRQ
12	84620001	84740000
14	57620001	57760000	EXOC5; AP5M1; NAA30
17	29480001	29720000	NF1; OMG; EVI2B; EVI2A; RAB11FIP4
18	38180001	38320000
20	54340001	54440000

Appendix 2—table 1

Loss and accuracy for CNNs after training for three epochs, as reported by Keras/Tensorflow, for the training and validation datasets.

Binary cross-entropy was used for the loss function.

Demographic model	Hyperparameters	Training		Validation
		Loss	Accuracy	Loss	Accuracy
A1	AF>0.05	0.1592	0.9458	0.1618	0.9468
A1	AF>0.25	0.1224	0.9585	0.1265	0.9578
A1	AF>0.25; unphased	0.1347	0.9537	0.1368	0.9530
B	AF>0.05; unphased	0.3415	0.8439	0.3441	0.8439
B	AF>0.25; unphased	0.3546	0.8372	0.3583	0.8376

Appendix 3—table 1

Parameter values used for simulating Demographic Model A1.

A Demes-format YAML file for each demographic model is available from the genomatnn git repository.

Parameter	Description	Value	Units	Source
$N_{A n c}$	ancestral pop. size	18500		Kuhlwilm et al., 2016
$N_{N e a}$	Neanderthal pop. size	3400		Kuhlwilm et al., 2016
$N_{Y R I}$	YRI pop. size	27600		Kuhlwilm et al., 2016
$N_{C E U 0}$	CEU bottleneck pop. size	1080		Ragsdale and Gravel, 2019
$N_{C E U 1}$	CEU growth-start pop. size	1450		Ragsdale and Gravel, 2019
$N_{C E U 2}$	CEU current pop. size	13377
$r_{C E U}$	CEU growth rate	0.00202		Ragsdale and Gravel, 2019
$T_{C E U 2}$	CEU time at growth start	31.9	kya	Ragsdale and Gravel, 2019
$T_{0}$	Nea/other split time	550	kya	Prüfer et al., 2017
$T_{1}$	CEU/YRI split time	65.7	kya	Ragsdale and Gravel, 2019
$T_{2}$	time of Nea → CEU gene flow	55	kya	Prüfer et al., 2017
$g$	generation time	29	years	Prüfer et al., 2017
$α$	Nea → CEU admixture proportion	2.25		Prüfer et al., 2017
$T_{Altai}$	sampling time	115	kya	Prüfer et al., 2017
$T_{Vindija}$	sampling time	55	kya	Prüfer et al., 2017
$n_{Nean}$	sample size	2	diploid individuals
$n_{Afr}$	sample size	108	diploid individuals
$n_{Eur}$	sample size	99	diploid individuals
$s$	selection coefficient	$10^{Unif(-4,-1)}$
$T_{sel1}$	selection onset (sweep)	Unif(1, $T_{1}$ )	kya
$T_{mut1}$	mutation (sweep)	Unif( $T_{sel1}$ , $T_{1}$ )	kya
$T_{sel2}$	selection onset (AI)	Unif(1, $T_{2}$ )	kya
$T_{mut2}$	mutation (AI)	Unif( $T_{2}$ , $T_{0}$ )	kya

Additional files

Transparent reporting form: https://cdn.elifesciences.org/articles/64669/elife-64669-transrepform-v2.pdf
Download elife-64669-transrepform-v2.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Mendeley

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Graham Gower
Pablo Iáñez Picazo
Matteo Fumagalli
Fernando Racimo

(2021)

Detecting adaptive introgression in human evolution using convolutional neural networks

eLife 10:e64669.

https://doi.org/10.7554/eLife.64669

Share this article

Cite this article

A schematic overview of how genomatnn detects adaptive introgression.

Schematic overview of Demographic Model A1 and A2.

Schematic overview of Demographic Model B.

CNN performance on validation simulations for Demographic Model A.

Performance evaluation for Demographic Model B.

Comparison to other methods and performance evaluation with misspecified demographic models.

Saliency maps, showing the CNN’s attention across the input matrices for each simulated scenario, calculated for the CNN trained on Demographic Model A, filtered for beneficial allele frequency >0.25.

Comparison of Manhattan plots using beta-calibrated output probabilities for different class ratios.

Reliability plots for Demographic Model A1 with AF > 5%.

Reliability plots for Demographic Model A1 with AF > 25%.

Reliability plots for Demographic Model B with AF > 5%.

Reliability plots for Demographic Model B with AF > 25%.

Application of the trained CNN to the Vindija and Altai Neanderthals, and 1000 genomes populations YRI and CEU.

Application of the trained CNN to the Altai Denisovan and Altai Neanderthal, 1000 genomes YRI populations, and IGDP Melanesians.

Haplotype plot for the candidate region chr1:104500001–104600000 in the Neanderthal-into-European AI scan.

Haplotype plot for the candidate region chr2:109360001–109460000 in the Neanderthal-into-European AI scan.

Haplotype plot for the candidate region chr2:160160001–160280000 in the Neanderthal-into-European AI scan.

Haplotype plot for the candidate region chr3:114480001–114620000 in the Neanderthal-into-European AI scan.

Haplotype plot for the candidate region chr4:54240001–54340000 in the Neanderthal-into-European AI scan.

Haplotype plot for the candidate region chr5:39220001–39320000 in the Neanderthal-into-European AI scan.

Haplotype plot for the candidate region chr6:28180001–28320000 in the Neanderthal-into-European AI scan.

Haplotype plot for the candidate region chr8:143440001–143560000 in the Neanderthal-into-European AI scan.

Haplotype plot for the candidate region chr9:16700001–16820000 in the Neanderthal-into-European AI scan.

Haplotype plot for the candidate region chr12:85780001–85880000 in the Neanderthal-into-European AI scan.

Haplotype plot for the candidate region chr19:20220001–20380000 in the Neanderthal-into-European AI scan.

Haplotype plot for the candidate region chr19:33580001–33740000 in the Neanderthal-into-European AI scan.

Haplotype plot for the candidate region chr20:62100001–62280000 in the Neanderthal-into-European AI scan.

Haplotype plot for the candidate region chr21:25840001–25940000 in the Neanderthal-into-European AI scan.

Genotype plot for the candidate region chr2:129960001–130060000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr3:3740001–3840000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr4:41980001–42080000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr5:420001–520000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr6:74640001–74740000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr6:81960001–82060000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr6:137920001–138120000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr7:25100001–25200000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr7:38020001–38120000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr7:121160001–121260000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr8:3040001–3140000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr12:84640001–84740000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr12:108240001–108340000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr12:114020001–114280000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr14:61860001–61960000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr14:63120001–63220000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr14:96700001–96820000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr15:55260001–55400000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr16:62600001–62700000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr16:78360001–78460000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr18:22060001–22160000 in the Denisovan-into-Melanesian AI scan.

Genotype plot for the candidate region chr22:19040001–19140000 in the Denisovan-into-Melanesian AI scan.

Top ranking gene candidates corresponding to Neanderthal AI in Europeans.

Top ranking gene candidates corresponding to Denisovan AI in Melanesians.

Top ranking gene candidates corresponding to Neanderthal AI in Europeans.

Top ranking gene candidates corresponding to Denisovan AI in Melanesians.

Loss and accuracy for CNNs after training for three epochs, as reported by Keras/Tensorflow, for the training and validation datasets.

Parameter values used for simulating Demographic Model A1.

Transparent reporting form

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)