The AUPRC values were calculated based on performance on the test set formed by 1000 bp sequences from chr2, held out from training and validation. The boxplots show summary of performance across …
The AUROC values were calculated based on performance on the test set formed by 1000 bp sequences from chr2, held out from training and validation. The boxplots show summary of performance across …
Boxplots represent summary of 100 individual CNN models differing in the size of convolutional filters of the first layer. Informative filters represent filters with standard deviation of filter …
(A) Distribution of CNN-predicted regulatory variants (q < 0.05) in the six broader CNN feature groups. (B) Enrichment of variants predicted to affect the CNN feature groups within variant list …
Summary of CNN predictions for all variants from T2D GWAS credible sets.
(A) Regulatory variants (black) are enriched among variants with highest genetic PPAs (gPPAs) over permuted background (blue). (B) Regulatory variants (black) are enriched among variants with …
(A) Comparison of -log10-transformed q-values from the islet CNN ensemble with functional significance scores generated by the omni-tissue DeepSEA model (B) Comparison of -log10-transformed q-values …
Genetic PPAs (gPPAs) are shown in the top panels as blue points, functional PPAs (fPPAs) are shown in the middle panels as green points, and -log10-transformed q-values from CNN predictions are …
(A) Genetic PPA (gPPA), functional PPA (fPPA) and -log10(q-value) of the CNN islet regulatory predictions for both variants. (B) Allelic imbalance in open chromatin across four pancreatic islets …
Luciferase intensity values.
Sequence logos of representative CNN filters are shown. Transcription factor binding motifs redundancy was removed with Tomtom motif similarity search with other motifs detected by CNNs with q < 0.05…
Motif name/TF | Representative CNN filter logo | Motif logo | CNNs with filter match q < 0.05 | Similar TF motifs discovered |
---|---|---|---|---|
M6114_1.02 FOXA1 | ![]() | ![]() | 838 | M6234_1.02 FOXA3, M6241_1.02 FOXJ2, M4567_1.02 FOXA2… |
M4427_1.02 CTCF | ![]() | ![]() | 833 | M4612_1.02 CTCFL |
M1906_1.02 SP1 | ![]() | ![]() | 677 | M2314_1.02 SP2, M6482_1.02 SP3, M6535_1.02 WT1… |
M2296_1.02 MAFK | ![]() | ![]() | 629 | M4629_1.02 NFE2, M4572_1.02 MAFF, M4681_1.02 BACH2… |
M2292_1.02 JUND | ![]() | ![]() | 571 | M4623_1.02 JUNB, M2278_1.02 FOS, M4619_1.02 FOSL1… |
M1528_1.02 RFX6 | ![]() | ![]() | 556 | M4476_1.02 RFX5, M1529_1.02 RFX7, M5777_1.02 RFX4… |
M4640_1.02 ZBTB7A | ![]() | ![]() | 530 | M6539_1.02 ZBTB7B, M6552_1.02 ZNF148, M6422_1.02 PLAGL1… |
M1970_1.02 NFIC | ![]() | ![]() | 484 | M5664_1.02 NFIX, M5660_1.02 NFIA, M5662_1.02 NFIB |
M2277_1.02 FLI1 | ![]() | ![]() | 442 | M6222_1.02 ETV4, M2275_1.02 ELF1, M5398_1.02 ERF… |
M6281_1.02 HNF1A | ![]() | ![]() | 418 | M6282_1.02 HNF1B, M6546_1.02 ZFHX3 |
Supplementary Tables.
(STable 1) Summary of publicly available datasets used to train the CNN models of human pancreatic islet epigenomic features. Where indicated, the original raw data was reprocessed with the default setting of either the ATAC-seq/DNase-seq pipeline (available from: https://github.com/kundajelab/atac_dnase_pipelines), or the AQUAS TF and histone ChIP-seq pipeline (available from: https://github.com/kundajelab/chipseq_pipeline), using the human genome GRCh37 as reference. (STable 2) Tested sets of CNN hyperparameters. Convolutional neural networks with each set of hyperparameters differing in numbers and sizes of convolutional filters were trained 100 times, for a total of 1000 CNNs trained. (STable3) Full list of transcription factor binding motifs with <5% FDR sequence match to motifs activating convolutional filters from the first layers of the 1000 CNN ensemble. No motif redundancy removal was applied here. (STable 4) CNN regulatory predictions at 28 T2D association signals fine-mapped to a single most likely causal variant with genetic PPA (gPPA) > = 0.80 or functional PPA (fPPA) > = 0.80. (STable 5) CNN regulatory predictions at signals with at least two variants with functional PPAs (fPPAs) > = 0.2. These are the signals where incorporating CNN predictions downstream of fine-mapping can yield the largest benefits. The table lists all the variants with fPPA > = 0.05 at these signals, together with their CNN q-value (lowest_Q), and the corresponding top scoring CNN feature and the mean predicted score difference across the 1000 trained models. (STable 6) Primer sequences used for cloning of the Prox1 enhancer. Prox1_enhancer_Forward (Reverse)_internal were designed for sequence validation. Restriction enzymes NheI and XhoI were used for all subsequent cloning. SDM = site directed mutagenesis.