Deep learning models predict regulatory variants in pancreatic islets and refine type 2 diabetes association signals

Abstract
Data availability
Article and author information
Metrics

Abstract

Genome-wide association analyses have uncovered multiple genomic regions associated with T2D, but identification of the causal variants at these remains a challenge. There is growing interest in the potential of deep learning models - which predict epigenome features from DNA sequence - to support inference concerning the regulatory effects of disease-associated variants. Here, we evaluate the advantages of training convolutional neural network (CNN) models on a broad set of epigenomic features collected in a single disease-relevant tissue – pancreatic islets in the case of type 2 diabetes (T2D) - as opposed to models trained on multiple human tissues. We report convergence of CNN-based metrics of regulatory function with conventional approaches to variant prioritization – genetic fine-mapping and regulatory annotation enrichment. We demonstrate that CNN-based analyses can refine association signals at T2D-associated loci and provide experimental validation for one such signal. We anticipate that these approaches will become routine in downstream analyses of GWAS.

Data availability

The datasets analysed during the current study are available in the public repositories under accessions listed in STable 1.

The following previously published data sets were used

(2018) Integration of human pancreatic islet genomic data refines regulatory mechanisms at Type 2 Diabetes susceptibility loci
European Genome-Phenome Archive, EGAS00001002592.

https://ega-archive.org/studies/EGAS00001002592
1. Ackermann AM
2. Wang Z
3. Schug J
4. Naji A
5. Kaestner KH
(2015) Integration of ATAC-seq and RNA-seq Identifies Human Alpha Cell and Beta Cell Signature Genes
NCBI Gene Expression Omnibus, GSE76268.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE76268
(2014) Pancreatic islet epigenomics reveals enhancer clusters that are enriched in Type 2 diabetes risk variants
EMBL-EBI ArrayExpress, E-MTAB-1919.

https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-1919/
1. ENCODE DCC
(2011) Duke_DnaseSeq_PanIslets
NCBI Gene Expression Omnibus, GSM816660.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM816660
1. ENCODE DCC
(2012) UNC_FaireSeq_PanIslets
NCBI Gene Expression Omnibus, GSM864346.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM864346
1. ENCODE DCC
(2012) DNaseI/FAIRE/ChIP Synthesis from ENCODE/OpenChrom(Duke/UNC/UTA)
NCBI Gene Expression Omnibus, GSE40833.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE40833
1. Bhandare R
2. Schug J
3. Le Lay J
4. Fox A
5. Smirnova O
6. Liu C
7. Naji A
8. Kaestner KH
(2010) ChIP-Seq of human normal pancreatic islets with anti-histone antibodies to analyse histone modifications
EMBL-EBI ArrayExpress, E-MTAB-189.

https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-189/
1. Bernstein BE
2. Meissner A
(2010) BI Human Reference Epigenome Mapping Project: ChIP-Seq in human subject
NCBI Gene Expression Omnibus, GSE19465.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE19465
(2009) UCSF-UBC Human Reference Epigenome Mapping Project
NCBI Gene Expression Omnibus, GSE16368.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16368
1. Bramswig NC
2. Everett LJ
3. Schug J
4. Dorrell C
5. Liu C
6. Luo Y
7. Streeter PR
8. Naji A
9. Grompe M
10. Kaestner KH
(2013) Epigenomic plasticity enables human pancreatic alpha to beta cell reprogramming
NCBI Gene Expression Omnibus, GSE50386.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50386
1. Stitzel ML
2. Sethupathy P
3. Pearson DS
4. Chines PS
5. Song L
6. Erdos MR
7. Welch R
8. Parker SC
9. Boyle AP
10. Scott LJ
11. Margulies EH
12. Boehnke M
13. Furey TS
14. Crawford GE
15. Collins FS
(2010) Global epigenomic analysis of primary human pancreatic islets provides insights into type 2 diabetes susceptibility loci
NCBI Gene Expression Omnibus, GSE23784.

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE23784

Article and author information

Author details

Agata Wesolowska-Andersen

The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom

Competing interests
No competing interests declared.

"This ORCID iD identifies the author of this article:" 0000-0001-8688-2814
Grace Zhuo Yu

Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, United Kingdom

Competing interests
No competing interests declared.
Vibe Nylander

Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, United Kingdom

Competing interests
No competing interests declared.
Fernando Abaitua

The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom

Competing interests
No competing interests declared.
Matthias Thurner

The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom

Competing interests
No competing interests declared.

"This ORCID iD identifies the author of this article:" 0000-0001-7329-9769
Jason M Torres

The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom

Competing interests
No competing interests declared.

"This ORCID iD identifies the author of this article:" 0000-0002-7537-7035
Anubha Mahajan

The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom

Competing interests
No competing interests declared.
Anna L Gloyn

The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom

Competing interests
No competing interests declared.

"This ORCID iD identifies the author of this article:" 0000-0003-1205-1844
Mark I McCarthy

Department of Human Genetics, Genentech, South San Francisco, United States

For correspondence
mark.mccarthy@drl.ox.ac.uk

Competing interests
Mark I McCarthy, Senior editor, eLife; has served on advisory panels for Pfizer, NovoNordisk, Zoe Global; has received honoraria from Merck, Pfizer, NovoNordisk and Eli Lilly; has stock options in Zoe Global and has received research funding from Abbvie, Astra Zeneca, Boehringer Ingelheim, Eli Lilly, Janssen, Merck, NovoNordisk, Pfizer, Roche, Sanofi Aventis, Servier & Takeda. As of June 2019, is an employee of Genentech, and holds stock in Roche.

"This ORCID iD identifies the author of this article:" 0000-0002-4393-0510

Funding

Wellcome (099673/Z/12/Z)

Matthias Thurner

Horizon 2020 Framework Programme (T2D Systems)

Anna L Gloyn

NIH Clinical Center (U01-DK105535)

Anna L Gloyn
Mark I McCarthy

NIH Clinical Center (U01-DK085545)

Anna L Gloyn

National Institute for Health Research (NF-SI-0617-10090)

Anna L Gloyn

Wellcome (090532)

Mark I McCarthy

Wellcome (106130)

Anna L Gloyn
Mark I McCarthy

Wellcome (098381)

Mark I McCarthy

Wellcome (203141)

Anna L Gloyn
Mark I McCarthy

Wellcome (212259)

Mark I McCarthy

Wellcome (095101)

Anna L Gloyn

Wellcome (200837)

Anna L Gloyn

Medical Research Council (MR/L020149/1)

Anna L Gloyn

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.