1. Genomics and Evolutionary Biology
  2. Human Biology and Medicine
Download icon

Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures

  1. Xinchen Wang
  2. Nathan R Tucker
  3. Gizem Rizki
  4. Robert Mills
  5. Peter HL Krijger
  6. Elzo de Wit
  7. Vidya Subramanian
  8. Eric Bartell
  9. Xinh-Xinh Nguyen
  10. Jiangchuan Ye
  11. Jordan Leyton-Mange
  12. Elena V Dolmatova
  13. Pim van der Harst
  14. Wouter de Laat
  15. Patrick T Ellinor
  16. Christopher Newton-Cheh
  17. David J Milan Is a corresponding author
  18. Manolis Kellis Is a corresponding author
  19. Laurie A Boyer Is a corresponding author
  1. Massachusetts Institute of Technology, United States
  2. Broad Institute of MIT and Harvard, United States
  3. Massachusetts General Hospital, United States
  4. Hubrecht Institute-KNAW, University Medical Center Utrecht, Netherlands
  5. University Medical Center Utrecht, Netherlands
  6. University of Groningen, Netherlands
  7. Center for Human Genetic Research, Massachusetts General Hospital, United States
Research Article
Cited
3
Views
3,431
Comments
0
Cite as: eLife 2016;5:e10557 doi: 10.7554/eLife.10557

Abstract

Genetic variants identified by genome-wide association studies explain only a modest proportion of heritability, suggesting that meaningful associations lie 'hidden' below current thresholds. Here, we integrate information from association studies with epigenomic maps to demonstrate that enhancers significantly overlap known loci associated with the cardiac QT interval and QRS duration. We apply functional criteria to identify loci associated with QT interval that do not meet genome-wide significance and are missed by existing studies. We demonstrate that these 'sub-threshold' signals represent novel loci, and that epigenomic maps are effective at discriminating true biological signals from noise. We experimentally validate the molecular, gene-regulatory, cellular and organismal phenotypes of these sub-threshold loci, demonstrating that most sub-threshold loci have regulatory consequences and that genetic perturbation of nearby genes causes cardiac phenotypes in mouse. Our work provides a general approach for improving the detection of novel loci associated with complex human traits.

https://doi.org/10.7554/eLife.10557.001

eLife digest

Most complex traits are governed by a large number of genetic contributors, each playing only a modest effect. This makes it difficult to identify the genetic variants that increase disease risk, hindering the discovery of new drug targets and the development of new therapeutics.

To overcome this limitation in discovery power, the field of human genetics has traditionally sought increasingly large groups, or cohorts, of afflicted and non-afflicted individuals. Studies of large cohorts are a powerful approach for discovering new disease genes, but such groups are often impractical and sometimes impossible to obtain. However, it has become possible to complement the genetic evidence found in disease association studies with biological evidence of the effects of disease-associated genetic variants.

Wang et al. focus specifically on genetic sites, or loci, that do not affect protein sequence but instead affect the non-coding control regions. These are known as enhancer elements, as they can enhance the expression of nearby genes. These loci constitute the majority of disease regions, and thus are extremely important, but their discovery has been hindered by our relatively poor understanding of the human genome.

Chemical modifications known as epigenomic marks are indicative of enhancer regions. By studying the factors that affect heart rhythm, Wang et al. show that specific combinations of epigenomic marks are enriched in known trait-associated regions. This knowledge was then used to prioritize the further investigation of genetic regions that genome-wide association studies had only weakly linked to heart rhythm alterations. Wang et al. directly confirmed that genetic differences in “sub-threshold” regions indeed alter the activity of these regulatory regions in human heart cells. Furthermore, mutating or perturbing the predicted target genes of the sub-threshold enhancers caused heart defects in mouse and zebrafish.

Wang et al. have demonstrated that epigenome maps can help to distinguish which sub-threshold regions from genome-wide association studies are more likely to contribute to a disease. This allows for the discovery of new disease genes with much smaller cohorts than would be needed otherwise, thus speeding up the development of new therapeutics by many years.

https://doi.org/10.7554/eLife.10557.002

Introduction

Genome-wide association studies (GWAS) hold the promise of identifying genetic loci that drive complex disease, however realizing this goal has been challenging due to the modest effect sizes of most common variants that require extremely large cohorts to detect with significance. The recent demonstration that disease-associated single nucleotide polymorphisms (SNPs) reside preferentially in enhancer elements provides a unique opportunity to leverage epigenomic maps of regulatory elements for understanding the function of known GWAS loci and for prioritizing new loci missed in current studies (Ernst et al., 2011; Cowper-Sal·lari et al., 2012; Maurano et al., 2012; Trynka et al., 2013). Despite increasingly large GWAS cohort sizes, the current catalog of genome-wide significant loci still explains only a modest proportion of the heritability for any given trait, with an excess of low p-value loci still below the genome-wide significance threshold (Arking et al., 2014). These observations suggest that many more signals with 'sub-threshold' significance remain to be identified, however, the recognition of biologically relevant sub-threshold loci is hindered by a higher false positive rate (Hindorff et al., 2009; Maher, 2008; Altshuler et al., 2008). Thus, new computational approaches that integrate genetic data with genome-wide epigenomic profiles are needed to use existing cohorts to discover new loci and genes that influence complex traits and diseases.

Here, we use epigenomic maps of 127 tissues from the Roadmap Epigenomics Project as a guide to systematically identify biologically relevant sub-threshold variants (Roadmap Epigenomics Consortium, 2015). As proof of concept, we focused on two cardiac traits with clinical significance: electrocardiographic QT interval reflecting myocardial repolarization and QRS duration reflecting cardiac conduction. These two traits have a clear tissue of origin and published GWASs have reported over a hundred QT/QRS loci, making these traits ideal for testing variants with sub-threshold significance (Supplementary file 1) (Hindorff et al., 2009; Maher, 2008; Altshuler et al., 2008). In particular, variation within QT interval length plays an important role in human disease, where extreme QT prolongation is associated with sudden cardiac death and can occur as an unintended side effect of many non-cardiac medications (Rabkin et al., 1982; Heist and Ruskin, 2010). We combine genome-wide maps of cardiac enhancer activity with the results from a large study of QT interval duration to identify dozens of novel QT loci with sub-threshold statistical significance. We provide multiple lines of evidence to show that these sub-threshold loci can alter enhancer activity, and we implicate specific genes through which these loci act to influence QT interval length. Importantly, we demonstrate that epigenetic signals can distinguish true biological signals from noise, thus bypassing the higher false positive rate that has previously hindered study of sub-threshold loci. We expect our work will uncover new genes involved in cardiac electrophysiology, aid in the identification of patients at risk for sudden cardiac death, and enable development of new treatments for susceptible individuals. More broadly, our work demonstrates the power of integrating epigenomics with existing GWAS to discover sub-threshold genetic loci and novel genes associated with complex human disease.

Results

QT/QRS-associated variants are enriched in cardiac enhancers

We compiled a list of 112 QT/QRS loci from the NHGRI GWAS database (accessed July 2013, Supplementary file 1) and identified SNPs in strong linkage disequilibrium (r2>0.8) using genotype data from the 1000 Genomes Project (Phase 1, CEU population) (1000 Genomes Project Consortium, 2010). We also collected GWAS loci from a later meta-analysis of QT interval studies, published in June 2014 by Arking et al., which we held out from the aforementioned 112 QT/QRS loci as a validation dataset for subsequent analyses (Arking et al., 2014). Because only 22 of 112 loci (20%) harbor SNPs that overlap exons, we examined whether QT/QRS variants are enriched in predicted enhancer elements across the genome using chromatin maps across 127 tissues generated by the Roadmap Epigenomics Project including adult left ventricle (LV), adult right ventricle (RV), fetal heart (FH) and adult right atrium (RA) (Roadmap Epigenomics Consortium, 2015). QT/QRS variants have greatest overlap with predicted enhancers (as defined by high levels of H3K4me1 and low H3K4me3 using ChromHMM) from the four cardiac tissues compared to the other 123 non-cardiac tissues (red circles, Figure 1b, Supplementary file 1) (Ernst et al., 2011). This enrichment persists over randomly sampled sets of control loci with matched genetic properties such as minor allele frequency, number of SNPs in LD, distance to nearest gene, number of nearby genes, and presence on an Affymetrix 660W genotyping array (Figure 1a, Materials and methods). Enhancers from the LV showed the strongest enrichment of the four cardiac tissues (z-scores=7.67, empirical p<1x10−5, 105 permutations), demonstrating that an unbiased analysis can resolve the causal tissue with high precision, as QT interval and QRS duration are primarily reflective of myocardial repolarization in the ventricles.

Figure 1 with 3 supplements see all
GWAS repolarization loci preferentially overlap cardiac enhancers.

(a) Enrichment of human left ventricle enhancers in 112 QT/QRS loci. The number of loci that contain a SNP overlapping an enhancer are computed for the 112 QT/QRS loci, and compared against 100,000 permutations of randomly sampled control loci matched for LD block size (number of SNPs), MAF, distance to nearest gene, number of nearby genes, and presence on genotyping array. (b) Top, Enrichment of enhancers from 127 human tissues in QT/QRS loci. Bottom, Enrichment of enhancers from non-cardiac tissues for QT/QRS loci is substantially weaker following removal of enhancers active in any of the four cardiac tissues. (c) Top, QT/QRS SNPs are more likely to disrupt motifs corresponding to expressed TFs compared to 100,000 sets of matched control loci. Bottom, Weaker enrichment was observed between repolarization and matched control loci when the sequence of the TF motif was randomly shuffled and re-mapped to the genome (10,000 permutations).

https://doi.org/10.7554/eLife.10557.003

Focusing on the left ventricle, we analyzed the enrichment of both coding annotations using GENCODE and non-coding annotations using individual chromatin marks and chromatin states defined by ChromHMM as well as DNase I hypersensitivity (DHS) maps available in heart tissue (Ernst et al., 2011; Harrow et al., 2012; Thurman et al., 2012). We observed that intergenic enhancers are the most strongly enriched annotated genomic region (z-score > 7.5) in QT/QRS loci, followed by gene transcription regions (z-score between 3 and 6) (Figure 1—figure supplements 1 and 2). This enrichment increased significantly (z-score from 7.67 to 9.31 for left ventricle) when restricting the analysis to 'strong' enhancers (i.e. H3K4me1 enhancers that are also marked by H3K27ac). Our results indicate that predicted enhancers are highly informative for annotating trait-associated variants compared to other classes of genomic regions.

Enhancers overlapping QT/QRS loci differ in functional characteristics from all enhancers.

Several functional characteristics were compared between enhancers overlapping QT/QRS loci (red) and non-GWAS left ventricle enhancers (blue). Fold change represents fold change between median values for the two groups, and p-values were calculated using the Mann-Whitney U test. See Materials and methods for comparison methodology between GWAS QT/QRS enhancers and non-GWAS enhancers for each functional or epigenomic feature. For primate conservation, LV enhancers (blue) were size-matched (+/-1 kb) to GWAS enhancers to control for skewed enrichments driven by larger GWAS enhancer size.

https://doi.org/10.7554/eLife.10557.007

We next asked whether LV enhancers that overlap QT/QRS loci have features that distinguish them from putative LV enhancers identified by ChromHMM that do not overlap QT/QRS loci (Figure 2). First, we considered the density of H3K27ac marks, as the co-enrichment of H3K4me1 and H3K27ac correlates with strong enhancer activity (Creyghton et al., 2010; Rada-Iglesias et al., 2011). We found that the 65 enhancers overlapping 45 QT/QRS loci have a 3.1-fold higher H3K27ac density compared to non-GWAS LV enhancers (p=1.53x10--4). In fact, incorporating H3K27ac into ChromHMM enhancer predictions resulted in substantially greater enrichment of QT/QRS loci (z-score = 9.31 vs. 7.67 for left ventricle); 44 of the 45 QT/QRS loci overlap an H3K27ac-defined 'strong' enhancer. QT/QRS LV enhancers are also more likely to be marked by either H3K4me1 or H3K27ac in at least one of the other three heart tissues (fetal, right atrium, right ventricle) compared to non-GWAS LV enhancers (p-values between 0.004 and 0.04, Figure 2) and less likely to be active in non-cardiac tissues (p=9x10-3, Figure 2).

Left ventricular QT/QRS enhancers are significantly more hypomethylated than predicted LV enhancers not overlapping QT/QRS loci (hypomethylation p=1.07x10-6, hypermethylation p=0.60, Figure 2). Similar to H3K27ac, CpG hypomethylation correlates with increased enhancer activity, possibly through modulation of TF binding site accessibility (Hon et al., 2013; Stadler et al., 2011). Consistent with this idea, 22 of the 45 GWAS loci contain an enhancer SNP that alters a predicted motif for a cardiac-expressed TF (empirical p=0.002, 105 permutations) (Figure 1c). Moreover, QT/QRS GWAS enhancers are enriched for DHS and Cap Analysis Gene Expression (CAGE) signals in human fetal heart, both of which are marks of greater enhancer activity (Figure 2) (Thurman et al., 2012; Andersson et al., 2014). Finally, QT/QRS left ventricular enhancers show significant evolutionary conservation across the primate lineage compared to non-GWAS LV enhancers (p=6.82x10-5 compared to 105 size-matched sets of LV enhancers), suggesting that perturbation of these enhancers is under stronger negative selection. Taken together, QT/QRS loci preferentially overlap conserved enhancers that show cardiac-restricted activity, suggesting that common variants associated with these loci play roles in regulating cardiac functions that drive human phenotypes.

Common features in GWAS cardiac enhancers identify novel sub-threshold loci

Current GWAS loci collectively explain only a small fraction of the estimated heritability of a complex trait in part due to strict Bonferroni thresholds for multiple hypothesis testing (p<5x10-8) and the limited statistical power of existing studies to discover variants with modest effect sizes (Maher, 2008; Yang et al., 2011). We hypothesized that knowledge of the genomic properties associated with existing GWAS loci can guide the search for additional genetic signals that cannot be detected without increasing GWAS cohort sizes, and that these loci with weaker 'sub-threshold' p-values (i.e. 0.05>p>5x10-8) might reveal novel genes and biological pathways that contribute to complex disease. To test this idea, we used SNP summary statistics from the Arking et al. (2014) QT interval GWAS study we had earlier held out as a validation dataset (Arking et al., 2014). These summary statistics include the 112 QT/QRS loci identified by prior GWASs (red dots, bottom, Figure 3), as well as loci that reach genome-wide significance in the larger meta-analysis cohort but were not discovered in any previous GWAS (and therefore were not included in the 112 QT/QRS loci used for enrichment analyses above, gold dots, bottom, Figure 3). We observed that active LV enhancers are strongly enriched for loci harboring SNPs with p-values between 1x10-4 and 5x10-8 (Figure 3a, black line). Furthermore, the combination of functional features identified for above-threshold QT/QRS enhancers (Figure 2) substantially improves sub-threshold locus enrichment across a wide range of p-value thresholds (Figure 3a, colored lines, Figure 3—figure supplement 1).

Figure 3 with 4 supplements see all
Cardiac enhancers harbor additional sub-threshold QT loci.

(a) Top, Enhancer characteristics learned on above-threshold QT/QRS loci from Figure 2 are predictive for additional sub-threshold loci (colored lines). Each point on a curve represents the fold difference in proportion of SNPs with p-value below the cutoff in the enhancer set versus the whole genome. Bottom, Manhattan plot of p-values for all SNPs from Arking et al. (2014) QT interval GWAS. 112 QT/QRS loci and all SNPs within 1 Mb are highlighted in red. Genome-wide significant loci newly discovered by Arking et al. and not in the 112 QT/QRS loci are highlighted in yellow. (b) Top, Enrichment signals for sub-threshold SNPs in left ventricle enhancers persists following removal of the 112 GWAS loci and nearby SNPs (+/- 1 Mb). Bottom, Manhattan plot of p-values for all SNPs from Arking et al. (2014) QT interval GWAS following removal of 112 QT/QRS loci and all SNPs within 1 Mb. Genome-wide significant loci newly discovered by Arking et al. and not in the 112 QT/QRS loci are highlighted in yellow.

https://doi.org/10.7554/eLife.10557.008

Whether the enrichment of SNPs in the sub-threshold significance range represents linkage disequilibrium with existing above-threshold GWAS SNPs or novel biologically relevant loci remains an unresolved question (Maurano et al., 2012). In fact, an enrichment analysis using only SNPs nearby above-threshold GWAS loci produced a strong enrichment signature in the sub-threshold significance range (Figure 3—figure supplement 2). To distinguish between the two possibilities, we took a conservative approach and removed all SNPs within 1Mb of the initial 112 QT/QRS loci. Remarkably, the enrichment for LV enhancers persists and increases in the sub-threshold range (i.e. p=1x10-4 to 5x10-8, Figure 3b), likely due to removal of nominally significant SNPs that are in LD with above-threshold QT/QRS loci and do not represent true association signals. The enrichment for active LV enhancers in sub-threshold loci is not driven by biases in MAF, LD block size, distance to nearest gene, number of nearby genes, or presence on a SNP genotyping array (Figure 3—figure supplement 3). In total, we identified 2075 SNPs with p<1x10-4 that are independent of the 112 published QT/QRS loci, of which 208 SNPs overlap LV enhancers (Supplementary file 2).

Epigenomic prioritization identifies sub-threshold loci with molecular functions

Because the enrichment of sub-threshold SNPs in cardiac enhancers suggests that epigenetic prioritization can be used as a starting point for more in-depth investigations of sub-threshold signals from GWAS, we sought to directly test the molecular hypothesis that these sub-threshold loci impact the transcriptional regulation of cardiac genes (Figure 4a). We grouped all 2075 sub-threshold SNPs using linkage disequilibrium data (minimum r2=0.2) to identify 287 independent sub-threshold loci in the genome (Materials and methods). We prioritized loci where a sub-threshold SNP overlapped an active LV enhancer and either (i) also overlapped a fetal heart DNase I hypersensitivity peak or (ii) was an expression quantitative trait locus (eQTL) for a nearby gene. In total, we cloned allele-specific enhancer fragments from 22 cardiac enhancers that overlap SNPs from 18 independent sub-threshold loci, and performed quantitative luciferase assays in human iPSC-derived cardiomyocytes to determine whether the sub-threshold SNP genotypes influence enhancer activity (Materials and methods). We observed that 13 of 18 sub-threshold loci (72.2%) contain an enhancer that drives luciferase activity in an allele-specific manner (Figure 4b,d, Figure 4—figure supplement 1). Moreover, we estimate that between 51.1%-89.8% (95% Bayesian confidence interval) of prioritized sub-threshold loci show allele-specific activity on transcription, suggesting that the majority of sub-threshold loci identified by epigenomic prioritization do in fact have an impact on transcriptional enhancer activity.

Figure 4 with 3 supplements see all
Sub-threshold loci prioritized by epigenomics alter enhancer activity.

(a) Model detailing how sub-threshold SNPs overlapping enhancers can affect QT interval. Green text: methods used to test mechanistic step in model. (b) Summary of luciferase enhancer reporter experiments. Left, luciferase enhancer reporter construct. Right, number of loci tested in panel d that exhibits significant allelic activity (p<0.05 between two haplotypes). (c), Left, schematic of a 3-D enhancer-promoter chromatin interaction detectable by 4C-seq. Right, number of loci tested in panel d where an enhancer-promoter interaction is observed in human iPS-derived cardiomyocytes by 4C-seq. (d) Experimental evidence that sub-threshold SNPs alter enhancer activity and that sub-threshold enhancers interact with gene promoters. Fold below GWS column represents degree to which sub-threshold locus is below genome-wide significance (5x10-8); Luciferase reporter column colored green if significant allelic difference in activity (p<0.05, Figure 4—figure supplement 1); Enhancer-promoter interactions column colored green if there is a detectable enhancer-promoter interaction by 4C-seq (Figure 4—figure supplement 3).

https://doi.org/10.7554/eLife.10557.013

We also performed chromosome conformation capture combined with high-throughput sequencing (4C-seq) to experimentally test whether predicted enhancers in sub-threshold loci can form contacts with promoters, and to identify potential target genes of sub-threshold enhancers. We used 4C-seq to test ten predicted enhancers from eight sub-threshold loci in human iPSC-derived cardiomyocytes (van de Werken et al., 2012). Eight enhancers in six loci formed enhancer-promoter interactions in the proximal 500 kb region (Figure 4c,  Figure 4 — figure supplement 3). These analyses provides evidence that the novel QT loci enhancers have regulatory activity and that the sub-threshold SNPs identified in our analyses can alter the activity of cardiac enhancers.

Epigenomic prioritization discriminates biologically relevant sub-threshold loci

We next tested whether epigenomic prioritization can distinguish biologically relevant sub-threshold loci by comparing properties of sub-threshold loci that do or do not overlap cardiac enhancers. From the 287 independent sub-threshold loci in the genome, we selected two subsets to compare: 60 loci that contain sub-threshold SNPs directly overlapping predicted active LV enhancers, and as a negative control, 129 sub-threshold loci that do not contain any SNPs (r2>0.2) overlapping a cardiac enhancer.

Evidence from genome-wide association studies

We reasoned that if sub-threshold loci that overlap active cardiac enhancers represent true biological signals, they should have stronger GWAS association signals than the negative control set. We present multiple lines of evidence supporting this hypothesis below (Figure 5a):

  1. The 60 enhancer-overlapping sub-threshold loci have significantly stronger p-values than the 129 negative control loci, despite the application of the same p=1x10-4 threshold for both sets (p=1.95x10-5, left, Figure 5a).

  2. 9 of the 60 enhancer-overlapping sub-threshold loci are among the loci that reach genome-wide significance in the larger held out meta-analysis cohort (and not included in the 112 QT/QRS loci used for enrichment analyses in Figure 1 and 2), compared to only 3 of the 129 sub-threshold loci that do not overlap enhancers (6.45-fold enrichment, p=1.92x10-3, middle, a).

  3. The 60 enhancer-overlapping sub-threshold loci are more likely to reach nominal significance (p<0.05) in a related GWAS study of QRS duration (see Materials and methods for individuals shared between both studies)(van der Harst, unpublished). In the QRS duration GWAS, p-values are available for 56 of 60 enhancer-overlapping sub-threshold QT loci and 110 of 129 negative control sub-threshold loci. 31 of 56 (55.4%) enhancer-overlapping sub-threshold loci are nominally significant in the QRS GWAS, a rate 2.9-fold higher than the 129 negative control loci (21 of 110 loci, p=3.28x10-6, right, Figure 5a), suggesting that epigenetic prioritization is more likely to identify sub-threshold SNPs that replicate in subsequent GWASs.

These analyses demonstrate that genome-wide maps of predicted enhancers can facilitate the detection of true sub-threshold loci.

Evidence from organismal phenotypes

Our identification of a high-confidence set of sub-threshold loci based on epigenomic signals provides a unique opportunity to discover new genes that contribute to cardiac electrophysiological traits. As enhancers can regulate genes up to 1Mb away, it is difficult to identify targets using a simple nearest gene approach (Fullwood et al., 2009). To circumvent this limitation, we developed a computational enhancer-gene linking method that prioritizes gene targets based on correlated activity patterns between enhancer-gene pairs across 59 human tissues (Materials and methods). Using this approach, we identified 106 candidate genes predicted to be regulated by the 60 enhancer-overlapping sub-threshold loci (Supplementary file 2). Notably, 11 of the 15 observed 4C-seq interactions were predicted by our computational approach, compared to 3 of 15 by the commonly applied approach of assigning the enhancer target to the nearest gene.

We used the output of the enhancer-gene linking method to test whether these candidate genes have roles in QT interval. To this end, we studied mouse phenotypes for directed knockouts and genetic perturbations of the 106 predicted gene targets of sub-threshold enhancers. We identified 49 of the 106 genes where mouse mutant models were available with documented phenotypes (Bello et al., 2015). Genetic perturbation in 11 of the 49 genes resulted in altered cardiac conduction or cardiac contractility: both processes that are also influenced by genes nearby above-threshold QT interval loci and genes implicated in the Mendelian Long QT syndrome. This represents a 4.11-fold enrichment compared to genes linked to all active LV enhancers (181 of 3311, p=6.84x105, black bar, Figure 5b). In contrast, phenotypes arising from genetic perturbation of LV-expressed genes nearby the 129 negative control sub-threshold loci outside enhancers are 7.30-fold less likely to result in altered cardiac conduction or contractility compared to our 60 prioritized sub-threshold loci (p=1.92x10-3, perturbation of 2 of 65 genes nearby the negative control subset have relevant cardiac phenotypes, light grey bar, Figure 5b).

The study of biologically relevant sub-threshold loci has been hampered by a high false positive rate that makes the detailed investigation of any sub-threshold locus experimentally more difficult and less attractive than above-threshold loci. The data presented here provide multiple independent lines of evidence that epigenomic signatures can be used to prioritize sub-threshold GWAS loci with a significantly greater likelihood of being biologically relevant.

Sub-threshold locus at rs1743292/rs1772203 functionally disrupts enhancer activity

Only a very small number of above-threshold GWAS loci, including SORT1 for LDL cholesterol, the FTO/IRX3 locus for obesity, and the SCN5A/SCN10A locus for QRS duration, have been investigated in detail (Musunuru et al., 2010; van den Boogaard et al., 2012; van den Boogaard et al., 2014; Arnolds et al., 2012; Smemo et al., 2014). These studies all identified SNPs within non-coding regulatory elements that disrupt expression of a nearby gene that plays a critical role in controlling a human phenotype. In contrast, no sub-threshold locus has been experimentally studied or validated to date. We selected one locus on chromosome 6 where our results from Figures 4 and 5 suggest that sub-threshold SNPs disrupt enhancer activity and therefore expression of a gene involved in cardiac electrophysiology. We set out to investigate whether this locus can serve as an example for future investigations of other sub-threshold loci.

Epigenomic prioritization distinguishes biologically relevant sub-threshold loci

(a) Left, Sub-threshold loci overlapping enhancers have significantly stronger association signals than loci outside enhancers in the QT interval GWAS. Middle, Loci overlapping enhancers have significantly more likely to be newly genome-wide significant in the held-out QT interval GWAS, than loci outside enhancers. Right, Sub-threshold loci overlapping enhancers are significantly more likely to be nominally significant (p<0.05) in QRS GWAS than sub-threshold loci not overlapping enhancers. (b) Genetic perturbation of genes with predicted links to 60 enhancer-overlapping sub-threshold loci are significantly more likely to result cardiac conduction or contractility phenotypes than genes linked to all LV enhancers and genes nearby non-enhancer overlapping sub-threshold loci.

https://doi.org/10.7554/eLife.10557.017

The sub-threshold locus on chromosome 6 contains 8 SNPs with reported p-values less than 1x10-4 and another 2 SNPs in LD that do not have calculated p-values. We focused on the 3 SNPs in this locus that overlap active LV enhancers: rs1743292 (p=6.48x10-5) and rs112332323 (p-value not available) that both overlap a 3.6 kb predicted enhancer, and rs1772203 (p=5.87x10-5) that overlaps a 2.8 kb predicted enhancer (Figure 6a,b). We cloned fragments corresponding to both enhancers upstream of a minimal promoter driving the luciferase gene, and compared luciferase activity between constructs carrying either the major or minor haplotypes at each site (rs1743292 enhancer: Figure 6c, rs1772203 enhancer: Figure 6e, SNPs differing between cloned constructs listed at bottom of Figure 6b). We observed that the activity of both enhancers is dependent on the sub-threshold haplotype: at the rs1743292 enhancer, the major haplotype has 45% greater activity (p=2.99x10-12), while the minor haplotype is 28% more active in the rs1772203 enhancer (p=1.79x10-5).

In the fetal human heart, rs1743292 overlaps a strong DNase I hypersensitivity peak marking a local region of open chromatin signifying potential transcription factor binding (DHS track, Figure 6b) (Neph et al., 2012). Thus, to provide evidence that the rs1743292 locus alters enhancer activity in humans, we re-aligned the DHS sequencing reads from heterozygous human individuals in an allele-specific manner to assess the difference in the number of reads that map to either allele (Maurano et al., 2012). In fetal heart tissue from one individual sequenced to high depth, rs1743292 shows a significant allelic imbalance for DHS reads with 97 reads mapping to the major C allele and 300 reads mapping to the minor T allele (left, Figure 6d, p=3.1x10-25, binomial test). This trend is consistent in all five additional human individuals heterozygous at rs1743292 sequenced at lower depth (right, Figure 6d), suggesting that rs1743292 can affect enhancer activity potentially through altering chromatin accessibility or transcription factor binding. Moreover, using motif analysis, we observed that rs1743292 alters a predicted binding site for the cardiac-expressed nuclear factor NF-I family (Figure 6f), which contains a family member (NF-1a) that itself has been associated by GWAS with cardiac electrophysiology (Ritchie et al., 2013).

We used 4C-seq to identify genes that could be regulated by the rs1743292 or rs1772203 enhancers. We observed that both enhancers form interactions with promoters of the upstream popeye-domain containing (POPDC) family members BVES/POPDC1 and POPDC3, and with predicted enhancers situated within introns of the downstream PREP gene (Figure 6a,g). This suggests that both enhancers may contribute to regulating the gene expression of BVES and POPDC3, of note because the POPDC protein family of transmembrane proteins has recently reported roles in cardiac pacemaking (Froese et al., 2012; Kirchmaier et al., 2012).

We sought to investigate the roles of the three candidate target genes (BVES, POPDC3, PREP) of the rs1743292/rs1772203 locus in regulating myocardial repolarization. Consistent with the genetic association between this locus and QT interval length, we found that mice homozygous for loss-of-function copies of BVES exhibit cardiac conduction and pacemaker defects (Figure 6h) (Bello et al., 2015; Froese et al., 2012). In contrast, POPDC3 and PREP mouse loss-of-function models have no reported cardiac abnormalities, and instead show altered body fat, suggesting that this genetic locus alters QT interval length through the BVES gene (Bello et al., 2015).

Strengthening our evidence implicating BVES in QT interval, we observed that across 59 human tissues, BVES is most highly expressed in human left ventricle, whereas POPDC3 has much lower expression in cardiac tissue than skeletal muscle, and PREP is constitutively expressed across a wide range of tissues (Figure 6—figure supplement 1). We also used antisense morpholino oligonucleotides to knockdown transcripts from the BVES, POPDC3 and PREP orthologs in zebrafish, observing that bves knockdown leads to a reproducible shortening of the zebrafish ventricular action potential duration (APD), the cellular correlate of the QT interval, (p=0.002 and 0.09 for two independent morpholino sequences), whereas there is no reproducible difference in ventricular APD following loss of popdc3 or prep transcripts (Figure 6—figure supplement 2). Collectively, these data from multiple organisms provide evidence that SNPs within the rs1743292/rs1772203 locus alter QT interval duration through disruption of BVES expression.

Figure 6 with 2 supplements see all
The rs1743292/rs1772203 sub-threshold locus disrupts activity of cardiac enhancers that interact with BVES, a gene important for cardiac electrophysiology.

(a) Overview of rs1743292/rs1772203 sub-threshold locus. Gold rectangles represent predicted active LV enhancers, blue and green lines represent enhancer promoter interactions from the rs1743292 and rs1772203 enhancers, respectively (see panel g). (b) Detailed view of cardiac enhancers overlapping rs1743292 (left) and rs1772203 (right). (c) rs1743292 haplotypes differing at 6 SNPs (listed at bottom of panel b) affect activity of cardiac enhancer in human iPSC-derived cardiomyocytes, n=24 per haplotype. Error bars represent standard error of the mean. (d), Left, rs1743292 alters level of DNase I hypersensitivity in a heterozygous human fetal heart sample. Right, Allelic imbalance of DHS reads at rs1743292 observed for 5 of 5 human individuals. (e) rs1772203 allele affects activity of cardiac enhancer in human iPSC-derived cardiomyocytes, n=16 per allele. Error bars represent standard error of the mean. (f) rs1743292 SNP overlaps a predicted nuclear factor I (NF-I) motif. (g) 4C-seq analysis of the rs1743292 (blue) and rs1772203 (green) enhancers identifies enhancer-promoter interactions with nearby BVES, BVES-AS1 and POPDC3 genes, and additional enhancer-enhancer interactions within introns in PREP. (h) Genetic perturbation of Bves, but not Popdc3 or Prep leads to cardiac electrophysiological defects in mouse models.

https://doi.org/10.7554/eLife.10557.018

These results provide evidence that cardiac enhancers can be used to identify novel sub-threshold loci and genes associated with cardiac traits. As demonstrated with the luciferase enhancer reporter assays, and specifically the rs1743292/rs1772203 locus, sub-threshold loci harbor SNPs that affect enhancer activity and regulate genes involved in QT interval. In the current QT interval GWAS, rs1743292 had an effective sample size of 68,900 individuals with 12.76% power to detect the locus at genome-wide significance. To detect rs1743292 at genome-wide significance with 80% power would require a GWAS cohort of 146,700 individuals. Thus, our study demonstrates that genome-wide enhancer maps are a powerful tool for identifying sub-threshold loci with bona fide roles in human cardiovascular physiology that would have remained otherwise unrecognized from existing GWAS cohorts.

Discussion

A major limitation in the human genetics field is the inability to ascribe function to the vast majority of non-coding SNPs associated with complex human traits. Using enhancer annotations from hundreds of cell types and tissues, we find ~50% of QT/QRS GWAS loci overlap enhancers, and that these enhancers share common characteristics, including H3K27ac marks, CpG hypomethylation, and greater evolutionary conservation. The high density of common variation we observed in non-coding enhancers may be due to weaker evolutionary selection against the subtle phenotypes that arise from disruption of transcriptional regulatory units compared to the more severe disruption of protein-coding sequences commonly observed in rare Mendelian diseases.

Studies of genetic heritability have indicated that many additional loci lie below the genome-wide significance threshold (Yang et al., 2011). Our study contributes fundamental insights to overcoming the difficult problem of discovering the biologically relevant sub-threshold genetic signals that are orders of magnitude weaker than discovered by traditional GWAS. Three prior studies have observed the general enrichment of either sub-threshold SNPs or SNPs that explain a disproportionately high amount of heritability in cell type-specific regulatory elements (Maurano et al., 2012; Finucane et al., 2015; Gusev et al., 2014). However, our study is unique in demonstrating the advantage of combining different epigenomic features to produce greater enrichments of sub-threshold loci. Critically, no previous study to our knowledge has implicated any specific sub-threshold locus in any complex human trait, whereas we establish that 13 of the 18 sub-threshold loci tested in this study are capable of altering enhancer activity. We also leverage GWAS summary statistics and genetic perturbations in mouse to demonstrate that epigenetic marks can discriminate true positive sub-threshold signals from noise, a key problem that, until now, has prevented the study of these loci. Finally, we perform an in-depth molecular dissection of the rs1743292/rs1772203 sub-threshold locus and implicate the popeye-domain containing family of transmembrane proteins in regulating myocardial repolarization. The study of above-threshold GWAS loci is generating more biological insights on new causal genes contributing to human disease, however there remains a wealth of untapped signals in the sub-threshold region. The work presented here represents a first step towards deciphering this signal and opens the door for the discovery of greater numbers of disease loci, genes, and pathways.

Our study focused on QT interval and QRS duration due to their clear tissue of origin and a wealth of existing GWAS data, however we believe our approach could generalize to any well-powered GWAS on any trait. To this end, we chose two recently published, well-powered GWASs that relate to human diseases affecting large segments of the population: LDL cholesterol levels and Alzheimer’s disease. For both traits, we observed the enrichment of SNPs well into the sub-threshold significance range, that the enrichment signature persists following removal of all above-threshold loci, and that functional features that improve enrichment of QT-associated sub-threshold loci are also effective when applied to sub-threshold loci associated with LDL cholesterol and Alzheimer’s disease (Figure 3—figure supplement 4). These results suggest that epigenomics can be applied more broadly to identify new loci with sub-threshold statistical significance from GWAS of many complex human diseases. One important future extension of this work would be to build a formal machine learning classifier that can be first trained on above-threshold GWAS loci before being applied to quantitatively rank sub-threshold loci by predicted biological relevance.

Finally, investigating the differences between above-threshold and sub-threshold loci to elucidate the factors that drive loci to different degrees of association with a trait will be an important area of future investigation. Many reported genome-wide significant loci have been discovered by GWAS despite low power, likely due to the existence of many other variants of similar effect that go undetected, termed the 'winner’s curse', and thus this difference could be driven in part by random chance. However, we also hypothesize that sub-threshold loci with weaker effect sizes may act in different pathways from loci with stronger effect sizes, and that sub-threshold variants could have weaker effects on gene expression.

In summary, our results provide a critical roadmap for the systematic analysis and re-analysis of genome-wide association studies to prioritize novel biologically relevant loci with weak association signals. As demonstrated with the rs1743292/rs1772203 locus, these loci would otherwise require substantially greater cohort sizes to reach statistical significance. Thus, we expect that this approach can be exploited to broadly improve the understanding of the biological pathways that contribute to complex human traits and disease.

Materials and methods

Identifying GWAS loci associated with cardiac traits

We compiled a list of all SNPs associated with electrocardiographic QT interval (reflecting myocardial repolarization) or QRS duration (reflecting cardiac conduction) from the NHGRI GWAS catalog of published GWAS (accessed on July 09, 2013), and removed loci identified from studies with small sample sizes (<5000 individuals). As the GWAS catalog reports SNPs with p<1x10-6, we performed a sensitivity analysis using only loci with p<5x10-8 to demonstrate that two different cut-offs does not meaningfully affect enrichment results for left ventricle (Figure 1—figure supplement 3). We used genotype data from the 1000 Genomes project to identify all SNPs in LD (r2>0.8, CEU population) with the lead SNPs. For cases where two lead SNPs were in LD with each other (i.e. different studies reported different SNPs from the same haplotype block), we merged the resulting loci. To avoid over-counting, if the sets of LD SNPs from two independent lead SNPs overlapped, we randomly assigned each of the shared LD SNPs to only one of the two lead SNPs.

RNA-seq data and enhancer annotations

Epigenome roadmap datasets.

Processed RNA-seq data for 59 human tissues and enhancer annotations (for 127 H3K4me1-defined and 88 'strong' H3K4me1/H3K27ac-defined enhancer sets) were downloaded from the Roadmap Epigenomics Project (Roadmap Epigenomics Consortium, 2015). Initial analyses across all 127 tissues were performed on cardiac enhancers defined by ChromHMM by the Roadmap Epigenomics Project using five chromatin modifications including H3K4me1 but not H3K27ac (15-state model). 'Strong' cardiac enhancers, available for a subset of 88 tissues, were defined by ChromHMM by the Roadmap Epigenomics Project using six chromatin modifications including both H3K4me1 and H3K27ac (18-state model).

Human differentiated cardiomyocyte RNA-seq dataset.

hESCs were differentiated to cardiomyocytes as previously described (Elliott et al., 2011) and were obtained from David Elliott at Monash University. RNA was extracted using TRIzol reagent according to the manufacturer's instructions. 10 µg RNA was used for library construction according to Illumina RNA-seq library kit with minor modifications. Briefly, mRNA was isolated using Dynabeads mRNA Purification Kit (Invitrogen, Catalog #61006) followed by fragmentation and ethanol precipitation. First and second strand synthesis were performed followed by end repair, A-tailing, paired end adaptor ligation and size selection on a Beckman Coulter SPRI TE nucleic acid extractor. 200-400 bp dsDNA was enriched by 15 cycles of PCR with Phusion High-Fidelity DNA Polymerase (NEB, Catalog #M0530) followed by gel purification of 250 bp fragments from the amplified material. Amplified libraries were sequenced on an Illumina GAIIx sequencer. Reads were mapped against the hg19 version of the human genome using RSEM v. 1.2.3 and bowtie v. 0.12.7 using flags "rsem-calculate-expression --phred64-quals -p 4 --output-genome-bam --calc-ci --paired-end --bowtie-chunkmbs 1024, without in-silico polyA addition to the transcripts.

Enrichment of genomic features in QT/QRS loci

We used genomic features annotated by combinations of histone modifications (e.g. enhancers and promoters using ChromHMM by the Roadmap Epigenomics Project) or by GENCODE (e.g. protein-coding exons). Previous studies have compared the number of GWAS SNPs overlapping a feature against the number expected for a randomly chosen region of similar size (Maurano et al., 2012; Hnisz et al., 2013). However, this approach does not control for biases associated with the location of GWAS SNPs. We controlled for these biases by following the Variant Set Enrichment approach where we generate a background distribution for genomic feature enrichment in loci around sets of 112 randomly sampled control lead SNPs (Cowper-Sal·lari et al., 2012). We chose control lead SNPs from a genome-wide genotyping array (Affymetrix 660W) matched for size of the LD block (+/- 5 SNPs), minor allele frequency of the lead SNP (+/- 0.1), distance to the nearest gene (+/- 25 kb if outside gene), and number of nearby genes within a +/- 500 kb interval (+/- 3 genes). We also considered differences in local GC content (+/-25 nt) but did not observe a strong difference between GWAS and control lead SNPs (p=0.06). To calculate enrichment of genomic regions in GWAS loci, we compared the number of GWAS loci that overlapped an enhancer to 100,000 sets of equally sized randomly sampled control lead SNPs. The 112 GWAS SNPs compiled from the NHGRI GWAS catalog includes 57 loci with p-values between 1x10-6 and 5x10-8 that have a higher false positive rate. In a sensitivity analysis, we examined the subset of 55 loci that met the more stringent 5x10-8 statistical threshold and found that sets of cardiac enhancers (specifically fetal heart and adult left ventricle) were also most highly enriched in these loci compared to the 123 non-cardiac tissues (Figure 1—figure supplement 3).

Comparing differences between QT/QRS-associated LV enhancers and all LV enhancers

H3K27ac, DNase I Hypersensitivity and CAGE-seq read enrichment

To score the presence of epigenomic marks in enhancers, we averaged the wig signal tracks over every enhancer with the UCSC bigWigAverageOverBed tool. Fold difference in signals between QT/QRS enhancers and all LV enhancers were calculated by comparing the median signal values of the two groups. P-values were calculated using the Mann-Whitney U test. Activity in other cardiac and non-cardiac tissues: Overlap with enhancers in other tissues was calculated using the intersectBed function in the BEDTools suite (Quinlan and Hall, 2010). CpG hypomethylation and hypermethylation: Whole-genome bisulfite sequencing data for 37 human tissues, including the left ventricle, was obtained from the Roadmap Epigenomics Project (Roadmap Epigenomics Consortium, 2015). We identified LV-specific hypo and hypermethylated CpGs as those that differed in percent methylation with the mean of 34 non-cardiac tissues by both (i) 2 standard deviations and (ii) at least a difference in absolute percent methylation of 15 percent. Evolutionary Conservation: We calculated evolutionary conservation of enhancers using the methodology outlined by Nord et al. (2013)(Nord et al., 2013). Briefly, we first identified the 100 bp region of each enhancer with greatest average evolutionary conservation across primates (primate subset of 46-way phyloP conservation track obtained from UCSC). To quantify differences in evolutionary conservation of GWAS enhancers against all LV enhancers, we randomly selected 1000 size-matched sets of LV enhancers (size within +/-1 kb of corresponding QT/QRS enhancer), as the 100 bp segment of greatest conservation in longer enhancers is statistically more likely to have greater conservation than a shorter segment.

Comparing differences in TF motif disruption

We obtained TF motif instances in the human genome (hg19) for 651 human motifs from the ENCODE project (ENCODE Project Consortium, 2012), and filtered these to only consider 287 motifs that correspond to TFs expressed in the left ventricle (>1 RPKM by RNA-seq). We quantified the number of QT/QRS loci containing a SNP that disrupted an enhancer motif corresponding to an expressed TF in the left ventricle, and compared this against randomly sampled sets of control loci matched for MAF, LD block size, distance to the nearest gene and presence on the Affymetrix 660 W genotyping array.

Enrichment of QT SNPs below genome-wide significance in enhancers

Enrichment analysis

We used a sliding -log(p-value) threshold from 0 to 10 with steps of 0.1. At each cut-off, we computed the proportion of SNPs in enhancers with p-values more significant than the cut-off (foreground) against the proportion of SNPs in the whole genome. Grouping SNPs in LD. For each pair of SNPs, if the two SNPs are in LD (r2>0.2, CEU population from 1000 Genomes project) we remove the SNP with the weaker p-value.

Enrichment of LDL cholesterol and Alzheimer’s disease-associated sub-threshold loci in enhancers

Summary GWAS data for LDL cholesterol was obtained from Willer et al. (2013), and summary GWAS data for Alzheimer’s disease (AD) was obtained from Lambert et al. (2013)(Lambert et al., 2013). Enrichment analyses were performed as described above for QT interval. For enrichment of Alzheimer’s disease-associated SNPs, the region encompassing the HLA locus was excluded (chr6:24,182,924–34,537,546 in hg19), as this region contained approximately 25% of all low p-value SNPs (p<1x10-5) in the genome therefore and could skew enrichment results.

The liver tissue was chosen for LDL cholesterol enrichment based on biological relevance. Tissue choice for AD SNPs was made using genome-wide enrichment analyses performed by Gjoneska et al. (2015). For this analysis, we chose the second-most enriched tissue from Gjoneska et al. (peripheral blood monocytes, with most significant p-value) instead of the most enriched tissue (peripheral blood mononuclear cells, PBMCs, with second-most significant p-value) because the enrichment of AD SNPs in PBMC enhancers was substantially weaker than peripheral blood monocytes following removal of SNPs within the HLA locus. For AD GWAS, removal of SNPs within +/- 1 Mb of above-threshold loci was performed using 13 above-threshold loci with p<5x10-8 (Stage 1 analysis) listed in Table 2 of Lambert et al. For LDL cholesterol analyses, we first attempted to remove all SNPs within +/- 1 Mb of above-threshold loci reported in Supplementary files 2 & 3 of Willer et al., however many SNPs with p<5x10-8 remained. Therefore, we performed LD pruning (r2>0.2 from CEU population) on summary-level p-value data from Willer et al. to define above-threshold loci and then removed 68 unique genomic intervals from the analysis. Enhancer functional characteristics applied to the enhancer sets were chosen based on the availability of additional data for the chosen tissue. DNase I hypersensitivity data not available for human liver, and genome-wide CpG methylation data was not available for peripheral blood monocytes.

Comparison of QT sub-threshold loci in QRS GWAS data

To assess whether QT sub-threshold loci overlapping enhancers are more likely to represent true biological signals, we queried the p-values of these loci in a related GWAS of QRS duration. In total, the QT GWAS we used to identify the sub-threshold loci consisted of 76,061 individuals, while the QRS GWAS queried consisted of 60,255 individuals. We compared the total sizes of each cohort used in the two studies and calculated that a minimum total of 46,452 individuals must be different between the two studies. Specifically, there are at least 31,129 individuals present in QT GWAS that are not present in the QRS GWAS, and at least 15,323 individuals present in the QRS GWAS that are not present in the QT GWAS.

We used summary-level p-value data from the QRS GWAS testing four clinically applied QRS traits: Sokolow-Lyon, Cornell, 12-lead-voltage duration products, and QRS duration.

For each SNP, the assigned p-value represented the minimum p-value across these four traits. For each sub-threshold locus, we identified all SNPs in strong LD (r2>0.8, CEU population from 1000 Genomes project), and assigned the p-value as the minimum of all p-values for LD SNPs in the QRS GWAS data.

Identifying candidate genes near sub-threshold loci using activity correlation across human tissues

From the Roadmap Epigenomics Project, we were able to obtain matching 'strong' enhancer annotations and RNA-seq data for 59 of the 127 tissues, including LV. For each LV enhancer, we considered all genes with expression ≥1 RPKM in LV and in vitro differentiated human cardiomyocytes and distance within +/-500 kb as potential targets. We then split the RNA-seq data for the 59 tissues into two groups, depending on whether the enhancer is present or absent in each tissue, and applied a one-sided Mann-Whitney U test to ask whether each potential target gene showed significantly greater expression in tissues where the enhancer was active. Genes differentially expressed between tissues with active and inactive enhancers (p<0.05) were considered computationally-determined potential target genes. For determining targets of sub-threshold enhancers, we first filtered our set of sub-threshold enhancers to remove those unlikely to be associated with QT interval. To do this, we excluded sub-threshold SNPs if the -log(p-value) was lower than 80% of the -log(p-value) of the most statistically significant SNP in LD (r2>0.2), as these are unlikely to be causal.

Cardiac phenotypes for genes with mutations in mouse

For sub-threshold loci overlapping enhancers, and the set of all active LV enhancers, we identified nearby genes using the enhancer-gene linking method described above. This methodology was not applicable to the 129 sub-threshold loci that do not overlap enhancers, and therefore we identified the two nearest genes within 1 Mb using GREAT v2.0.2 and selected only genes with expression in adult human left ventricle data (>1 RPKM). Mouse orthologs of human genes were identified using the Ensembl Genes 79 database through BioMart, and all queries of the MGI mouse phenotypes database were made between April 26, 2015 and May 6, 2015. We used three search terms relevant to QT interval: 'ventricle muscle contractility', 'cardiac contractility' and 'conduction' (excluding non-cardiac conduction terms).

Quantifying allelic imbalance at SNPs

We used DNase I hypersensitivity and digital genomic footprinting data from the ENCODE and Roadmap Epigenomics Projects because samples were sequenced to a greater depth than the chromatin modification ChIP-seq data, and there were data available from more individuals Roadmap Epigenomics Consortium, 2015. To quantify allelic imbalance, we mapped DHS/DGF reads to a version of the human genome (hg19) downloaded from the UCSC genome browser with all SNPs (dbSNP141) masked by ambiguous nucleotides (N’s) using Bowtie2 (v2.2.0, flags: -N 1, --sensitive, --end-to-end, --no-unal). As genotypes were not available, we considered a sample heterozygous at a particular SNP if reads from the hg19-defined reference and alternate alleles each mapped to 3 or more unique positions. Using this methodology, we observed the median difference in reads mapping to the reference versus alternate alleles to be 0. In total, reads mapped to the reference allele more often than alternate at 6537 of 13,3826 heterozygous SNPs, and vice versa at 5884 of 13,826 heterozygous SNPs, with equal numbers of reads mapping to both alleles at the remaining 1405 SNPs. To quantify statistical significance of allelic imbalance at SNPs, we followed Maurano et al. (2012) and considered only SNPs with more than 21 reads. We performed a binomial test under the null hypothesis where reads map to both alleles at equal frequency, followed by Benjamini-Hochberg multiple testing correction across all heterozygous enhancer-overlapping SNPs.

4C-Seq Methods

Human iPSC-derived cardiomyocytes (iCMs) (Cellular Dynamics, Catalog #CMC-100-010-001) were thawed according to manufacturer’s instructions and diluted to a final plating density of 0.2x106 cells per mL with plating medium (Cellular Dynamics, Catalog#CMM-100-110-001). After 7 days in culture, iCMs were homogenized using a douncer, cross-linked and further processed as 4C template using DpnII as the first restriction enzyme and Csp6I as the second enzyme following the procedure outlined in van de Werken et al. (2012). The median spacing between GATC fragments (recognized by DpnII) in the hg19 human genome is 264 nt. Sequencing of the 4C-Seq library was performed on an Illumina HiSeq 2000, and sequencing reads were aligned to a reduced genome consisting of sequences that flank DpnII restriction sites. Primer sequences used for sequencing the 4C-seq library are listed in Supplementary file 3. The human genome (hg19) was used as reference genome for mapping 4C sequence captures. Non-unique sequences that flank a restriction site were removed from the analysis.

To map 4C-seq reads to the genome, we first binned reads according to the reading primers used in each lane. We allow a single mismatch in the reading primer that overlaps the primary restriction cut site (DpnII). The binned sequences were mapped to an in silico library of potential fragment ends generated based on the restriction enzymes used for the 4C template preparation. We did not allow any mismatch in the fragment-end, and for analysis we focused on the unique fragends only (excluding repetitive fragment ends). As biases from sequencing yield or restriction cutting may be introduced by 4C-seq, we computed 4C-seq coverage in a genomic region by averaging mapped reads in running windows of 21 4C-seq fragment-ends. For peak-calling in a single 4C experiment, we perform explicit background modeling of the up- and downstream genomic regions independently. We assume that in a completely unstructured chromatin fiber the contact probability monotonically decreases as a function of the distance to the viewpoint. We model this by performing monotonic regression of the 4C signal as a function of the distance to the viewpoint. For this we use the R package isotone, which implements the monotonic regression (Mair et al., 2009). We then compare the observed 4C signal to the predicted value from the background model and call the extremes that reach a significance threshold as peaks. For a given threshold q and a distribution F of residuals from the background model, every observation greater than Q3(F)+q*IQR(F), where Q3 is the third quartile of F and IRF(F) the inter-quartile range, is considered significant.

Generating enhancer reporter constructs

Sub-threshold loci were considered candidates for testing by the luciferase reporter assay if the sub-threshold SNP overlapping the active LV enhancer either (i) overlaps a fetal heart DNase I hypersensitivity site, or (ii) is an eQTL in the left ventricle (i.e. the SNP genotype is associated with differential expression of a nearby gene). We generated allele-specific enhancer constructs using two strategies outlined below: (i) PCR from genotyped heterozygous individuals, or (ii) direct synthesis of enhancer fragments. (i) Enhancer cloning from heterozygous individuals: We designed primer sequences to clone the entire predicted enhancer sequence defined by ChromHMM, and appended a 5’CACC sequence to forward primers to permit directional TOPO cloning. We designed primer sequences to clone fragments of up to 3 kb. For enhancers annotated as larger than 3 kb, we either selected a 3 kb fragment centered at the region of greatest histone modification density (H3K4me1, H3K27ac), or generated multiple fragments spanning the enhancer. Primer sequences and samples for human genomic DNA (Coriell Cell Repositories) are listed in Supplementary file 3. We PCR amplified enhancers from human genomic DNA using Q5 High-Fidelity DNA Polymerase (NEB, Catalog #M0491S) and purified fragments corresponding to the correct length using a QIAquick Gel Extraction Kit (Qiagen, Catalog #28706). (ii) Direct synthesis of enhancer fragments: Enhancer fragments up to 1 kb in size were chosen so that the fragment covers both the sub-threshold SNP as well as peak within the DNase I hypersensitivity signal, and a 5’CACC sequence was appended to permit directional TOPO cloning. Fragments were synthesized using the gBlocks Gene Fragments service from Integrated DNA Technologies (sequences are listed in Supplementary file 3). Enhancer fragments from both methods were cloned into Gateway-compatible entry vectors using a pENTR/D-TOPO Cloning Kit (Life Technologies, Catalog # K2400) and transformed into TOP10 E. coli bacteria following manufacturers guidelines. We used Sanger sequencing to verify that purified entry vectors carried enhancers with the correct insertion orientation and no mutations beyond the expected polymorphisms. Entry vectors were then Gateway-cloned using LR Clonase II Plus (Life Technologies, Catalog # 12538-120) into a Gateway-converted pGL4.23 destination vector (Promega, Catalog # E8411) for luciferase assays in human cell lines (Fisher et al., 2006). We used Sanger sequencing to confirm a second time the correct enhancer orientation and sequence inside the destination vectors.

Human cardiomyocyte luciferase assays

Human iCMs (Cellular Dynamics, Catalog #: CMC-100-010-001) were thawed according to manufacturer’s instructions and diluted to a final plating density of 0.2x106 cells per mL with plating medium (Cellular Dynamics, Catalog#: CMM-100-110-001). 96-well tissue culture treated plates were coated with 0.1 mL of 0.1%(w/v) gelatin per well and incubated at 37°C for at least two hours. The gelatin solution was aspirated off and wells rinsed with 100 uL of PBS, aspirated, and let sit in the tissue culture hood. Using a multichannel pipette, 100 uL of cells were seeded per well to obtain a target density of 20x103 iCMs. The plates were kept on a flat bench at room temperature for 10-15 minutes to allow for cells to settle down uniformly, followed by incubation at a tissue culture incubator set at 37°C and 7% CO2. 48 hours post-seeding, the iCM plating medium was replaced with 100 uL of Maintenance Medium (Cellular Dynamics, Catalog #:CMM-100-120-001). The Maintenance Medium was replaced every other day.

3-4 days post-plating, iCMs began beating spontaneously and 7 days post-plating, they formed electrically connected syncytial layers that beat simultaneously. At this stage, the cells were transfected with the appropriate Luciferase reporter constructs and controls for downstream analyses. Media was replaced an hour before transfections. For each well, 95 ng of enhancer firefly Luciferase reporter (cloned into pGL4.23, Promega) and 5 ng of Renilla Luciferase transfection control vector (pGL4.73, Promega) was mixed with 10 ul of Opti-MEM (Life Technologies, Catalog #:51985-034). 0.2 uL of Viafect transfection reagent was added to the DNA/Opti-MEM mixture. After mixing, the transfection cocktail was incubated at room temperature for 5 min and 10 ul dispensed into the well with iCMs and plates transferred to 37°C. Media was changed 24 hr after transfection. 8 independent wells of iCMs were transfected per construct to account for variability in plating and transfection efficiencies. A mammalian expression vector, pEF-GFP (Addgene, Plasmid 11154), was used to visually monitor transfection efficiency. At least 65–70% of the population of iCMs expressed GFP 24 hr hours post-transfection.

Luciferase activity was measured 24 hrhr after transfections using the Dual-Luciferase Reporter Assay System (Promega, Catalog#:E1980). After aspirating media, cells were rinsed with PBS once, and lysed with 20 uL of 1X Passive lysis buffer in the Luciferase assay kit. 15 min minutes after gentle shaking on an orbital shaker and complete lysis, the plate was stored at -80°C until further processing. Samples were prepared and luminescence measured according to Manufacturer’s Assay protocol for 96-well plates using the Varioskan Flash Multimode Reader (Thermo Scientific).

Data analysis

For all transfection wells, luminescence values of a blank non-transfection control were subtracted from all measured activity values. Firefly luciferase activity was then normalized to Renilla luciferase activity to control for transfection efficiency in each well. As luciferase reporter assay reagents decrease in activity during regular storage, the reference and alternate alleles of each reporter construct were spotted on the same 96-well plates to control for plate-to-plate variability in reagent activity. For each enhancer, we merged readings from multiple days by normalizing the activity of reporters to the reference allele. Each reporter construct was transfected into wells of at least two separate 96-well plates and readings for all wells were merged. Wells where Renilla luciferase activity (transfection control) was substantially lower (>90%) than neighboring wells were excluded from analyses. Statistical significance was determined by unpaired Student’s t-test assuming equal variance. Minimum sample size of n=8 per enhancer construct was chosen to achieve 95% power for effect size (Cohen’s d) of 2 (0.2 difference in activity between haplotypes with standard deviation of 0.1 normalized luciferase activity units) at p=0.05.

Zebrafish antisense morpholino oligonucleotide-mediated knockdown and optical voltage mapping

Zebrafish (TuAB strain) were cared for according to standard techniques. All animal experiments were approved by the Partners Subcommittee on Research Animal Care (SRAC) and were conducted in compliance with the regulations published in the US National Institute of Health Guide for the Care and Use of Laboratory Animals. At the single cell stage, fertilized oocytes were injected with standardized concentrations and volumes of antisense morpholino oligonucleotides (5’CAATAGATGGCGCTGTGTACCTGTC3’ and 5’AGAGCAGCCTGAAAGACAATAAAGA3’ for bves, 5’GGTTAATCCACTCACCTGCCTGAAA3’ and 5’CCGTCACTCGTATCCTGTTTTAGTG3’ for popdc3, 5'3' and 'AGAAGTGTTTGCTCAGGTCACCTGT3' for prep, 5’GTTCAATTGTTTCTCACCTGCCAGA3’ and 5’CTAATCCTGTGAAAGCAGAAGATCC3’ for popdc2) dissolved in Danieau’s solution (58 mM NaCl, 0.7mM KCl, 0.4 mM MgSO4, 0.6 mM Ca(NO3)2, 5.0 mM HEPES pH 7.6). Controls were injected with an equivalent dose of non-targeting morpholino of equal length but differing nucleotide composition (5’ATCCTCTTGAGGCGAACAAAGAGTC3’). RNA was harvested at 72 hr using TRIzol (Life Technologies) according to the manufacturer’s instructions, cDNA synthesized by iScript reverse transcriptase (Bio-Rad Laboratories, Hercules, CA, Catalog #1708840) and semi-quantitative PCR was used to assess relative percentage of gene knockdown. All studies of morpholino efficacy are a result of samples obtained from three independent injections. For evaluation of ventricular action potential duration, embryo hearts were microdissected at 72 hr hours of development and stained with di-8-ANEPPS (Invitrogen, Catalog #). Cardiac contraction was arrested with 15 uM blebbistatin (Sigma-Aldrich). Hearts were then field paced at 2Hz and imaged at 1000 frames per second. Analysis of action potential durations was performed using an in-house developed MatLab program. The action potential duration at 80% repolarization was utilized for all analyses. A minimum n of 9 embryos was required for all ventricular action potential studies, based on power calculations for effect size (Cohen’s d) of 1.5 at p=0.05. No animals were excluded from analyses unless ventricular depolarization could not be induced at 120 paces per minute. No randomization of samples or blinding of investigators was utilized during these protocols. Statistical comparisons were performed using one-way ANOVA with Fisher’s Least Significant Difference testing with all comparisons being to clutchmate controls. All distributions were normal, and variances between control and experimental groups were not statistically significant.

References

  1. 1
  2. 2
  3. 3
  4. 4
    Genetic association study of QT interval highlights role for calcium signaling pathways in myocardial repolarization
    1. DE Arking
    2. SL Pulit
    3. L Crotti
    4. P van der Harst
    5. PB Munroe
    6. TT Koopmann
    7. N Sotoodehnia
    8. EJ Rossin
    9. M Morley
    10. X Wang
    11. AD Johnson
    12. A Lundby
    13. DF Gudbjartsson
    14. PA Noseworthy
    15. M Eijgelsheim
    16. Y Bradford
    17. KV Tarasov
    18. M Dörr
    19. M Müller-Nurasyid
    20. AM Lahtinen
    21. IM Nolte
    22. AV Smith
    23. JC Bis
    24. A Isaacs
    25. SJ Newhouse
    26. DS Evans
    27. WS Post
    28. D Waggott
    29. LP Lyytikäinen
    30. AA Hicks
    31. L Eisele
    32. D Ellinghaus
    33. C Hayward
    34. P Navarro
    35. S Ulivi
    36. T Tanaka
    37. DJ Tester
    38. S Chatel
    39. S Gustafsson
    40. M Kumari
    41. RW Morris
    42. ÅT Naluai
    43. S Padmanabhan
    44. A Kluttig
    45. B Strohmer
    46. AG Panayiotou
    47. M Torres
    48. M Knoflach
    49. JA Hubacek
    50. K Slowikowski
    51. S Raychaudhuri
    52. RD Kumar
    53. TB Harris
    54. LJ Launer
    55. AR Shuldiner
    56. A Alonso
    57. JS Bader
    58. G Ehret
    59. H Huang
    60. WH Kao
    61. JB Strait
    62. PW Macfarlane
    63. M Brown
    64. MJ Caulfield
    65. NJ Samani
    66. F Kronenberg
    67. J Willeit
    68. JG Smith
    69. KH Greiser
    70. H Meyer Zu Schwabedissen
    71. K Werdan
    72. M Carella
    73. L Zelante
    74. SR Heckbert
    75. BM Psaty
    76. JI Rotter
    77. I Kolcic
    78. O Polašek
    79. AF Wright
    80. M Griffin
    81. MJ Daly
    82. DO Arnar
    83. H Hólm
    84. U Thorsteinsdottir
    85. JC Denny
    86. DM Roden
    87. RL Zuvich
    88. V Emilsson
    89. AS Plump
    90. MG Larson
    91. CJ O'Donnell
    92. X Yin
    93. M Bobbo
    94. AP D'Adamo
    95. A Iorio
    96. G Sinagra
    97. A Carracedo
    98. SR Cummings
    99. MA Nalls
    100. A Jula
    101. KK Kontula
    102. A Marjamaa
    103. L Oikarinen
    104. M Perola
    105. K Porthan
    106. R Erbel
    107. P Hoffmann
    108. KH Jöckel
    109. H Kälsch
    110. MM Nöthen
    111. M den Hoed
    112. RJ Loos
    113. DS Thelle
    114. C Gieger
    115. T Meitinger
    116. S Perz
    117. A Peters
    118. H Prucha
    119. MF Sinner
    120. M Waldenberger
    121. RA de Boer
    122. L Franke
    123. PA van der Vleuten
    124. BM Beckmann
    125. E Martens
    126. A Bardai
    127. N Hofman
    128. AA Wilde
    129. ER Behr
    130. C Dalageorgou
    131. JR Giudicessi
    132. A Medeiros-Domingo
    133. J Barc
    134. F Kyndt
    135. V Probst
    136. A Ghidoni
    137. R Insolia
    138. RM Hamilton
    139. SW Scherer
    140. J Brandimarto
    141. K Margulies
    142. CE Moravec
    143. F del Greco M
    144. C Fuchsberger
    145. JR O'Connell
    146. WK Lee
    147. GC Watt
    148. H Campbell
    149. SH Wild
    150. NE El Mokhtari
    151. N Frey
    152. FW Asselbergs
    153. I Mateo Leach
    154. G Navis
    155. MP van den Berg
    156. DJ van Veldhuisen
    157. M Kellis
    158. BP Krijthe
    159. OH Franco
    160. A Hofman
    161. JA Kors
    162. AG Uitterlinden
    163. JC Witteman
    164. L Kedenko
    165. C Lamina
    166. BA Oostra
    167. GR Abecasis
    168. EG Lakatta
    169. A Mulas
    170. M Orrú
    171. D Schlessinger
    172. M Uda
    173. MR Markus
    174. U Völker
    175. H Snieder
    176. TD Spector
    177. J Ärnlöv
    178. L Lind
    179. J Sundström
    180. AC Syvänen
    181. M Kivimaki
    182. M Kähönen
    183. N Mononen
    184. OT Raitakari
    185. JS Viikari
    186. V Adamkova
    187. S Kiechl
    188. M Brion
    189. AN Nicolaides
    190. B Paulweber
    191. J Haerting
    192. AF Dominiczak
    193. F Nyberg
    194. PH Whincup
    195. AD Hingorani
    196. JJ Schott
    197. CR Bezzina
    198. E Ingelsson
    199. L Ferrucci
    200. P Gasparini
    201. JF Wilson
    202. I Rudan
    203. A Franke
    204. TW Mühleisen
    205. PP Pramstaller
    206. TJ Lehtimäki
    207. AD Paterson
    208. A Parsa
    209. Y Liu
    210. CM van Duijn
    211. DS Siscovick
    212. V Gudnason
    213. Y Jamshidi
    214. V Salomaa
    215. SB Felix
    216. S Sanna
    217. MD Ritchie
    218. BH Stricker
    219. K Stefansson
    220. LA Boyer
    221. TP Cappola
    222. JV Olsen
    223. K Lage
    224. PJ Schwartz
    225. S Kääb
    226. A Chakravarti
    227. MJ Ackerman
    228. A Pfeufer
    229. PI de Bakker
    230. C Newton-Cheh
    231. CARe Consortium
    232. COGENT Consortium
    233. DCCT/EDIC
    234. eMERGE Consortium
    235. HRGEN Consortium
    (2014)
    Nature Genetics 46:826–836.
    https://doi.org/10.1038/ng.3014
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
    Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease
    1. JC Lambert
    2. CA Ibrahim-Verbaas
    3. D Harold
    4. AC Naj
    5. R Sims
    6. C Bellenguez
    7. AL DeStafano
    8. JC Bis
    9. GW Beecham
    10. B Grenier-Boley
    11. G Russo
    12. TA Thorton-Wells
    13. N Jones
    14. AV Smith
    15. V Chouraki
    16. C Thomas
    17. MA Ikram
    18. D Zelenika
    19. BN Vardarajan
    20. Y Kamatani
    21. CF Lin
    22. A Gerrish
    23. H Schmidt
    24. B Kunkle
    25. ML Dunstan
    26. A Ruiz
    27. MT Bihoreau
    28. SH Choi
    29. C Reitz
    30. F Pasquier
    31. C Cruchaga
    32. D Craig
    33. N Amin
    34. C Berr
    35. OL Lopez
    36. PL De Jager
    37. V Deramecourt
    38. JA Johnston
    39. D Evans
    40. S Lovestone
    41. L Letenneur
    42. FJ Morón
    43. DC Rubinsztein
    44. G Eiriksdottir
    45. K Sleegers
    46. AM Goate
    47. N Fiévet
    48. MW Huentelman
    49. M Gill
    50. K Brown
    51. MI Kamboh
    52. L Keller
    53. P Barberger-Gateau
    54. B McGuiness
    55. EB Larson
    56. R Green
    57. AJ Myers
    58. C Dufouil
    59. S Todd
    60. D Wallon
    61. S Love
    62. E Rogaeva
    63. J Gallacher
    64. P St George-Hyslop
    65. J Clarimon
    66. A Lleo
    67. A Bayer
    68. DW Tsuang
    69. L Yu
    70. M Tsolaki
    71. P Bossù
    72. G Spalletta
    73. P Proitsi
    74. J Collinge
    75. S Sorbi
    76. F Sanchez-Garcia
    77. NC Fox
    78. J Hardy
    79. MC Deniz Naranjo
    80. P Bosco
    81. R Clarke
    82. C Brayne
    83. D Galimberti
    84. M Mancuso
    85. F Matthews
    86. S Moebus
    87. P Mecocci
    88. M Del Zompo
    89. W Maier
    90. H Hampel
    91. A Pilotto
    92. M Bullido
    93. F Panza
    94. P Caffarra
    95. B Nacmias
    96. JR Gilbert
    97. M Mayhaus
    98. L Lannefelt
    99. H Hakonarson
    100. S Pichler
    101. MM Carrasquillo
    102. M Ingelsson
    103. D Beekly
    104. V Alvarez
    105. F Zou
    106. O Valladares
    107. SG Younkin
    108. E Coto
    109. KL Hamilton-Nelson
    110. W Gu
    111. C Razquin
    112. P Pastor
    113. I Mateo
    114. MJ Owen
    115. KM Faber
    116. PV Jonsson
    117. O Combarros
    118. MC O'Donovan
    119. LB Cantwell
    120. H Soininen
    121. D Blacker
    122. S Mead
    123. TH Mosley
    124. DA Bennett
    125. TB Harris
    126. L Fratiglioni
    127. C Holmes
    128. RF de Bruijn
    129. P Passmore
    130. TJ Montine
    131. K Bettens
    132. JI Rotter
    133. A Brice
    134. K Morgan
    135. TM Foroud
    136. WA Kukull
    137. D Hannequin
    138. JF Powell
    139. MA Nalls
    140. K Ritchie
    141. KL Lunetta
    142. JS Kauwe
    143. E Boerwinkle
    144. M Riemenschneider
    145. M Boada
    146. M Hiltuenen
    147. ER Martin
    148. R Schmidt
    149. D Rujescu
    150. LS Wang
    151. JF Dartigues
    152. R Mayeux
    153. C Tzourio
    154. A Hofman
    155. MM Nöthen
    156. C Graff
    157. BM Psaty
    158. L Jones
    159. JL Haines
    160. PA Holmans
    161. M Lathrop
    162. MA Pericak-Vance
    163. LJ Launer
    164. LA Farrer
    165. CM van Duijn
    166. C Van Broeckhoven
    167. V Moskvina
    168. S Seshadri
    169. J Williams
    170. GD Schellenberg
    171. P Amouyel
    172. European Alzheimer's Disease Initiative (EADI)
    173. Genetic and Environmental Risk in Alzheimer's Disease
    174. Alzheimer's Disease Genetic Consortium
    175. Cohorts for Heart and Aging Research in Genomic Epidemiology
    (2013)
    Nature Genetics 45:1452–1458.
    https://doi.org/10.1038/ng.2802
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
    Discovery and refinement of loci associated with lipid levels
    1. CJ Willer
    2. EM Schmidt
    3. S Sengupta
    4. GM Peloso
    5. S Gustafsson
    6. S Kanoni
    7. A Ganna
    8. J Chen
    9. ML Buchkovich
    10. S Mora
    11. JS Beckmann
    12. JL Bragg-Gresham
    13. HY Chang
    14. A Demirkan
    15. HM Den Hertog
    16. R Do
    17. LA Donnelly
    18. GB Ehret
    19. T Esko
    20. MF Feitosa
    21. T Ferreira
    22. K Fischer
    23. P Fontanillas
    24. RM Fraser
    25. DF Freitag
    26. D Gurdasani
    27. K Heikkilä
    28. E Hyppönen
    29. A Isaacs
    30. AU Jackson
    31. A Johansson
    32. T Johnson
    33. M Kaakinen
    34. J Kettunen
    35. ME Kleber
    36. X Li
    37. J Luan
    38. LP Lyytikäinen
    39. PK Magnusson
    40. M Mangino
    41. E Mihailov
    42. ME Montasser
    43. M Müller-Nurasyid
    44. IM Nolte
    45. JR O'Connell
    46. CD Palmer
    47. M Perola
    48. AK Petersen
    49. S Sanna
    50. R Saxena
    51. SK Service
    52. S Shah
    53. D Shungin
    54. C Sidore
    55. C Song
    56. RJ Strawbridge
    57. I Surakka
    58. T Tanaka
    59. TM Teslovich
    60. G Thorleifsson
    61. EG Van den Herik
    62. BF Voight
    63. KA Volcik
    64. LL Waite
    65. A Wong
    66. Y Wu
    67. W Zhang
    68. D Absher
    69. G Asiki
    70. I Barroso
    71. LF Been
    72. JL Bolton
    73. LL Bonnycastle
    74. P Brambilla
    75. MS Burnett
    76. G Cesana
    77. M Dimitriou
    78. AS Doney
    79. A Döring
    80. P Elliott
    81. SE Epstein
    82. GI Eyjolfsson
    83. B Gigante
    84. MO Goodarzi
    85. H Grallert
    86. ML Gravito
    87. CJ Groves
    88. G Hallmans
    89. AL Hartikainen
    90. C Hayward
    91. D Hernandez
    92. AA Hicks
    93. H Holm
    94. YJ Hung
    95. T Illig
    96. MR Jones
    97. P Kaleebu
    98. JJ Kastelein
    99. KT Khaw
    100. E Kim
    101. N Klopp
    102. P Komulainen
    103. M Kumari
    104. C Langenberg
    105. T Lehtimäki
    106. SY Lin
    107. J Lindström
    108. RJ Loos
    109. F Mach
    110. WL McArdle
    111. C Meisinger
    112. BD Mitchell
    113. G Müller
    114. R Nagaraja
    115. N Narisu
    116. TV Nieminen
    117. RN Nsubuga
    118. I Olafsson
    119. KK Ong
    120. A Palotie
    121. T Papamarkou
    122. C Pomilla
    123. A Pouta
    124. DJ Rader
    125. MP Reilly
    126. PM Ridker
    127. F Rivadeneira
    128. I Rudan
    129. A Ruokonen
    130. N Samani
    131. H Scharnagl
    132. J Seeley
    133. K Silander
    134. A Stancáková
    135. K Stirrups
    136. AJ Swift
    137. L Tiret
    138. AG Uitterlinden
    139. LJ van Pelt
    140. S Vedantam
    141. N Wainwright
    142. C Wijmenga
    143. SH Wild
    144. G Willemsen
    145. T Wilsgaard
    146. JF Wilson
    147. EH Young
    148. JH Zhao
    149. LS Adair
    150. D Arveiler
    151. TL Assimes
    152. S Bandinelli
    153. F Bennett
    154. M Bochud
    155. BO Boehm
    156. DI Boomsma
    157. IB Borecki
    158. SR Bornstein
    159. P Bovet
    160. M Burnier
    161. H Campbell
    162. A Chakravarti
    163. JC Chambers
    164. YD Chen
    165. FS Collins
    166. RS Cooper
    167. J Danesh
    168. G Dedoussis
    169. U de Faire
    170. AB Feranil
    171. J Ferrières
    172. L Ferrucci
    173. NB Freimer
    174. C Gieger
    175. LC Groop
    176. V Gudnason
    177. U Gyllensten
    178. A Hamsten
    179. TB Harris
    180. A Hingorani
    181. JN Hirschhorn
    182. A Hofman
    183. GK Hovingh
    184. CA Hsiung
    185. SE Humphries
    186. SC Hunt
    187. K Hveem
    188. C Iribarren
    189. MR Järvelin
    190. A Jula
    191. M Kähönen
    192. J Kaprio
    193. A Kesäniemi
    194. M Kivimaki
    195. JS Kooner
    196. PJ Koudstaal
    197. RM Krauss
    198. D Kuh
    199. J Kuusisto
    200. KO Kyvik
    201. M Laakso
    202. TA Lakka
    203. L Lind
    204. CM Lindgren
    205. NG Martin
    206. W März
    207. MI McCarthy
    208. CA McKenzie
    209. P Meneton
    210. A Metspalu
    211. L Moilanen
    212. AD Morris
    213. PB Munroe
    214. I Njølstad
    215. NL Pedersen
    216. C Power
    217. PP Pramstaller
    218. JF Price
    219. BM Psaty
    220. T Quertermous
    221. R Rauramaa
    222. D Saleheen
    223. V Salomaa
    224. DK Sanghera
    225. J Saramies
    226. PE Schwarz
    227. WH Sheu
    228. AR Shuldiner
    229. A Siegbahn
    230. TD Spector
    231. K Stefansson
    232. DP Strachan
    233. BO Tayo
    234. E Tremoli
    235. J Tuomilehto
    236. M Uusitupa
    237. CM van Duijn
    238. P Vollenweider
    239. L Wallentin
    240. NJ Wareham
    241. JB Whitfield
    242. BH Wolffenbuttel
    243. JM Ordovas
    244. E Boerwinkle
    245. CN Palmer
    246. U Thorsteinsdottir
    247. DI Chasman
    248. JI Rotter
    249. PW Franks
    250. S Ripatti
    251. LA Cupples
    252. MS Sandhu
    253. SS Rich
    254. M Boehnke
    255. P Deloukas
    256. S Kathiresan
    257. KL Mohlke
    258. E Ingelsson
    259. GR Abecasis
    260. Global Lipids Genetics Consortium
    (2013)
    Nature Genetics 45:1274–1283.
    https://doi.org/10.1038/ng.2797
  44. 44

Decision letter

  1. Gilean McVean
    Reviewing Editor; Oxford University, United Kingdom

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your work entitled "Discovery and validation of sub-threshold GWAS loci using epigenomic signatures" for consideration by eLife. Your article has been favorably evaluated by Stylianos Antonarakis (Senior editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors.

The reviewers have discussed the reviews with one another and the Reviewing editor has drafted this decision to help you prepare a revised submission.

Summary:

This paper presents a series of analyses that demonstrate clear evidence for enrichment of functionally-relevant genetic variants among sub genome-wide significant (GWS) loci from studies of heart parameters, and the value of using epigenetic signatures to further identify subsets that are likely to be real. The paper identifies one locus in particular and presents experimental data to support a hypothesis for a particular variant-gene interaction. Zebra-fish knock downs are also used to further support the involvement of a particular class of genes in QT interval / QRS duration. The authors suggest that the set of approaches they use have wide applicability. In the last couple of years several authors have demonstrated the value of epigenetic signatures as a way of homing in on signals in GWAS – or decomposing heritability. However, this is the first paper that we are aware of to follow up such work with experimental validation of sub-GWS signals.

There is a very large amount of work presented here and on the whole it is well-designed, logical and plausible. However, all three reviewers felt that there were substantial gaps in each of the pieces of work presented, which would have to be addressed in a revision. The key things that need to be revised are:

1) There has to be a quantitative attempt to estimate the FDR associated with the strategy of epigenetic prioritisation. It is very important for experimental researchers to know that a locus is 'real' before follow-up work starts – this is precisely why the GWS significance threshold is so stringent. An augmented set of experimental data along the lines presented here would enable this to be estimated.

2) In the luciferase and 4C experiments, only loci from the likely set are considered. In order to address the FDR issue, the same would have to be done for loci drawn from the no epigenetic signal set of loci with the same p value distribution. Such experiments would directly address the FDR issue.

3) In the analysis of matched, control SNPs, gene density needs to be accounted for.

4) All analyses should be carried out excluding GWS loci. In a paper about sub-GWS loci, including these is problematic.

5) In the fine-mapping example, we need to understand the prioritisation of rs1743292 better. There are other SNPs in LD with this in regulatory regions. Why were these not considered? We need to see a fine-mapping analysis of the signal with imputation (which is valid even though the region is not GWS) prior to focusing on the particular SNP.

6) In the morpholino experiments, there is apparently a lot of noise – such that the differences between those loci classed as positive and negative look relatively small. Additional replicates are needed to be convincing here.

7) There has to be a more up-to-date discussion of the relevant literature around biological signal lying in sub-GWS loci.

We appreciate that a lot is being asked for here and you may feel that such extensive revision is not feasible. One possibility is to remove some elements of the paper – e.g. the morpholino work – and focus on presenting a more complete story around the FDR part.

https://doi.org/10.7554/eLife.10557.030

Author response

There is a very large amount of work presented here and on the whole it is well-designed, logical and plausible. However, all three reviewers felt that there were substantial gaps in each of the pieces of work presented, which would have to be addressed in a revision. The key things that need to be revised are: 1) There has to be a quantitative attempt to estimate the FDR associated with the strategy of epigenetic prioritisation. It is very important for experimental researchers to know that a locus is 'real' before follow-up work starts – this is precisely why the GWS significance threshold is so stringent. An augmented set of experimental data along the lines presented here would enable this to be estimated.

We thank the reviewers for this helpful comment. We agree that it is important to demonstrate that sub-threshold loci prioritized by epigenetic signals have a sufficiently high “true positive” rate to warrant follow-up investigations. In accordance with the reviewers’ request, we assayed the allele-specific differences in activity for an additional 11 enhancers that include 8 new sub-threshold loci.

In total, we now test 22 enhancers from 18 sub-threshold loci, observing that 13 loci (72%) display allele- specific activity in reporter assays. We quantitatively estimate that between 51.1%-89.8% (95% Bayesian confidence interval) of prioritized sub-threshold loci show allele-specific activity on transcription. These results suggest that the majority of sub-threshold loci prioritized by epigenomic annotations represent attractive candidates for follow-up by experimental researchers (Figure 4B,D). We reproduce the text from the main paper:

“In total, we cloned allele-specific enhancer fragments from 22 cardiac enhancers that overlap SNPs from 18 independent sub-threshold loci, and performed quantitative luciferase assays in human iPSC-derived cardiomyocytes to determine whether the sub-threshold SNP genotypes influence enhancer activity (Methods). We observed that 13 of 18 sub-threshold loci (72.2%) contain an enhancer that drives luciferase activity in an allele-specific manner (Figure 4B,D, Figure 4—figure supplement 1). Moreover, we estimate that between 51.1%-89.8% (95% Bayesian confidence interval) of prioritized sub-threshold loci show allele-specific activity on transcription, suggesting that the majority of sub-threshold loci identified by epigenomic prioritization do in fact have an impact on transcriptional enhancer activity.”

We also address the FDR concern by combining independent lines of evidence indicating that sub-threshold loci prioritized by epigenomics are significantly more attractive candidates for further study than those lack a relevant epigenetic signature.

2) In the luciferase and 4C experiments, only loci from the likely set are considered. In order to address the FDR issue, the same would have to be done for loci drawn from the no epigenetic signal set of loci with the same p value distribution. Such experiments would directly address the FDR issue. The reviewers raise an important point regarding the importance of demonstrating that sub-threshold loci prioritized by epigenomics have greater biological plausibility than those with no epigenetic signal.

We followed the reviewers’ recommendations and tested five sub-threshold loci marked by little or no epigenetic signal in human heart tissue using luciferase reporter assays (it was not feasible for us to perform additional 4C-seq experiments in the time frame). Four of these five loci show no allelic difference in activity, while surprisingly, two haplotypes at the fifth sub-threshold locus (rs9504919) differ in enhancer activity by 15% (p=0.0047, Figure 4—figure supplement 2A).

We hypothesize that the allelic difference in activity at rs9504919 can be explained by the presence of an undiscovered cardiac enhancer. Specifically, the individuals contributing heart tissue to the Roadmap Epigenomics project are likely homozygous for the less active “C” allele at this locus (within the 1000 Genomes Project, 100% of 503 EUR individuals are homozygous C/C, and across all human populations, the C allele is present at a 95% frequency). Thus, the enhancer may only be detectable by epigenomic profiling studies if an individual with the correct genotype (A/C or A/A) is recruited. This touches upon a broader limitation of current epigenomic profiling studies that profile single “reference” individuals – these studies are unable to discover epigenomic signatures at sites where the individual carries a less active genotype, suggesting that more regulatory elements remain to be discovered by the profiling of the same tissues from additional individuals.

However, we respectfully wish to mention that the suggested experiment of testing the allele-specific activity of “negative control” sub-threshold loci can address the FDR issue. A negative result in this experiment (i.e. no allelic difference in activity) does not mean that these unmarked sub-threshold loci are not biologically relevant, but only that if they are “real”, they do not act through enhancer activity.

To better address the spirit of the reviewers’ comment on the effectiveness of using epigenetic signals to discriminate real sub-threshold loci from noise, we have re-written the text to emphasize two lines of evidence that support this point (subsection “Epigenomic prioritization discriminates biologically relevant sub-threshold loci”). We show that while all sub-threshold loci were identified using the same p-value cut-off of p<1x10-4, the loci marked by epigenetic signatures have significantly stronger GWAS signals in both the QT interval GWAS and a related GWAS for QRS duration (Figure 5A).

Furthermore, we demonstrate that genetic perturbation of genes nearby the prioritized sub-threshold loci is significantly more likely to lead to cardiac conduction or contractility abnormalities compared to “negative control” sub-threshold loci outside cardiac enhancer elements (Figure 5B). Notably, the tests in Figure 5 do not rely on assuming that sub-threshold loci affect enhancer activity, and thus the results demonstrate that epigenetic marks can prioritize biologically relevant sub-threshold loci.

3) In the analysis of matched, control SNPs, gene density needs to be accounted for. We thank the reviewers for this suggestion. We now include gene density (within a surrounding +/-500kb window) as a feature to control for in sampling the background distribution of control SNPs. We observe that the same patterns of greater enrichment of QT/QRS GWAS loci overlapping cardiac enhancers persist after account for gene density (Author response image 1 adapted from Figure 1B, but consistent patterns also observed everywhere else control SNPs were sampled for a background distribution).

4) All analyses should be carried out excluding GWS loci. In a paper about sub-GWS loci, including these is problematic. We apologize for the confusion regarding this important point. We started our study by identifying 112 QT/QRS GWAS loci from GWASs published up to July 2013 and used this set to discover the common characteristics of GWAS loci outlined in Figures 1 and 2 (e.g. cardiac enhancer overlap and the greater density of specific epigenetic marks). The GWS loci in Figure 3B that remain after removal of the 112 QT/QRS GWAS loci are loci that only reach genome-wide significance in the larger 2014 Arking et al.GWAS. These newly GWS loci were purposefully held out from the initial set of 112 QT/QRS loci so that they can act as an independent validation set for testing the effectiveness of epigenomic prioritization, as these newly GWS loci are almost certainly biologically real.

We have clarified this point in the text at multiple places:

”We also collected GWAS loci from a later meta-analysis of QT interval studies, published in June 2014 by Arking et al., which we held out from the aforementioned 112 QT/QRS loci as a validation dataset for subsequent analyses.”

“To test this idea, we used SNP summary statistics from the Arking et al.(2014) QT interval GWAS study we had earlier held out as a validation dataset (Arking et al., 2014). These summary statistics include the 112 QT/QRS loci identified by prior GWASs (red dots, bottom, Figure 3), as well as loci that reach genome-wide significance in the larger meta-analysis cohort but were not discovered in any previous GWAS (and therefore were not included in the 112 QT/QRS loci used for enrichment analyses above, gold dots, bottom, Figure 3).”

However, to address the reviewers’ concern regarding the additional GWS loci, we have performed the same enrichment analysis after also excluding the held-out GWS and any SNPs within +/- 1Mb. The plot is included in Author response image 2, and we observe the same enrichment pattern as in Figure 3.

5) In the fine-mapping example, we need to understand the prioritisation of rs1743292 better. There are other SNPs in LD with this in regulatory regions. Why were these not considered? We need to see a fine-mapping analysis of the signal with imputation (which is valid even though the region is not GWS) prior to focusing on the particular SNP. We thank the editors and reviewers for this suggestion. Unfortunately, we do not have access to the individual-level genotype information to perform statistical fine-mapping analysis. However, to address the reviewers’ concern, in the revised manuscript we now consider all putative causal SNPs in the rs1743292 locus as determined by the combination of GWAS signal and epigenetic signal. As described in the subsection “Sub-threshold locus at rs1743292/rs1772203 functionally disrupts enhancer activity”, the rs1743292 locus contains 15 SNPs (r2≥0.8). 8 of 15 have a p-value reported by Arking et al.of less than 1x10-4 and an additional 2 do not have reported p-values. We considered these 10 SNPs as possible causal SNPs in the locus, and in our revised manuscript we study the 3 SNPs out of 10 that overlap active LV enhancers (rs1743292, rs112332323, rs1772203).

As also described in the aforementioned subsection, both the rs1743292/rs112332323 enhancer and the rs1772203 enhancer show allele-specific enhancer activity, and both enhancers form detectable 3D interactions with the BVES and POPDC3 promoters (Figure 6A, C, E, Gg).

Thus, we now believe that genetic variation in the rs1743292 locus affects both enhancers, and that rs1772203 is an additional potential causal SNP that contributes to the activity of this locus.

6) In the morpholino experiments, there is apparently a lot of noise – such that the differences between those loci classed as positive and negative look relatively small. Additional replicates are needed to be convincing here. We thank the reviewers for the suggestion. It was not feasible for us to perform additional morpholino experiments in the time frame, so we followed the editors’ suggestion below and moved most of the zebrafish section from the main text to a supplemental figure (Figure 6 —figure supplement 2). In the revised manuscript, we now focus primarily on the organism-level phenotype validation of the rs1743292 locus using mouse phenotypes, showing that loss of Bves, but not Popdc3 or Prep, leads to cardiac contractility and conduction defects (Figure 6G). However, we still present the zebrafish optical voltage mapping results in the supplemental figure because we believe the phenotypic agreement between bves knockdown in zebrafish and Bves knockout in mouse strengthens our case for BVES as the causal gene in the locus.

7) There has to be a more up-to-date discussion of the relevant literature around biological signal lying in sub-GWS loci. We thank the reviewers for this suggestion. In the revised manuscript we now include an up-to-date discussion of the literature around sub-GWS loci. Briefly, three recent studies (PMIDs: 22955828, 26414678, 25439723) have observed the general enrichment of either sub-threshold SNPs or SNPs that explain a disproportionately high amount of heritability in cell type-specific regulatory elements, however to our knowledge no study has moved beyond this observation to investigate any individual sub- threshold locus or tracked down the relevant genes or underlying pathways. This latter point is critical, as the editors and reviewers mention in point #1: it is “very important for experimental researchers to know that a locus is 'real' before follow-up work starts”.

We have added the following to the Discussion: “Studies of genetic heritability have indicated that many additional loci lie below the genome-wide significance threshold (Yang et al., 2011). […] The work presented represents a first step towards deciphering this signal and opens the door for the discovery of greater numbers of disease loci, genes, and pathways.”

https://doi.org/10.7554/eLife.10557.031

Article and author information

Author details

  1. Xinchen Wang

    1. Department of Biology, Massachusetts Institute of Technology, Cambridge, United States
    2. Broad Institute of MIT and Harvard, Cambridge, United States
    3. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, United States
    Contribution
    XW, Designed the project, Performed human genetics and epigenomics analyses, Performed 4C-seq experiments and analysis, Constructed luciferase reporter constructs and performed human cardiomyocyte luciferase assays, Wrote the manuscript, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-5198-6581
  2. Nathan R Tucker

    Cardiovascular Research Center, Massachusetts General Hospital, Boston, United States
    Contribution
    NRT, Performed zebrafish morpholino experiments, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  3. Gizem Rizki

    Department of Biology, Massachusetts Institute of Technology, Cambridge, United States
    Contribution
    GR, Performed 4C-seq experiments and analysis, Constructed luciferase reporter constructs and performed human cardiomyocyte luciferase assays, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  4. Robert Mills

    Cardiovascular Research Center, Massachusetts General Hospital, Boston, United States
    Contribution
    RM, Performed zebrafish morpholino experiments, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  5. Peter HL Krijger

    1. Hubrecht Institute-KNAW, University Medical Center Utrecht, Utrecht, Netherlands
    2. University Medical Center Utrecht, Utrecht, Netherlands
    Contribution
    PHLK, Performed 4C-seq experiments and analysis, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  6. Elzo de Wit

    1. Hubrecht Institute-KNAW, University Medical Center Utrecht, Utrecht, Netherlands
    2. University Medical Center Utrecht, Utrecht, Netherlands
    Contribution
    EdW, Performed 4C-seq experiments and analysis, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  7. Vidya Subramanian

    Department of Biology, Massachusetts Institute of Technology, Cambridge, United States
    Contribution
    VS, Constructed luciferase reporter constructs and performed human cardiomyocyte luciferase assays, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  8. Eric Bartell

    Department of Biology, Massachusetts Institute of Technology, Cambridge, United States
    Contribution
    EB, Constructed luciferase reporter constructs and performed human cardiomyocyte luciferase assays, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  9. Xinh-Xinh Nguyen

    Cardiovascular Research Center, Massachusetts General Hospital, Boston, United States
    Contribution
    X-XN, Performed zebrafish morpholino experiments, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  10. Jiangchuan Ye

    Cardiovascular Research Center, Massachusetts General Hospital, Boston, United States
    Contribution
    JY, Performed zebrafish morpholino experiments, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  11. Jordan Leyton-Mange

    Cardiovascular Research Center, Massachusetts General Hospital, Boston, United States
    Contribution
    JL-M, Performed zebrafish morpholino experiments, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  12. Elena V Dolmatova

    Cardiovascular Research Center, Massachusetts General Hospital, Boston, United States
    Contribution
    EVD, Performed zebrafish morpholino experiments, Acquisition of data, Analysis and interpretation of data
    Competing interests
    The authors declare that no competing interests exist.
  13. Pim van der Harst

    1. Department of Cardiology, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
    2. Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
    Contribution
    PvdH, Acquisition of data, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  14. Wouter de Laat

    1. Hubrecht Institute-KNAW, University Medical Center Utrecht, Utrecht, Netherlands
    2. University Medical Center Utrecht, Utrecht, Netherlands
    Contribution
    WdL, Performed 4C-seq experiments and analysis, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  15. Patrick T Ellinor

    1. Broad Institute of MIT and Harvard, Cambridge, United States
    2. Cardiovascular Research Center, Massachusetts General Hospital, Boston, United States
    Contribution
    PTE, Performed zebrafish morpholino experiments, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    Competing interests
    The authors declare that no competing interests exist.
  16. Christopher Newton-Cheh

    1. Broad Institute of MIT and Harvard, Cambridge, United States
    2. Cardiovascular Research Center, Massachusetts General Hospital, Boston, United States
    3. Center for Human Genetic Research, Massachusetts General Hospital, Boston, United States
    Contribution
    CN-C, Provided association results and analyses for sub-threshold QT interval loci, Analysis and interpretation of data, Drafting or revising the article, Contributed unpublished essential data or reagents
    Competing interests
    The authors declare that no competing interests exist.
  17. David J Milan

    Cardiovascular Research Center, Massachusetts General Hospital, Boston, United States
    Contribution
    DJM, Performed zebrafish morpholino experiments, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article
    For correspondence
    dmilan@mgh.harvard.edu
    Competing interests
    The authors declare that no competing interests exist.
  18. Manolis Kellis

    1. Broad Institute of MIT and Harvard, Cambridge, United States
    2. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, United States
    Contribution
    MK, Designed the project, Wrote the manuscript, Analysis and interpretation of data
    For correspondence
    manoli@mit.edu
    Competing interests
    The authors declare that no competing interests exist.
  19. Laurie A Boyer

    Department of Biology, Massachusetts Institute of Technology, Cambridge, United States
    Contribution
    LAB, Designed the project, Wrote the manuscript
    For correspondence
    lboyer@mit.edu
    Competing interests
    The authors declare that no competing interests exist.

Funding

Canadian Institutes of Health Research (Doctoral Foreign Research Award)

  • Xinchen Wang

Fondation Leducq (14CVD01)

  • Wouter de Laat
  • Patrick T Ellinor

National Institutes of Health (R01HL109004)

  • David J Milan

National Institutes of Health (R01HG004037)

  • Manolis Kellis

National Institutes of Health (R01HG008155)

  • Manolis Kellis

National Institutes of Health (R01GM113708)

  • Manolis Kellis

National Institutes of Health (U41HG007000)

  • Manolis Kellis

National Institutes of Health (U01HL098179)

  • Laurie A Boyer

American Heart Association (15GRNT25670044)

  • Laurie A Boyer

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We are thankful to E Gjoneska for advice on mammalian luciferase assays, P. Kheradpour for advice on TF motif analysis, G Quon and J. Wamstad for advice on epigenomics analyses and D Altshuler, L. Ward and V Agarwala for advice on human genetics analyses. We are grateful to the members of the Boyer and Kellis laboratories for helpful suggestions and discussions.

Ethics

Animal experimentation: Zebrafish (TuAB strain) were cared for according to standard techniques. All animal experiments were approved by the Partners Subcommittee on Research Animal Care (SRAC, protocol #2005N000025) and were conducted in compliance with the regulations published in the US National Institute of Health Guide for the Care and Use of Laboratory Animals.

Reviewing Editor

  1. Gilean McVean, Reviewing Editor, Oxford University, United Kingdom

Publication history

  1. Received: October 5, 2015
  2. Accepted: April 4, 2016
  3. Version of Record published: May 10, 2016 (version 1)

Copyright

© 2016, Wang et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 3,431
    Page views
  • 682
    Downloads
  • 3
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Comments

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Developmental Biology and Stem Cells
    2. Genomics and Evolutionary Biology
    Jian Ming Khor, Charles A Ettensohn
    Research Article
    1. Biophysics and Structural Biology
    Jing Li et al.
    Research Article Updated