Dynamic effects of genetic variation on gene expression revealed following hypoxic stress in cardiomyocytes
Abstract
One life-threatening outcome of cardiovascular disease is myocardial infarction, where cardiomyocytes are deprived of oxygen. To study inter-individual differences in response to hypoxia, we established an in vitro model of induced pluripotent stem cell-derived cardiomyocytes from 15 individuals. We measured gene expression levels, chromatin accessibility, and methylation levels in four culturing conditions that correspond to normoxia, hypoxia, and short- or long-term re-oxygenation. We characterized thousands of gene regulatory changes as the cells transition between conditions. Using available genotypes, we identified 1,573 genes with a cis expression quantitative locus (eQTL) in at least one condition, as well as 367 dynamic eQTLs, which are classified as eQTLs in at least one, but not in all conditions. A subset of genes with dynamic eQTLs is associated with complex traits and disease. Our data demonstrate how dynamic genetic effects on gene expression, which are likely relevant for disease, can be uncovered under stress.
Introduction
Cardiovascular disease (CVD), which ultimately damages heart muscle, is a leading cause of death worldwide (WHO, 2018). CVD encompasses a range of pathologies including myocardial infarction (MI), where ischemia or a lack of oxygen delivery to energy-demanding cardiomyocytes results in cellular stress, irreparable damage, and cell death. Genome-wide association studies (GWAS) have identified hundreds of loci associated with coronary artery disease (Nikpay et al., 2015), MI, and heart failure (Shah et al., 2020), indicating the potential contribution of specific genetic variants to disease risk. Most disease-associated loci do not localize within coding regions of the genome, often making inference about the molecular mechanisms of disease challenging. That said, because most GWAS loci fall within non-coding regions, these variants are thought to have a role in regulating gene expression. One of the main goals of the Genotype-Tissue Expression (GTEx) project has been to bridge the gap between genotype and organismal level phenotypes by identifying associations between genetic variants and intermediate molecular level phenotypes such as gene expression levels (GTEx Consortium et al., 2017). The GTEx project has identified tens of thousands of expression quantitative trait loci (eQTLs); namely, variants that are associated with changes in gene expression levels, across dozens of tissues including ventricular and atrial samples from the heart. However, the eQTLs reported by GTEx explain a modest proportion of GWAS loci, and while increasing the diversity of tissues and sample sizes will enable further insight, orthogonal approaches also need to be considered.
It is becoming increasingly evident that many genetic variants that are not associated with gene expression levels at steady state, may be found to impact dynamic programs of gene expression in specific contexts. This includes specific developmental stages (Cuomo et al., 2020; Strober et al., 2019), or specific exposure to an environmental stimulus such as endoplasmic reticulum stress (Dombroski et al., 2010), hormone treatment (Maranville et al., 2011), radiation-induced cell death (Smirnov et al., 2012), vitamin D exposure (Kariuki et al., 2016), drug-induced cardiotoxicity (Knowles et al., 2018), and response to infection (Alasoo et al., 2018; Barreiro et al., 2012; Çalışkan et al., 2015; Kim-Hellmuth et al., 2017; Manry et al., 2017; Nédélec et al., 2016). The studies of context-specific dynamic eQTLs highlight the need to determine the effects of genetic variants in the relevant environment. Therefore, if we are to fully understand the effects of genetic variation on disease, we must assay disease-relevant cell types and disease-relevant perturbations. Most of the aforementioned studies were performed in whole blood or immune cells, which means that there are many cell types and disease-relevant states that have yet to be explored.
With advances in pluripotent stem cell technology, we can now generate otherwise largely inaccessible human cell types through directed differentiation of induced pluripotent stem cells (iPSCs) reprogrammed from easily accessible tissues such as fibroblasts or B-cells. One of the advantages of iPSC-derived cell types as a model system is that the environment can be controlled, and thus we can specifically test for genetic effects on molecular phenotypes in response to controlled perturbation. This is particularly useful for studies of complex diseases such as CVD, which result from a combination of both genetic and environmental factors.
The heart is a complex tissue consisting of multiple cell types, yet the bulk of the volume of the heart is comprised of cardiomyocytes (Donovan et al., 2019; Pinto et al., 2016), which are particularly susceptible to oxygen deprivation given their high metabolic activity. iPSC-derived cardiomyocytes (iPSC-CMs) have been shown to be a useful model for studying genetic effects on various cardiovascular traits and diseases, as well for studying gene regulation (Banovich et al., 2018; Benaglio et al., 2019; Brodehl et al., 2019; Burridge et al., 2016; de la Roche et al., 2019; Ma et al., 2018; McDermott-Roe et al., 2019; Panopoulos et al., 2017; Pavlovic et al., 2018; Ward and Gilad, 2019).
In humans, coronary artery disease can lead to MI (Dzau et al., 2006) which results in ischemia and a lack of oxygen delivery to energy-demanding cardiomyocytes. Given the inability of cardiomyocytes to regenerate, this cellular stress ultimately leads to tissue damage. Advances in treatment for MI, such as surgery to restore blood flow and oxygen to occluded arteries, have improved clinical outcomes. However, a rapid increase in oxygen levels post-MI can generate reactive oxygen species leading to ischemia-reperfusion (I/R) injury (Giordano, 2005). Both MI and I/R injury can thus ultimately influence the amount of damage in the heart. iPSC-CMs allow us to mimic the I/R injury process in vitro by manipulating the oxygen levels that cardiomyocytes are exposed to in vivo.
We thus designed a study aimed at developing an understanding of the genetic determinants of the response to a universal cellular stress, oxygen deprivation, in a disease-relevant cell type, mimicking a disease-relevant process. To do so, we established an in vitro model of oxygen deprivation (hypoxia) and re-oxygenation in a panel of iPSC-CMs from 15 genotyped individuals (Banovich et al., 2018). We collected data for three molecular level phenotypes: gene expression, chromatin accessibility, and DNA methylation to understand both the genetic and regulatory responses to this cellular stress. This framework allowed us to identify eQTLs that are not evident at steady state, and assess their association with complex traits and disease.
Results
We differentiated iPSC-CMs from iPSCs of 15 Yoruba individuals that were part of the HapMap project (Banovich et al., 2018). To obtain a measure of variance associated with the differentiation process, and to more effectively account for batch effects, we replicated the iPSC-CM differentiation from three individuals three times, yielding 21 differentiation experiments in total. The proportion of iPSC-CMs in each cell culture was enriched by metabolic purification (see Materials and methods). On Day 20, cardiomyocyte differentiation cultures from each individual were split into sub-cultures for each of the four subsequent oxygen conditions and for assessment of cardiomyocyte purity (Figure 1—figure supplement 1). iPSC-CMs were matured by electrical pulsing and maintenance in cell culture for 30 days. On Day 30 (+/- 1 day), the median cardiomyocyte purity from one representative sub-culture across individuals was 81% (40–97% range), determined by flow cytometry as the proportion of cells that were positive for the cardiac-specific marker, TNNT2 (Figure 1—figure supplement 2; Supplementary file 1; see Materials and methods).
We studied the response of the iPSC-CMs to hypoxia and re-oxygenation (Figure 1A). To do so, we first cultured the iPSC-CMs at oxygen levels that are close to physiological oxygen levels (10% oxygen - Condition A) for 7 days. We then subjected the iPSC-CMs to 6 hours of hypoxia (1% oxygen - Condition B), followed by re-oxygenation for 6 hours (10% oxygen - Condition C), or 24 hours (10% oxygen - Condition D) as previously described (Ward and Gilad, 2019). Oxygen levels were reproducibly controlled in cell culture (Figure 1B, Supplementary file 1). In order to determine whether the cardiomyocytes were affected by the changes in oxygen levels, we measured the enzymatic activity of released lactate dehydrogenase throughout the experiment, as a proxy for cytotoxicity. We also measured released BNP, a clinical marker of heart failure (Maeda et al., 1998). As expected, both cytotoxicity (p=0.01, Figure 1—figure supplement 3A–B) and BNP (p=5×10−6, Figure 1—figure supplement 3C) levels increased following hypoxia and long-term re-oxygenation.

Study design to test the response of iPSC-derived cardiomyocytes to hypoxia and re-oxygenation.
(A) Experimental design of the study. Cardiomyocytes differentiated from iPSCs (iPSC-CMs) from 15 Yoruba individuals were cultured in normoxic conditions (10% oxygen - condition A, brown) and subjected to 6 hr of hypoxia (1% oxygen - condition B, blue) followed by 6 and 24 hr of re-oxygenation (10% oxygen - conditions C [coral] and D [red]). Immunocytochemistry of a representative cardiomyocyte culture where green: TNNT2; blue: nuclei. (B) Peri-cellular oxygen levels of each condition. Each data point represents one individual undergoing the oxygen stress experiment. (C) Molecular phenotypes collected from each individual in each condition.
With this system established, we sought to understand the contribution of the global gene regulatory response to the molecular and cellular response to hypoxia and re-oxygenation. To do so, we collected global gene expression data (using RNA-seq; n = 15 individuals), chromatin accessibility data (using ATAC-seq; n = 14), and DNA methylation data (using the EPIC arrays; n = 13; Figure 1C) in each condition. With these data we studied both the gene regulatory response to oxygen perturbation, as well as the interaction of the response with the underlying genotype of the assayed individuals.
Gene expression changes in response to hypoxia and re-oxygenation
We first sought to identify those genes important for regulating the response by analyzing the gene expression (RNA-seq) data. We processed samples in batches as described in Supplementary file 1 and mapped and filtered sequencing reads to prevent allelic mapping biases (Figure 2—figure supplement 1; Supplementary file 1; see Materials and methods; van de Geijn et al., 2015). We observed that one sample (18852A) was a clear outlier when comparing read counts for 18,226 autosomal genes across all samples, and thus excluded it from further analysis (Figure 2—figure supplement 2). We filtered out genes with low expression levels (see Materials and methods) to yield a final set with data from 12,347 expressed genes (see Materials and methods). We performed a number of correlation-based analyses using the data from the technical replicates (Figure 2—figure supplement 3), and confirmed that the quality of the data is high and that, in line with our flow cytometry data, our iPSC-CMs express a range of cardiomyocyte marker genes across conditions including MYH7 and TNNT2 (Figure 2—figure supplement 4).
We took advantage of the fact that we have replicate experiments from three individuals to correct the data for unwanted variation (see Materials and methods; Risso et al., 2014). Following this procedure, our samples clustered both by oxygen level and individual (Figure 2—figure supplement 5). To identify genes that respond to hypoxia and re-oxygenation, we first tested for differential expression between pairs of conditions using a linear model with a fixed effect for ‘condition’, a random effect for ‘individual’, and four unwanted factors of variation, learned from the data, as covariates. At an FDR of 10%, we identified thousands of genes that are differentially expressed between conditions (A vs. B = 4,983, B vs. C = 6,311; B vs. D = 6,792; A vs. D = 2,835; Figure 2 and Figure 2—figure supplement 6A). In order to identify a single set of genes which respond to hypoxia across conditions, we used Cormotif (Wei et al., 2015) to jointly model pairs of tests. This approach led to the classification of 2,113 genes (17% of all expressed genes) as responding to hypoxia, which we term ‘response genes’ (Figure 2, Figure 2—figure supplement 6B–C), and 9,949 genes that do not change their expression level throughout the experiment which we term ‘non-response genes’. Response genes are enriched for genes previously identified to respond to hypoxia in a Caucasian population of individuals (Chi-squared test; p<2.2×10−16; Ward and Gilad, 2019), and are highly enriched for gene ontologies in transcription-related processes (see Materials and methods, modified Fisher’s exact test; p=1×10−19).

Thousands of gene expression changes occur following hypoxia and re-oxygenation.
(A) Volcano plots representing genes that are differentially expressed (DE) between pairs of conditions (orange dots). (B) Bayesian information criterion (BIC) at increasing numbers of Cormotif correlation motifs following joint modeling of pairs of tests. Genes with a posterior probability of being differentially expressed p>0.5 across all tests are defined as ‘response genes’ and those with p<0.5 as ‘non-response genes’. Inset shows the proportion of all expressed genes that are classified as response genes (green), and non-response genes (magenta). (C) Expression levels of JUN, a response gene, during the course of the experiment. (D) Expression levels of ACTN1, a non-response gene.
Dynamic eQTLs are revealed following hypoxia
Having established that oxygen stress initiates a transcriptional response affecting thousands of genes, we sought to identify eQTLs, either before or after oxygen stress. Using the combined haplotype test (CHT), an approach that leverages allele-specific information in small sample sizes (see Materials and methods; van de Geijn et al., 2015), we identified 1,886 SNPs which associate with gene expression (eSNPs) in at least one condition resulting in 1,573 genes with eQTLs (eGenes) in at least one condition (q-value <0.1; A: 613; B: 564; C: 564 and D: 464; Figure 3A–B; Supplementary file 2). Given that cardiomyocytes were split from a single differentiation culture for each condition, we do not expect cell composition to bias the identification of eQTLs in each condition. Indeed, the distribution of eQTL effect sizes is similar across conditions (Figure 3—figure supplement 1A), and the eQTL effect size values are correlated across conditions (Figure 3—figure supplement 1B). In addition, we included the same number of individuals in each condition, RNA extraction batches included all conditions from a given individual, RIN scores are similar across conditions, and the number of sequencing reads are similar across conditions (Supplementary file 1). Together, these results suggest that our overall power to detect eQTLs is similar across conditions. We refer to the 613 eGenes identified in condition A as ‘baseline eGenes’.

Hundreds of dynamic eQTLs are revealed following hypoxia and re-oxygenation.
(A) QQ plot illustrating an enrichment of associations between genetic variants and gene expression levels in each condition. Numbers represent the number of eGenes in each condition. (B) Union of eQTLs identified in each condition. Each row represents a SNP that is an eQTL in at least one condition. Color represents the strength of the association p-value. (C) Heatmap illustrating the 367 SNPs that are classified as dynamic eQTLs including 330 induced eQTLs and 37 suppressed eQTLs. (D) An example of a shared eQTL, HEPB2. (E) Examples of each of the two dynamic eQTL categories. Top panel: genes that become an eQTL following hypoxia – induced eQTLs e.g. ZNF845. Bottom panel: genes that are an eQTL in normoxic condition A but not following hypoxia – suppressed eQTLs e.g. RFC2. (F) The proportion of all genes, baseline eGenes (all those identified in condition A), and induced eGenes that are also response genes, and transcription factor genes curated by Lambert et al. Asterisk denotes a statistically significant difference between gene categories (*p<0.05).
Our goal was to identify dynamic eQTLs, which are either revealed or suppressed as the cells transition between conditions. Due to the small sample size of our study, we have incomplete power to detect eQTLs in any condition; thus, a naive comparison of eQTLs classified as ‘significant’ across conditions will result in an over-estimation of the number of dynamic eQTLs. Indeed, the p-values of genes whose expression levels are not significantly associated with a SNP in any condition deviate from the expected uniform distribution (Kolmogorov–Smirnov test; p<2.2×10−16; Figure 3—figure supplement 2A). To address this challenge, we first considered eQTLs identified using a q-value <0.1 in at least one condition, and visualized the p-value distributions of the corresponding eQTL associations in all other conditions. These p-value distributions are expected to be uniform if we had complete power to detect eQTLs in any condition (because in that theoretical case, even a naive comparison of eQTLs classified as ‘significant’ across conditions will result in the identification of true condition-specific eQTLs). Due to incomplete power, this is clearly not the case (KS test; p<2.2×10−16; Figure 3—figure supplement 2B); however, this distribution allowed us to choose a lenient secondary p-value-based cutoff, where values deviate from the uniform distribution, to classify dynamic QTLs (p=0.15; Figure 3—figure supplement 2B).
We specifically focused on two dynamic scenarios. First, we defined suppressed eQTLs, as eQTLs that are identified in condition A at a q-value <0.1 but not in any of the other conditions, with a p-value greater than 0.15 (37 instances; Figure 3C). Second, we defined induced eQTLs as eQTLs identified in conditions B, C, or D at a q-value <0.1, but not in A, with a p-value greater than 0.15 (330 instances; Figure 3C). This set of 367 dynamic eQTLs corresponds to 328 unique dynamic eGenes (see Materials and methods) and includes ZNF845 and RFC2 (Figure 3E) as well as PPARGC1, C8orf82 and CELF1 (Figure 3—figure supplement 3). While our choice of the particular statistical cutoffs is somewhat arbitrary, we can evaluate the false discovery rate associated with our chosen cutoff. Based on the p-value distributions of the corresponding eQTL associations in all other conditions, we estimate that our approach to classify dynamic eQTLs is associated with a false discovery rate of 48%. The relatively high FDR associated with our choice of statistical cutoffs does not indicate that these loci are not eQTLs; rather it means that if we had a larger sample size, roughly half of our dynamic eQTLs should have been classified as eQTLs in more conditions, potentially in all of them. As expected, using a more stringent p-value cutoff of 0.9, we find fewer dynamic eQTLs (19) and a lower false discovery rate of 7%.
We next wanted to determine whether the dynamic eQTLs we identified in iPSC-CMs are also eQTLs in primary heart tissue. To do so, we compared our 367 dynamic eSNPs and eSNPs regulating the same gene in all four conditions (‘shared eQTLs’, n = 20) to eQTLs identified in left ventricle heart tissue and 49 tissues assayed from hundreds of individuals in the GTEx study (GTEx Consortium et al., 2017). Thirty-two of 326 dynamic eSNP-eGene pairs tested in GTEx are eQTLs in heart tissue (9.8%), and 140 are eQTLs in at least one other assayed tissue (42.9%) demonstrating that many of these genes have context-dependent inter-individual variation in expression. Six of 19 shared eSNP-eGene pairs tested in GTEx are eQTLs in heart tissue (31.6%), and six are an eQTL in at least one other tissue (31.6%). Shared eSNPs are therefore more likely to be eSNPs in heart tissue than dynamic eSNPs (Chi-squared test; p=0.01). To further investigate eQTL concordance, we compared the eQTL effect size of our dynamic and shared eQTLs to the eQTL effect size in heart tissue determined by the GTEx consortium. Shared eQTLs overlapping heart tissue eQTLs (n = 6) have a higher concordance of effect than dynamic eQTLs overlapping heart tissue eQTLs (n = 32; Spearman correlation = 0.73 vs. 0.12) suggesting that our perturbation study has revealed novel eQTL effects.
Dynamic eGenes are enriched for response genes and transcription factors
To determine whether induced eGenes coincide with expression changes of the same genes following hypoxia, we integrated the results of our eQTL and differential expression analyses. For this analysis, we compared eGenes identified in condition A, baseline eGenes, with the set of induced eGenes. In line with our previous findings, baseline eGenes, as well as left ventricle (LV) and atrial appendage (AA) eGenes found in primary heart tissue (GTEx), are depleted for response genes (Chi-squared test; p<0.02, Ward and Gilad, 2019). However, we found a significant enrichment in response genes amongst induced eGenes (57 of 277 genes; Chi-squared test; p=0.01, Figure 3F) when compared to baseline eGenes, suggesting that induced eQTLs often impact the regulation of genes that respond to hypoxic stress.
Given that thousands of genes are differentially expressed in response to hypoxia, and many of these genes correspond to transcription-related processes, we next investigated the role of 1,639 genes annotated as transcription factors in humans, which may drive the transcriptome changes we observe (Lambert et al., 2018). We found a significant enrichment of genes annotated as transcription factors amongst the genes responding to oxygen stress compared to non-response genes (327 of 1,639 annotated human TFs, chi-squared test; p<2×10−16; Figure 3F). Given that stress affects transcription factor gene expression, we asked whether induced eGenes are also enriched for transcription factor genes. Indeed, genes annotated as transcription factors are enriched in induced eGenes compared to baseline eGenes (32 TFs; p=0.004), including MITF and PPARA, both of which are TFs that have been previously implicated in hypoxic response (Feige et al., 2011; Narravula and Colgan, 2001). TFs amongst induced eGenes are more likely to be response genes than non-response genes (p=0.02); however, there is no difference in the proportion of response genes amongst baseline eGenes annotated as TFs and induced eGenes annotated as TFs (38% vs. 39%). The enrichment we observe in TFs amongst our induced eGenes suggests that latent genetic variation can have multiple downstream effects on gene expression including gene targets of TF genes. This could provide a mechanism for the appearance of induced eGenes that are neither response genes nor TF genes.
Chromatin accessibility changes following hypoxia and re-oxygenation
We next asked whether the hundreds of transcription factor expression changes following hypoxic stress are accompanied by global chromatin accessibility changes. To examine this, we performed ATAC-seq experiments in all four conditions to identify regions of open chromatin (we were only able to collect these data from 14 of the 15 individuals; Figure 1—figure supplement 1, see Materials and methods). We filtered the ATAC-seq reads to include only those reads that map unambiguously to the nuclear genome (see Materials and methods, Figure 4—figure supplement 1). We identified a set of open chromatin regions in each sample, and merged samples across individuals within each condition. Genomic regions identified as accessible in each condition were then merged to yield a set of 128,672 open chromatin regions across conditions (with a median length of 312 bp). Regions with low read counts were filtered out, resulting in a final set of 110,128 regions. Analysis of various metrics revealed the data to be of good quality (Figure 4—figure supplements 2–3).
We sought to identify chromatin regions that are differentially accessible across pairs of conditions. Using a sensitive adaptive shrinkage-based approach with a False Sign Rate of 10% (Stephens, 2017), we could not detect changes in accessibility between baseline and hypoxia across 110,128 regions; however, we identified 831 differentially accessible regions (DARs) between hypoxia and short-term re-oxygenation (BC-DARs; 429 regions with increased accessibility and 402 with decreased accessibility), and 71 DARs between hypoxia and long-term re-oxygenation (BD-DARs; Figure 4A). Despite the fact that we do not identify any DARs between normoxia and hypoxia, there is a strong anti-correlation in the effect size between the normoxia (A) and hypoxia (B) conditions, and the hypoxia (B) and re-oxygenation (C) conditions across regions (Spearman correlation = −0.62; sign test p=4.6×10−14; Figure 4B) suggesting minor changes in accessibility in response to hypoxia that do not meet our significance threshold. Conversely, there is a strong correlation in effect sizes between hypoxia and short-term re-oxygenation (BC-DARs), and hypoxia and long-term re-oxygenation (BD-DARs; Spearman correlation = 0.74; Figure 4C), and 59 of the 71 BD-DARs are amongst the 831 BC-DARs (83%), suggesting that most regions have returned to baseline levels of accessibility by the first re-oxygenation condition. This includes a region within the intron of the FOXO1 gene, a master regulator of the oxidative stress response (Figure 4D). We therefore considered the 831 BC-DARs, henceforth DARs, in further analysis. These DARs represent 0.8% of the total number of accessible regions.

Hundreds of chromatin accessibility changes occur following hypoxia and re-oxygenation.
(A) Estimated effect size (Beta) following adaptive shrinkage (ash) between each pair of conditions, and the numbers of chromatin regions that are differentially accessible (DARs) between pairs of conditions. (B) Estimated Beta for each region when comparing A vs. B and B vs. C. (C) Estimated Beta for each region when comparing B vs. C, and B vs. D. (D) Chromatin accessibility levels at a chromatin region within a FOXO1 intron. (E) Expression levels of the hypoxia-responsive gene ADM following hypoxia, and chromatin accessibility levels at a DAR, overlapping an induced HIF1α-bound region, within 500 bp of the ADM gene.
Linking chromatin accessibility changes with gene expression changes
We next sought to integrate our gene expression data with our chromatin accessibility data. We found that when considering a 50 kb window around the TSS of expressed genes, DARs are enriched near response genes compared to non-response genes (Chi-squared test; p=0.03). Of 2,113 response genes, 113 have a DAR within 50 kb of the TSS. This set includes an accessible region, overlapping a HIF1α site, within 500 bp of the 3’ end of the classic hypoxia response gene, ADM (Figure 4E).
We asked whether the changes in chromatin accessibility coincide with the appearance of dynamic eQTLs. We found that DARs are no more likely to be near dynamic eGenes than shared or baseline eGenes. In line with previous estimates of the proportion of eQTLs in open chromatin regions, 24 baseline eQTL SNPs (613 total SNPs) and 19 dynamic eQTL SNPs (367 total SNPs) overlap with accessible chromatin regions (GTEx Consortium et al., 2017). One dynamic eQTL SNP overlaps a DAR, near the actin filament binding protein gene, FGD4. This gene was also shown to be differentially expressed between children with congenital heart defects where the defect leads to a chronic hypoxic state (cyanotic disease), and children with a similar defect but where oxygen levels are not affected (acyanotic disease; Ghorbel et al., 2010).
Genetic variants within transcription factor binding sites can influence chromatin accessibility at these regions, leading to effects on gene expression. To directly test whether there are genetic effects on chromatin accessibility, independent of gene expression, we sought to identify chromatin accessibility QTLs (caQTLs) that is genetic variants located within the 128,672 accessible regions, which coincide with different levels of accessibility based on genotype. We identified few caQTLs per condition (q-value <0.1; A: 10, B: 1, C: 7, D: 6; Figure 4—figure supplement 4A). Six of these caQTLs are classified as dynamic caQTLs that is induced or suppressed in response to hypoxia using the same definitions as used for the dynamic eQTLs, and include regions at the TSS of the mRNA decapping enzyme gene DCPS, and a region within 100 kb of the C1orf99 gene (Figure 4—figure supplement 4B–C).
Genomic features associated with differentially accessible regions (DARs)
We next wanted to determine what distinguishes DARs from constitutively accessible regions (CARs). To do so, we investigated three classes of genomic features: (1) promoter- and enhancer-associated marks, (2) transcription factor binding locations, and (3) underlying DNA sequence features. We found that DARs are more likely to overlap TSS than constitutively accessible regions (43% overlap vs. 11% overlap; chi-squared test, p<2×10−16; Figure 5A) suggesting that DARs may be involved in the gene regulatory response. Indeed, DARs are more likely to coincide with active histone marks in left ventricle heart tissue than constitutively accessible regions (H3K4me3: 57% overlap DARs vs. 24% overlap constitutively accessible regions; chi-squared test; p<2.2×10−16; H3K4me1: 86% overlap DARs vs. 52% overlap constitutively accessible regions; p<2.2×10−16; Figure 5B).

Differentially accessible regions are enriched for active chromatin features.
(A) The proportion of differentially accessible regions (DARs) and constitutively accessible regions (CARs) that overlap with annotated TSS. (B) The proportion of DARs and CARs that overlap with the locations of histone marks determined by ChIP-seq in human heart tissue (ENCODE Project Consortium, 2012). (C) The proportion of DARs and CARs that overlap with HIF1α and HIF2α binding locations determined by ChIP-seq in a breast cancer cell line (Schödel et al., 2011). (D) The most significant motif identified to be differentially enriched in DARs compared to all ARs that is putatively recognized by ZNF770, E2F3 and E2F4. We classify E2F4 as a response gene and therefore determined the proportion of DARs and CARs that overlap with E2F4 ChIP-seq binding locations identified in a human LCL line (Lee et al., 2011). (E) The proportion of DARs and CARs that overlap four major transposable element (TE) classes – LTR, LINE, SINE, DNA. The proportion of DARs and CARs that overlap with CpG islands (CGIs) that is proximal to the TSS (+/- 2 kb from the TSS), and distal to the TSS. Asterisk denotes a statistically significant difference between DARs and CARs (*p<0.05, **p<0.005, ***p<0.0005).
To determine whether sequence-specific hypoxia-responsive transcription factors associate with differentially accessible chromatin, we integrated DARs with published chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) data for the well-studied hypoxia-inducible factors HIF1α and HIF2α (Schödel et al., 2011). A total of 234 of the 356 HIF1α ChIP-seq binding sites (66%), and 150 of the 301 HIF2α binding sites (50%) overlap with our set of 110,128 accessible chromatin regions. We found that these HIF1α and HIF2α binding sites are more likely to overlap the 831 differentially accessible regions than the 109,275 constitutively accessible regions (Fisher test; p=0.03; Figure 5C).
We next took an unbiased approach to identify transcription factor binding motifs that are enriched in DARs compared to all accessible regions using MEME-ChIP software (see Materials and methods). We found two motifs to be enriched in DARs compared to all regions (Figure 5D). Motif 1 (p=2×10−2) is recognized by HTF4 and TFE2, both of which are non-response genes in our system. Motif 2 (p=6.2×10−42) is posited to be recognized by ZN770, E2F3, and E2F4. Both ZN770 and E2F4 are response genes in our system. DARs arise between the hypoxia and re-oxygenation conditions, and E2F4 expression increases following re-oxygenation, suggesting that it may be involved in the response. To test this hypothesis, we obtained a published ChIP-seq data set for E2F4 (Lee et al., 2011), and overlapped the 16,245 E2F4-bound regions with DARs and constitutively accessible regions. E2F4-bound regions are significantly enriched in DARs compared to constitutively accessible regions (Chi-squared test; p<2.2×10−16; Figure 5D). E2F4 is important for survival following ischemia in neurons, and has been suggested to be an anti-apoptotic factor in cardiomyocytes (Dingar et al., 2012; Iyirhiaro et al., 2014).
To identify additional sequence features that associate with DARs, we asked whether transposable elements (TE), a potential source of regulatory sequence subjected to chromatin-level regulation, are enriched in these sites (Du et al., 2016; He et al., 2019). We found that while three of the main TE classes, LINEs, LTRs, and DNA elements are similarly enriched in DARs compared to constitutively accessible regions; SINEs are specifically enriched in DARs (p=6.5×10−6; Figure 5E). There is an enrichment of both Alu and MIR SINE family members in DARs (p=3×10−5 and p=0.006). AluS elements, and the AluSq and AluSp sub-families, are particularly enriched within the Alu family (p=0.007 and p=0.02 respectively). A different cellular stress, heat shock, has previously been shown to remodel chromatin accessibility at Alu elements in cervical cancer cells (Kim et al., 2001).
As Alu and MIR TE sequences are notably CpG dense (Medstrand et al., 2002), we next asked about the enrichment of CpG-dense CpG islands (CGIs) in our differentially accessible regions. We found that CpG islands are enriched in DARs compared to constitutively accessible regions, whether these regions fall within 1 kb of TSS, which are typically enriched for CGIs, or not (p<2.2×10−16; Figure 5F).
DNA methylation state at stress-responsive genes and chromatin regions
Genes with CGI promoters are thought to allow flexibility in TSS choice compared to genes without CGI promoters (Carninci et al., 2006), and to allow for the rapid induction of gene expression in response to stimuli (Ramirez-Carrozzi et al., 2009). We therefore asked whether this promoter feature is enriched in the stress response genes. Indeed, we find that response genes are more likely to have CGI promoters than non-response genes (Chi-squared test; p=0.002).
Given the enrichment of CpG islands in gene promoters and chromatin regions that are responsive to stress, we asked whether this feature corresponded to differences in CpG DNA methylation levels in these same regions. We measured global DNA methylation levels at 766,658 CpG sites in all conditions from 13 of our individuals (no data was collected from two of the 15 individuals), together with 23 replicate samples from three individuals (see Materials and methods; Supplementary file 1). We found the expected bimodal distribution of DNA methylation Beta-values across CpGs (Beta-values represent the ratio of intensities between the methylated and unmethylated alleles; Figure 6—figure supplement 1A). Additional analyses indicated the data to be of good quality (Figure 6—figure supplements 1–2).
To determine whether steady-state DNA methylation levels mark genes or regions that will change their expression level in response to stress, we investigated baseline DNA methylation levels in the promoters of genes classified as response genes and non-response genes, as well as accessible regions and DARs. To do so, we assessed the DNA methylation level at CpGs within 200 bp upstream of the TSS in the baseline condition (condition A). The majority of the assayed CpGs were hypomethylated with a median Beta-value of less than 0.2 across genes and regions (Figure 6A). While there is no difference in median DNA methylation levels between response and non-response genes, we found that the median DNA methylation level is lower in DARs compared to all accessible regions (p<2.2×10−16). These data suggest that responsive chromatin regions may have specific epigenetic profiles which poise them for rapid response to stress.

Minimal DNA methylation changes following hypoxia.
(A) Mean DNA methylation levels (Beta-values) in the baseline condition (A) at CpGs within 200 bp upstream of the TSS of non-response genes (NR-G) and response genes (RG), and within all accessible regions (ARs) and differentially accessible regions (DARs; *p<0.05, ***p<0.0005). (B) Volcano plots representing the differential DNA methylation analysis across 766,658 CpGs. There are no differentially methylated (DM) CpGs across any pair of conditions. (C) Numbers of differentially methylated CpGs, when restricting the test set to include only CpGs within CpG islands, across pairs of conditions. (D) Gene expression and DNA methylation levels at a differentially methylated CpG within an intron of the EGR2 response gene.
DNA methylation levels are largely stable following hypoxia and re-oxygenation
Given that DNA methylation levels can associate with gene expression levels, we asked whether any CpGs are differentially methylated during the course of the oxygen perturbation experiment, which induces thousands of gene expression changes. When considering all 766,658 CpGs we did not find any differentially methylated CpGs across any pair of conditions (10% FDR; Figure 6B), and all p-values are estimated to be true null p-values (pi0 = 1 when estimated by q-value across all pairs of conditions; Figure 6—figure supplement 3). We were also unable to detect any differentially methylated CpGs when considering two estimates of DNA methylation levels: Beta-values or M-values (log2 ratio of intensities of methylated versus unmethylated alleles).
To increase the likelihood of identifying differentially methylated CpGs, we reduced the number of statistical tests for identifying differentially methylated CpGs by restricting our test set of CpGs to those within annotated regions of interest. Because of the CpG island enrichment in our response gene promoters, we first selected CpGs present within CpG islands (143,587 of the 766,658 measured CpGs). Again, we found no differentially methylated CpGs between baseline (A) and hypoxia (B), and hypoxia (B) and the short-term re-oxygenation condition (C). However, we identified four differentially methylated CpGs (DMCpGs) between the baseline (A) and long-term re-oxygenation condition (D; Figure 6C). This set includes a CpG in the intron of the EGR2 response gene, which shows increased DNA methylation levels over time (Figure 6D). Methylation at CpG islands within the intron of EGR2 has been shown to confer enhancer activity in cancer cells (Unoki and Nakamura, 2003). If we only select CpGs located within the promoters of the 2,113 response genes, we find one DMCpG within the promoter of the FTSJ2 gene, a rRNA methyltransferase, that is differentially methylated between the hypoxia (B) and long-term re-oxygenation (D) conditions. Selecting CpGs located only within the 831 DARs reveals two DMCpGs between baseline (A) and hypoxia (B), and one DMCpG between baseline (A) and long-term re-oxygenation (D). We therefore were only able to identify a handful of differentially methylated CpGs during our experimental timecourse which elicited thousands of gene expression changes.
To further investigate the apparent differences in response to hypoxia across our three molecular phenotypes, we calculated the proportion of variance explained by ‘individual’, ‘condition’ and ‘replicate’. We observed a lower proportion of variance explained by ‘condition’ in the DNA methylation data (0.8%) and chromatin accessibility data (3%) compared to the gene expression data (6%, t-test, p<2.2×10−16), and a corresponding increase in the residual variance (Figure 6—figure supplement 4). For the DNA methylation data in particular, there is a similar proportion of variance explained by individual (44% for gene expression, 40% for DNA methylation, and 28% for chromatin accessibility) and approximately an order of magnitude less variance explained by condition. These results suggest that noise alone cannot explain the relatively small number of chromatin accessibility and DNA methylation changes.
Dynamic eQTLs associate with traits and disease
Finally, we wanted to determine whether any of the dynamic eQTL SNPs or genes that we identified might also be associated with complex traits or disease. To maximize the number of potentially phenotypically-relevant genomic loci, we performed three independent analyses. We first took an unbiased, SNP-based approach by searching within a catalog of genetic variants associated with a variety of traits assayed in thousands of GWAS for overlap with our dynamic eSNPs (Buniello et al., 2019). By intersecting these two data sets, we found one induced dynamic eSNP (rs8053350) that is also associated with a measured phenotype in GWAS – varicose veins (Supplementary file 1; Figure 7—figure supplement 1A-B).
Given that our eQTL data are from the Yoruba population from West Africa and that most GWAS studies are performed in European populations, which have an inherently different LD structure, we may not expect to see strong overlap at the level of individual variants. We therefore next took an orthogonal gene-based approach, using the same GWAS catalog, to specifically investigate the genes implicated in three phenotypes associated with cardiovascular function or response to oxygen deprivation: MI, heart failure, and stroke (see Materials and methods). If the expression of these CVD genes is found to vary across individuals following hypoxia, it suggests that they may be important for mediating disease risk. Indeed, we found that six of our dynamic eGenes are implicated in these three disease states by GWAS gene mapping approaches (Supplementary file 1). This list includes the DNA damage and apoptosis factor ZC3HC1, which is implicated in MI and stroke (Figure 7—figure supplement 1C–D). Importantly, ZC3HC1 is not an eGene in LV or AA, but the SNP-gene pair is an eQTL in other tissues.
Lastly, we performed an in-depth SNP-based analysis using full summary statistics for coronary artery disease (CAD) and MI from the CARDIoGRAMplusC4D Consortium (Stitziel et al., 2016; Nikpay et al., 2015; Webb et al., 2017). We tested whether the lead eSNP from our set of dynamic eQTLs is also associated with CAD or MI and identified two loci (Supplementary file 1). This set includes the eSNP (rs12588981) for the dynamic eQTL EIF2B2 which is nominally associated with CAD (GWAS association p=4.7×10−5) and is in high and statistically significant LD with the lead GWAS variant (GWAS association p=9.9×10−8) at this locus (rs3832966; R2 = 0.96, p<0.001; Figure 7; Figure 7—figure supplement 2). Given that we have incomplete power to detect dynamic eQTLs, we also used the same GWAS dataset to investigate whether any of the significant eQTL SNPs in any condition are associated with CAD or MI. We determined that eSNP (rs8105092), which regulates CARM1 in the hypoxic condition (condition B), is nominally associated with MI (GWAS association p=1.79×10−5) and in moderate, yet significant LD, with the lead GWAS variant (GWAS association p=2.8×10−9) at this locus (rs4804142; R2 = 0.32, p<0.001; Figure 7—figure supplement 3), and is not an eGene in LV or AA. A handful of the eQTLs that we highlight here were tested for association with gene expression by the GTEx consortium, yet are not eQTLs in heart tissue. These results suggest that perturbation studies in relevant cell types can give insight into the molecular basis for the genetic association with complex traits and disease, which might not be gleaned from the study of post-mortem tissues.

Dynamic eQTL for EIF2B2 overlaps a coronary artery disease GWAS locus.
(A) EIF2B2 is a dynamic eGene. (B) Expression levels of EIF2B2 during the course of the experiment following aggregation of all individuals. (C) Locus zoom plots in each condition illustrating the tested SNPs in the CARDIoGRAMplusC4D GWAS study (light blue), and the SNPs tested in our eQTL analyses in each condition (A: brown, B: blue, C: coral, D: red). All tested SNPs within a 1 Mb region around the dynamic eQTL SNP on chromosome 14 are shown.
Discussion
Studying gene expression across individuals in response to stress can reveal latent effects of genetic variation, which may contribute to higher-order phenotypes and disease. In order to understand the effects of genetic variation in a disease-relevant cell type and a disease-relevant process, we differentiated cardiomyocytes from a panel of genotyped individuals, and subjected them to hypoxia and re-oxygenation. We found hundreds of eQTLs that are revealed or suppressed following hypoxic stress (dynamic eQTLs), several of which have been associated with phenotypes measured in GWAS.
Steady-state and dynamic eQTLs may help understand CVD
Attempts have been made to identify genetic variants that associate with gene expression levels and CVD phenotypes in easily accessible biological samples such as blood. However, less than half of CVD/MI GWAS loci are associated with an eQTL in whole blood when thousands of individuals are tested (Joehanes et al., 2017). To determine the effects of genetic variation on gene expression specifically in the heart, more targeted studies have taken advantage of left ventricle tissue (GTEx Consortium et al., 2017; Koopmann et al., 2014), left atrium tissue (Lin et al., 2014; Sigurdsson et al., 2017), and right atrial appendage tissue (GTEx Consortium et al., 2017) obtained during cardiac surgeries or post-mortem. Using fewer than a hundred individuals, a handful of identified eQTL SNPs correspond to SNPs associated with cardiac traits, thus linking specific genes to organismal-level phenotypes. A compelling example is the association between MYOZ1 expression and atrial fibrillation (Lin et al., 2014; Sigurdsson et al., 2017). Across tissues, the GTEx consortium reported that ~ 50% of eQTLs are also associated with variation in other measured complex traits (GTEx Consortium et al., 2017), and Heinig et al. have shown that 20% of left ventricle eQTLs relate to heart-associated loci (Heinig et al., 2017). However, these variants, identified in healthy individuals, are unlikely to represent all genetic variants that have consequences on disease. Indeed, Heinig et al. identified 100 dilated cardiomyopathy–specific eQTLs (not seen in healthy individuals) in a case-control study of 97 individuals with dilated cardiomyopathy and 108 healthy donors (Heinig et al., 2017). Similarly, by collecting samples pre- and post-surgically-induced ischemia, Stone et al. identified genetic associations that are only detected under stress (Stone et al., 2019). Although these studies provide a set of gene targets for further investigation, there are many loci that remain unexplained.
The heart is a complex tissue consisting of multiple cell types. The effects of some genetic variants in specific cell types might well be masked when considering heterogeneous tissue samples. As we are now able to direct iPSCs toward a cardiac fate, we can test for genetic effects on specific cell types such as cardiomyocytes (Panopoulos et al., 2017). As one would expect, iPSC-CMs are better suited to study cardiovascular traits than the immortalized B-cells or iPSCs from which they are derived (Banovich et al., 2018). However, given the high degree of eQTL sharing across diverse tissues (GTEx), identifying eQTLs in the disease-relevant terminal cell type at steady state may not give substantial insight into disease biology. A significant advantage of using iPSC-CMs is that these cells provide a system to interrogate gene expression dynamics.
Cellular stressors that perturb gene expression levels and the cell state can unmask additional layers of regulatory variation (Alasoo et al., 2018; Barreiro et al., 2012; Çalışkan et al., 2015; Kim-Hellmuth et al., 2017; Knowles et al., 2018; Manry et al., 2017; Nédélec et al., 2016). Intermediate developmental cell states can similarly provide insight into GWAS loci where eQTL analysis in terminal cell types cannot (Strober et al., 2019). Further evidence for the notion that steady state eQTLs may have limited applicability to disease states comes from our previous work, where we used a comparative evolutionary approach to investigate the response to stress. We showed that genes that respond to oxygen stress in iPSC-CMs from both humans and chimpanzees, and are therefore likely relevant for disease, are depleted for eQTLs identified in heart tissue compared to genes that do not respond in either species (Ward and Gilad, 2019).
In the current study, by subjecting iPSC-CMs from a panel of individuals to perturbation (oxygen deprivation), we were able to identify a handful of potentially interesting trait-relevant loci. Using a broad GWAS catalog, we found one of our dynamic eQTL SNPs (rs8053350) to be associated with varicose veins, and the level of RNF166 expression (Fukaya et al., 2018). This SNP falls within an intron of the PIEZO1 gene. Varicose veins are associated with a risk for developing deep vein thrombosis and other vascular diseases (Chang et al., 2018). When we performed an analogous analysis focused on genes previously associated with three relevant traits – MI, heart failure and stroke, we identified a novel heart eGene, ZC3HC1, encoding the NIPA protein, which is implicated in MI, coronary artery disease, and ischemic stroke (IBC 50K CAD Consortium, 2011; Nikpay et al., 2015; Schunkert et al., 2011). This dynamic eQTL SNP is also associated with bronchodilator responsiveness in chronic obstructive pulmonary disease (Hardin et al., 2016). Using full summary statistics from a CAD and MI GWAS (Stitziel et al., 2016; Nikpay et al., 2015), we found minimal overlap between significant GWAS loci and significant or dynamic eSNPs. This may be due in part to differences in genetic ancestry and LD structure between the predominant GWAS population (of European descent) and our eQTL study population (Yoruba). Nevertheless, we highlight two interesting loci EIF2B2 and CARM1. While the EIF2B2 locus does not reach genome-wide significance in the GWAS analysis, integration with our dynamic eQTL data highlights this region as being potentially relevant to the disease. Indeed, EIF2B2, a GTP exchange protein, has been predicted to be a gene target in CAD using summary data-based Mendelian randomization (Pavlides et al., 2016). CARM1, also known as PRMT4, has previously been implicated in myocardial infarction and apoptosis in mice (Wang et al., 2019).
Mechanisms behind response genes and dynamic eQTLs
Changes in gene expression can associate with other molecular-level phenotypes. The response to hypoxia is mediated by the HIF1α transcription factor (Samanta and Semenza, 2017), but given that there are hundreds of HIF1α binding locations and thousands of differentially expressed genes, regulation by this factor alone cannot directly explain all the transcriptional changes. We explored two additional molecular phenotypes in the context of oxygen deprivation – the locations and level of accessibility of open chromatin regions, and DNA methylation levels. We did not find either to contribute substantially to the gene expression response we observed. There are minimal changes in accessibility following hypoxia, which is in contrast to observations of studies that considered stimulation of immune cell types (Alasoo et al., 2018; Calderon et al., 2019; Pacis et al., 2015). This could reflect cell type specificity in response to stress, or the specificity of the cellular response to different stressors. Indeed, despite large gene expression changes in response to various stimulants in endothelial cells, there are a relatively small number of differentially accessible regions (Findley et al., 2019). We speculate that the transcriptional response to oxygen stress could result in the induction of transcription factors, which bind already accessible regions of open chromatin, and that cells are primed for a quick response to this universal cellular stress. Indeed, it has been shown that chromatin contacts exist between HIF1α binding sites and hypoxia-inducible genes in the normoxic state (Platt et al., 2016). Conversely, it has been suggested that hypoxia results in the induction of HIF1α, and significant changes in histone methylation (Batie et al., 2019). As we did not measure histone marks in our system, these changes may occur in the absence of chromatin accessibility changes, but we also cannot rule out the possibility that the choice of a single timepoint following 6 hr of hypoxia, or insufficient statistical power in our sample size, contributed to the minimal differences in accessibility that we observed.
Using an approach designed to measure small effect sizes between conditions, we did identify a set of 831 DARs between hypoxia and short-term re-oxygenation that are enriched for marks of active chromatin, CpG islands, and TEs. These regions do not appear to explain many of the gene expression differences we observed. Hypoxia and oxidative damage are likely to also affect the genome in ways that do not directly impact gene expression. Indeed, the distribution of oxidative DNA damage sites varies across the genome following stress such that TEs and active chromatin regions are enriched for DNA damage, while promoters are depleted (Poetsch et al., 2018). We found enrichment for TEs, specifically Alu SINE elements, in DARs. Interestingly, TEs, and DNA transposons in particular, are also enriched in regions that become accessible in macrophages in response to bacterial infection; suggesting sequence-specific effects of TEs in response to different cellular stressors (Bogdan et al., 2020). Alu elements have previously been found to associate with the response to stress in other contexts. Serum starvation induces binding of TFIIIC, which recruits RNA polymerase III, to Alu elements (Ferrari et al., 2020), and heat shock increases chromatin accessibility around Alu elements (Kim et al., 2001).
There are several studies, which suggest that DNA methylation levels are dynamic and change in response to stressors such as hypoxia. We did not find any notable differences in DNA methylation levels pre- and post-hypoxia and re-oxygenation, which suggests that like chromatin accessibility, DNA methylation levels do not make large contributions to changes in gene expression levels or the appearance of dynamic eQTLs in our system. Many of the DNA methylation changes that have been described in response to hypoxia occur in chronic and intermittent hypoxia, and not acute hypoxia as investigated in our study (Hartley et al., 2013; Robinson et al., 2012; Watson et al., 2014). DNA methylation levels are also altered in response to other stressors such as bacterial infection (Pacis et al., 2015); however, the importance of timing is highlighted by the fact that, in this system, gene expression responses precede DNA methylation changes (Pacis et al., 2019). It is also important to note that our study considers baseline oxygen levels to be 10% oxygen, which is closer to physiological oxygen levels (5–13%) than atmospheric oxygen levels (21%; Brahimi-Horn and Pouysségur, 2007; Carreau et al., 2011; Jagannathan et al., 2016). Most studies define normoxia as 21% oxygen saturation, and while this likely leads to larger effect size differences in known hypoxia response genes following hypoxia, these comparisons may not give meaningful insight into the in vivo state.
One can speculate about different mechanisms that might lead to the appearance or disappearance of dynamic eQTLs. In the context of the immune response, it has been shown that the same response variants affect both gene expression and chromatin accessibility (Alasoo et al., 2018). This is in line with the general notion that changing cellular environments results in differences in chromatin accessibility at transcription factor binding sites, which leads to gene expression changes. We found that this does not appear to be a major mechanism in our system as there are minimal changes in accessibility following hypoxia. We observed that there is an enrichment of response genes amongst dynamic eQTLs suggesting that the change in environment results in a change in expression levels that is dependent on the associated genotype. We also find enrichment for TFs amongst response genes and dynamic eQTLs, suggesting that dynamic eQTLs can appear through secondary trans effects.
Potential limitations of our model
To understand the effects of genetic variation on human heart tissue, and how this variation might contribute to the MI and I/R injury etiologies of CVD, we carefully perturbed oxygen levels that cardiomyocytes in culture are exposed to. This in vitro approach is by design a model system, and therefore will likely not fully recapitulate the in vivo state. However, we previously found that out of 2,549 genes that respond to hypoxia in iPSC-CMs from humans and chimpanzees, only 16% are differentially expressed between iPSC-CMs and heart tissue (Pavlovic et al., 2018; Ward and Gilad, 2019). This suggests that our in vitro system is applicable to heart tissue. There is still a possibility that the dynamic eQTLs that we identify in our in vitro system are not physiologically relevant.
Our study comprised a small number of individuals (15), far fewer than what is typical for identifying eQTLs. Our work is therefore a first step toward understanding the effects of genetic variation on gene expression in response to stress. Nevertheless, with a small number of individuals we were able to identify a couple of hundred dynamic eQTLs that are revealed or suppressed under stress, suggesting that this paradigm is worth exploring further in larger cohorts. Under the simplifying assumption of a single causal variant, we determined that we have ~6% power to detect an effect which explains ~38% of the heritability, and an equal false positive rate to call it a dynamic eQTL (see Materials and methods; Figure 7—figure supplement 4). The median eQTL heritability over all genes is 0.16, as previously reported by Gusev et al., 2016; Wheeler et al., 2016, which illustrates the relative power of this study compared to the GTEx and Depression Genes and Networks studies (Battle et al., 2014) for eQTL mapping. This suggests that the impact of stress on genotype-dependent effects on gene expression will be even greater in studies which have higher power to detect smaller effects of genotype. For perspective, early eQTL studies were similarly powered to our study, using 70 individuals; yet these studies still led to important insights opening an avenue of research focused on assaying the consequences of genetic variation by RNA-seq (Pickrell et al., 2010).
In summary, there have been few studies assessing the effects of genetic variation in response to CVD-relevant perturbations in cardiomyocytes. Here, we profiled the response to oxygen deprivation in cardiomyocytes from a panel of genotyped individuals. We find that eQTLs can appear and disappear in response to oxygen deprivation, and that some of these eQTLs have effects on relevant complex traits and disease.
Materials and methods
Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
---|---|---|---|---|
Cell line (H. sapiens, Female) | 18499 iPSC | 10.1101/gr.224436.117 | Derived from HapMap Yoruba LCL | |
Cell line (H. sapiens, Female) | 18505 iPSC | 10.1101/gr.224436.117 | Derived from HapMap Yoruba LCL | |
Cell line (H. sapiens, Female) | 18511 iPSC | 10.1101/gr.224436.117 | Derived from HapMap Yoruba LCL | |
Cell line (H. sapiens, Female) | 18520 iPSC | 10.1101/gr.224436.117 | Derived from HapMap Yoruba LCL | |
Cell line (H. sapiens, Female) | 18852 iPSC | 10.1101/gr.224436.117 | Derived from HapMap Yoruba LCL | |
Cell line (H. sapiens, Female) | 18855 iPSC | 10.1101/gr.224436.117 | Derived from HapMap Yoruba LCL | |
Cell line (H. sapiens, Female) | 18858 iPSC | 10.1101/gr.224436.117 | Derived from HapMap Yoruba LCL | |
Cell line (H. sapiens, Female) | 18870 iPSC | 10.1101/gr.224436.117 | Derived from HapMap Yoruba LCL | |
Cell line (H. sapiens, Female) | 18912 iPSC | 10.1101/gr.224436.117 | Derived from HapMap Yoruba LCL | |
Cell line (H. sapiens, Male) | 19098 iPSC | 10.1101/gr.224436.117 | Derived from HapMap Yoruba LCL | |
Cell line (H. sapiens, Male) | 19101 iPSC | 10.1101/gr.224436.117 | Derived from HapMap Yoruba LCL | |
Cell line (H. sapiens, Female) | 19108 iPSC | 10.1101/gr.224436.117 | Derived from HapMap Yoruba LCL | |
Cell line (H. sapiens, Female) | 19116 iPSC | 10.1101/gr.224436.117 | Derived from HapMap Yoruba LCL | |
Cell line (H. sapiens, Female) | 19128 iPSC | 10.1101/gr.224436.117 | Derived from HapMap Yoruba LCL | |
Cell line (H. sapiens, Female) | 19160 iPSC | 10.1101/gr.224436.117 | Derived from HapMap Yoruba LCL |
Samples
We randomly selected individuals from the Yoruba YRI HapMap population. iPSCs were reprogrammed from lymphoblastoid cell lines from these individuals (Banovich et al., 2018). We used fifteen biological replicates (individuals), and three technical replicates (independent cardiomyocyte differentiation and oxygen stress experiments) from three individuals. This number of individuals has been shown to be sufficient to be able to identify eQTLs in small sample sizes (van de Geijn et al., 2015). Experiments were designed and performed such that technical variables (such as sample processing batch) were not confounded with the variable of interest (condition). All cell lines tested negative for mycoplasma contamination.
Cardiomyocyte differentiation from iPSCs
Request a detailed protocoliPSCs were maintained in a feeder-independent state in Essential 8 Medium (A1517001, ThermoFisher Scientific, Waltham, MA, USA) with Penicillin/Streptomycin (30002, Corning, NY, USA) on Matrigel hESC-qualified Matrix (354277, Corning, Bedford, MA, USA) at a 1:100 dilution. Cells were passaged at ~70% confluence every 3–4 days with dissociation reagent (0.5 mM EDTA, 300 mm NaCl in PBS), and seeded with ROCK inhibitor Y-27632 (ab12019, Abcam, Cambridge, MA, USA).
Cardiomyocyte differentiations were performed largely as previously described (Ward and Gilad, 2019), except the duration and concentration of the Wnt agonist and antagonist differed for this panel of individuals, which included only human samples. Briefly, on Day 0, iPSC lines at 70–100% confluence in 100 mm plates were treated with 12 μM GSK3 inhibitor CHIR99021 trihydrochloride (4953, Tocris Bioscience, Bristol, UK) in 12 ml Cardiomyocyte Differentiation Media [500 mL RPMI1640 (15–040 CM ThermoFisher Scientific), 10 mL B-27 Minus Insulin (A1895601, ThermoFisher Scientific), 5 mL Glutamax (35050–061, ThermoFisher Scientific), and (5 mL Penicillin/Streptomycin)], and a 1:100 dilution of Matrigel. 24 hr later, on Day 1, the media was replaced with Cardiomyocyte Differentiation Media. 48 hr later, on Day 3, 2 μM of the Wnt inhibitor Wnt-C59 (5148, Tocris Bioscience), diluted in Cardiomyocyte Differentiation Media, was added to the cultures. Cardiomyocyte Differentiation Media was replaced on Days 5, 7, 10, and 12. Cardiomyocytes were purified by metabolic purification by the addition of glucose-free, lactate-containing media (Purification Media) [500 mL RPMI without glucose (11879, ThermoFisher Scientific), 106.5 mg L-Ascorbic acid 2-phosphate sesquimagenesium salt (sc228390, Santa Cruz Biotechnology, Santa Cruz, CA, USA), 3.33 ml 75 mg/ml Human Recombinant Albumin (A0237, Sigma-Aldrich, St Louis, MO, USA), 2.5 mL 1 M lactate in 1 M HEPES (L(+)Lactic acid sodium (L7022, Sigma-Aldrich)), and 5 ml Penicillin/Streptomycin] on Days 14, 16, and 18. A total of 1.5 million cardiomyocytes were re-plated per well of a six-well plate on Day 20 in Cardiomyocyte Maintenance Media (500 mL DMEM without glucose [A14430-01, ThermoFisher Scientific], 50 mL FBS [S1200-500, Genemate], 990 mg Galactose [G5388, Sigma-Aldrich], 5 mL 100 mM sodium pyruvate [11360–070, ThermoFisher Scientific], 2.5 mL 1 M HEPES [SH3023701, ThermoFisher Scientific], 5 mL Glutamax [35050–061, ThermoFisher Scientific], 5 mL Penicillin/Streptomycin). iPSC-CMs were matured in culture for a further 10 days with Cardiomyocyte Maintenance Media replaced on Days 23, 25, 27, 28, and 30.
On Day 25, iPSC-CMs were transferred to a 10% oxygen environment (representative of in vivo levels) in an oxygen-controlled incubator (HERAcell 150i CO2 incubator, ThermoFisher Scientific). From Day 27 onwards, iPSC-CMs were pulsed at a voltage of 6.6 V/cm, frequency of 1 Hz, and pulse frequency of 2 ms using an IonOptix C-Dish and C-Pace EP Culture Pacer to further mature the cells and synchronize beating.
Flow cytometry
Request a detailed protocolPurity of the cardiomyocyte cultures was assessed ~Day 30 as previously described (Ward and Gilad, 2019). Briefly, cells were stained with Zombie Violet Fixable Viability Kit (423113, BioLegend), and PE Mouse Anti-Cardiac Troponin T antibody (564767, clone 13–11, BD Biosciences, San Jose, CA, USA), and analyzed on a BD LSRFortessa Cell Analyzer together with negative control samples of iPSCs, and iPSC-CMs that are incubated without the troponin antibody, or without either the troponin antibody or viability stain.
Hypoxia experiment
Request a detailed protocolOn Day 31/32, iPSC-CMs were subjected to the hypoxia experiment. At time = 0, condition A samples remained at 10% O2 (normoxia), while samples for conditions B, C and D were transferred to an incubator set at 1% O2 (hypoxia). After 6 hr, conditions A and B were harvested, while plates C and D were returned to normoxic oxygen conditions. Plate C was harvested 6 hr following the hypoxic treatment, and Plate D was harvested 24 hr following the hypoxic treatment. Oxygen levels, experienced by the cells in culture, were measured in cultures from each experimental batch using an oxygen-sensitive sensor (SP-PSt3-NAU-D5-YOP, PreSens Precision Sensing GmbH, Regensburg, Germany), optical fiber (NWDV29, Coy, Grass Lake, MI, USA), and oxygen meter (Fibox 3 Transmitter NWDV16, Coy).
Material collection
Cell culture media for ELISA and cytotoxicity assays
Request a detailed protocolAliquots of cell culture media from each experiment were centrifuged at 10,000 rpm for 10 min at 4°C to remove cellular debris. The supernatant was stored at −80°C until further use.
Nuclei for ATAC-seq
Request a detailed protocolCardiomyocytes from each well of a six-well plate were washed twice with cold PBS on ice before collection by manual scraping in 1.5 ml PBS. A total of 200 μl of cells were pelleted by centrifugation at 500 g for 5 min. Cell pellets were re-suspended in 50 μl cold ATAC-seq lysis buffer (10 mM Tris-HCl pH 7.4, 10 mM NaCl, 3 mM MgCl2, 0.1% Igepal CA630, dH2O). Nuclei were pelleted by centrifugation at 500 g for 5 min at 4°C. Nuclei were re-suspended in 50 μl transposition mix (25 μl 2xTD buffer, 2.5 μl Tn5 transposase, 22.5 μl nuclease-free dH2O) from the Nextera DNA sample kit (FC-121–1031, Illumina). The transposition reaction was performed at 37°C for 30 min. Transposed DNA was purified with Qiagen MinElute Kit (28004, Qiagen, MD, USA), re-suspended in 12 μl elution buffer, and stored at −20°C.
Cell pellets for RNA-seq and DNA methylation arrays
Request a detailed protocolCells from each well of a six-well plate were washed twice with cold PBS on ice before collection by manual scraping in 1.5 ml PBS. Cells were pelleted by centrifugation at 7,000 rpm for 8 min at 4°C, flash-frozen and stored at −80°C.
RNA/DNA extraction
Request a detailed protocolRNA and DNA were extracted from the same frozen cell pellets using the ZR-Duet DNA/RNA MiniPrep kit (D7001, Zymo, CA, USA) according to the manufacturer’s instructions. All four conditions from three or four individuals were extracted in the same batch. RNA samples had a median RIN score of 8.5 with similar median scores across conditions (A = 9.4, B = 8.7, C = 9.1, D = 9.2) (Supplementary file 1, Figure 2—figure supplement 1). All samples had RIN scores greater than eight except for two samples: 18852A RIN = 6, 18852D RIN = not determined. 18852A is an outlier sample in Figure 2—figure supplement 2. Given that both of these samples come from an individual for which we have replicate samples, these particular samples were not selected for the differential expression analysis and eQTL analysis that are only able to handle one biological replicate.
RNA-seq library preparation
Request a detailed protocolWe prepared RNA-seq libraries from 15 individuals, and three replicates from three individuals (18852, 18855, 18511). A toal of 500 ng of RNA were used to prepare sequencing libraries using the Illumina TruSeq RNA Sample Preparation Kit v2 (RS-122–2001 and −2002, Illumina). Libraries were pooled into five master mixes containing 12 or 16 samples. Each pool was sequenced 50 bp, single-end on the HiSeq2500 or HiSeq4000 according to the manufacturer’s instructions.
DNA methylation array
Request a detailed protocolWe measured DNA methylation in 13 of the 15 individuals we collected gene expression data for (we do not have data from any condition for individuals 18870 and 19116; Supplementary file 1). As we did for the RNA-seq data, we collected three replicates for three individuals; except for individual 18852 which only had two replicates. Nine chips (eight samples per chip) with 60–1000 ng DNA were bisulfite-converted and processed on an Illumina Infinium MethylationEPIC array at the University of Chicago Functional Genomics facility.
ATAC-seq
Request a detailed protocolWe performed ATAC-seq in 14 of the 15 individuals we had gene expression data for (we do not have data from any condition for individual 18858; Supplementary file 1). One sample was collected for each individual and condition. ATAC-seq libraries were prepared using the Illumina Nextera DNA sample kit. Libraries were amplified for 10–16 cycles depending on the amplification rate of each library. Each library was amplified in a PCR reaction containing 10 μl DNA, 10 μl dH2O, 15 μl NMP (PCR master mix), 5 μl PPC (PCR primer cocktail), 5 μl index N5, and 5 μl index N7. PCR conditions were set at 72°C for 5 min, 98°C for 30 s, 98°C for 10 s, 63°C for 30 s, 72°C for 1 min, repeat steps 3–5 4x and hold at 4°C. The number of cycles per library was determined using a qPCR side reaction as described in Buenrostro et al., 2013. Libraries were purified using Agencourt AMPure XP beads (A63880, Beckman Coulter, IN, USA), and bioanalyzed to determine library quality. Twelve or 16 samples were pooled together to generate four master mixes. Each master mix was sequenced 50 bp paired-end on the HiSeq4000 according to the manufacturer’s instructions.
Lactate dehydrogenase activity assay
Request a detailed protocolLactate dehydrogenase activity (LDH) was measured in 5 μl cell culture media using the Lactate Dehydrogenase Activity Assay Kit (MAK066, MilliporeSigma, MO, USA) according to the manufacturer’s instructions. Each sample was assayed in triplicate. LDH activity was measured as the difference in absorbance prior to the addition of the substrate, and 10 min after the initiation of the enzymatic reaction, calculated relative to a standard curve. Measurements are standardized relative to A, and reported as A (A-A), B (B-A), C (C-B) and D (D-B).
BNP ELISA
Request a detailed protocolA total of 125 μl of cell culture media was assayed to quantify the level of secreted BNP using the Brain Natriuretic Peptide EIA kit (RAB0386, MilliporeSigma). Each sample was assayed in duplicate on two 96-well plates. BNP levels were quantified relative to a standard curve using 4- and 5-parameter logistic models using the R package drc. Measurements are standardized relative to A, and reported as A (A-A), B (B-A), C (C-B), and D (D-B).
RNA-seq analysis
Reads were aligned to hg19 using subread align (Liao et al., 2013). The mapped reads were then reprocessed to reduce reference bias for downstream analyses using the WASP pipeline (van de Geijn et al., 2015). Briefly, reads overlapping polymorphisms segregating in our population were remapped to the genome using the true read, and a version of the read with the alternative allele. Only reads that mapped uniquely to the same locations with both possible alleles were kept. The median number of reads across conditions was similar (A: 34,353,716; B: 33,493,298; C: 33,883,532; D: 38,147,083). The number of filtered reads mapping to genes was quantified using featureCounts within subread (Liao et al., 2014). We obtained measurements for 19,081 genes. Sample 18852A was an outlier when considering read count correlations between pairs of samples, and was therefore removed prior to subsequent analyses.
Differential expression analysis
Request a detailed protocolWe selected autosomal genes for downstream analysis (18,226). Log2-transformed counts per million were calculated (Robinson et al., 2010), and genes with a mean log2cpm < 0 were excluded. We used the fact that we have replicate data from three individuals to remove unwanted variation in our data. We used the RUVs function in the RUVSeq package in R (Risso et al., 2014) to identify such factors. By manual inspection, our data segregated by individual or condition after correction with four factors. For the differential expression analysis, we excluded sample replicate one to avoid the outlier sample and randomly selected replicate two, instead of replicate three, for individuals with replicate samples. We used the RUV factors as covariates in our differential expression analysis using the TMM-voom-limma pipeline (Law et al., 2014; Robinson et al., 2010; Smyth, 2004). We used fixed effects for each condition (A, B, C, D), the RUVs factors as covariates, and a random effect for individual, which was implemented using duplicateCorrelation. Genes with a Benjamini and Hochberg FDR < 0.1 are classified as differentially expressed (Benjamini and Hochberg, 1995).
Gene expression trajectory analysis
Request a detailed protocolTo identify response genes, we used the Cormotif package in R (Wei et al., 2015) to jointly model pairs of tests. We used TMM-normalized log2cpm values as input and considered the following pairs of tests: A vs B, B vs. C, and B vs. D to determine which genes are changing their expression during the course of the experiment. The best fit was determined to correspond to two correlation motifs or clusters using BIC and AIC. We classified genes as response genes if the probability of differential expression between conditions was >0.5 in all pairs of tests.
eQTL identification
Request a detailed protocolTo map eQTLs, we analyzed the same samples considered in the differential expression analysis. Given the sample size in this study, we utilized the combined haplotype test (CHT) to identify eQTLs (van de Geijn et al., 2015). This test models both allelic imbalance and total read depth at a region to identify QTLs. We require 50 total counts and 10 allele-specific counts for each gene, and tested variants 25 kb upstream and 25 kb downstream of the TSS, resulting in 1,040,874 shared tests (A: 1,215,476; B: 1,211,099, C: 1,224,612, D: 1,201,078). As previously reported, we found that null p-values for the CHT were not calibrated in our data. To calibrate the p-values, we estimated the null distribution of the CHT by permuting the data 100 times and fitting a Beta distribution to the permuted p-values for each SNP-gene pair (previously proposed by Delaneau et al., 2017; Ongen et al., 2016). We then computed an adjusted p-value for each SNP-gene pair by taking the CDF of the fitted Beta distribution, evaluated at the reported CHT p-value.
To call significant eQTLs, we estimated q-values for the set of adjusted p-values for each phenotype, and took tests with q < 0.1. The number of eGenes in each condition was determined by taking the most significant SNP-gene association in each condition (i.e. the top SNP). We defined dynamic eQTLs as either: (1) significant only in A (q<0.1 in A and permutation-adjusted p>0.15 in B and C and D; suppressed eQTL); (2) significant in at least one of B, C, or D (q < 0.1) and not nominally significant in A (adjusted p>0.15; induced eQTL).
Power analysis
Request a detailed protocolFor QTL mapping, we assume a linear model
where denotes the phenotype of individual and denotes the genotype of individual at a single SNP of interest. We estimate an effect size
where is the sample size. Let be the standardized effect size. Then,
and
where denotes the significance level and denotes the standard Gaussian CDF. To simplify the analysis, we consider (i.e. Bonferroni correction; this is equivalent to controlling the FDR when all tests are null, and is conservative otherwise). Assume there is a single causal variant. Then, the phenotypic variance explained is:
We defined a dynamic eQTL as either significant only in A, or significant (after Bonferroni correction, in this analysis) in one of B, C, or D and not significant in A. To estimate the false-positive rate of dynamic eQTL calling, we asked what was the probability of a SNP passing this definition, assuming the standardized effect size was identical in all four conditions. We then computed phenotypic variance explained, power to detect an eQTL, and false-positive rate to call a dynamic eQTL for every choice of standardized effect size .
Overlapping response genes and eGenes with existing gene sets
Gene ontology analysis
Request a detailed protocolGene set enrichment analysis was performed on response genes, and a background set of all expressed genes using the DAVID genomic annotation tool (Huang et al., 2009a; Huang et al., 2009b). GO Terms related to Biological Processes were selected, and those with a Benjamini-Hochberg controlled FDR < 0.05 were designated as significantly enriched. Each of the five significantly enriched processes relates to transcription (‘DNA-templated transcription’, ‘DNA-templated regulation of transcription’, ‘DNA-templated negative regulation of transcription’, ‘negative regulation of transcription from RNA polymerase II promoter’, ‘positive regulation of transcription from RNA polymerase II promoter’). The most significantly enriched GO terms related to Molecular Functions include ‘transcription factor activity, sequence-specific DNA binding’, ‘nucleic acid binding’ and ‘DNA binding’.
Transcription factors
Request a detailed protocolA list of 1,637 annotated human TFs was obtained from Lambert et al., 2018, and intersected with our gene sets.
GTEx eQTLs
Request a detailed protocolThree hundred and sixty-seven dynamic eSNPs were interrogated for overlap with eQTL data from left ventricle heart tissue and 49 other tissues assayed in GTEx v8 (http://www.gtexportal.org). A total of 326 dynamic eSNPs were tested in GTEx. We determined whether each dynamic eSNP was identified as a significant eQTL in left ventricle heart tissue or any other tissue. To define shared eSNPs we identified the 61 eGenes present in all four conditions. We then identified the most significant eSNP for that gene in condition A and determined whether this SNP was significantly associated with the expression of that gene in conditions B, C, and D. This yielded 20 shared eSNPs, 19 of which were tested by the GTEx consortium. We compared our dynamic eQTL and shared eQTL effect sizes to the effect size in GTEx left ventricle heart tissue. In this analysis, for the dynamic eQTLs we selected the condition with the largest effect size, and for the shared eQTLs we used the effect size in condition A to compare to the GTEx effect size.
ATAC-seq analysis
Paired-end sequencing reads were aligned to hg19 using bowtie2 with default settings (Langmead and Salzberg, 2012). Reads were filtered using Picard Tools (https://broadinstitute.github.io/picard/) to remove duplicate reads, and reads mapping to the mitochondrial genome. Reads were then remapped using the WASP pipeline as described above. We retained a similar median number of reads across conditions (A: 28,998,060; B: 33,662,261; C: 30,161,640; D: 34,534,416). Across conditions, there is no significant difference in the number of mapped reads, number of regions identified, or fraction of reads mapped to open chromatin regions (Figure 4—figure supplement 2A–C). All libraries, across conditions, show the expected fragment size distribution, enrichment of reads at transcription start sites (TSS), and footprints at well-defined CTCF motifs (Figure 4—figure supplement 2D–F). Correlation analysis of read counts between pairs of samples revealed clustering by individual and condition (Figure 4—figure supplement 3). As expected, the correlation of read counts between samples at the 10,633 regions overlapping the TSS is higher than the correlation across all regions (median rho = 0.83 vs. 0.56). Pairs of samples from the same condition are marginally more correlated in their accessibility profiles than pairs of samples across all conditions (median rho = 0.84 vs. 0.83 at the TSS).
Identification of accessible chromatin regions
Request a detailed protocolTo generate a unified list of regions with accessible chromatin across conditions and samples, we first used MACS2 (Zhang et al., 2008) to identify peaks within each sample independently. Next, we used BEDtools (Quinlan and Hall, 2010) with the multiIntersectBed function to identify overlapping peaks within each condition separately. Within each condition, we retained peaks with support from more than three individuals and used the mergeBed function to create a condition-specific consensus. We then combined and merged the bed files across the four conditions to make a final consensus file containing all the filtered accessible regions. The number of reads mapping to accessible chromatin regions was quantified using featureCounts within subread (Liao et al., 2014).
Identification of differentially accessible regions (DARs)
Request a detailed protocolThe 128,673 open chromatin regions associated with count data were filtered to include only those regions on the autosomes, and those which had mean log2cpm values > 0 for each region. First, to identify differentially accessible regions we used the same limma framework described above for the RNA-seq data. To test for differences between conditions, a linear model with a fixed effect for condition was used together with a random effect for individual. We did not identify any significantly differentially accessible regions with a Benjamini and Hochberg FDR < 0.1. To identify regions with small effect size differences between conditions, we used an adaptive shrinkage method implemented in the ashr package in R (Stephens, 2017). We used the regression estimates (regression coefficients, posterior standard errors, and posterior degrees of freedom) generated by limma to calculate a posterior mean (shrunken regression coefficients), FDR, and False Sign Rate (FSR, probability that the sign of the effect size is wrong). We considered regions to be differentially accessible at FSR <0.1. We denote regions that are not differentially accessible as constitutively accessible regions.
Overlap of DARs with genomic features
TSS
Request a detailed protocolTranscription start sites were obtained from the UCSC Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables) using ‘txStart’ from Ensembl genes (Karolchik et al., 2004). TSS were defined based on the TSS of the 5’ most transcript on the sense strand and 3’ most transcript on the anti-sense strand. TSS regions, and subsequent genomic features, were intersected with DARs and constitutively accessible regions requiring a 1 bp overlap using bedtools intersect (Quinlan and Hall, 2010).
Histone marks
Request a detailed protocolWe obtained histone mark data (.bed files) for human heart tissue from the ENCODE consortium (ENCODE Project Consortium, 2012; Davis et al., 2018) ENCODE portal, (https://www.encodeproject.org). We selected H3K4me3 (Experiment ENCSR181ATL), H3K4me1 (Experiment ENCSR449FRQ), H3K36me3 (Experiment ENCSR799KLF), H3K27me3 (Experiment: ENCSR613PPL), and H3K9me3 (Experiment ENCSR803MVC) ChIP-seq data from heart left ventricle tissue from a 51-year-old female individual (Biosample ENCBS684IAD).
Transcription factor binding locations
Request a detailed protocolWe obtained ChIP-seq data for the hypoxia-responsive factors HIF1α and HIF2α assayed in the MCF-7 breast cancer cell line (Schödel et al., 2011), and E2F4 in the GM06990 lymphoblastoid cell line (Lee et al., 2011). Genome co-ordinates of the 356 HIF1α-, 301 HIF2α- and 16,245 E2F4-bound regions were converted from hg18 to hg19 using the liftOver tool in the Galaxy platform (http://galaxyproject.org/; Afgan et al., 2018).
Motif enrichment analysis in DARs
Request a detailed protocolWe obtained sequences for all accessible regions and differentially accessible regions using the Galaxy platform (Afgan et al., 2018). We used the MEME-ChIP tool within The MEME Suite (Bailey et al., 2009; Machanick and Bailey, 2011) in Differential Enrichment mode to identify motifs differentially enriched in DARs compared to all accessible regions using a hypergeometric distribution.
TEs
Request a detailed protocolWe obtained repeat annotations from the RepeatMasker track (Jurka, 2000; Smith et al., 2010) from the UCSC Table browser (Karolchik et al., 2004). We intersected the Repeatmasker track with our accessible regions and reported those elements where 50% of their length overlaps a DAR or constitutively accessible region. We stratified TEs by TE class: LINE, SINE, DNA, and LTR, and then by TE family and type within the SINE class.
CpG islands
Request a detailed protocolWe obtained CpG island annotations from the UCSC Table Browser, and overlapped these regions with DARs and constitutively accessible regions.
caQTL identification
Request a detailed protocolThe caQTLs were identified in the same manner as described for the eQTLs. However, in the caQTL analyses, we limited tested SNPs to those falling within the 128,672 accessible regions, as opposed to testing variants within 25 kb of the region as we did for eQTLs. Dynamic caQTLs were identified as for dynamic eQTLs.
DNA methylation analysis
To allow for accurate quantification of DNA methylation levels we removed probes overlapping SNPs with a minor allele frequency of >0.1, and only retained probes with a detection p-value of >0.75 across samples. Beta-values (ratio of methylated probe intensity and overall probe intensity, and bounded between 0 and 1) were quantile normalized using lumiN, and, when appropriate, converted to M-values (log2 ratio of intensities of methylated probe versus unmethylated probe) using lumi (Du et al., 2008).
The methylation level of CpGs coincides with the expected distribution based on their annotated genomic location that is low levels of DNA methylation in CpG islands, and higher levels in CpG island shores, and CpG island shelves respectively (Figure 6—figure supplement 1). Correlation analysis across all pairs of samples, including replicate samples, reveals clustering primarily by individual rather than condition (Figure 6—figure supplement 2).
To measure the DNA methylation level at gene set promoters, we selected CpGs 200 bp upstream of the TSS (TSS200 defined on the array). We considered all CpGs when overlapping with DARs.
Identification of differentially methylated CpGs (DMCpGs)
Request a detailed protocolDifferentially methylated CpGs were identified using the same limma framework as described for the RNA-seq data. Analysis was run using both Beta-values and M-values.
Variance partition of three molecular phenotypes
Request a detailed protocolTo identify the proportion of variance explained by individual, condition and replicate in the gene expression, chromatin accessibility, and DNA methylation data, we used a linear mixed model with a random effect for all of the variables. The variance was normalized to sum to one and the proportion of variance attributed to each variable was calculated at each locus. We used a t-test to compare the mean proportion of variance explained by the same variable between data types.
Integration with GWAS-implicated variants and genes
Request a detailed protocolWe intersected the Reference SNP cluster ID of our dynamic QTLs with the 158,654 SNPs in the NHGRI-EBI GWAS Catalog available from the UCSC Table Browser (Buniello et al., 2019) in August 2019.
We also considered the ‘mapped genes’ results from GWAS from thee relevant traits: myocardial infarction (EFO_0000612, 89 genes), heart failure (EFO_0003144, 164 genes) and stroke (EFO_0000712, 255 genes), downloaded from the NHGRI-EBI GWAS Catalog in August 2019. Gene lists were intersected with our response eGenes.
Full GWAS summary statistics for CAD and MI were obtained from the CARDIoGRAMplusC4D Consortium (Stitziel et al., 2016; Nikpay et al., 2015). We tested if the lead eSNP of the dynamic eQTLs, and all significant eQTLs in each condition is associated with CAD or MI. We created locus zoom plots in each condition by plotting the -log10 p-values of the GWAS and eQTL SNPs in a 1 Mb window around eSNPs. We calculated LD from the YRI population using the LDlink program (Machiela and Chanock, 2015). We used the LDmatrix function to generate the LD matrix plots choosing all tested SNPs 5 kb upstream and downstream from the eSNP or lead GWAS variant. To generate the R2 value and p value of the LD between the eSNP and GWAS variant we used the LDpair function.
Data access
Request a detailed protocolAll RNA-seq, ATAC-seq and DNA methylation data have been deposited in the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) under accession number GSE144426.
Data availability
Sequencing data have been deposited in GEO under accession codes GSE144426.
-
NCBI Gene Expression OmnibusID GSE144426. Dynamic effects of genetic variation on gene expression revealed following hypoxic stress in cardiomyocytes.
References
-
The galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 updateNucleic Acids Research 46:W537–W544.https://doi.org/10.1093/nar/gky379
-
MEME SUITE: tools for motif discovery and searchingNucleic Acids Research 37:W202–W208.https://doi.org/10.1093/nar/gkp335
-
Controlling the false discovery rate: a practical and powerful approach to multiple testingJournal of the Royal Statistical Society: Series B 57:289–300.https://doi.org/10.1111/j.2517-6161.1995.tb02031.x
-
Transposable elements have contributed human regulatory regions that are activated upon bacterial infectionPhilosophical Transactions of the Royal Society B: Biological Sciences 375:20190332.https://doi.org/10.1098/rstb.2019.0332
-
Oxygen, a source of life and stressFEBS Letters 581:3582–3591.https://doi.org/10.1016/j.febslet.2007.06.018
-
Human induced pluripotent Stem-Cell-Derived cardiomyocytes as models for genetic cardiomyopathiesInternational Journal of Molecular Sciences 20:4381.https://doi.org/10.3390/ijms20184381
-
Genome-wide analysis of mammalian promoter architecture and evolutionNature Genetics 38:626–635.https://doi.org/10.1038/ng1789
-
Why is the partial oxygen pressure of human tissues a crucial parameter? small molecules and hypoxiaJournal of Cellular and Molecular Medicine 15:1239–1253.https://doi.org/10.1111/j.1582-4934.2011.01258.x
-
The encyclopedia of DNA elements (ENCODE): data portal updateNucleic Acids Research 46:D794–D801.https://doi.org/10.1093/nar/gkx1081
-
A complete tool set for molecular QTL discovery and analysisNature Communications 8:15452.https://doi.org/10.1038/ncomms15452
-
Anti-apoptotic function of the E2F transcription factor 4 (E2F4)/p130, a member of retinoblastoma gene family in cardiac myocytesJournal of Molecular and Cellular Cardiology 53:820–828.https://doi.org/10.1016/j.yjmcc.2012.09.004
-
Gene expression and genetic variation in response to endoplasmic reticulum stress in human cellsThe American Journal of Human Genetics 86:719–729.https://doi.org/10.1016/j.ajhg.2010.03.017
-
Lumi: a pipeline for processing illumina microarrayBioinformatics 24:1547–1548.https://doi.org/10.1093/bioinformatics/btn224
-
Clinical and genetic determinants of varicose veinsCirculation 138:2869–2880.https://doi.org/10.1161/CIRCULATIONAHA.118.035584
-
Transcriptomic analysis of patients with tetralogy of fallot reveals the effect of chronic hypoxia on myocardial gene expressionThe Journal of Thoracic and Cardiovascular Surgery 140:337–345.https://doi.org/10.1016/j.jtcvs.2009.12.055
-
Oxygen, oxidative stress, hypoxia, and heart failureJournal of Clinical Investigation 115:500–508.https://doi.org/10.1172/JCI200524408
-
A genome-wide analysis of the response to inhaled β2-agonists in chronic obstructive pulmonary diseaseThe Pharmacogenomics Journal 16:326–335.https://doi.org/10.1038/tpj.2015.65
-
Regulation of ischemic neuronal death by E2F4-p130 protein complexesJournal of Biological Chemistry 289:18202–18213.https://doi.org/10.1074/jbc.M114.574145
-
Oxidative stress under ambient and physiological oxygen tension in tissue cultureCurrent Pharmacology Reports 2:64–72.https://doi.org/10.1007/s40495-016-0050-5
-
Repbase update: a database and an electronicJournal of Repetitive Elements Trends Genet 16:418–420.https://doi.org/10.1016/S0168-9525(00)02093-X
-
The UCSC table browser data retrieval toolNucleic Acids Research 32:493–496.https://doi.org/10.1093/nar/gkh103
-
Fast gapped-read alignment with bowtie 2Nature Methods 9:357–359.https://doi.org/10.1038/nmeth.1923
-
The subread aligner: fast, accurate and scalable read mapping by seed-and-voteNucleic Acids Research 41:e108.https://doi.org/10.1093/nar/gkt214
-
MEME-ChIP: motif analysis of large DNA datasetsBioinformatics 27:1696–1697.https://doi.org/10.1093/bioinformatics/btr189
-
Revisiting cardiac cellular compositionCirculation Research 118:400–409.https://doi.org/10.1161/CIRCRESAHA.115.307778
-
Normalization of RNA-seq data using factor analysis of control genes or samplesNature Biotechnology 32:896–902.https://doi.org/10.1038/nbt.2931
-
Genetic variation in radiation-induced cell deathGenome Research 22:332–339.https://doi.org/10.1101/gr.122044.111
-
Linear models and empirical bayes methods for assessing differential expression in microarray experimentsStatistical Applications in Genetics and Molecular Biology 3:1–25.https://doi.org/10.2202/1544-6115.1027
-
False discovery rates: a new dealBiostatistics 18:275–294.https://doi.org/10.1093/biostatistics/kxw041
-
Coding variation in ANGPTL4, LPL, and SVEP1 and the risk of coronary diseaseThe New England Journal of Medicine 374:1134–1144.https://doi.org/10.1056/NEJMoa1507652
-
Sex differences in gene expression in response to ischemia in the human left ventricular myocardiumHuman Molecular Genetics 28:1682–1693.https://doi.org/10.1093/hmg/ddz014
-
PRMT4 overexpression aggravates cardiac remodeling following myocardial infarction by promoting cardiomyocyte apoptosisBiochemical and Biophysical Research Communications 520:645–650.https://doi.org/10.1016/j.bbrc.2019.10.085
-
Systematic evaluation of pleiotropy identifies 6 further loci associated with Coronary Artery DiseaseJournal of the American College of Cardiology 69:823–836.https://doi.org/10.1016/j.jacc.2016.11.056
-
Model-based analysis of ChIP-Seq (MACS)Genome Biology 9:R137.https://doi.org/10.1186/gb-2008-9-9-r137
Decision letter
-
Oliver StegleReviewing Editor; European Molecular Biology Laboratory, European Bioinformatics Institute, United Kingdom
-
Patricia J WittkoppSenior Editor; University of Michigan, United States
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
This manuscript describes an in vitro system to study inter-individual variation of transcriptional and epigenetic responses to hypoxia using iPSC derived cardiomyocytes. Changes in gene expression are found without changes in chromatin accessibility or methylation, providing insight into key questions about the necessity of epigenetic changes for changes in gene expression. It also identifies regions of the genome responsible for context dependent-changes in gene expression.
Decision letter after peer review:
Thank you for submitting your article "Dynamic effects of genetic variation on gene expression revealed following hypoxic stress in cardiomyocytes" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Patricia Wittkopp as the Senior Editor. The reviewers have opted to remain anonymous.
The reviewers have discussed the reviews with one another and the Senior Editor has drafted this decision to help you prepare a revised submission.
Both reviewers agree that this manuscript is in principle appropriate for publication in eLife without any further experimentation; however, they both also raise a number of concerns about the data analysis and presentation that would need to be addressed satisfactorily before we can consider this work further. Although the typical practice for eLife is to provide authors with a single consolidated review, I felt it would be more beneficial in this case to provide the two reviews as is. They are generally consistent and both explain why they recommend the changes they do in detail.
Reviewer #2:
1) The authors present the range of cardiomyocyte purity. However, I think it would be helpful for this data to be presented in a table. Further, how do the purities correlate between the individual at different time points collected? Could some of the eQTLs identified really be a result of the variation in purity of the differentiations? Investigation and results regarding this possibility would be helpful if included in the manuscript.
2) The inferences made in this paper about the identification of dynamic eQTLs (and their absence) for a given loci rests on the assumption that the power to detect an eQTL is the same in every cell-type by condition. This is addressed in subsection “Dynamic eQTLs are revealed following hypoxia”, where it is clearly show (and acknowledged) that this is not the case. To avoid this problem the authors chose a lenient secondary cut-off at the point where p-values deviate from uniformity. Firstly, it would be helpful to formally test for deviation of uniformity here (rather than appearance on a graph). More importantly, utilising the same p-value threshold when the distribution of p-values is different will lead to variation in the false positive / false negative rates between conditions. This would lead to bias in the identification of dynamic eQTLs, as variation in eQTL detection is a function of the FP/FN rates. Can this be addressed, and it clearly demonstrated that the comparison between conditions is based on the same ability to detect a true eQTL?
3) In subsection “Dynamic eQTLs are revealed following hypoxia”, the authors state that the genes that were not eQTLs in any condition were uniform. However, the figures suggest significant skewing to more significant p-values. Please test for the uniform distribution with a chi-squared test. The authors suggest that they would expect the eQTLs identified in one condition to be uniform in another condition. This relies a the huge assumption that the largest contributing factor to significant eQTLs in each condition is the condition itself, not the cell type and that there will not be a significant number of eQTLs that are consistent across all conditions. However, the authors have provided no evidence to support this claim or this assumption. Please provide the requisite evidence to support these claims and assumptions.
4) Related to this, in paragraph three, it is indicated that based on the FDR calculations, half of the dynamic eQTLs that they identified are not truly dynamic. However, given this is the main conclusion of the paper, it really would be beneficial if some additional work could be done to ascertain the true from false dynamic eQTLs. Particularly if this work is going to be used for any translational research. One option would be to create a null distribution of the effect sizes for genes in each condition and test for enrichment of the "dynamic eQTLs" across the different conditions to test demonstrate whether they are actually identifying truly dynamic eQTLs. Further, I think different FDR cutoffs would have been more appropriate for identifying truly dynamic eQTLs since that is the main focus of the manuscript.
5) The analysis of the GTEx and GWAS overlap is unclear and would benefit from some more careful explanations. Reading this section, I had the following concerns:
i) How were the 14 tissue types randomly selected? And why not include all GTEx tissue types?
ii) How was overlap between dynamic eQTL and GTEx eQTL determined? Many eGenes are regulated by different eQTL – was the overlap based ona. Rank test of p-values between datasets for example? That would help provide confidence it is a true shared effect rather than unlinked eQTL.
iii) Are the allelic effects in the same direction?
iv) Why perform two stage comparison between eQTL and GWAS variants from all traits, then cardiac traits? Was multiple testing corrected for?
v) How do you rule out whether the overlap is just by chance, pleiotropy, linkage, or a causal relationship? These different scenarios have significant impact on the conclusions that are drawn. As it currently stands, it is implied that the overlap is a shared causal relationship.
6) One of the premises of this work is the statement made at the end of paragraph one of the Introduction, where the authors conclude that because the majority GTEx eQTLs are shared across tissues, they probably do not contribute to disease in tissue-specific manners. However, this statement makes two significant assumptions that are not addressed. 1) They assume that the downstream effects and signalling are the same regardless of cell type given they have the same eQTL. However, because different genes and pathways are activated in different tissues, this assumption may not hold true across all tissues. 2) They ignore the ascertainment bias associated with large additive cis-eQTL (which are more likely to overlap between tissues). The mean proportion of expression h2 that can be found to overlap between tissues is 0.02 (2%) in GTEX. The assumption that the remaining 98% are shared between tissues isn't supported by evidence from trans-eQTL mapping or single cell eQTL analyses. In my opinion, clarifying this statement in the Introduction as justification for the work would be beneficial.
7) I found that manuscript in general to be difficult to read and easily interpret the analysis conducted, and conclusions based on specific results. I would suggest a careful eye on editing a review would be very helpful. But here are a few points that I noted specifically
i) The notation used for supplementary figures and tables presented as "Figure X – Supplement Y". It would help to refer to the supplementary figures as "Supplementary Figure X" instead.
ii) Subsection “Dynamic eGenes are enriched for response genes and transcription factors” – it is not always entirely clear whether the authors are referring to transcription factor genes or transcription factors regulating the genes. You could include the word "gene" following "transcription factor" throughout the paragraph.
iii) Defining clearly what response genes and why they are important.
iv) Subsection “Potential limitations of our model” states that the power analysis indicates a 6% power to detect 38% heritability. Firstly, I assume, but it is not clear, that the test is against H0: h2=0? I understand the maths behind this, but that seems a really odd way of explaining power. Why not give a power curve?
v) How are the genomic features tested for enrichment of DARs?
vi) Subsection “Genomic features associated with differentially accessible regions (DARs)” paragraph two and Discussion paragraph four are confusing and unclear. It is hard to understand what the has been done.
vii) Throughout the manuscript, it is difficult to identify which timepoint were used for each assay. Please clarify this when introducing a new assay. Altering Figure 1 to make this clear graphically would help as well.
8) Some of the RIN numbers in RIN table are very low (i.e. 0 for 18852 in condition D). It's not clear if these samples were included in the analyses, but if they were some justification is needed. I worry about impact on the analysis, as essentially it would induce bias in those samples.
9) The section "DNA methylation levels are largely stable following hypoxia and re-oxygenation" is confusing and difficult to read. It is first stated that no DMCpGs were identified and then later stated that they were identified.
10) In subsection “Chromatin accessibility changes following hypoxia and re-oxygenation”, it is stated that ATAC-seq reads were selected that did not show allelic mapping biases. However, later testing for differential accessible regions. This filtering step would remove the ability to effectively identify differentially accessible regions with this step. Could you please either explain the reasoning for removing allelic mapping biases at this step, and/or explain why it would not impact differential accessibility analyses downstream.
11) It is concluded that the data indicate that the "underlying DNA sequence features can affect their epigenetic profile". However, I do not think that the results demonstrate this. They only demonstrate differential methylation in differentially accessible regions which are themselves not associated with genetic variants. I would suggest altering this sentence by removing "that the underlying DNA sequence features of these regions can affect their epigenetic profile"
Reviewer #3:
The authors established an in vitro system to study inter-individual variation of transcriptional and epigenetic responses to hypoxia using iPSC derived cardiomyocytes. Despite the modest sample size, they discovered more than a thousand eQTLs in total, of which 367 are dynamic eQTLs modulating the transcriptional response to hypoxia with implications for disease susceptibility.
Although the analyses demonstrated in the manuscript are solid and the results are reliable, authors failed to present their results effectively. I would recommend the authors to rewrite the paper to increase the accuracy and readability in general (as I suggested below).
In addition, I have one particular concern regarding the following important (but vague) claim presented in this paper, that is the transcriptional response to hypoxia is not mediated through epigenetic changes. Although the authors repeated the claim multiple times throughout the paper, it is hugely confounded by the power to detect epigenetic variations between conditions/individuals (i.e., DARs, caQTLs and DMCpGs). If the authors would keep the claim, they need to address the power issue appropriately.
Here are my specific comments:
1) Figure 1A should explain the overview of the experiment in more detail. There are three replicates for the three of 15 donors (mentioned in the main text) which are hidden in the panel. It is also nice to combine panel A and D in one panel to illustrate the whole experiment. According to the main text, one donor was dropped in ATAC-seq and two donors were dropped in methylation data which should also be shown in the panel.
2) It would be nice to summarise the 4 x 21 RNA-seq samples in the main figure. The authors used a linear mixed model for differential expression (DE) analysis. I'm wondering if the model can be used to perform a variance components analysis to decompose the total variance (for each gene) into “condition”, “individual”, “replication”, and residual variances (where the condition and replication are also treated as random effects). The estimated variance parameters across all genes are then summarised by a boxplot or similar where the residual variance is standardised as 1.0. The authors could repeat the analysis for ATAC-seq and methylation data to compare the difference of explained variances between molecular phenotypes. The analysis will provide clear interpretation on why detectable DARs, caQTLs and DMCpGs are so few. The similar analysis was performed e.g., in (Kilpinen et al., 2017).
3) Figure 2H is not so informative. Instead, the authors could extend the heatmap in Figure 2F to show how much of eGenes are classified into baseline eGenes, dynamic eGenes and shared eGenes by showing all genes with eQTLs at least in one condition. The authors could also add additional columns next to the heatmap showing which eGenes are response genes and/or transcription factors to support the enrichment analysis in the section “Dynamic eGenes are enriched for response genes and…”.
4) The DE analysis result is frequently used in the manuscript, but there is no summary data shown in the main text (e.g., volcano plots, DE genes shared/not shared with different conditions, etc.). It would be ideal to have an independent figure for DE analysis. Currently, Figure 2A-C are the result of DE analysis which are inconsistent with the figure title (i.e., “Hundreds of dynamic eQTLs are revealed…”).
5) How many dynamic eGenes are overlapping with the baseline eGenes? The Chi-square test performed in the section “Dynamic eGenes are enriched for response genes and transcription factors” seems inappropriate because the authors double-count the same genes in baseline eGenes as the background distribution. They need to subtract dynamic eGenes from the baseline eGenes to perform a proper Chi-square test.
6) The authors performed the enrichment analysis of transcription factors (TFs) in dynamic eGenes which could be confounded by response genes, because dynamic eGenes are likely to be response genes. In addition, even if dynamic eGenes are enriched with TFs, what is the biological interpretation? Why do regulatory variants affect TF dynamic eGenes more than non-TF dynamic eGenes?
7) It is often the case that the main figures are used to show anecdotal examples of biological statements raised in the main text and important data/results are hidden in the supplementary figures. The main figures must be used to support the author's claims in general (without cherry-picking an example gene). For example, the authors stated “most DARs have returned to baseline levels of accessibility by the first re-oxygenation condition.” with just numbers of DARs and the FOXO1 example which do not strongly support the claim. The authors should perform and highlight global analyses to support each claim from different angles. In fact, the claim was partly supported by the supplementary figure 3 (supplement 4A-C). Therefore, the authors first consider rearranging the figures to better support each claim made in the main text. Then, they can clarify which analysis is needed to make a strong statement.
8) The statement “These results suggests that gene expression changes…” was supported only by a limited number of DARs and not totally addressed yet. Recently, the “gene activity score” was proposed in the single-cell ATAC-seq analysis to overcome the sparseness of chromatin data (Granja et al., 2020). I'm wondering if it improves the power to detect DARs in the paper. This quantification also allows the authors to easily colocalise DARs with eGenes/response genes to prove (or disprove) the statement.
9) It is very surprising that CHT found only tens of caQTLs, while it mapped more than a thousand eQTLs. The authors should comment on that. If the authors use only variants overlapping with the accessibility region, does the power to detect caQTL increase? It seems 25Kb cis-region is too broad.
10) The statement “Changes in DNA methylation are therefore…” is hugely confounded by the power to detect DMCpGs with the higher noise rate in DNA methylation arrays. It is not conclusive with the modest sample size.
11) The authors should perform a colocalisation analysis (Giambartolomei et al., 2014) if GWAS summary statistics are available. If not, the authors are still able to calculate linkage disequilibrium (LD) between the GWAS lead variant and eQTL lead variant from CHT to make sure the GWAS locus and the eQTL are reasonably colocalised. The authors should show locuszoom plots in conjunction with boxplots in Figure 6 to demonstrate both the magnitude of statistical significance of eQTLs and LD (r2 index) with GWAS lead variants.
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
Thank you for submitting your article "Dynamic effects of genetic variation on gene expression revealed following hypoxic stress in cardiomyocytes" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Patricia Wittkopp as the Senior Editor. The reviewers have opted to remain anonymous.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.
As stated by the reviewers in response to the first submission, the paper presents a timely, interesting and relevant advance. The initial comments raised have been largely addressed, however both reviewers continue raise concerns regarding the eQTL and GWAS overlap / colocalization analysis. We agree that these concerns are fundamental and should be addressed.
First, concerning the overlap with GTEX (R2), it seems entirely doable to perform this analysis on the level of individual variants, which also permit assessing consistency of effect directions, etc.
Second, both reviewers comment on the GWAS overlap/colocalization analysis. While we acknowledge that the resolution to definitely identify colocalization events will be limited, the authors should extend their analysis and/or town down the conclusions. The approach proposed by reviewer 3 seems sensible and viable, at least for a selected set of loci.
For reference, we have also included the complete reviewer reports.
Reviewer #2:
The authors have addressed most of the comments raised, although there are a couple of points that I feel would be beneficial to investigate further.
1) GTEX overlap – in the original review, a suggestion of testing for concordance of allelic effect direction be shown for the replication of the eQTL identified in this work and GTEX tissues. It is unclear what “analysis at the level of the gene” exactly means, but given that many genes are controlled by independent eQTL (loci) in difference tissues/cell types, overlapping an observation of eGenes doesn't allow conclusions of shared eQTL to be determined. Testing for the concordance of allelic effects for eSNPs identified from your analysis with GTEX eSNP results would provide much stronger evidence to support this conclusion.
2) GWAS overlap – "We next chose to analyze the data using a less stringent gene-based approach using GWAS traits of interest. The second analysis allows for the identification of potentially interesting regions where the causal SNP may be different in the eQTL and GWAS data" I am not sure what this means biologically and what conclusions can be drawn from this. What does it mean if there is no overlap in the SNPs between a eQTL and GWAS loci but that they are both in the same gene region? Is there an enrichment over what would be expected by chance? the statements "We did not mean to imply that the overlap is a shared causal relationship. We have changed the language to reflect this. Our goal is to illustrate that the dynamic eQTLs we identified may play a role in higher-level phenotypes." These sentences seems in opposition to one another.
Reviewer #3:
The authors have now addressed most of my questions and concerns. Yet, the result of GWAS colocalisation is still in question. I have previously suggested to perform a proper colocalisation analysis using COLOC package or to report the linkage disequilibrium index (r2 value) between GWAS index variant and eQTL lead variant. Unfortunately the authors just provided only P-values for associations which does not support the GWAS locus and eQTL are colocalised. The authors have to expand the cis-window (up to 1Mb) for a handful number of genes overlapping with GWAS loci to perform a proper colocalisation analsysis. It is crucial to provide the posterior probability of colocalisation for such a small number of GWAS overlapping genes.
https://doi.org/10.7554/eLife.57345.sa1Author response
Reviewer #2:
1) The authors present the range of cardiomyocyte purity. However, I think it would be helpful for this data to be presented in a table. Further, how do the purities correlate between the individual at different time points collected? Could some of the eQTLs identified really be a result of the variation in purity of the differentiations? Investigation and results regarding this possibility would be helpful if included in the manuscript.
Please see Supplementary file 1 as indicated in the text for the purity of each cell line.
We did not measure purity at different time points, only at day 30 +/- 1 day, before the treatment on day 31-32. Because we split treatment cultures from a single differentiation culture, and all experiments were concluded within 48h after purity was determined by flow cytometry, it is unlikely that there are purity differences between conditions.
We have made this clearer in the Results and we have added a schematic of the experimental workflow as Figure 1—figure supplement 1. We have also amended Figure 2—figure supplement 4 to include cardiomyocyte marker gene expression levels across all conditions instead of just condition A as a representative condition. This data illustrates that there is no systematic bias in cardiomyocyte marker gene expression levels across conditions suggesting that the purity is indeed similar across conditions.
2) The inferences made in this paper about the identification of dynamic eQTLs (and their absence) for a given loci rests on the assumption that the power to detect an eQTL is the same in every cell-type by condition. This is addressed in subsection “Dynamic eQTLs are revealed following hypoxia”, where it is clearly show (and acknowledged) that this is not the case. To avoid this problem the authors chose a lenient secondary cut-off at the point where p-values deviate from uniformity. Firstly, it would be helpful to formally test for deviation of uniformity here (rather than appearance on a graph). More importantly, utilising the same p-value threshold when the distribution of p-values is different will lead to variation in the false positive / false negative rates between conditions. This would lead to bias in the identification of dynamic eQTLs, as variation in eQTL detection is a function of the FP/FN rates. Can this be addressed, and it clearly demonstrated that the comparison between conditions is based on the same ability to detect a true eQTL?
We tested for deviation from uniform distributions and added that data to the figure legend of Figure 3—figure supplement 2A. We believe that the assumption that power to detect eQTLs is similar in all conditions is a reasonable one as we now note because: (i) the cells originated from the same differentiation culture for each individual so the only factor distinguishing these samples should be the condition. (ii) We have the same number of individuals in each condition. (iii) RNA was extracted from all time points of the same individual at the same time (Supplementary file 1). (iv) RIN scores are similar across conditions (Supplementary file 1). (v) Sequencing read counts are similar across conditions (Figure 2—figure supplement 1). Moreover, the distribution of eQTL effect sizes across conditions is similar (new supplementary figure: Figure 3—figure supplement 1), and the values are correlated, further suggesting we have similar power to detect eQTLs across conditions.
The reviewer is correct that reliance on an arbitrary P-value when the distributions are different will lead to a different FDR. For that reason, our first – more stringent – cutoff is based on an FDR value, not a P-value. The secondary cutoff is based on a nominal P-value, which was determined based on visualization of the distributions. Importantly, we provide the FDR associated with different secondary P-value cutoffs in reviewer comment four below, thereby addressing the reviewer’s concern.
3) In subsection “Dynamic eQTLs are revealed following hypoxia”, the authors state that the genes that were not eQTLs in any condition were uniform. However, the figures suggest significant skewing to more significant p-values. Please test for the uniform distribution with a chi-squared test. The authors suggest that they would expect the eQTLs identified in one condition to be uniform in another condition. This relies a the huge assumption that the largest contributing factor to significant eQTLs in each condition is the condition itself, not the cell type and that there will not be a significant number of eQTLs that are consistent across all conditions. However, the authors have provided no evidence to support this claim or this assumption. Please provide the requisite evidence to support these claims and assumptions.
The experiments were concluded within 48 hours of testing a representative sample from each individual for purity. It does not seem reasonable that during that time the composition of the cultures has changed, certainly not in a way that will create a biased cell composition association with a specific treatment condition. We now acknowledge in the text that our conclusion relies on this assumption, and explain why we believe that it is quite a reasonable assumption to make.
We have tested for deviation from a uniform distribution and indicated the statistics in the figure legend of Figure 3—figure supplement 2B, and in the main text. However, given that the outcome of a chi-square test (rejection/non-rejection) could depend on how the data are binned, and it is not obvious how to do so in this case, we used the Kolmogorov–Smirnov test to determine whether our P-value distribution deviates from a uniform distribution as a simpler alternative.
4) Related to this, in paragraph three, it is indicated that based on the FDR calculations, half of the dynamic eQTLs that they identified are not truly dynamic. However, given this is the main conclusion of the paper, it really would be beneficial if some additional work could be done to ascertain the true from false dynamic eQTLs. Particularly if this work is going to be used for any translational research. One option would be to create a null distribution of the effect sizes for genes in each condition and test for enrichment of the "dynamic eQTLs" across the different conditions to test demonstrate whether they are actually identifying truly dynamic eQTLs. Further, I think different FDR cutoffs would have been more appropriate for identifying truly dynamic eQTLs since that is the main focus of the manuscript.
As the reviewer appreciates, it is not obvious how to distinguish between the true and false positives in this case. We supposed that one can use priors based on gene functions, but this approach seems prone to errors. We have now determined the number of dynamic eQTLs at different statistical cutoffs and their associated FDR (P=0.5: 117 dynamic eQTLs with 30% FDR; P=0.75: 53 eQTLs with 13% FDR; P=0.8: 37 eQTLs with 11% FDR;, P=0.9: 19 eQTLs with 7% FDR). We have added the results of one of these more stringent P-value cutoffs, where the FDR is quite low (7%) to the main text. We also now provide a supplementary figure highlighting six additional examples of dynamic eQTLs (Figure 3—figure supplement 3) indicating that our approach identifies dynamic eQTLs as expected.
5) The analysis of the GTEx and GWAS overlap is unclear and would benefit from some more careful explanations. Reading this section, I had the following concerns:
i) How were the 14 tissue types randomly selected? And why not include all GTEx tissue types?
This number and the specific tissues were chosen in an arbitrary way. We were looking for a modest number of diverse tissue types to interrogate. This analysis illustrates that 98% of dynamic eGenes are eGenes in at least one of 14 tissues suggesting we are reaching saturation already. Adding more tissues therefore can’t add much more insight because we are already at 98% overlap. We clarified that the choice of tissues was arbitrary – effectively random in the text.
ii) How was overlap between dynamic eQTL and GTEx eQTL determined? Many eGenes are regulated by different eQTL – was the overlap based ona. Rank test of p-values between datasets for example? That would help provide confidence it is a true shared effect rather than unlinked eQTL.
As indicated in the Materials and methods this analysis was performed at a gene level; i.e. a gene’s expression level is known to be affected by genetic variation (eGene) in GTEx data. We asked how often do these GTEx eGenes overlap with our eGenes. We have made the approach we chose more explicit in the Results.
A more nuanced SNP-based analysis is actually quite challenging to perform and will require a different approach. As we are uncertain what the potential insight would be from such an analysis, on balance, we decided against it.
iii) Are the allelic effects in the same direction?
As the analysis we performed was based on a gene level we are unable to compare the genotype/allelic effects. There is overwhelming overlap between our findings and GTEx. While we agree that a more nuanced approach will provide more details, we argue that this is a minor point in the paper, and further analysis in this section will not substantially add insight.
iv) Why perform two stage comparison between eQTL and GWAS variants from all traits, then cardiac traits? Was multiple testing corrected for?
To identify genomic loci that may be relevant for complex phenotypes, we performed two independent analyses to intersect our eQTL data with data from GWAS studies. We chose two different analytical approaches to identify as many potentially interesting loci as possible. We first chose to analyze the data using an unbiased, stringent SNP-based approach using all available GWAS traits. We next chose to analyze the data using a less stringent gene-based approach using GWAS traits of interest. The second analysis allows for the identification of potentially interesting regions where the causal SNP may be different in the eQTL and GWAS data.
In both of these analyses we searched for direct overlaps of either SNPs or genes. We did not correct for multiple testing. In line with the comment below we have changed the language to more accurately reflect the analysis we performed.
We have also now included a locus-specific co-localization-based analysis as requested by reviewer three. We have clarified the choice of these three different approaches in the text.
v) How do you rule out whether the overlap is just by chance, pleiotropy, linkage, or a causal relationship? These different scenarios have significant impact on the conclusions that are drawn. As it currently stands, it is implied that the overlap is a shared causal relationship.
We did not mean to imply that the overlap is a shared causal relationship. We have changed the language to reflect this. Our goal is to illustrate that the dynamic eQTLs we identified may play a role in higher-level phenotypes.
6) One of the premises of this work is the statement made at the end of paragraph one of the Introduction, where the authors conclude that because the majority GTEx eQTLs are shared across tissues, they probably do not contribute to disease in tissue-specific manners. However, this statement makes two significant assumptions that are not addressed. 1) They assume that the downstream effects and signalling are the same regardless of cell type given they have the same eQTL. However, because different genes and pathways are activated in different tissues, this assumption may not hold true across all tissues. 2) They ignore the ascertainment bias associated with large additive cis-eQTL (which are more likely to overlap between tissues). The mean proportion of expression h2 that can be found to overlap between tissues is 0.02 (2%) in GTEX. The assumption that the remaining 98% are shared between tissues isn't supported by evidence from trans-eQTL mapping or single cell eQTL analyses. In my opinion, clarifying this statement in the Introduction as justification for the work would be beneficial.
As the reviewer notes this is a complex issue, which was overly simplified in our statement about the consequences of sharing of eQTLs across tissues. We have removed this sentence and instead include a statement about how a variety of approaches are necessary to identify the molecular basis of disease-associated loci.
7) I found that manuscript in general to be difficult to read and easily interpret the analysis conducted, and conclusions based on specific results. I would suggest a careful eye on editing a review would be very helpful. But here are a few points that I noted specifically
We have addressed each point below.
i) The notation used for supplementary figures and tables presented as "Figure X – Supplement Y". It would help to refer to the supplementary figures as "Supplementary Figure X" instead.
The figures have been referenced in the format specified by eLife.
ii) Subsection “Dynamic eGenes are enriched for response genes and transcription factors” – it is not always entirely clear whether the authors are referring to transcription factor genes or transcription factors regulating the genes. You could include the word "gene" following "transcription factor" throughout the paragraph.
By “transcription factor” we mean any gene that is annotated as a transcription factor by Lambert et al., not transcription factors that regulate a particular set of genes. This is indicated in the Materials and methods but we have made this more explicit in the Results.
iii) Defining clearly what response genes and why they are important.
We should have explicitly defined response genes. We have now done so in the text.
iv) Subsection “Potential limitations of our model” states that the power analysis indicates a 6% power to detect 38% heritability. Firstly, I assume, but it is not clear, that the test is against H0: h2=0? I understand the maths behind this, but that seems a really odd way of explaining power. Why not give a power curve?
We clarify that the test is against H0: λ = 0, where λ is the standardized effect size.
This corresponds to the typical regression-based approach to map QTLs, assuming that genotypes and phenotypes have been standardized.
We plotted the power curve as a function of standardized effect size, fixing the sample size to the size of the current study, in Figure 7—figure supplement 3 (green curve).
Assuming a single causal variant, the standardized effect size be used to derive the heritability explained by that variant.
We reported power to detect an effect of a given true effect size, holding the sample size fixed to the size of the current study. We gave the true effect size in terms of heritability explained, rather than giving the standardized effect size, in order to make the results of the power analysis easier to interpret. The standardized effect size does not correspond to log fold change, which would be an alternative, interpretable choice of units.
In particular, the median eQTL heritability over all genes is h2 = 0.16, as previously reported by Gusev et al., 2016 and Wheeler et al., 2016. This reference point makes clear the relative power of this study compared to the GTEx and DGN studies for eQTL mapping.
We have added the Gusev and Wheeler studies for reference to the Discussion.
v) How are the genomic features tested for enrichment of DARs?
We have added the statistical test used to identify promoter- and enhancer-associated marks (chi-squared) to the Results. We have also added more details in the Materials and methods to describe how MEME identifies enriched motifs.
vi) Subsection “Genomic features associated with differentially accessible regions (DARs)” paragraph two and Discussion paragraph four are confusing and unclear. It is hard to understand what the has been done.
We have re-written these sections to make them clearer.
vii) Throughout the manuscript, it is difficult to identify which timepoint were used for each assay. Please clarify this when introducing a new assay. Altering Figure 1 to make this clear graphically would help as well.
We used all time points in all assays and differences between all timepoints were tested in all assays.
We have clarified this in the text, re-designed Figures 1 and 2 and added a supplementary figure with the experimental setup (Figure 1—figure supplement 1) to illustrate this point.
8) Some of the RIN numbers in RIN table are very low (i.e. 0 for 18852 in condition D). It's not clear if these samples were included in the analyses, but if they were some justification is needed. I worry about impact on the analysis, as essentially it would induce bias in those samples.
The median RIN score across 84 samples is 9.1 (9.4 for A, 8.7 for B, 9.1 for C and 9.2 for D). All RIN scores are >8, except 18852A which is 6 and 18852D which is listed as 0. The RIN score for 18852D could not be calculated by the bioanalyzer software, despite the two ribosomal subunits evident in the sample. We should have listed this score as “not detected” not “0”. Given that we have multiple replicates for 18852, and that only one replicate can be used in the eQTL and differential expression analysis, we chose not to include this replicate in downstream analysis.
We have added clarification text to where we mention RIN scores in the Materials and methods, changed the RIN score of 18852D to “not detected” in the table to more accurately reflect this score, and plotted the RIN values in Figure 2—figure supplement 1A to highlight this data.
9) The section "DNA methylation levels are largely stable following hypoxia and re-oxygenation" is confusing and difficult to read. It is first stated that no DMCpGs were identified and then later stated that they were identified.
We did not find any differentially methylated CpGs when considering all CpGs. We did find a handful of differentially methylated CpGs when restricting our analysis to a smaller set of CpGs (those in CpG islands etc).
We have made this distinction, and the rationale behind it, clearer in the text. We also now do not use the term DMCpGs when considering all CpGs. We only introduce the term when we subset CpGs and actually identify differences between conditions. This distinction is also now illustrated in re-designed Figure 6.
10) In subsection “Chromatin accessibility changes following hypoxia and re-oxygenation”, it is stated that ATAC-seq reads were selected that did not show allelic mapping biases. However, later testing for differential accessible regions. This filtering step would remove the ability to effectively identify differentially accessible regions with this step. Could you please either explain the reasoning for removing allelic mapping biases at this step, and/or explain why it would not impact differential accessibility analyses downstream.
We treated the ATAC-seq data as we did the RNA-seq data i.e. used the WASP algorithm to reprocess mapped reads to reduce reference bias as described in the Materials and methods. This step does not impact the ability to identify differentially accessible regions across conditions, it merely removes ambiguous reads. We have clarified in the Results that we only maintain ATAC-seq reads that map unambiguously to the genome.
11) It is concluded that the data indicate that the "underlying DNA sequence features can affect their epigenetic profile". However, I do not think that the results demonstrate this. They only demonstrate differential methylation in differentially accessible regions which are themselves not associated with genetic variants. I would suggest altering this sentence by removing "that the underlying DNA sequence features of these regions can affect their epigenetic profile"
We have altered this sentence as suggested.
Reviewer #3:
The authors established an in vitro system to study inter-individual variation of transcriptional and epigenetic responses to hypoxia using iPSC derived cardiomyocytes. Despite the modest sample size, they discovered more than a thousand eQTLs in total, of which 367 are dynamic eQTLs modulating the transcriptional response to hypoxia with implications for disease susceptibility.
Although the analyses demonstrated in the manuscript are solid and the results are reliable, authors failed to present their results effectively. I would recommend the authors to rewrite the paper to increase the accuracy and readability in general (as I suggested below).
In addition, I have one particular concern regarding the following important (but vague) claim presented in this paper, that is the transcriptional response to hypoxia is not mediated through epigenetic changes. Although the authors repeated the claim multiple times throughout the paper, it is hugely confounded by the power to detect epigenetic variations between conditions/individuals (i.e., DARs, caQTLs and DMCpGs). If the authors would keep the claim, they need to address the power issue appropriately.
We used 13-14 biological replicates (different individuals) in our study. Previous studies have generally used just 1-6 biological replicates to identify differences in chromatin profiles or DNA methylation levels under conditions of stress (Aref-Eshghi, Am J Physio, Cell Physiol, 2020; Marr, Plos Path., 2014; Pacis, GR, 2015). It is also our experience that when we measure both gene expression and DNA methylation changes between sample groups, the number of DNA methylation changes often far outnumber the gene expression changes (Burrows/Banovich, Plos Genetics, 2016; Natri/Bobowik, Plos Genetics, 2020). Lack of power is always a reasonable explanation, but compared to previous studies, our observations are quite surprising. Nevertheless, we have now softened our claims about the lack of differences we observed in the Results section, and removed some of the conclusions we drew.
Here are my specific comments:
1) Figure 1A should explain the overview of the experiment in more detail. There are three replicates for the three of 15 donors (mentioned in the main text) which are hidden in the panel. It is also nice to combine panel A and D in one panel to illustrate the whole experiment. According to the main text, one donor was dropped in ATAC-seq and two donors were dropped in methylation data which should also be shown in the panel.
We have re-designed Figure 1 to combine A and D and now indicate exactly which samples were collected.
2) It would be nice to summarise the 4 x 21 RNA-seq samples in the main figure. The authors used a linear mixed model for differential expression (DE) analysis. I'm wondering if the model can be used to perform a variance components analysis to decompose the total variance (for each gene) into “condition”, “individual”, “replication”, and residual variances (where the condition and replication are also treated as random effects). The estimated variance parameters across all genes are then summarised by a boxplot or similar where the residual variance is standardised as 1.0. The authors could repeat the analysis for ATAC-seq and methylation data to compare the difference of explained variances between molecular phenotypes. The analysis will provide clear interpretation on why detectable DARs, caQTLs and DMCpGs are so few. The similar analysis was performed e.g., in (Kilpinen et al., 2017).
We have now created an extra main text figure where we show the differential expression analysis (Figure 2). These results are largely in line with the findings from our previous work (Ward and Gilad, 2019), which is why we had not initially highlighted them in this paper.
Using a linear mixed model, we calculated the proportion of variance explained by individual, condition, replicate (we only have replicate data for the RNA-seq and DNA methylation data), and the residual variance. A boxplot of variance components for all three phenotypes is now included as Figure 6—figure supplement 4. We now note in the main text that we observe a lower proportion of variance explained by “condition” in the DNA methylation data (0.8%) and chromatin accessibility data (3%) compared to the gene expression data (6%, t-test, P < 2x10-16), and a corresponding increase in the residual variance. Comparing the gene expression data with the DNA methylation data in particular, reveals that while there is approximately an order of magnitude lower variance in the “condition” component in the DNA methylation data, there is a similar proportion of variance explained by “individual” (44% for gene expression, 40% for DNA methylation, and 28% for chromatin accessibility). These results suggest that noise alone cannot explain the relatively small number of chromatin accessibility and DNA methylation changes.
In terms of the low number of caQTLs, we specified in the Results section that we restricted tested SNPs to those within the 128,672 open chromatin regions we identified. We chose to do this so that any effects would be interpretable i.e. a SNP affects a transcription factor binding site, altering accessibility of the region. This means we were testing many fewer variants than the eQTL analyses. We may well be underpowered to detect these effects in our modest sample size. We have added text to the manuscript to make this point more explicit.
3) Figure 2H is not so informative. Instead, the authors could extend the heatmap in Figure 2F to show how much of eGenes are classified into baseline eGenes, dynamic eGenes and shared eGenes by showing all genes with eQTLs at least in one condition. The authors could also add additional columns next to the heatmap showing which eGenes are response genes and/or transcription factors to support the enrichment analysis in the section “Dynamic eGenes are enriched for response genes and…”.
We appreciate the suggestion on how to make this figure more informative. Unfortunately this didn’t work in practice partly because of the distinction between eSNP-based and eGene-based analyses and partly because of the available software. Instead, to address the comment we have now made a clear distinction between SNP-based results and eGene-based results, added an extra panel with a heatmap which includes all eQTLs identified in all conditions to provide a more global view of the data and to show how much “sharing” there is between conditions (Figure 3B), and re-designed a panel to include the response gene and TF data (Figure 3F).
4) The DE analysis result is frequently used in the manuscript, but there is no summary data shown in the main text (e.g., volcano plots, DE genes shared/not shared with different conditions, etc.). It would be ideal to have an independent figure for DE analysis. Currently, Figure 2A-C are the result of DE analysis which are inconsistent with the figure title (i.e., “Hundreds of dynamic eQTLs are revealed…”).
We have made a new Figure 2 which displays pairwise and joint tests in the RNA-seq data using panels from the original Figure 2 and supplement.
5) How many dynamic eGenes are overlapping with the baseline eGenes? The Chi-square test performed in the section “Dynamic eGenes are enriched for response genes and transcription factors” seems inappropriate because the authors double-count the same genes in baseline eGenes as the background distribution. They need to subtract dynamic eGenes from the baseline eGenes to perform a proper Chi-square test.
This was an error on our part. As the reviewer noted a subset of dynamic eGenes are present at baseline – these are the 37 suppressed eQTLs. We have now excluded these genes to compare only baseline eGenes and induced eGenes so that this is now a mutually exclusive set. This in fact increases the enrichment of both response genes and transcription factors, which we now report in the text.
6) The authors performed the enrichment analysis of transcription factors (TFs) in dynamic eGenes which could be confounded by response genes, because dynamic eGenes are likely to be response genes. In addition, even if dynamic eGenes are enriched with TFs, what is the biological interpretation? Why do regulatory variants affect TF dynamic eGenes more than non-TF dynamic eGenes?
We have now investigated the enrichment of TFs in induced eGenes that are response genes compared to induced eGenes that are non-response genes. TFs amongst induced eGenes are more likely to be response genes than non-response genes (P = 0.02); however there is no difference in the proportion of response genes between TFs amongst baseline eGenes and TFs amongst induced eGenes (38% vs. 39%). We have added these results to the main text.
The enrichment we observe in TFs amongst our induced eGenes suggests that latent genetic variation can have multiple downstream effects on gene expression including gene targets of TF genes. This could provide a mechanism for the appearance of induced eGenes that are neither response genes nor TF genes. We have added the implication of our findings to the main text.
7) It is often the case that the main figures are used to show anecdotal examples of biological statements raised in the main text and important data/results are hidden in the supplementary figures. The main figures must be used to support the author's claims in general (without cherry-picking an example gene). For example, the authors stated “most DARs have returned to baseline levels of accessibility by the first re-oxygenation condition.” with just numbers of DARs and the FOXO1 example which do not strongly support the claim. The authors should perform and highlight global analyses to support each claim from different angles. In fact, the claim was partly supported by the supplementary figure 3 (supplement 4A-C). Therefore, the authors first consider rearranging the figures to better support each claim made in the main text. Then, they can clarify which analysis is needed to make a strong statement.
We have re-designed Figure 4 and 6 to include the genome-wide analyses performed for the ATAC-seq and DNA methylation data that were previously shown in the supplementary figures, and we have edited the text accordingly.
8) The statement “These results suggests that gene expression changes…” was supported only by a limited number of DARs and not totally addressed yet. Recently, the “gene activity score” was proposed in the single-cell ATAC-seq analysis to overcome the sparseness of chromatin data (Granja et al., 2020). I'm wondering if it improves the power to detect DARs in the paper. This quantification also allows the authors to easily colocalise DARs with eGenes/response genes to prove (or disprove) the statement.
The referenced statement refers to the identification of caQTLs not DARs. However, given the other reviewer comments we have removed this particular statement and softened the language in this section. The referenced gene activity score is for aggregating regulatory regions across single cells. We aggregate open chromatin regions across individuals within each condition (by merging peaks of read enrichment) and then look for differences in read count between conditions and individuals. It is not clear to us how this tool would work with our bulk data, or how it would improve power to detect differences between conditions.
9) It is very surprising that CHT found only tens of caQTLs, while it mapped more than a thousand eQTLs. The authors should comment on that. If the authors use only variants overlapping with the accessibility region, does the power to detect caQTL increase? It seems 25Kb cis-region is too broad.
As we stated in the Results and in the Materials and methods, we tested for caQTLs as we did for eQTLs except we did not use a 25 kb window, we restricted SNPs to those within the 128,672 open chromatin regions.
10) The statement “Changes in DNA methylation are therefore…” is hugely confounded by the power to detect DMCpGs with the higher noise rate in DNA methylation arrays. It is not conclusive with the modest sample size.
We would argue that most studies aimed at identifying differences between two conditions do not have 13 biological replicates. Nevertheless, we have removed this statement.
11) The authors should perform a colocalisation analysis (Giambartolomei et al., 2014) if GWAS summary statistics are available. If not, the authors are still able to calculate linkage disequilibrium (LD) between the GWAS lead variant and eQTL lead variant from CHT to make sure the GWAS locus and the eQTL are reasonably colocalised. The authors should show locuszoom plots in conjunction with boxplots in Figure 6 to demonstrate both the magnitude of statistical significance of eQTLs and LD (r2 index) with GWAS lead variants.
We only have a small number of overlapping results here, so we chose an alternative approach: We were able to obtain full summary statistics from the CARDIoGRAMplusC4D Consortium for coronary artery disease (CAD) and myocardial infarction (MI). From both significant and dynamic eQTLs we tested the lead eSNP against the full GWAS results and identified two new associations. The first is an association between a dynamic eQTL (regulating EIF2B2) and CAD. The p-value of the dynamic eSNP is 4.7x10-5 in the GWAS and the most significant SNP in the GWAS peak has a p-value of 9.9x10-8. As we tested a relatively narrow window around genes in our eQTL analysis, our summary statistics do not overlap the full peak. We also identified an association between a significant eQTL in condition B that is significantly associated with MI (CARM1). We have added both of these loci to the manuscript, and now highlight EIF2B2 with locus zoom plots in Figure 7 and CARM1 in Figure 7—figure supplement 2.
The two original loci that we highlighted were RNF166, associated with varicose veins, and ZC3HC1, associated with MI. Varicose vein GWAS summary statistics were only available for the significant associations. We include a locus zoom plot of the RNF166 region with only the significant GWAS hits (Author response image 1A). The GWAS region around ZC3HC1 does not show a clear peak despite being a known MI locus (Author response image 1B). We felt that neither of these locus zoom plots are striking as examples to illustrate in the main paper. However, we believe that these are nevertheless potentially two interesting loci worth mentioning. We have moved the original RNF166 and ZC3HC1 loci eQTL boxplots to Figure 7—figure supplement 1.
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
Reviewer #2:
The authors have addressed most of the comments raised, although there are a couple of points that I feel would be beneficial to investigate further.
1) GTEX overlap – in the original review, a suggestion of testing for concordance of allelic effect direction be shown for the replication of the eQTL identified in this work and GTEX tissues. It is unclear what “analysis at the level of the gene” exactly means, but given that many genes are controlled by independent eQTL (loci) in difference tissues/cell types, overlapping an observation of eGenes doesn't allow conclusions of shared eQTL to be determined. Testing for the concordance of allelic effects for eSNPs identified from your analysis with GTEX eSNP results would provide much stronger evidence to support this conclusion.
We have now performed the GTEx overlap analysis of dynamic and shared eQTLs at the level of eSNP-eGene pairs rather than eGenes. 326 of 367 dynamic eSNPs are tested in GTEx. 32 dynamic eSNP-eGene pairs are eQTLs in left ventricle heart tissue (9.8%), and 140 are eQTLs in at least one of the other 49 tissues assayed by GTEx (42.9%). For the shared eQTL analysis we selected the 61 eGenes present in all conditions, identified the most significant eSNP in condition A and then determined whether this eSNP was significant in conditions B,C and D. 19 of 20 such shared eSNPs are tested in GTEx. 6 shared eSNP-eGene pairs are eQTLs in heart tissue (31.6%), and 6 are an eQTL in at least one other tissue (31.6%). In line with our previous eGene analysis, shared eSNPs are more likely to be eSNPs in heart tissue than dynamic eSNPs (Chi-squared test; p=0.01). To further investigate eQTL concordance, we compared the eQTL effect size of our dynamic and shared eQTLs to the eQTL effect size in heart tissue determined by the GTEx consortium. Shared eQTLs overlapping heart tissue eQTLs (n=6) have a higher concordance of effect than dynamic eQTLs overlapping heart tissue eQTLs (n=32; r2=0.73 vs. r2=0.12). We are only able to access the significant eQTLs identified by the GTEx consortium and were therefore unable to determine broader concordance across data sets.
We have replaced the GTEx eGene analysis with this new variant-based analysis in the Results section and Materials and methods section.
2) GWAS overlap – "We next chose to analyze the data using a less stringent gene-based approach using GWAS traits of interest. The second analysis allows for the identification of potentially interesting regions where the causal SNP may be different in the eQTL and GWAS data" I am not sure what this means biologically and what conclusions can be drawn from this. What does it mean if there is no overlap in the SNPs between a eQTL and GWAS loci but that they are both in the same gene region? Is there an enrichment over what would be expected by chance? the statements "We did not mean to imply that the overlap is a shared causal relationship. We have changed the language to reflect this. Our goal is to illustrate that the dynamic eQTLs we identified may play a role in higher-level phenotypes." These sentences seems in opposition to one another.
Our study population are the Yoruba from West Africa. Unfortunately, the vast majority of GWAS studies, including the CAD and MI studies reported here, are carried out in predominantly European populations which have an inherently different underlying LD structure. Thus, while our GWAS analysis includes a SNP based approach, we also felt that casting a wider net connecting eQTLs with genes which have orthogonal evidence of being involved in heart disease was prudent. Specifically, given the underlying differences in LD structure between European and African populations we anticipate that some variants identified by GWAS studies will have weak co-localization with eQTLs identified in our study, but the fact that these SNPs are acting to alter the expression of genes implicated in heart disease suggests they may be important for disease risk.
We have included the caveats of this analysis and explained our approach in the manuscript.
Reviewer #3:
The authors have now addressed most of my questions and concerns. Yet, the result of GWAS colocalisation is still in question. I have previously suggested to perform a proper colocalisation analysis using COLOC package or to report the linkage disequilibrium index (r2 value) between GWAS index variant and eQTL lead variant. Unfortunately the authors just provided only P-values for associations which does not support the GWAS locus and eQTL are colocalised. The authors have to expand the cis-window (up to 1Mb) for a handful number of genes overlapping with GWAS loci to perform a proper colocalisation analsysis. It is crucial to provide the posterior probability of colocalisation for such a small number of GWAS overlapping genes.
We appreciate the comments from the reviewer and indeed, failed to include the requested LD analysis in the first revision of this paper. Unfortunately, performing a full colocalization analysis using an extended set of variants that overlap the lead GWAS variant is non-trivial as we cannot easily add new variants to the combined haplotype test without performing a near complete reanalysis of the data. However, we can and have carried out the suggested LD analysis. Using LDlink we tested for the r2 value between the lead GWAS variants near EIF2B2 and CARM1 and our eSNPs (in the YRI population). We found the lead GWAS variant associated with CAD risk located near the EIF2B2 gene (rs3832966) was in high LD with our dynamic eSNP (rs12588981) associated with expression of the EIF2B2 gene (r2 = 0.96, p < 0.001). Similarly, the lead GWAS variant associated with MI near the CARM1 gene (rs4804142) is in lower, but still significant, LD with our condition B eSNP (rs8105092), which is associated with CARM1 expression (r2 0.32 p < 0.001).
We have added text to describe this analysis in the manuscript in the Results section, and the Materials and methods section. We have also added two supplementary figures of the LD matrix plots (Figure 7—figure supplement 2 and Figure 7—figure supplement 3D).
References:
Giambartolomei, Claudia, Damjan Vukcevic, Eric E. Schadt, Lude Franke, Aroon D. Hingorani, Chris Wallace, and Vincent Plagnol. 2014. "Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics." PLoS Genetics 10 (5): e1004383.
Granja, Jeffrey M., M. Ryan Corces, Sarah E. Pierce, S. Tansu Bagdatli, Hani Choudhry, Howard Y. Chang, and William J. Greenleaf. 2020. "ArchR: An Integrative and Scalable Software Package for Single-Cell Chromatin Accessibility Analysis." bioRxiv. https://doi.org/10.1101/2020.04.28.066498.
Kilpinen, Helena, Angela Goncalves, Andreas Leha, Vackar Afzal, Kaur Alasoo, Sofie Ashford, Sendu Bala, et al. 2017. "Common Genetic Variation Drives Molecular Heterogeneity in Human iPSCs." Nature 546 (7658): 370-75.
https://doi.org/10.7554/eLife.57345.sa2Article and author information
Author details
Funding
National Heart, Lung, and Blood Institute (HL092206)
- Yoav Gilad
EMBO (Long-Term Fellowship ALTF 751-2014)
- Michelle C Ward
National Institute on Aging (F31 AG044948)
- Nicholas E Banovich
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Kristen Patterson and Amy Mitrano for experimental assistance, David Knowles and Benjamin Strober for preliminary data exploration, all members of the Gilad lab and Luis Barreiro for helpful discussions, and Natalia Gonzales for comments on the manuscript. We thank the Genomics Core Facility at the University of Chicago for sequencing the libraries and processing the DNA methylation arrays. We thank The Genotype-Tissue Expression (GTEx) Project, supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH and NINDS, for providing data. The data used for the analyses described in this manuscript were obtained from the GTEx portal v8 on December 30th 2020. We thank the ENCODE Consortium and the Bernstein Lab at Broad for generating and making the histone mark ChIP-seq data available.
Senior Editor
- Patricia J Wittkopp, University of Michigan, United States
Reviewing Editor
- Oliver Stegle, European Molecular Biology Laboratory, European Bioinformatics Institute, United Kingdom
Publication history
- Received: March 28, 2020
- Accepted: February 6, 2021
- Accepted Manuscript published: February 8, 2021 (version 1)
- Version of Record published: February 25, 2021 (version 2)
Copyright
© 2021, Ward et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 2,875
- Page views
-
- 284
- Downloads
-
- 15
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Chromosomes and Gene Expression
- Neuroscience
Sensory feedback is required for the stable execution of learned motor skills, and its loss can severely disrupt motor performance. The neural mechanisms that mediate sensorimotor stability have been extensively studied at systems and physiological levels, yet relatively little is known about how disruptions to sensory input alter the molecular properties of associated motor systems. Songbird courtship song, a model for skilled behavior, is a learned and highly structured vocalization that is destabilized following deafening. Here, we sought to determine how the loss of auditory feedback modifies gene expression and its coordination across the birdsong sensorimotor circuit. To facilitate this system-wide analysis of transcriptional responses, we developed a gene expression profiling approach that enables the construction of hundreds of spatially-defined RNA-sequencing libraries. Using this method, we found that deafening preferentially alters gene expression across birdsong neural circuitry relative to surrounding areas, particularly in premotor and striatal regions. Genes with altered expression are associated with synaptic transmission, neuronal spines, and neuromodulation and show a bias toward expression in glutamatergic neurons and Pvalb/Sst-class GABAergic interneurons. We also found that connected song regions exhibit correlations in gene expression that were reduced in deafened birds relative to hearing birds, suggesting that song destabilization alters the inter-region coordination of transcriptional states. Finally, lesioning LMAN, a forebrain afferent of RA required for deafening-induced song plasticity, had the largest effect on groups of genes that were also most affected by deafening. Combined, this integrated transcriptomics analysis demonstrates that the loss of peripheral sensory input drives a distributed gene expression response throughout associated sensorimotor neural circuitry and identifies specific candidate molecular and cellular mechanisms that support the stability and plasticity of learned motor skills.
-
- Chromosomes and Gene Expression
In songbirds, deafening leads to changes in gene expression which have now been mapped at the single-cell level across the neural circuit involved in song production.