Anthracycline-induced cardiotoxicity (ACT) is a key limiting factor in setting optimal chemotherapy regimes, with almost half of patients expected to develop congestive heart failure given high doses. However, the genetic basis of sensitivity to anthracyclines remains unclear. We created a panel of iPSC-derived cardiomyocytes from 45 individuals and performed RNA-seq after 24 hr exposure to varying doxorubicin dosages. The transcriptomic response is substantial: the majority of genes are differentially expressed and over 6000 genes show evidence of differential splicing, the later driven by reduced splicing fidelity in the presence of doxorubicin. We show that inter-individual variation in transcriptional response is predictive of in vitro cell damage, which in turn is associated with in vivo ACT risk. We detect 447 response-expression quantitative trait loci (QTLs) and 42 response-splicing QTLs, which are enriched in lower ACT GWAS -values, supporting the in vivo relevance of our map of genetic regulation of cellular response to anthracyclines.https://doi.org/10.7554/eLife.33480.001
Many cancers, including leukaemia, lymphoma and breast cancer, are treated with potent chemotherapy drugs such as anthracyclines. However, anthracyclines have strong side effects known as anthracycline cardiotoxicity, which affect the health of the heart. Almost half of the patients given high doses of anthracyclines develop chronic heart failure.
While anthracycline cardiotoxicity is very common, people’s genes may contribute to how sensitive they are to these drugs but it is not understood which genes can cause this effect. Previous studies using only a small number of participants have not been able to pin down the genetic factors that make some patients respond well to anthracyclines, and others prone to developing heart failure when taking these drugs.
To find out which genes affect anthracycline cardiotoxicity, Knowles, Burrows et al. transformed blood cells from 45 individuals into stem cells, which were then developed into heart muscle cells. Then, the activity of genes was analyzed by measuring the amount of RNA (the template molecules used to make proteins) produced by those genes.
After the cells had been exposed for 24 hours to the anthracycline drug doxorubicin, hundreds of gene activity differences could be found in the heart muscle cells between individuals. Some of these differences were linked to poorer health of the cells after treatment with the drug. As a result, a number of genetic variants that could predispose patients to the side effects of doxorubicin were discovered. The experiments also revealed how doxorubicin disrupts an important process that separates ‘junk’ parts of the RNA from the parts that are used as a template for proteins.
Being able to predict who is likely to be sensitive to drugs such as doxorubicin could help doctors to tailor chemotherapy treatments more effectively, minimising the risk of heart failure. In future, larger studies could lead to accurate predictions of a patient’s response to a particular chemotherapy drug to personalize their cancer treatment.https://doi.org/10.7554/eLife.33480.002
Anthracyclines, including the prototypical doxorubicin, continue to be used as chemotherapeutic agents treating a wide range of cancers, particularly leukemia, lymphoma, multiple myeloma, breast cancer, and sarcoma. A well-known side-effect of doxorubicin treatment is anthracycline-induced cardiotoxicity (ACT). For some patients ACT manifests as an asymptomatic reduction in cardiac function, as measured by left ventricular ejection fraction (LVEF), but in more extreme cases ACT can lead to congestive heart failure (CHF). The risk of CHF is dosage-dependent: an early study (Von Hoff et al., 1979) estimated 3% of patients at 400 mg/m2, 7% of patients at 550 mg/m2, and 18% of patients at 700 mg/m2 develop CHF, where a more recent study puts these numbers at 5%, 26% and 48% respectively (Swain et al., 2003). Reduced LVEF shows a similar dosage-dependent pattern, but is not fully predictive of CHF.
Perhaps most daunting for patients is that CHF can occur years after treatment: out of 1807 cancer survivors followed for 7 years in a recent survey a third died of heart diseases compared to 51% of cancer recurrence (Vejpongsa and Yeh, 2014).
Various candidate gene studies have attempted to find genetic determinants of ACT, but are plagued by small sample sizes and unclear endpoint definitions, resulting in limited replication between studies. Two ACT genome-wide association studies (GWAS) have been published (Aminkeng et al., 2015; Schneider et al., 2017). While neither found genome-wide significant associations using their discovery cohorts, both found one variant that they were able to replicate in independent cohorts.
A nonsynonymous coding variant, rs2229774, in RARG (retinoic acid receptor ) was found to be associated with pediatric ACT using a Canadian European discovery cohort of 280 patients (Aminkeng et al., 2015), and replicated in both a European () and non-European cohort (). Modest signal () supporting rs2229774’s association with ACT was also reported in a recent study primarily focused on trastuzumab-related cardiotoxicity (Serie et al., 2017). RARG negative cell lines have reduced retinoic acid response element (RAREs) activity and reduced suppression of Top2b (Aminkeng et al., 2015), which has been proposed as a mediator of ACT.
In a different study, a GWAS in 845 patients with European-ancestry from a large adjuvant breast cancer clinical trial, 51 of whom developed CHF, found no variants at genome-wide significance levels (Schneider et al., 2017). However, one of the most promising variants, rs28714259 ( in discovery cohort), was genotyped in two further cohorts and showed modest replication (). rs28714259 falls in a glucocorticoid receptor protein binding peak, which may play a role in cardiac development.
An exciting approach to studying complex phenotypes, including disease, in human is to use induced pluripotent stem cells (iPSC) and derived differentiated cells as in vitro model systems. Work by us and others has demonstrated that iPSCs and iPSC-derived cell-types are powerful model systems for understanding cell-type specific genetic regulation of transcription (Thomas et al., 2015; Burrows et al., 2016; Banovich et al., 2018; Kilpinen et al., 2017; Alasoo et al., 2017), but it is less established whether these systems can be used to model the interplay of genetic and environmental factors in disease progression. Encouragingly, the response of iPSC-derived cardiomyocytes (ICs) to doxorubicin was recently extensively characterized (Burridge et al., 2016). ICs derived from four individuals who developed ACT after doxorubicin treatment (‘DOXTOX’ group) and four who did not (‘DOX’ group), showed clear differences in viability (via apoptosis), metabolism, DNA damage, oxidative stress and mitochondrial function when exposed to doxorubicin. These observations suggest that ICs recapitulate in vivo inter-individual differences in doxorubicin sensitivity. Gene expression response differences between the DOX and DOXTOX groups were found using RNA-sequencing data, but the sample size was insufficient (RNA-seq was generated for only three individuals in each group) to attempt mapping of genetic variants that might explain the observed functional differences between individuals.
Here we used a panel of iPSC-derived cardiomyocytes from 45 individuals, exposed to five different drug concentrations, to map the genetic basis of inter-individual differences in doxorubicin-sensitivity. We find hundreds of genetics variants that modulate the transcriptomic response, including 42 that act on alternative splicing. We show that the IC transcriptomic response predicts cardiac troponin levels in culture (indicative of cell lysis) in these cell-lines, and that troponin level is itself predictive of ACT. Finally we demonstrate that the mapped genetic variants show significant enrichment in lower ACT GWAS -values.
We generated iPSC-derived cardiomyocytes (ICs) for 45 Hutterite individuals (Figure 1a). iPSC quality was confirmed using qPCR (Figure 1—figure supplement 1), global gene expression profiling (Figure 1—figure supplement 2), the embryoid body test (Supplementary Data), and EBV integration analysis (Figure 1—figure supplement 3, Figure 1—figure supplement 4). Cardiomyocyte identity was confirmed by FACS for cardiac troponin I and T, with mean purity (72 ±12)% (Figure 1—figure supplement 5). We exposed all 45 IC lines to doxorubicin at five different concentrations for 24 hr, after which samples were processed for RNA-sequencing. We obtained sufficient read depth (10M exonic reads) for downstream analysis for 217 of the individual-concentration pairs, and confirmed sample identity by calling exonic SNPs (see Methods). We observed a strong gene regulatory response to doxorubicin across all concentrations, with 98% (12038/12317) of quantifiable genes (5% FDR) showing differential expression across the different treatment concentrations. Our data shows excellent concordance with the smaller RNA-seq dataset of (Burridge et al., 2016) (Figure 1—figure supplement 6). Principal component analysis (PCA, Figure 1b) confirms that the main variation in the data is driven by doxorubicin concentration and that the effect of concentration on gene expression is nonlinear. For some individuals the expression data following doxorubicin treatment with is closer to the data from treatment with , whereas for others it is closer to data from treatment with . This general pattern provides the first indication in our data that that there is systematic variation in how different individuals respond to doxorubicin exposure. Since the majority of genes appear responsive to doxorubicin we clustered genes into six distinct response patterns using a mixture model approach (Figure 1c, see Materials and methods). From largest to smallest, these clusters represent genes that, through the gradient from low to high concentration treatments, are (1) down regulated (2) initially up-regulated, then further down-regulated (3) up-regulated (4) down-regulated only at lower dosages (5) up-regulated only at lower dosages (6) down-regulated then partially recover (Supplementary file 1). Gene set enrichments (Figure 1—figure supplement 7, Supplementary file 2) for the up-regulated cluster include metabolic, mitochrondrial and extracellular processes, as well as known doxorubicin response genes in breast cancer cell lines from (Graessmann et al., 2007) (647 overlapping genes of 1090 in term, hypergeometric ). The down-regulated cluster shares genes with those down-regulated in response to UV light, which, like doxorubicin, causes DNA-damage (413 overlapping genes of 470 in term, hypergeometric ). Targets of p53, a transcription factor that responds to DNA damage, are overrepresented in clusters 2 and 5; these clusters involve up-regulation at low concentrations () but down-regulation at higher concentrations (486 overlapping genes out of 1057 in term, hypergeometric ). Promoter analysis (Figure 1—figure supplement 8, Supplementary file 3) revealed 21, 45, and 6 significantly enriched transcription factor (TF) binding motifs for clusters 1, 2 and 3 respectively (and none for cluster 4–6). Examples include binding sites for ZNF143, a TF that promotes GPX1 activity and protects cells from oxidative damage during mitochondrial respiratory dysfunction (Lu et al., 2012), which is enriched in cluster 1 (down regulation w/dox, 318 overlapping genes out of 3555 ZNF143 targets, hypergeometric ); RONIN, a regulator of mitochrondrial development and function (Poché et al., 2016), which is enriched in clusters 1 and 2 (217 and 210 overlapping genes out of 2295 targets, and respectively); and MEF2, myocyte enhancer factor 2, involved in regulating muscle development, stress-response and p38-mediated apoptosis (Zarubin and Han, 2005), enriched in cluster 4 (32 overlapping genes out of 741 targets, hypergeometric ).
We next sought to map single nucleotide polymorphisms (SNPs) that modulate the observed inter-individual transcriptomic response to doxorubicin, leveraging available genetic variation across the 45 individuals (Livne et al., 2015). We developed a linear mixed model approach, called suez, that extends the PANAMA framework (Fusi et al., 2012) to account for relatedness amongst individuals, repeat measurements, multiple conditions and latent confounding. Testing SNPs within 1 Mb of the transcription start site (TSS), 518 genes have a variant with a detectable marginal effect on expression (5% FDR, Supplementary file 4). Using a mutual information approach (see Methods) which, unlike a naive replication analysis (Figure 2—figure supplement 1), controls for differential power across GTEx tissues, we find our expression quantitative trait loci (eQTLs) show stronger overlap with the two heart tissues than any other GTEx tissue (Figure 2a, Figure 2—figure supplement 2). Remarkably, even with our moderate number of individuals, we are able to detect many response-eQTLs (reQTLs), i.e. variants that modulate (directly or indirectly) transcriptomic response to doxorubicin. We found reQTLs for 376 genes at a nominal 5% FDR (Supplementary file 5), which we estimate using a parametric bootstrap corresponds to a true FDR of (Figure 2b). We explored leveraging allele specific expression (ASE) extending our previous work (Knowles et al., 2017; van de Geijn et al., 2015). We fit a beta-binomial generalized linear model (GLM) where the response variable corresponds to alternative vs reference read counts and the independent variable is the heterozygosity of the test regulatory eSNP. We found it impractical to directly relate effect sizes in the total expression and ASE models so we instead combined likelihood ratios from the beta-binomial GLM and suez likelihood into a single test statistic. This approach yielded 447 reQTLs at 5% FDR (Supplementary file 6), an increase of 19% over using total expression alone. We hypothesize that this relatively modest increase in power is due to a) suez already being reasonably well powered in this direct perturbation setting and b) the somewhat low sequencing depth of our samples.
To characterize the detected reQTLs we assigned the response of the major and minor allele to one of the six clusters previously learned (Figure 1c), with heterozygotes expected to display the average of the two homozygous responses. 172 (46%) of reQTLs result in a qualitatively distinct response as determined by the two alleles being assigned to different clusters. The most common transition, occurring for 33 reQTLs, is that the major allele is associated with simple down-regulation (cluster 1) in response to doxorubicin, whereas the minor allele shows up-regulation at low concentration followed by down-regulation at higher concentration (cluster 2).
We further broke-down the significant reQTLs by considering the effect of genotype on expression at each concentration ( in Equation 6). We normalized the effect sizes relative to the with the largest absolute value, i.e. we consider , so that the largest genotype effect always corresponds to a normalized value of 1. The resulting normalized effect profiles were split into nine clusters using -means clustering (Figure 2d). The largest cluster (cluster 1, 85 reQTLs) represents reQTLs with a modest effect size at low concentrations () which is amplified at higher concentrations (Figure 2e shows a highly significant example). Cluster two corresponds to reQTLs whose effect size is attenuated at the treatment: examples of reQTLs in this cluster tend to be associated with higher expression level at the treatment (e.g. rs16853200’s association with ABCA12 response, Figure 2—figure supplement 3).
A non-synonymous coding variant in RARG, rs2229774, was previously associated with ACT (Aminkeng et al., 2015). Since RARG codes for a transcription factor we searched transcriptome-wide for rs2229774 trans-eQTLs: genes where the expression response to doxorubicin appears to be different for different RARG alleles. Only two of the individuals in our panel carry the alternative A allele (as heterozygotes) with the rest being homozygous reference (GG). While this limits statistical power, suez detects one marginal effect (RECQL) and five response trans-eQTLs (NMRK1, VMA21, PAQR3, SGIP1 and LRRC2) at 5% FDR (Figure 2—figure supplement 4). Interestingly PAQR3, a membrane protein localized to the Golgi apparatus, is a negative regulator of antioxidant response through the Nrf2-Keap1 pathway (Zhang et al., 2016). LRRC2 is a mitochondrial protein whose RNA expression level has been previous linked with heart failure (McDermott-Roe et al., 2017).
Oxidative stress, a major downstream consequence of doxorubicin exposure, disrupts splicing of individual genes including HPRT, POLB (Disher and Skandalis, 2007), and SMA (Seo et al., 2016). We queried the extent to which doxorubicin exposure disrupts splicing patterns across the transcriptome using LeafCutter (Li et al., 2017). Across all samples LeafCutter detected 27769 alternative splicing ‘clusters’ (referred to here as ‘ASCs’ to avoid confusion with -means clusters), which correspond approximately to splicing events, with a median of 3.0 splice junctions per ASC. Of 17755 ASCs with sufficient coverage to test, 10430 (59%), corresponding to 6398 unique genes, showed an effect of doxorubicin exposure on splicing outcomes (5% FDR, Supplementary files 7–8). To characterize these changes we calculated the entropy of the splicing choices made for each significant ASC at each concentration and used -means clusters patterns of change in entropy (Figure 3a ). The largest cluster has 6166 ASCs (59%), and corresponds to the null of no clear change in entropy across concentrations. Clusters 2 () and 5 () correspond to increasing entropy with concentration, and clusters 3, 4, 6, 8 and 9 correspond to the maximum entropy being at different concentrations and reaching different maximum levels. Interestingly, only the relatively small cluster 8 ( of ACSs) corresponds to a reduction in entropy at higher concentrations, suggesting the dominant behavior is reduced splicing fidelity and increased alternative splicing in response to doxorubicin.
We further tested the hypothesis that splicing fidelity decreases in the presence of doxorubicin by comparing patterns of intronic percent excised () with canonical vs cryptic (unannotated) splice site usage. We clustered the 7792 introns in significantly differentially spliced ASC, that have a change in percent excised for some pair of concentrations, into eight response patterns based on their relative excision proportions across concentrations. For each cluster we calculated the proportion of member introns with neither end annotated, one end unannotated, or both ends annotated (Figure 3b). The clusters representing increased with concentration (clusters 2, 4, 6 and 7) all show enrichment for cryptic splice site usage. The two most populous clusters (1 and 2) correspond to decreasing and increasingly continuously with doxorubicin concentration, respectively, and the difference in levels of cryptic splicing is extremely apparent (hypergeometric , odds ratio for one annotated end vs two is ).
We additionally used LeafCutter quantification of percentage spliced in (PSI) for each splice junction to map splicing QTLs (sQTL) and response-splicing QTLs (rsQTL) using suez. We tested SNPs within 100 kb of either end of the splice junction. At 5% FDR we found 467 ASCs with a marginal effect sQTL (Supplementary file 9) and 42 with a rsQTL (Supplementary file 10). An example rsQTL is rs72922482’s association with inclusion of exon 2 of APAF1 Interacting Protein (APIP). Under the major T allele exon skipping is extremely rare: the LeafCutter PSI for the spanning junction ranges from to across concentrations (Figure 3c). In rs72922482 heterozygotes, however, the exon is skipped in a significant proportion of transcripts, and this effect is most pronounced in the data collected after treatment at , with approximately 50% exon inclusion, suggesting the minor C allele results in very low inclusion of the cassette exon. Another interesting example is NDUFAF6, another mitochrondrial Complex I protein, where doxorubicin exposure (particularly at ) results in increased use of an alternative downstream transcription start site (TSS) which unmasks the influence of rs896853 on a cassette exon between the two alternative TSS (Figure 3—figure supplement 1).
We used the level of cardiac troponin released into the culture media by lysed cardiomyocytes (see Methods, Supplementary file 11) to estimate damage occurring as a result of doxorubicin exposure at different concentrations. We observed significant variation in measurable damage caused by doxorubicin across individuals, with 13 of 45 cell lines having a significant correlation between doxorubicin dose and troponin measurement (Figure 4a). We first sought to determine whether the inter-individual variation in troponin in culture could be explained by variation in the overall gene expression response. Since we are interested in this case in inter-individual differences rather than differences between concentrations we normalized the troponin measurements to have 0 mean and variance of 1 across samples at each doxorubicin treatment. We found 96.1% (95% credible interval ) or 91.5% of the variance in this normalized troponin level could be explained using gene expression levels (we excluded the troponin genes TNNT1-3 and TNNI1-3 from the analysis) at the corresponding doxorubicin concentrations, using a GREML-analysis (Yang et al., 2010) or leave-out-one cross validated (LOOCV) lasso (Tibshirani, 1996) respectively. The optimal lasso model included 118 genes (Supplementary file 12). To test whether gene expression mediates a link from genotype to troponin level we performed a transcriptome-wide association study (TWAS, Gamazon et al., 2015). For each gene we built an elastic-net predictor of expression at each doxorubicin concentration using SNPs within 100 kb, with 10-fold cross-validation to choose the regularization parameters. The fitted predictions (the ‘pre-validation’ values) represent the genetically-determined component of expression. We used the 3840 genes with a statistically significant genetic component (at 1% FDR) to predict troponin level using LOOCV lasso regression. 89% of the variance in normalized troponin level can be explained by the genetic component of 102 genes (Supplementary file 13). This analysis is analogous to two-stage least squares Mendelian randomization (Angrist and Imbens, 1995) analysis and therefore suggests the existence of a causal link from genotype through gene expression to troponin level, and highlights potential mediating genes. However, further assumptions — in particular that the SNPs and troponin level are independent conditional on gene expression — would be required to formally establish a causal connection.
To further explore the relationship between transcriptomic response and troponin presence in culture, we analyzed differential expression (DE) with respect to troponin measurement at each doxorubicin concentration separately. We found 0, 7, 78, 2984 and 2863 differentially expressed genes (5% FDR, Supplementary file 14) at the five concentrations respectively (Figure 4b). The most strongly DE gene (with respect to effect size) at the treatment is DUSP13, a known regulator of ASK1-mediated apoptosis (Park et al., 2010). The large number of DE genes at the and treatments are broadly shared (nominal replication rate 82 to 85%), and DE genes at the treatment generally represent the most strongly DE genes at the higher concentrations (Figure 4c).
To compare troponin measurements to transcriptomic response we determined an overall per-individual level of transcriptomic response with respect to doxorubicin concentration. To this end we fit a principal curve (Hastie and Stuetzle, 1989) through all gene expression samples, initializing the curve to pass sequentially through the successive doxorubicin concentrations (Figure 4d). Projecting every sample on the principal curve gives a single measure of ‘progression’ through response to doxorubicin at increasing concentrations. We then regressed these values against concentration for each individual to obtain a progression rate. We found the troponin measurement slope is significantly negatively correlated (Spearman , Figure 4e) with the transcriptomic response rate, suggesting that much of the gene expression program being activated in response to doxorubicin is in fact protective against cardiac damage.
Using previously published data (Burridge et al., 2016), we built a predictive model of ACT risk trained on RNA-seq of ICs exposed to doxorubicin from doxorubicin-treated patients who did (‘DOXTOX’, ) or did not (‘DOX’, ) develop ACT. Using lasso with fixed the optimal model included 17 genes as features (Supplementary file 15). We applied this model to our expression data from the treatment (since this concentration shows excellent concordance with the data of Burridge et al., see Figure 1—figure supplement 6) to obtain predicted log-odds of ACT. While these log-odds are unlikely to be well-calibrated due to differences in the training and test datasets, they may still accurately represent relative risk of ACT across our 45 individuals. Indeed, the log-odds correlated significantly with the troponin measurement slope (Spearman correlation , Figure 4f), suggesting our troponin measurements, and by extension our expression response data, recapitulate in vivo cellular response to doxorubicin.
To determine the disease-relevance of our molecular QTLs we obtained summary statistics for the largest ACT GWAS to date (Schneider et al., 2017). While this GWAS was not sufficiently powered to find genome-wide significant associations, 11 variants representing nine independent loci have , with the most significant (rs2184559) at . Of the 8 GWAS variants with either tested in our eQTL mapping, or in high LD () with a tested SNP, seven have a nominally significant marginal eQTL (, the 8th has ) and four have a reQTL with . The one replicated variant in this GWAS, rs28714259, was not genotyped in our data but is in high LD () with rs11855704 which is a nominally significant marginal eQTL for tubulin gamma complex associated protein 5 (TUBGCP5, Figure 3—figure supplement 2). rs4058287 (GWAS -value ) has a marginal effect on Alpha-Protein Kinase 2 (ALPK2, also known as ‘Heart Alpha-Protein Kinase’ since it was discovered in mouse heart (Ryazanov et al., 1999) and is expressed in few other tissues (Melé et al., 2015)) expression () as well as a weak interaction effect (, see Figure 5a). Interestingly, ALPK2 has been shown to upregulate DNA repair genes and to enable caspase-3 cleavage and apoptosis in a colorectal cancer model (Yoshida et al., 2012). The replicating variant from Aminkeng et al., 2015, rs2229774 only occurs in two individuals in our cohort (who are heterozygous) making eQTL mapping infeasible. Additionally we find a marginal effect eQTL (, Figure 5—figure supplement 1) on SLC28A3 for rs885004, which has previously been associated with ACT in a candidate gene study (Visscher et al., 2013). rs885004 is intronic, is in LD () with another ACT implicated variant, rs7853758 (Visscher et al., 2012), and falls in a DNase I hypersensitivity and H3K27ac peak present in numerous ENCODE cell lines (and is open in our ICs according to ATAC-seq data, see Figure 5—figure supplement 2).
To determine whether our molecular QTLs are more useful than published QTLs for interpreting ACT risk variants we first sought to obtain the best powered GWAS data possible. Since the Schneider et al. GWAS was overall underpowered, we obtained additional ACT GWAS summary statistics from a more recent study (Serie et al., 2017) and performed a meta-analysis with Schneider et al. We used this data to assess whether there was detectable enrichment of low GWAS p-values for our regulatory QTLs. When considering eQTL with nominal p<10−5 (corresponding approximately to 5% FDR) we found no enrichment for GWAS p<0.05 for three GTEx tissues (heart, brain and lymphoblastoid cell lines—LCLs), our marginal effect eQTLs or baseline (no doxorubicin) only eQTLs (Figure 5—figure supplement 3). However, considering SNPs that are either main effect or response eQTL we see significant enrichment (one-sided hypergeometric p=6 × 10−20, OR = 1.40). Similarly for ‘combined’ eQTL where we explicitly test for any effect of genotype (main or interaction effect, see Methods) we see enrichment (p=5 × 10−12, OR = 1.29). Furthermore, focusing on response eQTL we see a stronger enrichment (p=2 × 10−37, OR = 1.95, Figure 5b), suggesting that the enrichment in combined eQTL is driven by this signal. Response eQTLs mapped using allelic-specific expression as well as total expression show the strongest enrichment (p=6 × 10−60, OR = 2.22). When considering splicing QTLs (Figure 5—figure supplement 4) we found no enrichment for marginal sQTLs mapped in LCLs (Li et al., 2016). Interestingly, in contrast to the total expression QTLs we found a significant enrichment (p=4 × 10−17, OR = 1.36, Figure 5c) for IC marginal sQTLs although the enrichment in response sQTLs was still higher in absolute terms (p=2 × 10−5, OR = 1.57). These findings are indicative that molecular response QTL mapping has potential for understanding the molecular basis of environmentally-dependent human disease.
Finally, we attempted colocalization analysis for our response eQTLs and meta-analyzed GWAS using coloc (Giambartolomei et al., 2014), a Bayesian method based on summary statistics. For each region (gene in our case) coloc infers a posterior probability for each of five possibilities: (H0) no association, (H1) association for the eQTL only, (H2) association for the GWAS only, (H3) independent variants, or (H4) colocalization to one variant. Out of 43 genes with a SNP with a reQTL at p<10−5 and GWAS SNP at p<0.05 coloc gave maximum posterior probability to the null hypothesis (H0, no association) for 32 genes, association only for reQTL (H1) for 9, and colocalization (H4) for one gene, NOL10 (posterior probability of colocalization 0:54, Figure 5—figure supplement 5, Supplementary file 16). While these results suggest our data is not sufficiently well-powered for colocalization analysis, we note that the posterior probability of colocalization (H4) is higher than that for independent signal (H3) in 40/43 tested genes.
Human iPSC-derived somatic cells provide a powerful, renewable and reproducible tool for modeling cellular responses to external perturbation in vitro, especially for non-blood cell-types such as cardiomyocytes which are extremely challenging to collect and even then are typically only available post-mortem. We established a sufficiently large iPSC panel to effectively query the transcriptomic response of differentiated cardiomyocytes to doxorubicin. We were also able to characterize the role of genetic variation in modulating this response, both in terms of total expression and alternative splicing. There are, of course, caveats associated with using an in vitro system, which may not accurately represent certain aspects cardiac response to doxorubicin in vivo. That said, the replication of GTEx heart eQTLs, association of troponin levels with predicted ACT-risk (Burridge et al., 2016), and the observed GWAS enrichment, all support the notion that the IC system recapitulates substantial elements of in vivo biology. It is challenging to quantify this agreement, and there are in vivo factors that are certainly not represented. For example, excessive fibrosis may contribute to ACT (Cascales et al., 2013; Zhan et al., 2016; Farhad et al., 2016; Heck et al., 2017), although is unclear how substantial this contribution is as well as whether fibroblasts are directly activated by doxorubicin exposure or simply respond indirectly to cardiomyocyte damage. While our FACS analysis shows cardiomyocytes are the dominant cell type in our cultures, heterogeneity remains and other cell types could be mediating some of the observed changes.
For many diseases such as ACT which involve an environmental perturbation it is reasonable to suppose that eQTLs detected at steady-state are only tangentially relevant when attempting to interpret disease variants. Such concerns motivated us to focus on response eQTLs, that is, variants that that have functional consequences under specific cellular conditions because they interact, directly or indirectly, with the treatment. We used a statistical definition of reQTLs corresponding to cases where gene expression levels are significantly better explained using a model including an interaction term between genotype and treatment (represented as a categorical variable), compared to a model with only additive effects for genotype and treatment. Our characterization of the detected reQTL demonstrates that these variants are indeed candidate drivers of differences in individual transcriptomic response to doxorubicin. The strongest reQTL effects correspond to completely different response patterns for the major and minor alleles, while weaker effects correspond to more subtle modulation of the same response pattern. We note that it is not necessarily the case that such reQTLs are the only functionally relevant eQTLs. eSNPs with a marginal (additive) effect on expression of a gene responsive to doxorubicin (as most genes are) could still be important if the relationship between expression and ACT-risk is nonlinear, for example involving thresholding effects.
We observed a statistical enrichment of expression and (to a lesser extent) splicing QTLs in ACT GWAS. However, with no reproducible genome-wide significant associations available, fine-mapping of causal variants remains fraught. We anticipate our findings will be increasingly valuable as larger-scale ACT GWAS become available.
We derived ICs from healthy individuals so we do not known which individuals would develop ACT if they required anthracycline treatment. Mapping molecular response QTLs in larger panels of ICs from patients treated with anthracyclines who do or do not develop ACT symptoms would allow stronger conclusions to be drawn about the contribution of the detected (r)eQTLs to disease etiology.
We used a panel of Hutterites individual since this homogeneous population offers unique advantages for mapping genetic traits: exposure to a fairly uniform environment and less variable genetic background, despite still representing much of European diversity (Newman et al., 2004). However, the genetic basis of ACT susceptibility is likely complex and some relevant genetic variation may not be well represented in this cohort.
Finally, an interesting observation in our study is that splicing fidelity is reduced upon doxorubicin exposure. This is not completely unexpected since a key downstream side-effect of doxorubicin is increased oxidative stress, which has been previously associated with dysregulated splicing of specific genes (Disher and Skandalis, 2007; Seo et al., 2016). Our finding that this effect is prevalent across the transcriptome poses further questions about what known effects of doxorubicin might, in fact, be mediated by changes in RNA splicing.
Generation of lymphoblastoid cell lines (LCLs) and genome-wide genotyping of many individuals from a multi-generational pedigree were performed previously. Briefly, lymphocytes were isolated from whole blood samples using Ficoll-Paque and immortalized using Epstein Barr Virus (EBV) (Cusanovich et al., 2012; Cusanovich et al., 2016). Phased genotypes were obtained by combining pedigree information, genotypes from SNP arrays, and genotypes from whole genome sequencing of related individuals (Livne et al., 2015).
We reprogrammed 75 LCLs to iPSCs using episomal plasmid vectors, containing OCT3/4, p53 shRNA, SOX2, KLF4, L-MYC, and LIN28 which avoids integrating additional transgenes (Okita et al., 2011). Initially, the lines were generated on mouse embryonic fibroblasts (MEF), which coated the well and served as feeder cells to create an environment supportive of pluripotent stem cells. The colony was then mechanically passaged on MEF and tested for expression of pluripotency-associated markers by immunofluorescence staining and RT-PCR. The lines were passaged for at least 10 weeks on MEF to ensure lines had stabilized.
All iPSC lines were characterized as described previously (Gallego Romero et al., 2015). Briefly, we initially performed qPCR using 1μg of total RNA, converted to cDNA, from all samples to confirm the endogenous expression of pluripotency genes: OCT3/4, NANOG, and SOX2 (Figure 1—figure supplement 1). We next confirmed pluripotency using PluriTest (Müller et al., 2011). All samples were classified as pluripotent and had a low novelty score (Figure 1—figure supplement 2). Additionally, we confirmed the ability of all iPSC lines to differentiate into the three main germ layers using the embryoid body (EB) assay (Supplementary Data). Finally, we tested for the presence and expression of the EBV gene EBNA-1 using PCR (Figure 1—figure supplement 3, Figure 1—figure supplement 4). We tested all samples for both genomic integrations and vector-based EBV. If the cells were positive (two positive and one indeterminate case was identified), we further tested the origin of the EBV (genomic or episomal) using primers specific to the LMP-2A gene found in EBV or part of the sequence specific to the episomal plasmid (Figure 1—figure supplement 3). We concluded that two lines still had EBV present in the genome, this was also reflected in EBNA-1 gene expression for these individuals (Figure 1—figure supplement 4). We retained these individuals because they passed all quality control metrics and were not outliers based on genome-wide gene expression. It should also be noted that gene expression levels are extremely similar between iPSC lines. This relative homogeneity further demonstrates the quality of our iPSC lines. In summary, all iPSC lines showed expression of pluripotent genes quanti1ed by qPCR, generated EBs for all three germ layers, and were classified as pluripotent based on PluriTest.
iPSC lines were transitioned to feeder-free conditions, which was necessary to prime the iPSCs for differentiation. Next we differentiated the iPSCs to cardiomyocytes (Lian et al., 2013; Burridge et al., 2014). iPSC lines were covered with a 1:60 dilution matrigel overlay for 24 hr. On day 0 iPSC lines were treated with 12M of the GSK3 inhibitor, CHIR99021, in RPMI+ B27 medium (RPMI1640, 2 nM L-glutamine and 1x B27 supplement minus insulin) for 24 hr at which time media was replaced with fresh RPMI + B27. 72 hr after the addition of CHIR99021 (Day 3), 2M of the Wnt inhibitor Wnt C-59 was added for 48 hr. Fresh RPMI + B27 was added on Days 5, 7 and 10. Beating cells appeared between Days 8–10. These cardiomyocytes consisted of ventricular, atrial and pacemaker-like cells. The cells formed thick layers and contract throughout the well. Metabolic selection was used to purify the cardiomyocytes (Tohyama et al., 2013) from Day 14 to Day 20 when glucose-free RPMI media supplemented with the components essential for cardiomyocyte differentiation (Burridge et al., 2014), ascorbic acid and human serum albumin, together with lactate, a substrate uniquely metabolized by cardiomyocytes, was added to cells. Because this lactate media can only be metabolized by cardiomyocytes, the non-cardiomyocytes in the culture were removed over the 6 day treatment. On day 20 the cardiomyocytes, now at a high cTnT purity, were replated for experiments in media that contains only galactose and fatty acids as an energy source. This galactose media forces the cardiomyocytes to undergo aerobic respiration, rather than anaerobic glycolysis common in cultured cells.
At day 20 when ICs were plated for doxorubicin exposure, a portion of cells were collected to assess purity. Cells were harvested from plates by incubating with TrypLE Express (Thermo Fisher Scientific, Cat. No 12604013) for 5 min at 37°C. Once removed, cells were manually dissociated further by passing through a 100m and then 40m strainer to create a single cell suspension. Cells were then resuspended, fixed, and permeabilized (Foxp3/Transcription Factor Staining Buffer Set; eBioscience/Affymetrix, Cat. No 00-5523-00) according to the manufacturer’s instructions. Cells were stained with directly conjugated antibodies to cTnI (Alexa Fluor647 Mouse Anti-Cardiac Troponin I Clone C5; BD Biosciences, Cat. No 564409) and cTnT (PE Mouse Anti-Cardiac Troponin T Clone 13–11; BD Biosciences, Cat. No 564767). The Zombie VioletTM Fixable Viability Kit (BioLegend, Cat. No 423113) was used to assess cell viability at the time of fixation. The following isotype controls were used: Alexa Fluor647 Mouse IgG2b, Isotype Control Clone 27–35 (BD Biosciences Catalog No. 558713) and PE Mouse IgG1, Isotype Control Clone MOPC-21 (BD Biosciences Catalog No. 554680). Cells were analyzed using a FACS Canto or LSR-II flow cytometers (BD Biosciences), and the data were analyzed with FlowJo software (v10.0.7, Tree Star). All gates were established such that <2% of cells stained with isotype controls were positive and dead cells were excluded.
We incubated the cardiomyocytes in 0, 0.625, 1.25, 2.5, or 5 M doxorubicin. After 24 hr, we collected the serum and cells from each condition. From the serum, we measured cardiac Troponin T levels using the ABNOVA Troponin I (Human) ELISA kit (cat. no. KA0233). From the cells, we extracted RNA for sequencing. Cells from each individual were treated separately, but batches of experiments were performed on different days. Each treatment batch contained 1 to 4 individuals. RNA quality was assessed with the Agilent Bioanalyzer.
We prepared libraries using the Illumina TruSeq Library Kit and generated 50 bp single-end reads on a HiSeq 4000 at the University of Chicago Functional Genomics Facility. We confirmed sequencing quality using FastQC and MultiQC (Ewels et al., 2016). We confirmed sample identity by (1) comparing allelic counts (quantified using samtools mpileup [Li et al., 2009]) of exonic SNPs to the known genotypes and (2) running verifyBamID (Jun et al., 2012).
We aligned RNA-seq reads using STAR version 2.5.2a (Dobin et al., 2013) to GRCh38/GENCODE release 24. We counted reads using feature Counts (Liao et al., 2014) and calculated counts per million reads (cpm) using `cpm` from the `edgeR` ‘R package (version 3.18.1) (Robinson et al., 2010). We discarded samples with exonic reads and genes with median less than 0.
We performed differential expression (DE) analysis across all five doxorubicin concentrations jointly, using either a linear model on quantile normalized cpm value or Spearman correlation, followed by Benjamini-Hochberg False Discovery Rate (FDR) control. Since the vast majority of genes showed differential expression we did not investigate better powered DE methods such as DESeq2.
We clustered genes into ‘response patterns’ using a -component mixture model
where is a prior probability vector over cluster assignments, Dir is the Dirichlet distribution, is cluster from which gene is generated, is the expression of gene in individual at concentration , is the mixture parameter (mean) across concentrations for cluster , and is a shared noise variance. We marginalize (sum) over and optimize with respect to using the rstan R package (Carpenter et al., 2016) (version 2.16.2). The hyperparameters of the Dirichlet distribution are set such that in the limit of large the model approximates a Dirichlet process mixture (MacEachern and Müller, 1998) which automatically learns of an appropriate number of mixture components to use from data.
Gene set and promoter motif enrichment were performed using HOMER v4.9.1 (Heinz et al., 2010) using default parameters and without de novo motif search.
We developed an extension of the PANAMA (Fusi et al., 2012) linear mixed model (LMM) framework to map eQTLs and response eQTLs while accounting for latent confounding, which we call suez. suez entails a two step procedure. Step one is used to learn latent factors from all genes, using the model
where are latent factors, are per gene, per concentration fixed effects. We integrate over and , which results in a per gene multivariate normal,
where refers to the vector of expression for gene across all individuals and concentrations (i.e. all ‘samples’ where a sample is an individual-concentration pair), is a matrix mapping concentrations to samples (i.e. iff sample is at concentration ) and is a matrix of which samples are for the same individual (i.e. if sample and sample come from the same individual). We optimize and the variances jointly across all genes .
In step 2 we test individual gene-SNP pairs while accounting for confounding using the covariance matrix
which includes both latent confounding, individual random effects and similarity due to kinship. We consider three LMMs, all with the same parameterization of the covariance where and are optimized along with the fixed effects to allow the extent to which each gene follows the global covariance pattern to be adapted. The simple structure of this covariance also allows pre-computation of the eigen-decomposition of which enables linear (rather than cubic) time evaluation of the likelihood and its gradient.
Model 0 involves no effect of the SNP (and can therefore be fit once for a gene) and a fixed effect for concentration. Model 1 adds a marginal effect of the SNP genotype dosage . Finally model 2 adds an interaction effect between concentration and genotype, which is equivalent to a concentration-specific genotype effect. In summary:
We optimize , and the regression coefficients for each of the three models separately, and use likelihood ratio tests (LRT) to compare the models. Comparing Model 1 vs 0 (one degree of freedom) tests whether there is a marginal effect of the variant. Comparing Model 2 vs 1 ( degrees of freedom, where is the number of conditions/concentrations) tests whether there is an interaction effect, i.e. whether the genetic effect on expression is different at different concentrations (or equivalently whether the response to doxorubicin is different for different genotypes). Finally Model 2 vs 0 ( degrees of freedom) tests whether there is any effect of genotype on expression, either in terms of a marginal or concentration-specific effect (we refer to these as ‘combined’ eQTL). We use the conservative approach of using Bonferroni correction across SNPs for a gene, followed by Benjamini-Hochberg FDR control.
We quantile normalize the expression levels across all samples for each gene to a standard normal distribution so that the distributional assumptions of our linear mixed model are reasonable. However, optimizing the variance parameters and means that the distribution for the LRT will only hold asymptotically and -values for finite sample sizes will tend to be somewhat anti-conservative. To account for this for response-eQTLs, we use a parametric bootstrap since there is no fully valid permutation strategy for testing interaction effects. This involves first fitting Model 1 and then simulating new expression data under the fitted model. Models 1 and 2 are then (re)fit to this data and compared using an LRT. We then perform Bonferroni correction across SNPs for each gene to obtain an empirical null distribution of per gene -values which we use to estimate the true FDR for our response-eQTL results.
For significant reQTLs we assigned the response of the minor allele and major allele to the previously determined clusters using the model
where is the expression for individual at concentration , and are the cluster assignments for the major and minor allele respectively, is the genotype dosage, and and are fixed at the values learned in Equation 1. For each reQTL separately we calculate the likelihood of given all possible pairs of assignments and choose the maximum likelihood solution.
As for all -means clustering in the paper, we used KMeans_rcpp function of the R package ClusterR v1.0.6, taking the best of 10 initializations using the k-means++ option, to cluster the normalized genotype effect profiles of the significant associations. The choice of 9 clusters was determined manually.
We (Knowles et al., 2017; van de Geijn et al., 2015) and others (Kumasaka et al., 2016) have demonstrated that modeling allele-specific expression can improve power to detect both cis eQTLs (van de Geijn et al., 2015; Kumasaka et al., 2016) and reQTLs (Knowles et al., 2017). Here we employ a combination of ideas from these methods:
We assume our computational phasing between the regulatory and exonic SNP(s) is correct, since we have previously shown that errors in phasing reduce power but do not inflate false positives (van de Geijn et al., 2015).
We use a beta-binomial (denoted ) likelihood allowing exact likelihood calculations and straightforward maximum likelihood parameter estimation via LBFGS in Stan (Carpenter et al., 2016). We use the parametrization where is the total count, is the mean, and is the concentration. The usual pseudo-count parametrization is recovered as .
We model multiple exonic SNPs per gene to ease integration with the total expression signal from suez.
Under the hypothesis that the allelic effect of the test regulatory SNP varies across concentrations (analogous to Model 2 in Equation 6), we have
where there are exonic SNPs in a gene, with alternative allele counts and read coverage . is the logistic function. is an intercept term to account for reference mapping bias, and is a per-gene concentration (reciprocal of over dispersion) parameter. We regularize using a prior. is the phased heterozygosity of the test regulatory SNP in individual , with if the regulatory SNP is homozygous (these individuals are included to help estimate and ) and or if the regulatory SNP is heterozygous and in phase (1) or opposite phase (-1) with SNP . is the effect size in concentration . The null (no interaction effect, corresponding to Model 1 in Equation 6) is that for all all .
To integrate evidence from total and allelic specific expression it is valid to add the log likelihood ratios, which can be seen as either fitting one model with likelihood terms for the two components, or as a result of random variables being closed under addition. This approach has substantially better power than applying Fisher’s combined probability test to -values from testing the total and allelic expression components separately. Twice the summed log likelihood ratio is asymptotically with degrees of freedom being simply the sum of the degrees of freedom of the two components (so usually in our case). In practice we only fit the total expression model if there are at least alternative alleles observed for the test regulatory SNP, and only fit the allele-specific model if there are at least supporting allelic reads for the gene, so some regulatory SNPs are only tested using one component or the other. In addition, for some genes whose expression is very low for specific concentrations there may be no allelic reads for a concentration, in which case the degrees of freedom for the allele-specific component will be reduced since no is learned for that concentration.
We initially compared our eQTLs to GTEx eQTLs by estimating Storey’s π1 (Storey and Tibshirani, 2003), using the q value R package v2.8.0, for GTEx nominal p-values for our significant eQTLs (at a nominal p<10−5). While the GTEx heart tissues show higher replication than most tissues, surprising tissues ranked higher (Figure 2—figure supplement 1). We reasoned that differential power across the GTEx tissues due to differing sample size and noise levels confound this simple approach. We therefore used an extension of Storey’s π1 to test for overlap between two sets of p-values. For each GTEx tissue we fit (using LBFGS in Stan) a mixture model
where is the -value for SNP-gene pair in tissue , is the uniform distribution on (corresponding to -values coming from the null) and is the beta distribution (corresponding to non-null -values). correspond to mixture weights are estimates of the proportion of SNP-gene pairs that are (a) null for both tissues, (b) null for tissue one and non-null for tissue 2, (c) non-null for tissue one and null for tissue 2, (d)non-null for both tissues. Note that sums to . We constrain the hyperparameters and to encode the assumption that non-null SNP-gene pairs should have low -values. Due to the large number of SNP-gene pairs tested, in practice we bin -values on a regular grid and use the bin counts to weight the likelihood. Finally we estimate the mutual information between the pair of tissues as:
where and are marginal probabilities. This approach explicitly estimates the proportion of null tests in tissue 1 () and tissue 2 () as well as the proportion of tests that are non-null in both (). This approach both controls for power in both tissues and negates the need to choose arbitrary significance thresholds.
We ran LeafCutter v0.2.6_dev (using default settings) which allows joint differential intron excision testing across more than two conditions. For each Alternative Splicing Cluster (ASC) LeafCutter fits a set of probability vectors , across detected splice junctions , at each concentration . For ASCs determined to be significantly (5% FDR) differential spliced across concentrations, we calculated the entropy at each concentration . We normalized these profiles as and clustered these profiles, using KMeans_rcpp as above.
To investigate the relative usage of cryptic splice sites we first determined the set of 7792 splice junctions that (a) fell in ASCs determined to be significantly differentially spliced (5% FDR) and (b) had . We obtained normalized intron excision rates by subtracting the per intron mean and dividing by the per intron standard deviation. These profiles were clustered using KMeans_rcpp. Cryptic splice site usage was determined by considering all exons in Gencode v26 and ignoring transcript structure (i.e. a junction spanning two splice sites used but only in different transcripts would still be considered ‘annotated’).
For (response) splicing QTL we calculated within ASC intron excision with pseudocount of , and set entries with denominator (no reads for that ASC in that sample) to the mean across all other samples. These values were then (1) -score normalized across samples and (2) quantile normalized to a normal across introns. QTL mapping was then performed using suez considering each intron as a ‘gene’.
We assessed the proportion of variance in cardiac troponin explained by gene expression response. Let represent the troponin level measured in individual at doxorubicin concentration , normalized to have 0 mean and variance 1 across individuals at each concentration. Let be the expression of gene (in individual at concentration ), -score normalized across samples. We consider the linear model
where is noise and the coefficients are given a prior where is the number of genes in the analysis. Integrating over we have
We optimize this model wrt and to obtain an estimate of the percent variance of explained by . A Bayesian credible interval for is obtained under this model using 8000 iterations of Hamiltonian Monte Carlo (with the first 4000 discarded as burnin) implemented using RStan (Carpenter et al., 2016) (v2.16.2).
For the transcriptome-wide association study for cardiac troponin levels we use the R package glmnet (v2.0–13) to build elastic-net predictors of gene expression for each gene, using 10-fold cross-validation to choose and which we found gave comparable performance to higher values. A single model was learnt per gene jointly across concentrations, including main effects for all SNPs within 100 kb of the gene TSS, main effects for doxorubicin dosage (encoded categorically) and interaction terms between each SNP and the dosage factor. The fitted values on the test-folds from the cross-validation are known as the ‘prevalidation’ response. To test which genes have a significant genetic component we tested (using analysis-of-variance) whether the observed expression was better predicted under a linear model including the prevalidation values and the dosage variable than by dosage alone. The prevalidated response for the 3840/12317 genes (1% FDR) genes that are predictable from genotype are then used to predict troponin level, normalized as for Equation 10, using leave-out-one-cross-validated lasso regression.
All the custom analysis scripts used for this project are available at https://github.com/davidaknowles/dox (Knowles and Blischak, 2017; copy archived at https://github.com/elifesciences-publications/dox). The suez response eQTL mapping R package is available at https://github.com/davidaknowles/suez (Knowles, 2017; copy archived at https://github.com/elifesciences-publications/suez). The following data are available as Supplementary Data: (1) differential expression cluster assignments, (2) significant (5% FDR) eQTLs and sQTLs, (3) differential splicing results, (4) levels of cardiac troponin and the predicted transcriptomic response. In addition to the Supplementary Data included with this paper additional results are hosted at http://web.stanford.edu/~dak33/dox/ and Dryad (doi:10.5061/dryad.r5t8d04) including (1) gene-by-sample matrix of RNA-seq quantification (log counts per million), (2) LeafCutter intron excision quantification (3) p-values for all tested eQTLs, reQTLs, sQTLs, and rsQTLs, (4) RARG variant response and marginal trans-eQTLs, (5) RIN, RNA concentration and other technical covariates, (6) embryoid body imaging for all iPSC lines. The RNA-seq FASTQ files will be added to the dbGaP database (Tryka et al., 2014) under dbGaP accession phs000185 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000185). The genotype data files cannot be shared because releasing genotype data from a subset of individuals in the pedigree would enable the reconstruction of genotypes of other members of the pedigree, which would violate the original protocol approved by the research ethics board (Livne et al., 2015). The summary statistics for the ACT GWAS were given to us by the authors of the studies (Schneider et al., 2017; Serie et al., 2017).
Two-Stage least squares estimation of average causal effects in models with variable treatment intensityJournal of the American Statistical Association 90:431–442.https://doi.org/10.1080/01621459.1995.10476535
Impact of regulatory variation across human iPSCs and differentiated cellsGenome research, 28, https://www.biorxiv.org/content/early/2016/12/05/091660, 10.1101/gr.224436.117, 29208628.
Russ B Altman, and Joseph C Wu. human induced pluripotent stem cell-derived cardiomyocytes recapitulate the predilection of breast Cancer patients to doxorubicin-induced cardiotoxicity.-suppleNature Medicine 220:547–556.
Chemically defined generation of human cardiomyocytesNature Methods 11:855–860.https://doi.org/10.1038/nmeth.2999
Stan: a probabilistic programming languageJournal of Statistical Software 20:1–37.
Integrated analyses of gene expression and genetic association studies in a founder populationHuman Molecular Genetics 25:2104–2112.https://doi.org/10.1093/hmg/ddw061
Effect of candesartan and metoprolol on myocardial tissue composition during anthracycline treatment: the PRADA trialEuropean heart journal cardiovascular Imaging, 10.1093/ehjci/jex159, 29106497.
Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype dataThe American Journal of Human Genetics 91:839–848.https://doi.org/10.1016/j.ajhg.2012.09.004
Fine-mapping cellular QTLs with RASQUAL and ATAC-seqNature Genetics 48:206.https://doi.org/10.1038/ng.3467
PRIMAL: fast and accurate pedigree-based imputation from sequence data in a founder populationPLOS Computational Biology 11:e1004139.https://doi.org/10.1371/journal.pcbi.1004139
Estimating mixture of dirichlet process modelsJournal of Computational and Graphical Statistics 70:223–238.
A bioinformatic assay for pluripotency in human cellsNature Methods 8:315–317.https://doi.org/10.1038/nmeth.1580
Are common disease susceptibility alleles the same in outbred and founder populations?European Journal of Human Genetics 12:584–590.https://doi.org/10.1038/sj.ejhg.5201191
Positive regulation of apoptosis signal-regulating kinase 1 by dual-specificity phosphatase 13ACellular and Molecular Life Sciences 67:2619–2629.https://doi.org/10.1007/s00018-010-0353-3
Genome-Wide association study for Anthracycline-Induced congestive heart failureClinical Cancer Research 23:43–51.https://doi.org/10.1158/1078-0432.CCR-16-0908
Genome-wide association study of cardiotoxicity in the NCCTG N9831 (Alliance) adjuvant trastuzumab trialPharmacogenetics and Genomics 27:378–385.https://doi.org/10.1097/FPC.0000000000000302
Regression shrinkage and selection via the lassoJournal of the Royal Statistical Society Series B (Methodological) 58:267–288.
NCBI's Database of Genotypes and Phenotypes: dbGaPNucleic Acids Research 42:D975–D979.https://doi.org/10.1093/nar/gkt1211
Prevention of anthracycline-induced cardiotoxicity: challenges and opportunitiesJournal of the American College of Cardiology 64:938–945.https://doi.org/10.1016/j.jacc.2014.06.1167
Pharmacogenomic prediction of anthracycline-induced cardiotoxicity in childrenJournal of Clinical Oncology 30:1422–1428.https://doi.org/10.1200/JCO.2010.34.3467
Risk factors for doxorubicin-induced congestive heart failureAnnals of Internal Medicine 91:710–717.https://doi.org/10.7326/0003-4819-91-5-710
Ataxia telangiectasia mutated in cardiac fibroblasts regulates doxorubicin-induced cardiotoxicityCardiovascular Research 110:85–95.https://doi.org/10.1093/cvr/cvw032
Identification of an adaptor protein that facilitates Nrf2-Keap1 complex formation and modulates antioxidant responseFree Radical Biology and Medicine 97:38–49.https://doi.org/10.1016/j.freeradbiomed.2016.05.017
Gilean McVeanReviewing Editor; Oxford University, United Kingdom
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "Determining the genetic basis of anthracycline-cardiotoxicity by response QTL mapping in induced cardiomyocytes" for consideration by eLife. Your article has been favorably evaluated by Mark McCarthy (Senior Editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors. The following individual involved in review of your submission has agreed to reveal his identity: Douglas Sawyer (Reviewer #2).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
This work presents a detailed analysis of how IPSCs from healthy volunteers can be used to investigate the molecular and cellular processes involved in differential response to a widely used chemotherapy, which has a high rate of adverse effects. The work provides new insights into the molecular functions, such as splicing, disrupted by doxorubicin, and some evidence that genetic factors that influence responses in gene expression correlate with risk for adverse events. Although previous work has established the value of IPSCs in this context, this paper presents several important advances.
I have chosen to include the individual reviews as each provides some useful insight. However, there are a few key features that should be addressed when revising the manuscript. I do not expect a full response to every comment raised by each reviewer – just the three highlighted here.
1) The reviewers noted that the evidence linking genetic risk for adverse responses to the genetics of gene expression regulation is rather weak. There are two suggestions for how to advance this. One is to use allele-specific expression to add power to the genetic response in expression analysis (reviewer 3).
2) A second suggestion is to provide a breakdown of the genetic enrichment in the GWAS data by QTL type – see reviewer 1. Also to revisit some of the genes discussed earlier in the paper in the light of the GWAS data.
3) The work should include a more detailed characterisation of the state of the IPSC lines – see reviewer 3 (point 3). This is part of the QC process and important to include.
This paper uses functional genomic and genetic analyses of iPSC cardiomyocytes to investigate the phenomenon of cardiomyocyte toxicity arising from anthracycline chemotherapy. Given the high frequency of this adverse effect and the widespread use of doxorubicin (and related compounds) this is an important problem, though as yet rather little understood. There has been quite a lot of previous and somewhat related work – notably Burridge et al., 2016. However, this paper goes beyond that by using iPSC from 45 healthy individuals (compared to the 3 with ACT and 3 without) and by integrating genetic data.
From my reading, I think the advances made by this paper are:
1) Replication of some of the Burridge et al. results about molecular functions and processes that are influenced by doxorubicin and their correlation with ACT response.
2) A much broader investigation of the dosage effects and variation among individuals. Ultimately, providing convincing evidence that the iPSC system provides a good model for ACT and that variation seen among iPSC lines correlates with ACT susceptibility in vivo.
3) Evidence to support the hypothesis that doxorubicin dysregulates gene splicing.
4) Evidence that there are a large number of genetic variants that affect expression responses to doxorubicin and that these show strong (but by no means complete) overlap with variants that affect gene expression in primary tissue.
5) Evidence that genetic variants that influence risk of ACT in patients are enriched for variants that also affect the expression of genes in response to doxorubicin.
Overall, there is a lot to admire in the paper. It is an extremely impressive and comprehensive use of functional genomics to probe the expression response to doxorubicin in cardiomyocytes. By focusing on variation among individuals, it provides compelling evidence that a strong stress response is a good biomarker for positive outcome. However, the primary focus of the paper concerns the genetic basis of this response – and the (strong and specific) claim being made is that understanding the genetic influences on doxorubicin response in iPSC cardiomyocytes provides biological insight into ACT susceptibility. Personally, I think this claim is rather weak for two reasons.
1) First, the Burridge et al. paper established the relevance of iPSC cardiomyocytes as a model for ACT, so the novelty of this paper rests on light being shed by the integration of genetic data on the iPSC lines. However, the majority of the genetic differences affecting doxorubicin response seem to be seen in GTEx tissues – with only a modest increase enrichment in heart compared to other tissues. The key question is whether reQTLs – i.e. those that are only visible through the doxorubicin treatment – are more relevant to ACT than general QTLs. Otherwise, to analyse the authors could have simply used the gene expression data from Burridge, the GWAS data and the GTEx data to arrive at similar conclusions. A concrete suggestion would be to stratify the enrichment in Figure 5B by general eQTL, cardiomyocyte (GTEx) eQTL and doxorubicin-specific reQTL.
2) Second, the enrichment reported in the GWAS data is really very weak – and certainly not strong enough to direct any future experiments. Put a different way – it doesn't appear to me that genes with strong doxorubicin reQTLs are particularly strong causal determinants of ACT risk. Conversely – I was surprised to find no further analysis of RARG – given that it is the only established GWS risk variant for ACT. While I am convinced that gene expression can be a useful biomarker for ACT, I don't think this paper establishes expression of any specific gene (or pathway) as causal for ACT susceptibility. I would like to see something like a Mendelian randomisation analysis of the response trait identified in Figure 4D to provide evidence to support a causal link.
In summary, this is a rather beautifully executed and analysed experiment, but the central claim – that genetic variation influencing gene responses to doxorubicin treatment in iPSC cardiomyocytes can be used to understand the biology of the phenomenon – is not convincing.
The Investigators took an interesting approach to understand the effects of anthracyclines on the heart, using iPSCs differentiated into cardiomyocytes from a series of subjects to test differential response to doxorubicin. They present an incredibly rich dataset showing the complexity of anthracycline-induced transcriptional changes that vary among individuals. They compare their findings to published GWAS studies exploring the same questions, and present their findings in a clear and thoughtful manuscript. The figures are clear and communicative. This noninvasive approach to utilizing human cells to study this clinically important problem is interesting and potentially very valuable. If applied to a group of samples from patients with and without anthracycline cardiotoxicity, it would be even more interesting as the findings would be more likely interpretable as related to the clinical risk of anthracycline induced cardiotoxicity.
The investigators chose to use samples from a Hutterite population, presumably based upon how the concept that this homogeneous population is good for identifying single gene Mendelian disorders. The genetic basis for cardiotoxicity of anthracyclines is likely more complicated. This limitation should be acknowledged by the authors.
Experimentally the investigators chose a range of doxorubicin concentrations that are high relative to what is likely the exposure of the heart in vivo in the context of clinical doxorubicin use. Extending the concentration range to 0.1 or even 0.05 μm would be worthwhile, as others have shown that cell toxicity can occur in this range in cells in vitro.
The investigators collected serum for cardiac Troponin T measurements, looking for in vitro signs of cytotoxicity. However there may be other forms of toxicity to the heart that occur without myocyte cytolysis, including the changes in transcription and splicing as presented here, as well as effects on endothelial and mesenchymal stem cells. Along these lines, is there any consideration for changes in other cell types, as iPSC cultures are heterogeneous. These issues should be acknowledged by the authors.
ACT can arise years after chemotherapy, it would be interesting to see if the cells were treated and then analzyed at a later timepoint than 24 hours to see if there are further differences than the acute changes after a 24 hour exposure.
In the Discussion the authors mention fibrosis playing a role in ACT. While this has been shown in human populations (e.g. PMID 29106497) using MRI, it is a relatively minor effect. The authors may want to update the references used to support the statement about fibrosis, as well as acknowledge it remains unclear how mechanistic this is in ACT.
This paper presents an analysis of interindividual variation in ips-derived cardiomyocyte gene expression following exposure to doxorubicin (dox).
The authors characterise differential expression and splicing patterns at different dox concentrations, and find good evidence that the in vitromodel they have developed represents a reasonably accurate simulation of ACT in vivo, with the activation of apoptosis and DNA damage repair pathways. They also map QTLs that modulate gene expression and splicing at different dox concentrations, and suggest that these are enriched for GWAS loci for ACT. The paper is well written, the figures are pretty clear and the analyses seem sound. The study is underpowered (n=45) and this places limitations on the conclusions that can be drawn, however, I don't find that they overstate their findings. I have the following major comments:
1) I was surprised that they didn't use allele-specific signals to boost power for association detection because the lead author has recently published a method for doing exactly this in the GxE setting (Knowles et al., 2017). I would like to see whether the GWAS / reQTL examples they highlight are supported by patterns of allele-specific expression, some discussion of why AS signals weren't used in the analysis and what impact this might have on their results.
2) They should try colocalisation of their eQTL (response and otherwise) with the ACT GWAS and report the appropriate summary stats (e.g. posterior probs of shared causality if using "coloc"), as this can be helpful for others who may want to follow up individual associations experimentally.
3) One issue with the use of IPSCs is that there can be substantial variation in the make-up of the cell populations created. I would like them to do a better job of characterising the composition and maturation status of the IPS CMs they have derived – there are existing markers that they can use to distinguishwhether they have made more atrial / ventricular-like CMs, and whether there is variation in CM maturity (for example in this review: http://www.sciencedirect.com/science/article/pii/S0167527317300517)
4) For comparison, they should compute the pi1 scores for their eQTLs across all GTEx tissues (not just heart, brain and LCLs).https://doi.org/10.7554/eLife.33480.049
- Yoav Gilad
- Jonathan K Pritchard
- Jonathan K Pritchard
- Jonathan K Pritchard
- Courtney K Burrows
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
The idea for the study was developed through conversations with Julian Solway (University of Chicago). We gratefully acknowledge Bryan Schneider for providing GWAS summary statistics. This work was supported by the Howard Hughes Medical Institute, and the US National Institutes of Health (NIH grants HL092206, HG008140, HG009431). CKB was supported by a NIH (http://www.ncats.nih.gov/ctsa) Clinical and Translational Science Award 5 TL1 TR 432–7 pre-doctoral fellowship.
Human subjects: Human Subjects work was approved by the University of Chicago IRB (protocol 10-416-B). Written informed consent was obtained from all participants.
- Gilean McVean, Reviewing Editor, Oxford University, United Kingdom
© 2018, Knowles et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.