1. Stem Cells and Regenerative Medicine
Download icon

Deep learning detects cardiotoxicity in a high-content screen with induced pluripotent stem cell-derived cardiomyocytes

  1. Francis Grafton
  2. Jaclyn Ho
  3. Sara Ranjbarvaziri
  4. Farshad Farshidfar
  5. Anastasiia Budan
  6. Stephanie Steltzer
  7. Mahnaz Maddah
  8. Kevin E Loewke
  9. Kristina Green
  10. Snahel Patel
  11. Tim Hoey
  12. Mohammad Ali Mandegar  Is a corresponding author
  1. Tenaya Therapeutics, United States
  2. Cardiovascular Institute and Department of Medicine, Stanford University, United States
  3. Dana Solutions, United States
Research Article
  • Cited 0
  • Views 849
  • Annotations
Cite this article as: eLife 2021;10:e68714 doi: 10.7554/eLife.68714

Abstract

Drug-induced cardiotoxicity and hepatotoxicity are major causes of drug attrition. To decrease late-stage drug attrition, pharmaceutical and biotechnology industries need to establish biologically relevant models that use phenotypic screening to detect drug-induced toxicity in vitro. In this study, we sought to rapidly detect patterns of cardiotoxicity using high-content image analysis with deep learning and induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs). We screened a library of 1280 bioactive compounds and identified those with potential cardiotoxic liabilities in iPSC-CMs using a single-parameter score based on deep learning. Compounds demonstrating cardiotoxicity in iPSC-CMs included DNA intercalators, ion channel blockers, epidermal growth factor receptor, cyclin-dependent kinase, and multi-kinase inhibitors. We also screened a diverse library of molecules with unknown targets and identified chemical frameworks that show cardiotoxic signal in iPSC-CMs. By using this screening approach during target discovery and lead optimization, we can de-risk early-stage drug discovery. We show that the broad applicability of combining deep learning with iPSC technology is an effective way to interrogate cellular phenotypes and identify drugs that may protect against diseased phenotypes and deleterious mutations.

Introduction

Drug development is a lengthy and expensive endeavor, often requiring an estimated 10 years and $0.8–2.6 billion. The cost—and risk—grows exponentially as drugs advance toward the clinic (DiMasi et al., 2016). To reduce costs and risk, pharmaceutical companies need effective screening methods to prevent drug attrition at late stages of the development process.

Major causes of therapeutic attrition are drug-induced toxicity, including cardiotoxicity and hepatotoxicity. Cardiotoxicity alone accounts for approximately one-third of drugs withdrawn due to safety concerns (Guo et al., 2011; Mathur et al., 2013; Weaver and Valentin, 2019). To decrease the potential for toxicity, and for late-stage drug attrition, pharmaceutical and biotechnology industries seek in vitro systems that can identify drug-induced toxicity with phenotypic screening at early stages of development (Moffat et al., 2014). This screening enables interrogation of a large number of perturbagens (e.g., small molecules, siRNAs, CRISPR gRNAs) in a target-agnostic assay that measures phenotypic changes (Eder et al., 2014).

Some of the best in vitro models rely on human primary cells isolated directly from tissues. These models retain the morphological and physiological characteristics of their tissue of origin (Kaur and Dufour, 2012). However, the supply of primary cells is finite. Primary cells also have limited potential for proliferation, are technically complicated to culture for long periods, and are difficult to genetically manipulate. As a result, scientists often turn to immortalized cell lines, such as HEK293T cells, HepG2 human liver cancer cells, and HL-1 cardiac muscle cells. Many of these cell lines have been transformed (Ahuja et al., 2005; Prager et al., 2019), which may cause karyotypic abnormalities, and they do not fully recapitulate their in vivo counterparts.

More recently, researchers have turned to cell types derived from human induced pluripotent stem cells (iPSCs) (Takahashi et al., 2007). Cells derived from iPSCs more closely recapitulate human biology than immortalized cells (Moffat et al., 2017; Scannell and Bosley, 2016). These derived cells have been used for disease modeling (Judge et al., 2017; Lan et al., 2013; Pérez-Bermejo et al., 2020; Ribeiro et al., 2017; Sharma et al., 2017; Sun et al., 2012), proposed for drug discovery (Carlson et al., 2013; Grskovic et al., 2011; Robinton and Daley, 2012), and used to assess arrhythmogenic and structural liabilities of drugs (Guo et al., 2011; Maddah et al., 2020). Additionally, iPSCs can be derived from patients carrying deleterious mutations and genetically modified using nucleases (Judge et al., 2017; Kime et al., 2016; Mandegar et al., 2016; Miyaoka et al., 2012). They can also be expanded to sufficient quantities and cultured for extended periods that facilitate screening at scale (Burridge et al., 2014; Ghazizadeh and Majd, 2020; Sharma et al., 2017). Hence, iPSC-derived cell types enable high-throughput interrogation and screening using arrayed libraries of perturbagens.

As of early 2021, few studies have used iPSC-derived cells for phenotypic screening in a high-throughput manner that is truly scalable and, most importantly, has an assay readout with an appropriate window for screening and high signal-to-noise ratio. The screening window, commonly assessed using the Z-factor, ensures detection of true-positive hits and limits the number of false-negative and false-positive hits. Additionally, most published studies have been limited to single-readout phenotypes, such as cell survival (Sharma et al., 2017; Sun et al., 2017), expression of a single marker (Ghazizadeh and Majd, 2020), or reporter screens (McLendon et al., 2017). While these studies revealed great insight into cardiomyocyte biology, they have not described a general and unbiased screening strategy that can be applied to other cell types and used at scale.

Given the limitation of single-readout assays, researchers have turned to high-content screening to assess cellular features using traditional image processing (Moen et al., 2019; Salick et al., 2020). However, they must first choose which feature to quantify from a vast array of possible cellular phenotypes, such as nuclear fragmentation, protein aggregation, cell density, organelle damage and distribution, and changes to cytoskeleton organization. While quantifying a combination of these features may reflect the true state or importance of the phenotype, this approach can be challenging. Researchers may need to develop a new algorithm for each feature, and then decide on the biological relevance of each parameter. This strategy is time-consuming and subject to bias.

An alternative approach uses machine learning to decipher cellular features at high accuracies. For example, deep learning, a branch of artificial intelligence, uses a set of machine-learning techniques to train neural networks and learn input data (LeCun et al., 2015). Previously, we used phenotypic interrogation with deep learning to identify drug-induced structural toxicity in hepatocytes and cardiomyocytes (Maddah et al., 2020). Herein, we aimed to expand that work to induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) in a high-content screen (HCS) of 1280 bioactive compounds with putative primary targets. We then sought to validate identified hits using orthogonal assays to assess cellular stress, including mitochondrial respiration and brain natriuretic peptide (BNP) levels. Finally, we aimed to use our deep learning approach to screen a library of 1280 diverse compounds with no known targets and identify chemical structures and frameworks that show a cardiotoxicity signal in iPSC-CMs.

Materials and methods

Key resources table
Reagent type
(species) or
resource
DesignationSource or
reference
IdentifiersAdditional
information
Cell line (Homo sapiens)WTCGladstone InstitutesHuman iPSC line
AntibodyPrimary
Anti-MYBPC3
Mouse (monoclonal)
Santa CruzSc-1372371:200
AntibodyPrimary
Anti-ACTN2
Rabbit (monoclonal)
Thermo Fisher Scientific7019141:200
AntibodySecondary
Donkey anti-Rabbit IgG (H+L) Alexa Fluor 594
Thermo Fisher ScientificA-212021:500
AntibodySecondary
Donkey anti-Mouse IgG (H+L) Alexa Fluor 488
Thermo Fisher ScientificA-212071:500
Sequence-based reagentMYH7Thermo Fisher ScientificHs01110632_m1TaqMan qPCR probe, cardiac marker
Sequence-based reagentMYH6Thermo Fisher ScientificHs01101425_m1TaqMan qPCR probe, cardiac marker
Sequence-based reagentTNNI3Thermo Fisher ScientificHs00165957_m1TaqMan qPCR probe, cardiac marker
Sequence-based reagentTNNI1Thermo Fisher ScientificHs00913333_m1TaqMan qPCR probe, cardiac marker
Sequence-based reagentMYBPC3Thermo Fisher ScientificHs00165232_m1TaqMan qPCR probe, cardiac marker
Sequence-based reagentTNNT2Thermo Fisher ScientificHs00943911_m1TaqMan qPCR probe, cardiac marker
Sequence-based reagentGAPDHThermo Fisher ScientificHs99999905_m1TaqMan qPCR probe, housekeeping marker
Chemical compound, drugCHIR-99021SelleckchemS1263Small molecule
Chemical compound, drugY-27632SelleckchemS1049Small molecule
Chemical compound, drugIWP2SigmaI0536Small molecule
Chemical compound, drugBlasticidine S hydrochlorideThermo Fisher Scientific15205Small molecule
Chemical compound, drugBortezomibSelleckchemS1013Small molecule
Chemical compound, drugDoxorubicinSelleckchemS1208Small molecule
Chemical compound, drugGivinostatSelleckchemS2170Small molecule
Chemical compound, drugBafilomycinSelleckchemS1413Small molecule
Chemical compound, drugPaclitaxolSelleckchemS1150Small molecule
Chemical compound, drugCisaprideSelleckchemS4751Small molecule
Chemical compound, drugSorafenibSelleckchemS7397Small molecule
Chemical compound, drugFDA-approved Drug LibrarySelleckchemL1300Compound library
Chemical compound, drugDiscovery Diversity SetEnamineDDS-10-Y-10Compound library
Chemical compound, drugDimethyl sulfoxide (DMSO)SigmaD8418Solvent

iPSC culture, iPSC-CM differentiation, and blasticidin selection

Request a detailed protocol

WTC iPSCs (Mandegar et al., 2016) and derivative lines were maintained under feeder-free conditions on growth factor-reduced Matrigel (BD Biosciences) and fed daily with E8 medium (STEMCELL Technologies) (Ludwig et al., 2006). Accutase (STEMCELL Technologies) was used to enzymatically dissociate iPSCs into single cells. To promote cell survival during enzymatic passaging, cells were passaged with p160-Rho-associated coiled-coil kinase (ROCK) inhibitor Y-27632 (10 μM; Selleckchem) (Watanabe et al., 2007). iPSCs were frozen in 90% fetal bovine serum (HyClone) and 10% dimethyl sulfoxide (Sigma). iPSCs were differentiated into iPSC-CMs using the Wnt modulation-differentiation method (Lian et al., 2012) with 7 μM CHIR (Selleckchem) and 5 μM IWP2 (Sigma) in RPMI media supplemented with B27 (Thermo Fisher Scientific). iPSC-CMs were purified using 1 μM blasticidin (Thermo Fisher Scientific). Then, 15 days after adding CHIR, iPSC-CMs were frozen in 90% fetal bovine serum (FBS) with 10% dimethyl sulfoxide (DMSO). iPSC-CMs were thawed in RPMI with B27% and 10% FBS directly onto Matrigel-coated 384-well plates at a density of 20,000 cells/well. The next day, the media was switched to Tenaya’s cardiomyocyte (TCM) media for 6 days before screening. TCM media comprises Dulbecco's Modified Eagle Medium (DMEM) without glucose, 10% dialyzed FBS, 10 mM D-(+)-galactose, and 1 mM sodium pyruvate. To compare TCM against previously published maturation protocols, iPSC-CMs were cultured in RPMI without glucose, supplemented with 4 mM L-lactic acid and 100 nM triiodo-l-thyronine (RPMI/LT3 media) (Lin et al., 2017).

Generation of blasticidin-selectable iPSC line

Request a detailed protocol

A plasmid containing the Bsd open reading frame and a selectable marker containing puromycin and green fluorescent protein was knocked into the 3′ end of the MYH6 locus of WTC iPSCs. Cells were selected with puromycin. After the first round of selection, the selectable marker was removed using a plasmid expressing Cre recombinase (Figure 1—figure supplement 1A). Clones were isolated and verified using colony polymerase chain reaction (PCR) to ensure the presence of the Bsd marker and absence of the selectable cassette. Finalized clones showing the appropriate iPSC morphology were sent to WiCell for single-nucleotide polymorphism karyotyping (Figure 1—figure supplement 1B).

Compound library screening

Request a detailed protocol

A library containing 1280 bioactive compounds consisting of FDA-approved drugs, tool compounds, and pre-clinical drug candidates was sourced from Selleck Chemicals (Houston, TX). Screening and validation studies on the bioactive compounds were performed at three doses of 0.3 μM, 1.0 μM, and 3.0 μM in 0.1% DMSO. The library of diverse compounds was sourced from Enamine (Kyiv, Ukraine) and screened at 1.0 μM. StarDrop version 6.6.4.23412 (https://www.optibrium.com/stardrop/) was used to analyze data from the screen of diverse compounds.

Immunocytochemistry

Request a detailed protocol

iPSC-CMs were fixed for 15 min in 4% (v/v) paraformaldehyde (Thermo Fisher Scientific) in phosphate buffered saline (PBS) and permeabilized in 0.1% (v/v) Triton X-100 (Thermo Fisher Scientific) for 15 min. Cells were blocked in 5% (w/v) bovine serum albumin (BSA) (Sigma) with 0.1% (v/v) Triton X-100 in PBS for 60 min. Primary antibodies were diluted in 5% (w/v) BSA and 0.1% (v/v) Triton X-100 in PBS and incubated overnight at 4°C. After treatment with primary antibodies, cells were washed three times in PBS for 15 min each. Secondary antibodies were diluted in 5% (w/v) BSA and 0.1% (v/v) Triton X-100 in PBS, and then incubated for 1 hr at room temperature. After treatment with the secondary antibody, cells were washed three times in PBS for 15 min each. Nuclei were stained using Hoechst 33342 (Thermo Fisher Scientific) (1:1000 dilution). Images were taken using a Cytation 5 microscope (BioTek Instruments) at 10× magnification (nine images per 384-well plate). A list of primary and secondary antibodies with the appropriate dilution is listed in Supplementary file 4.

Sarcomere analysis using scanning gradient Fourier transform

Request a detailed protocol

Sarcomere organization and alignment were assessed using scanning gradient Fourier transform (SGFT) with a pattern size of 1.8, scanning resolution of 16, and a Fourier threshold of >0.8 (Salick et al., 2020). Briefly, SGFT performs gradient analysis on ACTN2 images to determine the myofibril directionality and then one-dimensional fast Fourier transforms to determine sarcomere organization and alignment. The code for this SGFT algorithm is available on GitHub: https://github.com/maxsalick/SGFT (Grafton, 2021 copy archived at swh:1:rev:1413b18726172659dcb9a09e6f91653063ab3361).

Contractility measurements

Request a detailed protocol

Contractility video measurements were taken with an SI8000 Cell Motion Imaging System (Sony Biotechnology) and analyzed using Pulse software (Maddah et al., 2015).

Construction of deep learning models and neural network architecture

Request a detailed protocol

To avoid complications associated with cell segmentation, we used block image segmentation and seeded a confluent monolayer of iPSC-CMs. Deep learning artificial intelligence models were built using the PhenoLearn platform (https://www.phenolearn.com/). We used PyTorch as the framework for the deep learning library, and a ResNet50 architecture, a 50-layer-deep convolutional neural network. Images from the DMSO control group were trained against the toxic groups (defined as either mildly toxic, toxic, or highly toxic). Each input image was divided into 12 square sub-images to have sizes ranging from 224 × 224 pixels to 300 × 300 pixels (Maddah et al., 2020). Each sub-image was flipped and rotated to create seven more augmented sub-images, and then fed into the input layer of ResNet50. Pseudo-image generation by rotation and flipping ensures enough diversity is seen by the network so that the algorithm is not biased based on the orientation of the images (Moen et al., 2019). We used 80% of the images to construct the neural network and the remaining 20% to validate the deep learning model. A consistent set of parameters were used for all training operations, including an initial learning rate of 0.01 and 20 epochs. For each training, the final neural network was selected from the epoch with the highest validation accuracy. Z-factor was calculated using the following formula:

Zfactor=1-3(σpσn)μpμn

proBNP assay

Request a detailed protocol

The proBNP/NPPB human sandwich ELISA kit (Invitrogen) was used to determine the level of secreted human proBNP. Cell culture media was collected from wells containing approximately 105 iPSC-CMs exposed to cardiotoxic drugs for 4 days. Media was diluted to obtain a proBNP concentration between 0.137 and 100 ng/mL (1:3-1:5). A standard mixture of recombinant human proBNP was diluted 1:3 from 0.137 to 100 ng/mL. Then 100 μL of diluted sample and standards were added to the human proBNP solid-phase sandwich ELISA microplate and incubated at room temperature for 2.5 hr with gentle shaking. The plate was then washed four times with 300 μL of 1× wash buffer. Next, 100 μL/well of biotin conjugate was added to a microplate and incubated at room temperature for 1 hr with gentle shaking. The plate was then washed four times with 300 μL 1× wash buffer. Next, 100 μL/well of streptavidin-horseradish peroxidase solution was added to the microplate and incubated at room temperature for 45 min with gentle shaking. The plate was then washed four times with 300 μL 1× wash buffer. Next, 100 μL/well of TMB substrate was added, and the plate was incubated in the dark at room temperature for 30 min. Then, 50 μL of stop solution was added directly to the TMB substrate and gently mixed. Absorbance was measured using a Cytation 5 microscope at 450 nm and 550 nm. For analysis, background signal was removed by subtracting the 550 nm signal from the 450 nm signal. A linear regression was fit to the standard absorbance measurements. ProBNP values from experimental samples were extrapolated from linear regression. Initial sample dilutions were accounted for when determining the final proBNP concentration.

RNA extraction and TaqMan qPCR analysis

Request a detailed protocol

Approximately 105 to 106 iPSC-CMs were lysed with TRI Reagent (Zymo Research) and frozen at −80°C to ensure complete cell lysis. Total RNA was extracted and washed from lysed cells using the Direct-zol-96 RNA Kit (Zymo Research) according to the manufacturer’s instructions. Samples were treated with DNase I for 15 min at room temperature. cDNA was reverse transcribed from 1 μg of RNA through random hexamers using the SuperScript-III kit (Invitrogen) according to the manufacturer’s instructions. Real-time qPCR reactions were performed using the TaqMan universal PCR master mix (Applied Biosystems) with the TaqMan probes listed in Supplementary file 5 (Life Technologies). RT-qPCR reactions were performed using the QuantStudio7 Flex Real-Time PCR systems (Life Technologies). Each reaction was performed in triplicate for a given RT-qPCR run, and each condition had four experimental replicates. Relative expression of the gene of interest was normalized to glyceraldehyde 3-phosphate dehydrogenase (GAPDH) as the housekeeping control using the 2–ΔΔCT method (Schmittgen and Livak, 2008).

High-throughput transcriptional analysis using RNA-seq

Request a detailed protocol

Raw RNA-seq reads were aligned with Salmon (Patro et al., 2017) (version 1.3.0) to the GENCODE (Frankish et al., 2019) version 33 reference transcript assembly (hg38 v.13) using best practice parameters to ensure mapping validity and reproducibility (--seqBias --gcBias --posBias --useVBOpt --rangeFactorizationBins 4 --validateMappings --mimicStrictBT2). Next, tximport (Soneson et al., 2015) was used to generate an expression matrix normalized to transcripts per million (TPM). To ensure consistency, we limited our analysis to genes with detectable expression in at least 90% of the samples. Protein-coding genes were determined using Ensembl release 100 human annotation (Cunningham et al., 2019) (GRCh38, Apr 2020) and extracted by biomaRt (Durinck et al., 2009) (version 2.45.9); non-protein-coding and mitochondrial genes were omitted from the analysis. After this step, expression values were re-normalized to TPM. After adding 1 as the pseudo-count, the expression matrix was log2-transformed. For the initial assessment, principal component analysis (PCA) models were generated in the R environment using the prcomp function, and the first two principal components were used for visualization. To quantify the distances between each drug’s samples and generate a PCA-based similarity matrix, we used Euclidean distance in the PCA space, as calculated by the pca2euclid function from the tcR package. We calculated the distance using all replicates and used the averaged expression in replicates for each drug. Visualizations were generated in R using the ggplot2 and ComplexHeatmap packages.

To compare the differential gene expression analysis of these clusters with the DMSO-treated group, we limited our downstream analysis to the genes in our curated list of genes with high relative expression in cardiac tissue. We also applied the following filters: (1) protein-coding genes (as defined by Ensembl) that had a median log2-transform TPM expression of >1 in at least one cluster and (2) showed a minimum of 0.25-fold change compared to the DMSO control. Transcriptomes in each cluster’s replicates were compared with DMSO-treated replicates by Welch’s t-test. A t-statistic value for each gene’s expression vector was used to rank-order the transcriptome in each cluster.

To evaluate functional perturbations in each cluster, we limited our analysis to two gene sets: those labeled with a ‘cardiac’ annotation in MSigDB (Liberzon et al., 2011) version 7.2 (127 gene sets), and the top 100 gene sets that significantly overlapped with our curated cardiac-rich expression gene set as defined by false discovery rate (FDR) q-value. The FDR q-value is the FDR analog of the hypergeometric p-value after correcting for multiple hypothesis testing according to the Benjamini and Hochberg method. In each drug cluster, Welch’s t-test statistical values were used to identify and compare most perturbed gene sets to the DMSO-treated group. A p-value < 0.05 was considered significant. We deposited our RNA-seq data on the Gene Expression Omnibus (GEO) database: GEO Submission (GSE172181), https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE172181.

Seahorse assay

Request a detailed protocol

The Agilent Seahorse XFe96 Analyzer was used to measure mitochondrial function in iPSC-CMs. The 96-well Seahorse plates were coated with Matrigel (1/100 dilution) in a phenol-free medium overnight. WTC iPSC-CMs were seeded at 30,000 cells per XFe96 well and recovered in TCM media for 1 week. Cardiotoxic compounds were diluted to 3 µM, 1 µM, 0.3 µM, and 0.1 µM in 1.0% DMSO in TCM media. Drugs were administered in fresh media for 4 days. Following 4 days of exposure to drugs, the cells were washed and incubated for 1 hr before the assay with Seahorse XF DMEM Basal Medium supplemented with 2 mM glutamine, 2 mM pyruvate, and 10 mM glucose. The Seahorse XFe96 cartridge was prepared according to the manufacturer’s guidelines. First, basal oxygen consumption rate (OCR) was measured. Next, the Mito Stress Test was performed with inhibitors injected in the following order: oligomycin (2.5 µM), carbonyl cyanide 4-(trifluoromethoxy) phenylhydrazone (FCCP; 1 µM), and rotenone and antimycin A (0.5 µM). OCR values were normalized to the total nuclear count per well as quantified by Hoechst staining. Basal respiration was calculated as (last rate measurement from the first oligomycin injection) – (minimum rate measurement after rotenone/antimycin A). Maximal respiration was calculated as (maximum rate measurement after FCCP injection) – (minimum rate measurement after rotenone/antimycin A). Spare respiratory capacity was calculated as (maximal respiration) – (basal respiration). ATP production was calculated as (last measurement before oligomycin injection) – (minimum rate after oligomycin injection).

Proteasome activity assay

Request a detailed protocol

Proteasome activity was measured in iPSC-CMs using the Proteasome-Glo Chymotrypsin-Like Assay according to the manufacturer’s instructions (Promega). Cells were incubated with the inhibitors (ranging from 2 to 5000 nM) for 1 hr before running the assay. IC50s were calculated using PRISM 8 software.

Results

iPSC-CM production, selection, and recovery after thaw

To efficiently and reproducibly assess cardiotoxicity in a highly enriched population of cardiomyocytes, we generated a blasticidin-selectable iPSC line by targeting the MYH6 locus (Figure 1—figure supplement 1A, B). Following standard differentiation of iPSC-CMs (Figure 1A), blasticidin selection reproducibly enriched a heterogenous population of iPSC-CMs to greater than 95% pure iPSC-CMs as measured by ACTN2, TNNT2, and MYBPC3 immunostaining (Figure 1B, C). Compared to previously published protocols, recovery in culture media leads to improved maturity metrics based on quantitative polymerase chain reaction (qPCR) markers, contractility (beat rate and velocity), and immunostaining (Figure 1—figure supplement 1C–F). To optimize recovery time of iPSC-CMs before screening, daily contractility metrics were performed using Pulse (Maddah et al., 2015). Then, 6–8 days after thaw, iPSC-CMs fully recovered and had stabilized contractility readings (Figure 1D).

Figure 1 with 1 supplement see all
Overview of induced pluripotent stem cell-derived cardiomyocytes (iPSC-CM) differentiation, selection, cryopreservation, and recovery for high-content screening.

(A) Schematic representation of protocol for iPSC-CM differentiation, selection, and cryopreservation for high-content screening. (B) iPSC-CMs taken from a representative differentiation stained with ACTN2, TNNT2, and MYBPC3 before and after blasticidin (Bsd) selection. (C) Representative immunostaining of an unpurified population of iPSC-CMs before and after selection with blasticidin. iPSC-CMs were seeded as a monolayer for screening. Hoechst stain represented as pseudocolored-to-white for better visualization. Scale bars = 100 μm. (D) Blasticidin-purified iPSC-CMs from a representative batch were thawed and recovered for 7 days. Daily contractility metrics were performed to identify the optimal time for recovery of iPSC-CMs after thaw. n = 15–32 technical replicates per day. Error bars = SD.

Figure 1—source data 1

Overview of iPSC-CM differentiation, selection, cryopreservation, and recovery for high-content screening.

https://cdn.elifesciences.org/articles/68714/elife-68714-fig1-data1-v2.xlsx

Known cardiotoxic compounds used to establish the extent of functional damage to iPSC-CMs

To establish training sets for deep learning models, we treated iPSC-CMs with bortezomib (proteasome inhibitor), doxorubicin (topoisomerase inhibitor), cisapride (serotonin 5-HT4 receptor agonist), sorafenib (tyrosine kinase inhibitor), givinostat (histone deacetylase inhibitor), bafilomycin (vacuolar-type H+-ATPase), paclitaxel (microtubule stabilizer), and JQ1 (bromodomain and extraterminal domain inhibitor) (Lamoureux et al., 2014). These compounds disrupt a diverse set of cellular pathways, so we included them to ensure that our deep learning models can identify a broad range of cardiotoxic compounds in an HCS.

To determine the level of drug-induced toxicity, we measured beat rate, velocity of contraction, and displacement as a function of dose and duration of exposure (Figure 2—figure supplement 1). Results from three representative cardiotoxins (bortezomib, doxorubicin, and bafilomycin) are displayed as a heatmap (Figure 2A). For each compound, beats per minute and velocity measured on the fourth day of drug exposure are displayed as a function of drug dose (Figure 2B, C). We observed that the cardiotoxins had both a dose- and time-dependent effect on contractility measurements. In most cases, the contraction velocity and displacement declined, and the beat rate reduced as a function of dose and time (Figure 2—figure supplement 1). In some instances (e.g., doxorubicin), immediately before spontaneous contractility completely stopped, we observed an increased beat rate and reduced contraction velocity and displacement. This change may reflect a compensatory mechanism by which the cells increase beat rate to induce more ‘output’ while the velocity and displacement are reduced (Burridge et al., 2016; Maddah et al., 2015).

Figure 2 with 1 supplement see all
Effect of representative cardiotoxins on inducing structural toxicity in induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs).

(A) Displacement of iPSC-CMs was measured as a function of dose and duration of exposure to drug. Results from three representative cardiotoxins (bortezomib, doxorubicin, and bafilomycin) are displayed as a heatmap. (B, C) Contractility measures of beats per minute and velocity on the fourth day of drug exposure as a function of drug dose. Data indicate a time- and dose-dependent decline in contractility. Error bars = SD; n = 8 technical replicates per drug dose, n = 26 for dimethyl sulfoxide (DMSO) group. (D) Signal intensities of sarcomeres stained with antibodies against MYBPC3 and ACTN2 were measured using total fluorescent staining and normalized to DMSO control. Nuclear count was quantified and normalized to DMSO control. Dashed gray line, DMSO control condition. Error bars = SD; n = 8 technical replicates per drug dose, n = 26 for DMSO group. (E) Relative sarcomere organization quantified using scanning gradient Fourier transform and normalized to DMSO control. The organizational index refers to the average Fourier strength measured across the image. Error bars = SD; n = 8 technical replicates per drug dose, n = 26 for DMSO group. (F) Levels of structural toxicity for each drug were binned into the three categories of highly toxic (class 3), toxic (class 2), and mildly toxic (class 1). (G) Representative immunostaining of levels of structural toxicity in iPSC-CMs treated with bortezomib (30 nM), doxorubicin (100 nM), and bafilomycin (100 nM). Magenta, MYBPC3; green, ACTN2. Scale bars = 100 μm.

Figure 2—source data 1

Effect of representative cardiotoxins on inducing structural toxicity in iPSC-CMs.

https://cdn.elifesciences.org/articles/68714/elife-68714-fig2-data1-v2.xlsx

Next, we assessed sarcomere organization and cell survival. Sarcomere staining intensity was measured using antibodies against MYBPC3 and ACTN2, and cell survival was measured using Hoechst nuclear stain (Figure 2D). To assess the structural integrity of the sarcomere and myofibril, which are negatively impacted by cardiotoxin-induced damage (Burridge et al., 2016; Judge et al., 2017; Maillet et al., 2016), sarcomere organization and alignment were also analyzed with the SGFT. This method can quantify subcellular myofibril alignment through one-dimensional fast Fourier transforms following sarcomere mapping (Salick et al., 2020; Figure 2E). Based on contractility metrics, sarcomeric staining, and nuclear count, we define three levels of structural toxicity binned into the three categories: highly toxic (class 3), toxic (class 2), and mildly toxic (class 1). DMSO-treated cells were designated as non-toxic (class 0) (Figure 2F and Supplementary file 1). Representative immunostaining of iPSC-CMs stained with MYBPC3 and ACTN2 show various levels of structural toxicity (Figure 2G).

Optimizing the deep learning model

Frozen iPSC-CMs were thawed and allowed to recover in 384-well plates to enable high-content screening. These plates were divided into three categories: a training plate, a validation plate, and library plates. The cells were then immunostained using MYBPC3 before capturing images. Images from the training plate (dosed with known cardiotoxins ranging from 10 nM to 10 μM for 4 days) were used to establish the deep learning models. Images from the validation plate were used to test the accuracy of the deep learning models. And images from the library plates were used to test the toxicity level of compounds. iPSC-CMs in the library plates were exposed to 1280 bioactive compounds at three doses (0.3 μM, 1.0 μM, and 3.0 μM for 4 days) (Figure 3A).

Figure 3 with 1 supplement see all
Schematic of drug screening and deep learning approach.

(A) Induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) were thawed and allowed to recover for 6 days. Two plates were designated for training and validating the deep learning models. iPSC-CMs were then treated with known cardiotoxins (doses ranging from 10 nM to 10 μM) in the training and validation plates for 4 days. In parallel, a library of 1280 bioactive compounds was added to iPSC-CMs at three doses (0.3 μM, 1.0 μM, and 3.0 μM) for 4 days. (B) Deep learning accuracies compared across the three models. All models show more than 95% accuracy in identifying the non-toxic dimethyl sulfoxide (DMSO)-treated condition (class 0). Model accuracies were lower when attempting to distinguish mildly toxic, toxic, and highly toxic classes. (C) The three deep learning models compared in a two-dimensional plot on the validation plate. Regardless of how each model was trained (based on the defined images fed into the neural network), all models were strongly correlated when analyzing the validation dataset (R2 > 0.93). The DMSO-treated cells scored the lowest in toxicity, and doxorubicin-treated (1 μM) cells scored the highest. The mildly toxic class (doxorubicin at 0.3 μM) showed an intermediate toxicity separated from the non-toxic and highly toxic classes.

Figure 3—source data 1

Schematic of drug screening and deep learning approach.

https://cdn.elifesciences.org/articles/68714/elife-68714-fig3-data1-v2.xlsx

Based on defined criteria for reduced contractility, loss of immunostaining, and nuclear count when cells were treated with various doses of cardiotoxins (Figure 2F and Supplementary file 1), we developed three deep learning models: 4-class, 3-class, and 2-class. The 4-class model distinguished highly toxic, toxic, and mildly toxic compound doses from the non-toxic DMSO-treated condition. The 3-class model combined highly toxic and toxic compound doses and kept mildly toxic compound doses separate. The 2-class model binned highly toxic and toxic compound doses from the non-toxic DMSO-treated condition. The total number of images used per class is outlined in Supplementary file 2.

We compared the deep learning accuracies across the three models. All models showed more than 95% accuracy in identifying the non-toxic DMSO-treated condition (class 0). For the 2-class model, the validation showed 100% accuracy in distinguishing the toxic from non-toxic conditions. For the 3- and 4-class models, the accuracies were lower when attempting to distinguish mildly toxic, toxic, and highly toxic classes (Figure 3B). This reduced accuracy may be a result of cellular toxicity signatures that follow a continuous spectrum, rather than clearly defined categories.

After training, the three deep learning models were independently used to score the validation plate (which the neural network had not seen before). The results from the three models on the validation plate are compared in a 2D plot in Figure 3C. The data suggested that all three models showed a strong correlation upon validation (R2 > 0.93, p<0.0001), regardless of how each deep learning model was trained (Figure 3C). As expected, the DMSO-treated cells scored the lowest on the cardiotoxicity scale. Cells treated with 1 µM doxorubicin showed the highest toxicity score, and cells treated with 0.3 µM doxorubicin (the mildly toxic class) showed an intermediate toxicity score separated from the non-toxic and highly toxic class.

Higher cell seeding densities lead to a higher model accuracy and screening window

We hypothesized that there may be an optimal cell density for screening that leads to best model performance and Z-factor. The Z-factor is a statistical measure of assay variability and reproducibility (Zhang et al., 1999). We posited that using a cell density that was too high would tightly pack cells and mask features, whereas using a cell density that was too low would not generate sufficient features for training. Interestingly, as we increased the cell density, we did not observe a bell-shaped effect in the performance of the deep learning models. We found that the higher the cell density, the higher the model accuracy and, most importantly, the higher the Z-factor. At approximately 3000 cells per well of a 384-well plate, the model accuracy and Z-factor plateaued and did not decline with increasing cell density (Figure 3—figure supplement 1A, B).

We also found that a minimum cell density is needed to ensure an appropriate screening window. For example, a minimum number of 1500 iPSC-CMs per well of a 384-well plate was required to reach a Z-factor of approximately 0.5, which is an acceptable level for an HCS in cell-based assays (Figure 3—figure supplement 1B). Training and validation accuracies were tracked as a function of epochs, or the number of times the entire training dataset passed through the neural network. Two representative cell seeding densities were analyzed, which showed that a higher cell density supports improved model performance (Figure 3—figure supplement 1C). Higher cell densities yielded higher cellular content for feature training, and they reduced cell edges and background. In addition, more cell-cell contact between iPSC-CMs led to sarcomere alignment that may reduce heterogeneity in training images. Given these factors, higher cell densities achieved a Z-factor greater than 0.5, which is ideal for HCS (Figure 3—figure supplement 1D).

Deep learning enables detection of toxic compounds in an HCS format

We calculated the dynamic range and Z-factors on the screening plates treated with DMSO (non-toxic control) as well as bortezomib and doxorubicin (cardiotoxic controls) (Figure 4A). With enough technical replicates, we saw significant differences in the nuclear count and sarcomere staining intensities (Figure 4A). However, when performing an HCS, only one or two technical replicates per test article are used. Thus, using parameters such as nuclear count and staining intensity will not be reliable for identifying hits in high-throughput screening (HTS) (Figure 4—figure supplement 1A, B). In this case, the associated Z-factors are far lower than a minimum de facto cutoff of 0.5 for HTS (Bray and Carpenter, 2017; Zhang et al., 1999). By using deep learning and the appropriate cell seeding density, we can increase the dynamic range and reduce the variability between treatment groups. These changes would increase the screening window coefficient (Z-factor > 0.5) for an HCS (Figure 4B).

Figure 4 with 1 supplement see all
Dynamic range and Z-factors of positive and negative controls in screening plates.

(A) Comparison of Z-factor of nuclear count and MYBPC3 and ACTN2 staining intensities for dimethyl sulfoxide (DMSO) vs. bortezomib (Bort; 0.1 μM) and doxorubicin (Doxo; 1 μM). The Z-factors (< 0) and the limited dynamic range of the nuclear count and sarcomere intensities prevent reliable separation of positive and negative controls. Error bars = SD. (B) While using deep learning (regardless of how the models are trained), the dynamic range and Z-factors (> 0.5) enable identification of the toxic controls from the DMSO condition. Error bars = SD. (C) Cardiotoxicity scores from all screening wells at three doses of DMSO, bortezomib, and doxorubicin controls are compared in various deep learning models. Regardless of how each deep learning model was trained, cardiotoxicity scores from all three models had strong correlation when applied to the screening plates (R2 > 0.95).

Figure 4—source data 1

Dynamic range and Z-factors of positive and negative controls in screening plates.

https://cdn.elifesciences.org/articles/68714/elife-68714-fig4-data1-v2.xlsx

We compared the cardiotoxicity scores from all screening wells at three doses, including the DMSO, bortezomib, and doxorubicin controls. Regardless of how each deep learning model was trained, all three models showed strong correlation (R2 > 0.95, p<0.0001) when applied to the screening plates (Figure 4C). All three models also accurately identified the non-toxic DMSO condition and enabled detection of the toxic controls (0.1 μM bortezomib and 1 μM doxorubicin). This result points to the robustness of hit detection and power of deep learning for analyzing large datasets. Given that the 4-class model was trained to separate the toxic and highly toxic classes, this model enables separation between the conditions treated with 0.1 μM bortezomib and 1 μM doxorubicin. To maximize assay sensitivity, it is more valuable to identify mildly toxic compounds than to distinguish highly toxic compounds from toxic compounds (given that highly toxic and toxic compounds show similarly strong phenotypes). To avoid overlooking a mildly toxic compound during the HCS, we proceeded to use the 3-class model to identify and further validate hits.

Screening using deep learning identifies compounds with cardiotoxic liabilities

We analyzed cardiotoxicity scores by screening 1280 bioactive compounds at three doses using our three deep learning models (Figure 5—figure supplement 1A, B). As mentioned, we focused on the 3-class model for further studies and validation rounds (Figure 5A). The cardiotoxicity scores from the top 25 hits are displayed as a heatmap from a scale of 0 to 2, such that the most cardiotoxic drugs have scores nearing 2 (red) and the non-toxic class have scores of nearly 0 (blue). For comparison, MYBPC3 and ACTN2 staining intensity and nuclear count are also displayed on a scale of 0 to 1. Most toxic compounds received a score of 1 (red), and least toxic compounds received a score of 0 (blue) (Figure 5B). Details about the identified cardiotoxic compounds are listed in Supplementary file 3.

Figure 5 with 2 supplements see all
Screen of bioactive compound library and identified cardiotoxic hits.

(A) Cardiotoxicity score from screening 1280 bioactive compounds at three doses. Data from the 3-class deep learning model are plotted. Higher scores correspond to a higher probability of a cardiotoxicity signature. Bortezomib and doxorubicin were used as cardiotoxic controls (Cntrls). (B) Cardiotoxicity heatmap scores of top 25 cardiotoxic hits based on deep learning signal. The immunostaining signal is also displayed as a heatmap. Drugs and doses with cardiotoxicity signals in induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) are indicated in red and yellow, while non-toxic drugs are indicated in blue. Signal intensity from immunostaining and nuclear count was normalized to the dimethyl sulfoxide (DMSO) control and converted to a toxicity scale. Based on the generated heatmaps, phenotypic screening with deep learning is more sensitive in detecting signals than pure immunostaining assays. (C) Target class composition of top cardiotoxic hits from the screen, including DNA intercalators and ion channel blockers, as well as epidermal growth factor receptor (EGFR), cyclin-dependent kinase (CDK), and multi-kinase inhibitors. (D) Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) protein-protein interaction network was used to identify interactions between drug targets (nodes) identified through the PRISM repurposing dataset. Seven clusters in four protein families of DNA (green), multi-kinase (light blue), ion channels (dark blue), and CDK (red) were found based on the highest number of interactions. The minimum required interaction score was set to 0.4, and the edge thickness indicated the degree of data support.

Figure 5—source data 1

Screen of bioactive compound library and identified cardiotoxic hits.

https://cdn.elifesciences.org/articles/68714/elife-68714-fig5-data1-v2.xlsx

A number of identified hits clustered based on mechanism of action and molecular pathways. The top hits showing a cardiotoxicity signal in iPSC-CMs included epidermal growth factor receptor (EGFR) inhibitors (WZ8040, AG-1478), cyclin-dependent kinase 1 (CDK1) inhibitors (BI-2536, PHA-767491), DNA intercalators and synthesis inhibitors (Adrucil, daunorubicin, streptozotocin), ion channel blockers (chlorpromazine, nitrendipine), and other multi-kinase inhibitors (regorafenib). The characterization of hits into target classes is summarized in Figure 5C. We classified the protein targets of our top 25 cardiotoxic hits with a PRISM repurposing dataset. Next, we built a protein-protein interaction network using the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING). In the resulting network, we identified seven clusters of protein families, including DNA interactors, ion channel blockers, multi-kinase inhibitors, and cyclin-dependent kinase (CDK) inhibitors (Figure 5D).

The top compounds demonstrating a cardiotoxicity signal in iPSC-CMs included several FDA-approved drugs that clinical and pre-clinical studies have linked to adverse cardiovascular events (Supplementary file 3). For example, Adrucil (DNA synthesis inhibitor) had a side effect of cardiac toxicity (RxList, 2020). Betamethasome (oral glucocorticoid) was identified as a risk factor for heart failure (Souverein, 2004). Regorafenib (sorafenib analog, a multi-kinase inhibitor) was linked to increased risk of cardiovascular events in patients with solid tumors (Chen and Wang, 2018). Chlorpromazine (dopamine and potassium channel inhibitor) was linked to fast and irregular heart rate (WebMD, 2020). Anagrelide (PDE3 inhibitor) may cause cardiovascular effects, including fast, irregular, pounding, or racing heartbeat or pulse (Clinic M, 2020). Sotalol (beta-blocker and anti-arrhythmic) has serious cardiac side effects, including QT prolongation, heart failure, or bronchospasm (FDA.gov, 2020). Solifenacin (muscarinic receptor antagonist) overdose may cause fast heartbeat (Plus M, 2020). Daunorubicin (doxorubicin analog, topoisomerase II inhibitor, and DNA intercalator) causes cardiotoxicity (Menna et al., 2012; Sawyer et al., 2010). Rosiglitazone (PPAR-γ agonist used to treat patients with type 2 diabetes) was associated with a significant increase in the risk of myocardial infarction (Nissen and Wolski, 2007). These reports support the value of screening with deep learning and iPSC-CMs to detect early signs of cardiotoxicity.

Validation studies using deep learning and orthogonal assays

We further evaluated a subset of hits that showed cardiotoxic liabilities using deep learning analysis but did not show a strong toxic liability using immunostaining image analysis (Figure 5—figure supplement 2). This subset included six compounds that showed a cardiotoxic signal in iPSC-CMs: three pre-clinical compounds (WZ8040 [EGFR inhibitor], BI-2536 [CDK inhibitor], and PHA-767491 [CDK inhibitor]) and three FDA-approved compounds (danorubicin [DNA intercalator], nitrendipine [Ca2+-channel blocker], and solifenacin [muscarinic receptor antagonist]). As a positive control, we also included tegaserod (5-HT4 agonist), which was withdrawn from the market in 2007 due to adverse cardiovascular effects (MayoClinic, 2020). With these seven compounds, we performed a secondary round of validation using deep learning and additional orthogonal assays. This validation showed that all seven compounds had cardiotoxic liability (Figure 6A and Figure 6—figure supplement 1A).

Figure 6 with 1 supplement see all
Hit validation using deep learning and orthogonal assay analysis.

(A) Cardiotoxicity heatmap scores for validation of seven compounds: six cardiotoxic hits from the primary screen and tegaserod (known to cause cardiotoxicity and withdrawn from the market). (B) Kinetic plots of oxygen consumption rates for the seven compounds and bortezomib as another control. Kinetic data from only the 1 μM dose is plotted. Error bars = SD. (C) Heatmaps for basal respiration, adenosine triphosphate (ATP) production, maximal respiration, and spare respiratory capacity. BI-2535, daunorubicin, nitrendipine, and bortezomib had the largest effect on basal respiration, ATP production, and maximal respiration. The oxygen consumption rate (OCR) was measured as pmol/min/nuclear count. (D) All seven compounds show significantly different basal respiration and ATP production (one-way ANOVA). All drugs, except tegaserod, show significantly different maximal respiration. SRC: spare respiratory capacity. (E) Heatmaps for ProBNP levels show that PHA-767491 elevates ProBNP levels at all three doses. All other drugs showed no change or a slight decrease in ProBNP levels, suggesting that ProBNP is the least-sensitive marker to assess cardiotoxicity and cellular stress.

Figure 6—source data 1

Hit validation using deep learning and orthogonal assay analysis.

https://cdn.elifesciences.org/articles/68714/elife-68714-fig6-data1-v2.xlsx

To further assess those seven validated compounds, we treated iPSC-CMs with the compounds and measured their mitochondrial respiration 4 days later using the Seahorse XFe96 Analyzer. Based on real-time OCR, all seven compounds dose-dependently reduced basal respiration in iPSC-CMs (Figure 6B and Figure 6—figure supplement 1B), suggesting decreased respiratory function compared to the DMSO control. To indirectly measure OCR, we treated iPSC-CMs with the compounds and oligomycin (ATP synthase [complex V] inhibitor) and then measured their capacity to produce adenosine triphosphate (ATP). Similarly, we found a dose-dependent decrease in the capacity of iPSC-CMs to produce ATP. Finally, we measured whether these compounds affect the spare respiratory capacity (maximum OCR subtracted by basal OCR) in iPSC-CMs. The spare respiratory capacity represents the cell’s ability to respond to an energetic stress. When compared to DMSO control, all compounds resulted in a dose-dependent decrease in maximal respiration and spare respiratory capacity (Figure 6C, D and Figure 6—figure supplement 1C). These data suggest that all seven compounds dose-dependently impair mitochondrial bioenergetics. Thus, by analyzing mitochondrial respiration, we validated the cardiotoxic compounds that were identified by deep learning but not by immunostaining image analysis.

ProBNP levels are not a reliable marker for detecting cardiotoxicity in iPSC-CMs

BNP is natriuretic peptide and hormone secreted from cardiac tissue. During heart failure, BNP secretions are elevated (Bay et al., 2003; Doust et al., 2006; Januzzi et al., 2005). BNP is also expressed in iPSC-CMs, which secrete the hormone into their culture media.

We used a ProBNP ELISA kit as an orthogonal, follow-up assay on media collected from iPSC-CMs treated with the seven validated cardiotoxic compounds and evaluated the levels of BNP. We found an inconsistent pattern of ProBNP levels and cardiotoxicity signal in the media of treated cells. Only PHA-767491 (at all three doses) elevated ProBNP. In contrast, WZ8040, daunorubicin, nitrendipine, and solifenacin dose-dependently reduced ProBNP (Figure 6E). This reduction may be due to poor health of iPSC-CMs after treatment with the cardiotoxic compounds. We found that BI-2536 did not alter the levels of ProBNP and that tegaserod only reduced BNP at the highest dose (3 μM) (Figure 6E).

RNA-seq analysis confirms drug-specific changes in key pathways that regulate cardiac muscle contraction, development, and identity

To delineate perturbed transcriptional profiles in iPSC-CMs treated with drugs, we performed RNA sequencing on samples treated with our previously defined nine cardiotoxic compounds (two replicates/drug) and DMSO-treated iPSC-CMs (six replicates). Using PCA, four drugs (bortezomib, daunorubicin, doxorubicin, and BI-2536) fell into three distinct clusters, whereas the other five drugs (nitrendipine, PHA-767491, solifenacin, tegaserod, and WZ8040) clustered more closely with DMSO-treated controls (Figure 7A). Based on hierarchical clustering on each sample’s projected location in the PCA space with 23 components, we classified the transcriptomic profiles into four clusters (n = 24; cluster 1: bortezomib; cluster 2: doxorubicin and daunorubicin; cluster 3: BI-2536; cluster 4: nitrendipine, PHA-767491, solifenacin, tegaserod, and WZ8040). We generated the heatmap in Figure 7B by selecting the top overexpressed and underexpressed genes in each cluster that intersected with a curated set of 310 genes with enriched expression in cardiac tissue. We did not see any major difference in the clustering structure between these two methods. For simplicity, we show the averaged PCA-based similarity matrix for each drug cluster in Figure 7B.

RNA-seq analysis of induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) treated with candidate cardiotoxic drugs reveals distinct clusters associated with altered gene expression.

(A) Principal component analysis (PCA) scatter plot representing the extent of transcriptional perturbation compared with dimethyl sulfoxide (DMSO)-treated iPSC-CMs. Overlapping replicates for each drug indicate very high precision of the experiment. Daunorubicin and doxorubicin on principal component 1 (PC1) and bortezomib on PC2 show the most different transcriptional profile compared to the DMSO-treated group. Note that daunorubicin and doxorubicin (both DNA intercalators) cluster together. Clustering of other drugs with the control group indicates relatively modest changes when compared with bortezomib, daunorubicin, and doxorubicin. (B) PCA-based hierarchical clustering of similarity matrix shows four distinct clusters of gene expression pattern: bortezomib (cluster 1); daunorubicin and doxorubicin (cluster 2); BI-2536 (cluster 3); and nitrendipine, PHA-767491, solifenacin, tegaserod, and WZ8040 (cluster 4). Cluster 4 shows a gene expression profile closer to the DMSO control group. (C) Heatmap of the most differentially abundant cardiac tissue-enriched genes in each drug cluster versus the DMSO-treated group. Each drug cluster shows up to five most differentially expressed genes (false discovery rate [FDR] < 0.05), showing the presence of four distinct cluster-associated gene groups. (D) Heatmap of the most differentially enriched pathways in each drug cluster. Unbiased single-sample gene set enrichment analysis (ssGSEA) was performed on curated gene lists representing the gene sets most enriched or depleted in each drug cluster as compared with the DMSO-treated group and identified by Welch’s t-test.

Bortezomib-treated cardiomyocytes showed significant downregulation of major genes encoding structural proteins associated with different forms of familial cardiomyopathies (Haas et al., 2015). For example, TCAP (regulates sarcomere assembly and titin assembly; implicated in familial hypertrophic cardiomyopathy), MYL3 (myosin light chain 3; implicated in left ventricular hypertrophic cardiomyopathy and restrictive cardiomyopathy), LDB3, and DES were highly downregulated (Figure 7C). Doxorubicin- and daunorubicin-treated clusters were identified by substantial downregulation of ANKRD1 (associated with dilated cardiomyopathy), LMNA (Lamin A/C; known mutations result in cardiomyopathy), as well as NKX2-5,and GATA4 (two transcription factors essential for cardiac development and survival).

BI-2536-related cardiotoxicity in cluster 3 was associated with reduced expression of AGT (angiotensinogen; a critical component of the renin-angiotensin system), PRKCH (protein kinase C, eta; a calcium-dependent serine/threonine protein kinase), and NFATC4 (a cardiac transcription factor required for oxidative phosphorylation activity as well as cardiomyocyte proliferation and differentiation) (Bushdid et al., 2003). The fourth cluster with the remaining five drugs (nitrendipine, PHA-767491, solifenacin, tegaserod, and WZ8040) presented more subtle transcriptional changes, including reduced expression of KCNE2 (potassium voltage-gated channel subfamily E regulatory subunit 2; associated with arrhythmic abnormalities) and HSPB6 (a molecular chaperone with greater expression in response to stress or tissue damage) (Li et al., 2017; Figure 7C).

To evaluate functional perturbations in each cluster, we performed single-sample gene set enrichment analysis (ssGSEA) (Subramanian et al., 2005) and normalized enrichment score for each sample. Normalized enrichment scores were scaled and centered on the average scores of the DMSO group (Figure 7D).

Deep learning detects structural frameworks that cause cardiotoxicity in iPSC-CMs

To identify chemical structure frameworks that lead to cardiotoxicity in iPSC-CMs, we used our deep learning approach to assess a library of 1280 diverse compounds with no known targets. We used the same conditions as the bioactive library screen. However, to optimize the accuracy and applicability of the diverse library screen, we established a new set of training images and developed a new deep learning model that incorporated all generated datasets. From the 1280 diverse compounds, our deep learning approach identified 33 hits to cause cardiotoxicity in iPSC-CMs (Figure 8A, B). Analysis of the structure-activity relationship revealed three structural frameworks with two compounds (also classified as matched pairs) in each framework set (Figure 8C). For example, Framework 2a (cardiotoxicity score: 1.41) and its matched pair Framework 2b (cardiotoxicity score: 0.69) showed that adding an ethylene spacer increased the cardiomyocyte toxicity score by twofold (Figure 8C). Based on this result, high-content phenotypic analysis using deep learning is a powerful approach to identifying chemical frameworks with cardiotoxic liabilities. By using this approach during the lead optimization process, we can de-risk a clinical development program and potentially reduce drug attrition in late-stage clinical trials.

Screen of diverse compound library and identified cardiotoxic chemical frameworks.

(A) Cardiotoxicity scores of 1280 diverse compounds screened at 1 µM. Sorted data from the 3-class deep learning model are plotted. Higher scores correspond to a higher probability of a cardiotoxicity signature. Bortezomib and doxorubicin were used as cardiotoxic controls (Cntrls). (B) Cardiotoxicity heatmap scores of top 33 cardiotoxic hits based on deep learning signal. The immunostaining signal is also displayed as a heatmap. Drugs and doses with cardiotoxicity signals in induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) are indicated in red and yellow, whereas non-toxic drugs are indicated in blue. Signal intensity from immunostaining and nuclear count was normalized to the dimethyl sulfoxide (DMSO) control and converted to a toxicity scale. (C) Three structural frameworks with matched pairs were identified to lead to cardiotoxicity in iPSC-CMs: Framework 1 (1a and 1b, tetrahydronaphthalene-2-sulfonamide core), Framework 2 (2a and 2b, 4-benzylpiperidine core), and Framework 3 (3a and 3b, 3-azabicycloheptane core). (D) Heatmap of chymotrypsin-like biochemical activity of 10 proteasome inhibitors profiled in iPSC-CMs. (E) Cardiotoxicity scores of 10 proteasome inhibitors plotted as a function of chymotrypsin-like biochemical IC50. Six of the proteasome inhibitors showed a similar range of biochemical potencies (dashed box). Delanzomib showed the lowest cardiotoxicity score; carfilzomib, bortezomib, and epoxomicin showed the highest cardiotoxicity scores.

Figure 8—source data 1

Screen of diverse compound library and identified cardiotoxic chemical frameworks.

https://cdn.elifesciences.org/articles/68714/elife-68714-fig8-data1-v2.xlsx

Deep learning identifies proteasome inhibitors with reduced cardiotoxicity in iPSC-CMs

Bortezomib and carfilzomib are FDA-approved proteasome inhibitors used for oncology indications, including multiple myeloma and mantle cell lymphoma (Dimopoulos et al., 2016; Richardson et al., 2003). Patients given these drugs can develop cardiomyopathy, worsening heart rhythm, heart failure, and death due to cardiac arrest (Moslehi, 2016). To identify proteasome inhibitors that may reduce cardiotoxic liability, we profiled the potency of 10 proteasome inhibitors using the Proteasome-Glo biochemical assay (Figure 8D). For each inhibitor, we calculated the biochemical IC50 and measured the cardiotoxicity score using deep learning. We identified six inhibitors with a similar range of biochemical potencies (~12 to ~25 nM): carfilzomib, bortezomib, epoxomicin, delanzomib, ixazomib, and oprozomib. Carfilzomib, bortezomib, and epoxomicin had the highest cardiotoxicity score; delanzomib showed the lowest cardiotoxicity liability; and ixazomib and oprozomib showed medium levels of cardiotoxicity (Figure 8E). This analysis is an example of how this type of phenotypic screening would help to de-risk a small-molecule program at an early stage. By focusing on a chemical series (or pathway) that shows high levels of potency for the intended target, we can identify compounds in that series that limit undesired toxicity in a specific cell type with a major liability (such as cardiomyocytes and hepatocytes).

Discussion

In this study, we combined the power of human iPSC technology with high-content image analysis using deep learning to detect signatures of cardiotoxicity in a high-throughput chemical screen. First, we treated iPSC-CMs with drugs known to cause cardiotoxicity in the clinic. We combined the contractility readout, immunostaining, and nuclear count to stratify doses of compounds into defined categories: mildly toxic, toxic, and highly toxic. Next, we trained a neural network with fluorescently labeled images to classify the images by degrees of cardiotoxicity with a data-driven method. We then constructed three deep learning models to understand how the categories would differ based on how the toxicities were classified with imaging. We found that all models strongly agreed in identifying hits that cause cardiotoxic liabilities in iPSC-CMs.

Our screening identified classes of compounds that clustered into distinct mechanisms of action and putative target classes. Some of the compounds with the most likely cardiotoxic liabilities included DNA intercalators and ion channel blockers, as well as EGFR, CDK, and multi-kinase inhibitors. We validated these hits using a second round of deep learning and orthogonal assays that measured mitochondrial respiration and evaluated BNP as a marker of stress.

Cardiomyocytes in the adult human heart contain approximately 30% mitochondria by cell volume. These mitochondria critically regulate many cellular processes, such as metabolism, oxidative stress, cell survival, and apoptotic death (Ventura-Clapier et al., 2011). The adult human myocardium is highly metabolically active. Approximately 70–80% of its energy derives from oxidative phosphorylation in mitochondria fueled by fatty acid–based oxidative phosphorylation. Conversely, in immature neonatal myocardium or iPSC-CMs, ATP production is predominately through the glycolytic pathway (Sacchetto et al., 2019; Siasos et al., 2018). In previous studies, exposing iPSC-CMs to various clinically approved compounds with known cardiotoxicities led to a dose-dependent decrease in mitochondrial function in these cells (Burridge et al., 2016; Rana et al., 2012). Using mitochondrial bioenergetics in iPSC-CMs, we showed that cardiotoxic hits identified from our HCS negatively affected mitochondrial bioenergetics in iPSC-CMs, including reduced basal respiration, ATP production, and maximal respiration rate.

Using RNA-seq analysis, we confirmed that iPSC-CMs treated with drugs identified using our phenotypic cardiotoxicity screening approach resulted in downregulation of key structural and developmental genes, including TCAP, MYL3, LDB3, DES, ANKRD1, NKX2-5, and GATA4. In addition, ssGSEA showed downregulation of pathways involved in cardiac muscle contraction, development, and differentiation, further validating our phenotypic screening approach to identifying cardiotoxic drugs.

Use of iPSC-derived models for cardiotoxicity (Sharma et al., 2017; Sirenko et al., 2017) and hepatotoxicity screening has been proposed (Mann, 2015). Phenotypic assays with generic readouts, such as cell viability, are typically less sensitive in identifying subtle toxicity signatures and less likely to enable hit triaging at an early stage. Other toxicity assays in iPSC-CMs, such as electrophysiology and multi-electrode arrays (Takeda et al., 2018), are not scalable for HTS. Instead, high-content imaging, such as used in this study, enables researchers to use more complex assays for HTS lead discovery.

Essential components are needed to build a useful deep learning neural network. For example, constructing a relatively large and annotated dataset can strengthen the model. In general, the larger the dataset, the better the performance of the model. In addition, the quality of the dataset determines the success of the deep learning model. For instance, a more homogenous set of images in well-controlled studies enables better model performance. Conversely, heterogeneity in the cells leads to less accurate and reliable models. Furthermore, the experimenter should ensure that the classes are balanced in the training dataset. Without this balance, the training dataset will display bias toward the most represented class, and in extreme cases, may completely ignore the minority class (Johnson and Khoshgoftaar, 2019). In this study, we trained the model by feeding the neural network between 250 and 500 images of pure iPSC-CMs at high cell densities from each class. We also used eight drugs and hundreds of images from each drug to establish our training dataset. This approach ensured that we achieved a relatively large and balanced annotated dataset, while also reducing cellular heterogeneity. In future work in which researchers may aim to identify a cardiotoxicity score with deep learning using in vitro models, we recommend that they establish a deep learning model based on a large training dataset using known cardiotoxins based on clinical data. Then images from test articles can be analyzed through the neural net to compare the level of toxicity to reference cardiotoxic drugs.

Widespread use of deep learning methods in the biological sciences can be challenging. This challenge may be attributed to technical and cultural norms (Moen et al., 2019). For example, biological data is inherently complex, heterogeneous, and variable. Thus, a certain threshold of data and computational resources is needed for data annotation to build accurate and useful deep learning models. Another concern has been the ‘black box’ nature of the method, or not knowing exactly what features the deep learning net is identifying (Moen et al., 2019). We propose that the black box is ‘a feature and not a bug’ of deep learning. We believe the black box allows researchers to interrogate a large set of perturbations in a cell-agnostic, unbiased, and high-throughput manner. Another advantage of deep learning is that deep neural networks significantly outperformed random forest by 22–27% in four measured metrics (Idakwo et al., 2019), suggesting that this approach is more robust than traditional chemoinformatics analysis. Although we did not compare deep learning and chemoinformatics in this study, this comparison would be a valuable addition to future research.

In this study, we showcase the power of deep learning in uncovering new biological insights and how the technology can be more accessible to biologists. Deep learning can be used as an additional tool to interrogate subtle and difficult-to-define phenotypes. Our findings support that deep learning is a highly sensitive method to detect cardiotoxicity phenotypes that are undetectable by traditional single-readout assays. While the scope of our study was limited to identifying cardiotoxicity signatures in iPSC-CMs, the method can be applied to identifying toxicity in other cell types.

Screening using iPSC-derived cells with deep learning can also be applied to de-risk early-stage drug discovery and ensure that compounds with on- and off-target toxicity are triaged at an early stage in drug development. In particular, deep learning can be used to identify targets with high-throughput perturbation screening (using small molecules, siRNA, and CRISPR libraries). Our screening strategy is cell and modality agnostic, and it can be applied to a range of indications, such as orphan and neglected diseases. Thus, combining iPSC technology and deep learning can accelerate discovery efforts by supporting rapid model development and interrogation of targets that could be used in different areas of research and for multiple indications.

Data availability

Our RNA-Seq data has been deposited on the Gene Expression Omnibus (GEO) database, GEO Submission GSE172181.

The following data sets were generated
    1. Grafton F
    2. Ho J
    3. Ranjbarvaziri S
    4. Farshidfar F
    5. Budan A
    6. Steltzer S
    7. Maddah M
    8. Loewke KE
    9. Green K
    10. Patel S
    11. Hoey T
    12. Mandegar MA
    (2021) NCBI Gene Expression Omnibus
    ID GSE172181. Deep learning detects cardiotoxicity in a high-content screen with induced pluripotent stem cell-derived cardiomyocytes.

References

  1. Book
    1. Bray M
    2. Carpenter A
    (2017)
    Advanced Assay Development Guidelines for Image-Based High Content Screening and Analysis - Assay Guidance Manual
    Bethesda, MA: NCBI Bookshelf.
    1. Doust J
    2. Lehman R
    3. Glasziou P
    (2006)
    The role of BNP testing in heart failure
    American Family Physician 74:1893–1898.
  2. Website
    1. Plus M
    (2020) Solifenacin
    Accessed November 28, 2020.
  3. Website
    1. RxList
    (2020) Adrucil
    Accessed November 28, 2020.
    1. Ventura-Clapier R
    2. Garnier A
    3. Veksler V
    4. Joubert F
    (2011) Bioenergetics of the failing heart
    Biochimica Et Biophysica Acta (BBA) - Molecular Cell Research 1813:1360–1372.
    https://doi.org/10.1016/j.bbamcr.2010.09.006

Decision letter

  1. Arduino A Mangoni
    Reviewing Editor; Flinders Medical Centre, Australia
  2. Matthias Barton
    Senior Editor; University of Zurich, Switzerland
  3. Feixiong Cheng
    Reviewer; Vanderbilt University Medical Center, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This study presents a deep learning approach plus iPSC-based high-throughput screening data for risk assessment of cardiotoxicity in the early phase of drug discovery. The authors demonstrate reasonable accuracy using a library of 1,280 bioactive compounds. Overall, this is an interesting study which provides potential tools for risk assessment of cardiotoxicity the early phase of drug discovery if broadly applied.

Decision letter after peer review:

Thank you for submitting your article "Deep Learning Predicts Patterns of Cardiotoxicity in a High-Content Screen Using Induced Pluripotent Stem Cell-Derived Cardiomyocytes" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Dr. Feixiong Cheng (Reviewer #2).

The Reviewers and Editors have discussed their reviews with one another, and this decision letter is to help you prepare a revised submission.

Essential Revisions:

1) The article lacks practical discussion, considerations and recommendations on how to validate the cardiotoxicity score when aiming to predict clinical drug cardiotoxicity, which will require comparing with clinical data.

2) This reviewer recommends the authors to read the review manuscript by Walker and colleagues published in 2020 in the journal Archives of Toxicology and entitled "The evolution of strategies to minimise the risk of human drug‑induced liver injury (DILI) in drug discovery and development" to understand how the pharmaceutical industry uses image-based data to predict clinical toxicity. Despite describing applications in liver toxicity, since it focuses extensively on image-based approaches, the practical approaches of the article can be easily translatable to the predictive effort of the neural network method in the cardiotoxicity arena.

3) This reviewer especially recommends considering Cmax or Cmax unbound of tested drugs and investigating IC50 cutoff values for predicting drug safety ranges in relation to the proposed cardiotoxicity score.

4) It is not clear how the deep learning approach outperform traditional approaches, such as doi: 10.1126/scitranslmed.aaf2584.

5) For deep learning models, it is not clear how the authors perform hyperparameter tuning, a key issue for deep learning models.

6) The authors are suggested to compared iPSC-based image-based deep learning models with traditional chemoinformatics approaches, such as random forest-based structure-activity relationships approach.

7) The reviewer is confused why the authors presented ProBNP assays. The authors only show negative results of ProBNP; yet, how it related to the deep learning models in current manuscript.

8) Cardiotoxicity is highly time-dependent and dose-dependent. How the authors address time-dependent and dose-dependent cardio-toxicity in their deep learning models.

9) The authors presented transcriptomics and metabolism analysis. How these omics layers can be integrated into deep learning models in the future studies?

10) The codes and data should be provided in public domains, such as GitHub or other open source websites.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for submitting your article "Deep Learning Predicts Patterns of Cardiotoxicity in a High-Content Screen Using Induced Pluripotent Stem Cell-Derived Cardiomyocytes" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Matthias Barton as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Feixiong Cheng (Reviewer #2).

Essential Revisions:

Comment 1– It is difficult to understand how this method (cardiotoxicity score) is predictive of clinical cardiotoxicity if the score is dependent on the training data and drugs used because clinical cardiotoxicity due to drug-induced effects will occur for drugs independently of what would be the surrogate of training data or reference drugs in clinical trials. Probably authors could avoid stating this method as predictive, but instead should present it as an approach for comparing potential cardiotoxic effects between new drugs and reference drugs without predicting clinical safety ranges. A drug can be safe if the range of efficacious concentrations is safely below the toxic range. If the authors still aim to demonstrate that the presented method is predictive of clinical data, a more reflected response to this comment is recommended, taking in consideration the current approaches in the field to test the clinical predictiveness of in vitro data. Since so much clinical and animal data is publicly available for several of the used drugs, it is unconceivable for approaches that aim to be predictive of clinical data to ignore these.

Comments 2 and 3 – There is no proof of principle results or proposed/ described comprehensive strategy or recommendations to predict clinical data from the results derived from this method. In line with the previous comment, the authors seem to be pretending that most of the used drugs are new drugs and that no clinical data is available. The authors also seem to pretend that clinical data is not currently predicted from in vitro cellular data to estimate safety ranges of drug concentration. The authors should rethink their response, especially when considering that some of the used drugs have been used for so long clinically. These are not investigational drugs. If authors want to "pitch" this method as predictive, please add information on how to predict clinical data from it considering how other authors in the field already predict clinical toxic drug effects (safety ranges) from in vitro cellular data. The results from at least one drug should be demonstrated to be predictive and not false-positives or false-negatives. Alternatively, this effort will not be necessary if authors remove from the manuscript any claims that this method is predictive of drug cardiotoxicity.

https://doi.org/10.7554/eLife.68714.sa1

Author response

Essential Revisions:

1) The article lacks practical discussion, considerations and recommendations on how to validate the cardiotoxicity score when aiming to predict clinical drug cardiotoxicity, which will require comparing with clinical data.

In this study, we described the process of deep learning as a methodology. To use deep learning to predict a cardiotoxicity score, we recommend that researchers establish a deep learning model based on a large training dataset using known cardiotoxins based on clinical data (for example, we used eight drugs listed under Supplementary File 1 and Figure 2—figure supplement 1, and hundreds of images from each drug to establish our training dataset). Then images from test articles can be analyzed through the neural net to predict the level of toxicity. We added these details to the Discussion section. If researchers would like to use and/or access our deep learning models and images for their academic or commercial research, we invite them to contact the corresponding author at mandegar@tenayathera.com.

2) This reviewer recommends the authors to read the review manuscript by Walker and colleagues published in 2020 in the journal Archives of Toxicology and entitled "The evolution of strategies to minimise the risk of human drug‑induced liver injury (DILI) in drug discovery and development" to understand how the pharmaceutical industry uses image-based data to predict clinical toxicity. Despite describing applications in liver toxicity, since it focuses extensively on image-based approaches, the practical approaches of the article can be easily translatable to the predictive effort of the neural network method in the cardiotoxicity arena.

We thank the reviewer for this recommendation. We have addressed this suggestion under Comment 3.

3) This reviewer especially recommends considering Cmax or Cmax unbound of tested drugs and investigating IC50 cutoff values for predicting drug safety ranges in relation to the proposed cardiotoxicity score.

Thank you for this recommendation, which would be a valuable part of follow-up studies. We believe that drug toxicity may not be driven only by Cmax but also by AUC exposure (or both may depend on the off-targets). To elucidate which drug’s toxicity is driven by Cmax or AUC, we would need to conduct extensive and regulated follow-up studies based on human data and preclinical species surrogates. However, these studies would be molecule-specific based on analyses such as MetID and not comparable between molecules. Also, data from preclinical species would not be a good indicator for applying Cmax values to human cells. In our paper, the clinical data for the compounds we tested are limited in our libraries (they are composed of limited clinical molecules, preclinical tool molecules, and more). These diverse libraries lack published human pharmacokinetic data. We believe that this analysis would be helpful when more human data is generated and/or released in the public domain.

4) It is not clear how the deep learning approach outperform traditional approaches, such as doi: 10.1126/scitranslmed.aaf2584.

Sharma et al., (2017) evaluated cardiotoxicity mostly by measuring cytotoxicity and contractility (with a follow-up assay using calcium handling and targeted qPCR). To generate data for our deep learning training, we evaluated loss of contractility (Figure 2A-2C and Figure 2—figure supplement 1) and cytotoxicity of eight known cardiotoxins through nuclear count (Figure 2D). We found that measuring cytotoxicity (either through loss of nuclear count or sarcomere intensity measured by MYBPC3 and ACTN2) and contractility did not provide the dynamic range suitable for high-throughput screening. For example, in Figure 4, we show that measuring nuclear count and sarcomere staining intensity did not provide the appropriate Z-factors for hit detection when the methodology is applied in a high-throughput manner.

With deep learning, we achieved Z-factors >0.5, which provided a more reliable method for detecting subtle phenotypic differences with high-throughput screening. For example, a relatively low dose of bortezomib (0.1μM) would not be detected as cardiotoxic when using only cytotoxicity and sarcomere staining. But with deep learning, we can reliably detect patterns of toxicity through sarcomere damage (Figure 4). To clarify this idea, our Discussion section states:

“Our findings support that deep learning is a highly sensitive method to detect cardiotoxicity phenotypes that are undetectable by traditional single-readout assays.”

5) For deep learning models, it is not clear how the authors perform hyperparameter tuning, a key issue for deep learning models.

We built our models using PhenoLearn, without changing the default parameters. For this study, we did not optimize hyper-parameter tuning, but our default values for the parameters were set based on thousands of images and hundreds of trainings we performed on cellular images. We have described some of this methodology in a published study by Maddah et al., 2020, which we referenced in our manuscript.

6) The authors are suggested to compared iPSC-based image-based deep learning models with traditional chemoinformatics approaches, such as random forest-based structure-activity relationships approach.

We thank the reviewers for this comment. In our study, we did not perform traditional chemoinformatics, such as random forest-based structure-activity relationship analysis. We believe this analysis is outside the scope of our manuscript and better suited for a follow-up study. However, we revised our Discussion section to describe the lack of these data as a limitation that would be valuable to assess in a follow-up study. We also referenced a paper by Idakwo et al., 2019 in which deep neural networks significantly outperformed random forest by 22% to 27% in four measured metrics.

7) The reviewer is confused why the authors presented ProBNP assays. The authors only show negative results of ProBNP; yet, how it related to the deep learning models in current manuscript.

ProBNP is a clinical biomarker of cardiac stress. We used a ProBNP ELISA kit as an orthogonal, follow-up assay on media collected from iPSC-CMs treated with the seven validated cardiotoxic compounds and evaluated the levels of BNP. In the media of treated cells, we found inconsistent patterns of ProBNP levels and cardiotoxicity signals. In most cases, the ProBNP signal was lower when treating with a predicted cardiotoxic compound (suggesting loss of cardiac identity). In only one case (PHA-767491) did ProBNP levels increase. Based on our results, we believe that measuring ProBNP levels was not the most reliable marker for predicting cardiotoxicity in iPSC-CMs. We have revised this section in the manuscript for further clarification.

8) Cardiotoxicity is highly time-dependent and dose-dependent. How the authors address time-dependent and dose-dependent cardio-toxicity in their deep learning models.

We thank the reviewers for this insightful point. We agree that cardiotoxicity (or toxicity in general) is both time-dependent and dose-dependent. In our manuscript, we reported that a subset of known cardiotoxic drugs led to a time-dependent and dose-dependent loss of contractility (see Figure 2A and Figure 2—figure supplement 1).

When we applied deep learning to detect cardiotoxicity, we focused on a single timepoint on fixed cells. We did not perform live imaging because this approach, especially on beating cardiomyocytes, is time-consuming and not easily scalable for high-throughput screening. We could perform sarcomere-based live imaging by establishing stable reporter cell lines, or transfecting/transducing with a sarcomeric protein tagged with a fluorescent marker (eg, ACNT2-mKate2 reporter by Judge et al., 2017). However, our study focused on the scalability of the methodology. Thus, for our primary screen, we assessed a single time point and three doses. Also, to identify hits, most primary high-throughput screening is performed at a single dose. In our study, we broadened the dose range by including three doses in the primary screen to address the dose-dependency of cardiotoxicity.

We believe that performing deep learning on live cells over time would be a valuable addition to future work. We used such an approach in previous work (Maddah et al., 2020), in which we used deep learning on live images with a small number of compounds. In this study, we reported both a time-dependent and dose-dependent aspect to detecting patterns of hepatotoxicity (Maddah et al., 2020, Figure 4) and cardiotoxicity (Maddah et al., 2020, Figure 6). We have added some of these details to the Discussion section of our manuscript.

9) The authors presented transcriptomics and metabolism analysis. How these omics layers can be integrated into deep learning models in the future studies?

In our study, we focused on the deep learning score as the primary output to predict patterns of cardiotoxicity, and we used transcriptomics and metabolic studies as orthogonal assays to validate our findings. In this study, we did not have the resources to analyze the transcriptional landscape of enough samples from each drug-treatment group to create a rich transcriptional training dataset as an additional data source for deep learning modeling. When such a dataset is available, the expression matrix can be used along with imaging data as the training dataset, and generated layers can be merged in the subsequent steps. Such a dataset could also be analyzed separately by either deep learning or traditional machine learning methods, and the generated classifier can be provided to create an Ensemble model in the next step. In this approach, multivariate projection methods (e.g., methods based on partial least squares, multivariate regression methods, random forests) and Support Vector Machines could be used to develop transcriptional classifiers.

10) The codes and data should be provided in public domains, such as GitHub or other open source websites.

The code for the SGFT algorithm by Salick et al., 2020 is available on GitHub:

https://github.com/maxsalick/SGFT

(We added this link to our Materials and methods)

Our RNA-Seq data has been deposited on the Gene Expression Omnibus (GEO) database. GEO Submission (GSE172181):

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE172181

(We added this link to our Materials and methods)

The PhenoLearn code is proprietary. However, we provided information on how the code is constructed in the Materials and methods section:

“We used PyTorch as the framework for the deep learning library, and a ResNet50 193 architecture, a 50-layer-deep convolutional neural network. Each input image was divided into 12 square sub-images to have sizes ranging from 224 x 224 pixels to 300 x 300 pixels (Maddah et al., 2020). Each sub-image was flipped and rotated to create seven more augmented sub-images, and then fed into the input layer of ResNet50. Pseudo-image generation by rotation and flipping ensures enough diversity is seen by the network so that the algorithm is not biased based on the orientation of the images (Moen et al., 2019). We used 80% of the images to construct the neural network and the remaining 20% to validate the deep learning model. A consistent set of parameters were used for all training operations, including an initial learning rate of 0.01 and 20 epochs.”

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Essential Revisions:

Comment 1– It is difficult to understand how this method (cardiotoxicity score) is predictive of clinical cardiotoxicity if the score is dependent on the training data and drugs used because clinical cardiotoxicity due to drug-induced effects will occur for drugs independently of what would be the surrogate of training data or reference drugs in clinical trials. Probably authors could avoid stating this method as predictive, but instead should present it as an approach for comparing potential cardiotoxic effects between new drugs and reference drugs without predicting clinical safety ranges. A drug can be safe if the range of efficacious concentrations is safely below the toxic range. If the authors still aim to demonstrate that the presented method is predictive of clinical data, a more reflected response to this comment is recommended, taking in consideration the current approaches in the field to test the clinical predictiveness of in vitro data. Since so much clinical and animal data is publicly available for several of the used drugs, it is unconceivable for approaches that aim to be predictive of clinical data to ignore these.

Comments 2 and 3 – There is no proof of principle results or proposed/ described comprehensive strategy or recommendations to predict clinical data from the results derived from this method. In line with the previous comment, the authors seem to be pretending that most of the used drugs are new drugs and that no clinical data is available. The authors also seem to pretend that clinical data is not currently predicted from in vitro cellular data to estimate safety ranges of drug concentration. The authors should rethink their response, especially when considering that some of the used drugs have been used for so long clinically. These are not investigational drugs. If authors want to "pitch" this method as predictive, please add information on how to predict clinical data from it considering how other authors in the field already predict clinical toxic drug effects (safety ranges) from in vitro cellular data. The results from at least one drug should be demonstrated to be predictive and not false-positives or false-negatives. Alternatively, this effort will not be necessary if authors remove from the manuscript any claims that this method is predictive of drug cardiotoxicity.

We thank the reviewers for their insightful comments. We removed all claims that this method predicts clinical cardiotoxicity from the manuscript. We rephrased all relevant parts of the manuscript to indicate that this method can be used to detect/identify a cardiotoxicity signal in an in vitro system using induced pluripotent stem cell-derived cardiomyocytes. Also, we took the reviewer’s recommendation and modified part of our discussion to indicate that this method can be used to compare the level of toxicity to reference cardiotoxic drugs.

https://doi.org/10.7554/eLife.68714.sa2

Article and author information

Author details

  1. Francis Grafton

    Tenaya Therapeutics, South San Francisco, United States
    Contribution
    Data curation, Formal analysis, Writing - original draft
    Competing interests
    is an employee of Tenaya Therapeutics and has stock holdings in the company.
  2. Jaclyn Ho

    Tenaya Therapeutics, South San Francisco, United States
    Contribution
    Data curation, Software, Formal analysis, Methodology, Writing - original draft
    Competing interests
    is an employee of Tenaya Therapeutics and has stock holdings in the company.
  3. Sara Ranjbarvaziri

    Cardiovascular Institute and Department of Medicine, Stanford University, Stanford, United States
    Contribution
    Formal analysis, Writing - original draft
    Competing interests
    No competing interests declared
  4. Farshad Farshidfar

    Tenaya Therapeutics, South San Francisco, United States
    Contribution
    Data curation, Software, Formal analysis, Writing - original draft
    Competing interests
    is an employee of Tenaya Therapeutics and has stock holdings in the company.
  5. Anastasiia Budan

    Tenaya Therapeutics, South San Francisco, United States
    Contribution
    Data curation, Formal analysis
    Competing interests
    is an employee of Tenaya Therapeutics and has stock holdings in the company.
  6. Stephanie Steltzer

    Tenaya Therapeutics, South San Francisco, United States
    Contribution
    Data curation, Formal analysis
    Competing interests
    is an employee of Tenaya Therapeutics and has stock holdings in the company.
  7. Mahnaz Maddah

    Dana Solutions, Palo Alto, United States
    Contribution
    Software
    Competing interests
    is affiliated with Dana Solutions. The author has no other competing interests to declare.
  8. Kevin E Loewke

    Dana Solutions, Palo Alto, United States
    Contribution
    Software
    Competing interests
    is affiliated with Dana Solutions. The author has no other competing interests to declare.
  9. Kristina Green

    Tenaya Therapeutics, South San Francisco, United States
    Contribution
    Resources
    Competing interests
    is an employee of Tenaya Therapeutics and has stock holdings in the company.
  10. Snahel Patel

    Tenaya Therapeutics, South San Francisco, United States
    Contribution
    Data curation, Formal analysis, Validation, Writing - original draft, Writing - review and editing
    Competing interests
    is an employee of Tenaya Therapeutics and has stock holdings in the company.
  11. Tim Hoey

    Tenaya Therapeutics, South San Francisco, United States
    Contribution
    Resources
    Competing interests
    is an employee of Tenaya Therapeutics and has stock holdings in the company.
  12. Mohammad Ali Mandegar

    Tenaya Therapeutics, South San Francisco, United States
    Contribution
    Conceptualization, Resources, Data curation, Software, Formal analysis, Supervision, Funding acquisition, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    mandegar@tenayathera.com
    Competing interests
    is an employee of Tenaya Therapeutics and has stock holdings in the company.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5323-7891

Funding

No external funding was received for this work.

Acknowledgements

We thank Crystal R Herron for editing the manuscript. We thank members of Tenaya’s drug discovery team for technical assistance and helpful comments on the manuscript.

Senior Editor

  1. Matthias Barton, University of Zurich, Switzerland

Reviewing Editor

  1. Arduino A Mangoni, Flinders Medical Centre, Australia

Reviewer

  1. Feixiong Cheng, Vanderbilt University Medical Center, United States

Publication history

  1. Preprint posted: March 24, 2021 (view preprint)
  2. Received: March 24, 2021
  3. Accepted: July 27, 2021
  4. Accepted Manuscript published: August 2, 2021 (version 1)
  5. Version of Record published: August 16, 2021 (version 2)

Copyright

© 2021, Grafton et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 849
    Page views
  • 136
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Stem Cells and Regenerative Medicine
    Diptiman Chanda et al.
    Short Report

    Multicellular organisms maintain structure and function of tissues/organs through emergent, self-organizing behavior. In this report, we demonstrate a critical role for lung mesenchymal stromal cell (L-MSC) aging in determining the capacity to form three-dimensional organoids or ‘alveolospheres’ with type 2 alveolar epithelial cells (AEC2s). In contrast to L-MSCs from aged mice, young L-MSCs support the efficient formation of alveolospheres when co-cultured with young or aged AEC2s. Aged L-MSCs demonstrated features of cellular senescence, altered bioenergetics, and a senescence-associated secretory profile (SASP). The reactive oxygen species generating enzyme, NADPH oxidase 4 (Nox4), was highly activated in aged L-MSCs and Nox4 downregulation was sufficient to, at least partially, reverse this age-related energy deficit, while restoring the self-organizing capacity of alveolospheres. Together, these data indicate a critical role for cellular bioenergetics and redox homeostasis in an organoid model of self-organization and support the concept of thermodynamic entropy in aging biology.

    1. Neuroscience
    2. Stem Cells and Regenerative Medicine
    Arantxa Cebrian-Silla et al.
    Research Article Updated

    The ventricular-subventricular zone (V-SVZ), on the walls of the lateral ventricles, harbors the largest neurogenic niche in the adult mouse brain. Previous work has shown that neural stem/progenitor cells (NSPCs) in different locations within the V-SVZ produce different subtypes of new neurons for the olfactory bulb. The molecular signatures that underlie this regional heterogeneity remain largely unknown. Here, we present a single-cell RNA-sequencing dataset of the adult mouse V-SVZ revealing two populations of NSPCs that reside in largely non-overlapping domains in either the dorsal or ventral V-SVZ. These regional differences in gene expression were further validated using a single-nucleus RNA-sequencing reference dataset of regionally microdissected domains of the V-SVZ and by immunocytochemistry and RNAscope localization. We also identify two subpopulations of young neurons that have gene expression profiles consistent with a dorsal or ventral origin. Interestingly, a subset of genes are dynamically expressed, but maintained, in the ventral or dorsal lineages. The study provides novel markers and territories to understand the region-specific regulation of adult neurogenesis.