A deep learning algorithm to translate and classify cardiac electrophysiology
Abstract
The development of induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) has been a critical in vitro advance in the study of patient-specific physiology, pathophysiology and pharmacology. We designed a new deep learning multitask network approach intended to address the low throughput, high variability and immature phenotype of the iPSC-CM platform. The rationale for combining translation and classification tasks is because the most likely application of the deep learning technology we describe here is to translate iPSC-CMs following application of a perturbation. The deep learning network was trained using simulated action potential (AP) data and applied to classify cells into the drug-free and drugged categories and to predict the impact of electrophysiological perturbation across the continuum of aging from the immature iPSC-CMs to the adult ventricular myocytes. The phase of the AP extremely sensitive to perturbation due to a steep rise of the membrane resistance was found to contain the key information required for successful network multitasking. We also demonstrated successful translation of both experimental and simulated iPSC-CM AP data validating our network by prediction of experimental drug-induced effects on adult cardiomyocyte APs by the latter.
Data availability
Since we used simulated data, we have made all drugged and drug-free iPSC-CM and adult-CM AP data used for training and testing the multitask network publicly available at Clancy lab Github.(https://github.com/ClancyLabUCD/Multitask_network/tree/master/data). In addition, we have illustrated training and test dataset in Figure1 and Figure5.We have also shared the jupyter notebook for preparing clean and organized data for training the network at Clancy lab Github (https://github.com/ClancyLabUCD/Multitask_network/tree/master/jupyter).We also made experimental data used for the model validation publicly available at Clancy lab Github.(https://github.com/ClancyLabUCD/Multitask_network/blob/master/data/clean_data/experiments.csv ). Figure 7 illustrates the experimental data we used to validate the network.
Article and author information
Author details
Funding
Common Fund (OT2OD026580)
- Igor Vorobyov
- Colleen E Clancy Ph.D.
Texas Advanced Computing Center Leadership Resource Allocation (MCB20010)
- Igor Vorobyov
- Colleen E Clancy Ph.D.
Oracle cloud for research allocation
- Igor Vorobyov
- Colleen E Clancy Ph.D.
Common Fund (OT2OD025308‐01S2)
- Parya Aghasafari Ph.D.
American Heart Association (19CDA34770101)
- Igor Vorobyov
National Heart, Lung, and Blood Institute (R01HL152681)
- Igor Vorobyov
- Colleen E Clancy Ph.D.
National Heart, Lung, and Blood Institute (R01HL128170)
- Colleen E Clancy Ph.D.
National Heart, Lung, and Blood Institute (U01HL126273)
- Colleen E Clancy Ph.D.
Department of Physiology and Membrane Biology Research Partnership Fund
- Igor Vorobyov
- Colleen E Clancy Ph.D.
Extreme Science and Engineering Discovery Environment (MCB170095)
- Igor Vorobyov
- Colleen E Clancy Ph.D.
National Center for Supercomputing Applications Blue Waters Broadening Participation Allocation
- Igor Vorobyov
- Colleen E Clancy Ph.D.
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Reviewing Editor
- Thomas Hund, The Ohio State University, United States
Publication history
- Received: March 11, 2021
- Accepted: June 29, 2021
- Accepted Manuscript published: July 2, 2021 (version 1)
- Version of Record published: July 15, 2021 (version 2)
Copyright
© 2021, Aghasafari et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 1,584
- Page views
-
- 233
- Downloads
-
- 8
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Computational and Systems Biology
- Genetics and Genomics
Cardiometabolic diseases encompass a range of interrelated conditions that arise from underlying metabolic perturbations precipitated by genetic, environmental, and lifestyle factors. While obesity, dyslipidaemia, smoking, and insulin resistance are major risk factors for cardiometabolic diseases, individuals still present in the absence of such traditional risk factors, making it difficult to determine those at greatest risk of disease. Thus, it is crucial to elucidate the genetic, environmental, and molecular underpinnings to better understand, diagnose, and treat cardiometabolic diseases. Much of this information can be garnered using systems genetics, which takes population-based approaches to investigate how genetic variance contributes to complex traits. Despite the important advances made by human genome-wide association studies (GWAS) in this space, corroboration of these findings has been hampered by limitations including the inability to control environmental influence, limited access to pertinent metabolic tissues, and often, poor classification of diseases or phenotypes. A complementary approach to human GWAS is the utilisation of model systems such as genetically diverse mouse panels to study natural genetic and phenotypic variation in a controlled environment. Here, we review mouse genetic reference panels and the opportunities they provide for the study of cardiometabolic diseases and related traits. We discuss how the post-GWAS era has prompted a shift in focus from discovery of novel genetic variants to understanding gene function. Finally, we highlight key advantages and challenges of integrating complementary genetic and multi-omics data from human and mouse populations to advance biological discovery.
-
- Computational and Systems Biology
Predicting the function of a protein from its amino acid sequence is a long-standing challenge in bioinformatics. Traditional approaches use sequence alignment to compare a query sequence either to thousands of models of protein families or to large databases of individual protein sequences. Here we introduce ProteInfer, which instead employs deep convolutional neural networks to directly predict a variety of protein functions – Enzyme Commission (EC) numbers and Gene Ontology (GO) terms – directly from an unaligned amino acid sequence. This approach provides precise predictions which complement alignment-based methods, and the computational efficiency of a single neural network permits novel and lightweight software interfaces, which we demonstrate with an in-browser graphical interface for protein function prediction in which all computation is performed on the user’s personal computer with no data uploaded to remote servers. Moreover, these models place full-length amino acid sequences into a generalised functional space, facilitating downstream analysis and interpretation. To read the interactive version of this paper, please visit https://google-research.github.io/proteinfer/.