Large-scale synthetic data enable digital twins of human excitable cells

Pei-Chi Yang; Mao-Tsuen Jeng; Deborah K Lieu; Regan L Smithers; Gonzalo Hernandez-Hernandez; L Fernando Santana; Colleen E Clancy

doi:10.7554/eLife.110013.1

Introlduction

Individual variability determines the emergence and severity of disease, as well as individual therapeutic response in the context of both inherited and acquired conditions^1–3. Traditional preclinical models are designed to minimize variability to isolate specific biological effects. While this reductionist approach simplifies experimental interpretation, it fails to reflect the heterogeneity that exists in human populations. The consequence is a persistent translational gap, incomplete mechanistic understanding of disease, and a higher risk of adverse drug reactions^4–8. Digital twins constitute a mechanistic approach to integrate individualized data with simulation-based inference to predict physiology and therapeutic outcomes at the level of an individual cell^9–12.

Human induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) are a powerful platform for studying electrophysiology in a patient-specific context^13–19. However, experimental measurements of iPSC-CM action potentials (APs) or ionic currents provide only partial information about the underlying mechanisms^20–24. The relationship between measured signals and the large number of biophysical parameters governing ion channel kinetics, calcium handling, and membrane transport is complex and nonlinear, making direct parameter estimation from small experimental data sets impossible. Existing fitting approaches are slow, require extensive specialized expertise, and are not scalable to capture individual variability^25–31.

Mathematical models of cardiac electrophysiology comprise a complementary framework to link cellular dynamics and underlying ionic mechanisms^32,33. Personalized digital twins have not yet been realized in part because of ongoing challenges to parameterize models accurately and quickly from cell-specific experimental data. Recent advances in deep learning now make it possible to solve the associated complex inverse problem by deriving the mappings from high-dimensional time-dependent recordings to the underlying biophysical parameters. The key is the generation of massive synthetic datasets generated from large in silico cell populations to generate the training data necessary for accurate and generalizable inference. It is not possible to develop a comparable process with purely experimental data.

Here, we present an integrated computational modeling and simulation, deep learning framework and experimental recordings demonstrating the generation of high-fidelity, cell-specific digital twins of iPSC-CMs from a single optimized voltage-clamp protocol. Large-scale synthetic model iPSC-CM populations were used to train a fully connected neural network to predict 52 biophysical parameters governing six major ionic currents. A deep learning-guided optimization loop was used to design voltage-clamp protocols that maximize information content for parameter recovery. The models reconstructed from the model-generated parameters reproduce the AP waveform, as well as calcium dynamics and the contributions of individual ionic currents. When applied to experimental data, this approach generates high fidelity digital twins of experimental recordings with high accuracy across temperatures and morphological variability.

This work addresses a major translational challenge by providing a scalable, temperature agnostic, morphology inclusive method for personalized cardiac modeling to bridge the gap between experimental recordings and mechanistic insight. We present a framework for predictive digital twin applications in precision cardiovascular medicine and beyond since the method can be applied to any electrically active cell type. By uniting protocol optimization, large-scale synthetic training, and deep learning-based inference, this work advances the field toward scalable digital twin technology with broad potential for basic research, preclinical testing, and personalized cardiovascular medicine.

Results

Variability and Digital Twin Modeling of iPSC-CMs

To assess the electrophysiological variability inherent in populations of iPSC-CMs, we first generated a simulated population of 200,000 spontaneously beating cells by applying ±40% random variation to 52 biophysical model parameters (see online SI). These parameters govern six key ionic currents central to cardiac AP generation and repolarization: I_Kr, I_CaL, I_Na, I_Ks, I_K1, and I_f. Examples of population variability are shown in 20 APs in Figure 1A. With a broad range of parameter perturbation, there was extensive variability observed in the model population like what is observed in experimental iPSC-CM preparations.

Huge variability captured in simulated iPSC-derived cardiomyocyte populations.
**(A)** To illustrate population variability, 20 action potentials (APs) were shown, each resulting from ±40% random variation applied to 52 parameters governing six key ionic currents (I_Kr, I_CaL, I_Na, I_Ks, I_K1, and I_f,) in the baseline Kernik iPSC-CM model⁵⁰, within a simulated population of 200,000 spontaneously beating cells. **(B)** These perturbations yielded a wide spectrum of APs, with substantial variation in both waveform and frequency. **(C)** The corresponding total ionic current and its decomposition into individual current components are visualized in panels (I_Kr, I_CaL, I_Na, I_Ks, I_K1, and I_f, **D–I, respectively**).

The simulations revealed a wide distribution of AP morphologies and spontaneous firing frequencies (examples in Figure 1B). APs varied substantially in upstroke velocity, plateau duration, and repolarization slope, with some cells exhibiting prolonged plateaus while others demonstrated rapid repolarization and shorter cycle lengths. Differences in spontaneous beat rate reflected variability in diastolic depolarization rates, which were closely linked to variations in I_f amplitude and I_K1 strength across the simulated population.

To dissect the mechanistic drivers of electrophysiological variety, we examined the total membrane current (I_total) and its decomposition into individual ionic components for representative cells across the population (Figure 1C–I). The heterogeneity in total current waveforms was accompanied by diverse current profiles across I_Kr, I_CaL, I_Na, I_Ks, I_K1, and I_f. Some cells exhibited larger I_CaL amplitude with prolonged calcium entry during the plateau, while others showed enhanced I_K1 or I_Kr mediated repolarizing currents. Differences in I_Na upstroke magnitude and I_f driven diastolic depolarization contributed to variation in excitability and cycle length. Together, these simulations underscore the high degree of functional variability that can emerge from relatively modest variation in channel kinetics and conductance properties. The synthetic cell population mirrors the experimental heterogeneity observed in iPSC-CM populations and underscores the likely importance of accounting for cellular variability in predictive modeling and ultimately, in individual drug response evaluation.

We next developed a deep learning-based workflow to generate fully parameterized, cell-specific models of iPSC-CM electrophysiology from whole-cell current data obtained with one simple voltage-clamp protocol (Figure 2). To create the training dataset, we generated a synthetic population of 200,000 iPSC-CMs by introducing random variations to 52 biophysical parameters governing six key ionic currents in the baseline model: I_Kr, I_CaL, I_Na, I_Ks, I_K1, and I_f. The parameter perturbations produced an array of simulated APs and whole-cell currents, reflecting broad physiological variability (Figure 2, top left panel).

Digital twins of human iPSC-CMs from one simple voltage-clamp recording.
**Top row** (pipeline overview): A large population of synthetic iPSC-CMs (top left panel) is generated by introducing variation to 52 biophysical parameters governing key ionic currents in the baseline model: I_Kr, I_CaL, I_Na, I_Ks, I_K1, and I_f (*highlighted with red asterisks* in the schematic on right). A computationally optimized voltage-clamp protocol (black trace, top left center panel) is applied to generate distinct whole cell current (I_Total, red trace), enabling cell-wide excitation of key ion channels. The simulated whole cell currents I_Total from 200,000 synthetic cells serve as inputs to a fully connected neural network, trained to map raw current responses to the underlying 52 model parameters with high accuracy, as demonstrated by the low MSE across training and test sets (top center right panel). The deep learning model is trained to predict optimized parameter values by inferring gating kinetics and maximal conductance for each ionic species. An example formulation for the fast sodium current (I_Na) is shown, with inferred parameters (x₁–x₅) contributing to gating and conductance (top right panel). In the **bottom row**, the resultant digital twin output is shown with the inferred parameters used to simulate the AP, whole cell currents I_Total and each key ion channel I_Na, I_CaL, I_K1, I_Ks, I_Kr, and I_f.

As part of the workflow (Figure 2 top row), a voltage-clamp protocol was applied to each synthetic cell to generate complex current responses to dynamic membrane potential input. The protocol was designed to sequentially engage major ionic currents in the synthetic cell, ensuring that the training data contained informative current signatures for each underlying conductance. The resulting total current traces (I_Total) served as inputs to a fully connected neural network (red trace in Figure 2 top left center panel).

The network was trained to map the raw current inputs to the 52 underlying model parameters, with training performance evaluated by mean squared error (MSE) on held-out test data sets (Figure 2, top center right panel). The model reached consistently low MSE values across both training and test datasets, indicating strong generalization. An example parameterization for the fast sodium current (I_Na) illustrates that the network infers gating kinetic parameters and maximal conductance (Figure 2, top right panel). The bottom row shows the digital twin output with inferred parameters used to simulate the AP, whole cell currents I_Total and each key ion channel I_Na, I_CaL, I_K1, I_Ks, I_Kr, and I_f as shown in Figure 2 bottom row.

Deep Learning Approaches for Voltage Clamp Optimization and Parameter Estimation

To maximize the information content of experimental recordings for parameter inference, we developed a deep learning-guided optimization strategy to design an ideal voltage-clamp protocol (Figure 3). Starting from a broad set of candidate voltage steps, the algorithm iteratively evaluated the mean squared error (MSE) between predicted and true parameters across large synthetic iPSC-CM populations. In each cycle, the most informative test potential was selected and incorporated into the evolving protocol, while less informative steps were discarded.

Deep learning guided optimization of an ideal voltage clamp protocol.
Each iteration began with a −100 mV holding potential for 250 ms, followed by sequential testing potentials from −120 mV to +50 mV in 10 mV increments. A total of 200,000 synthetic samples were generated per training cycle. The optimal testing potential, identified by the lowest MSE from the deep learning model, was then applied for 250 ms. This optimization cycle repeated every 7000 ms. At 6000 ms, the potential was transiently stepped to −120 mV for 250 ms before resuming the next testing potential sequence. The schematic illustrates the iterative loop of model evaluation, MSE-based selection, and protocol updating, leading to the optimized composite voltage waveform shown in the lower panel.

Each cycle began with a −100 mV holding potential for 250 ms, followed by systematic sweeps through test potentials from −120 mV to +50 mV in 10 mV increments. A population of 200,000 computational iPSC-CMs with randomized parameter sets was simulated under each protocol iteration, and the resulting whole-cell currents were used to train the deep learning model to predict all 52 ionic parameters. The potential producing the lowest MSE was then selected as the optimal step and applied for 250 ms before the sequence resumed. This iterative optimization loop was repeated every 7000 ms, with a transient step to −120 mV at 6000 ms to probe hyperpolarization-activated channels.

This adaptive process converged on a composite voltage-clamp waveform that effectively exposed the amplitude and kinetics of all six major ionic currents while substantially reducing voltage variability through the protocol duration. The optimized sequence preserved temporally distinct current signatures for I_Kr, I_CaL, I_Na, I_Ks, I_K1, and I_f enabling robust parameter estimation.

When applied to synthetic iPSC-CMs held back test data, the optimized voltage clamp protocol (Figure 4A, left) produced digital whole-cell currents (Figure 4A, right) that were effective as network training data confirming that the protocol provided sufficient information for accurate model personalization. Training data size was still influential and determined predictive accuracy, with improvement for trained networks with datasets containing increasing 1,000, 10,000, and 200,000 synthetic cells (Figure 4B). For the smallest dataset (1,000 cells), the test error plateaued at a MSE of 0.0036, with a median mean absolute error (MAE) of 0.041 across parameters. The test error curve exhibited an asymmetric U-shape across epochs, suggesting overfitting to the limited training set (4B, left column). Increasing the dataset size to 10,000 cells improved test performance, reducing the median MAE to 0.038 (a 7.3% decrease) and narrowing the error distribution (4B, middle column). The largest dataset of 200,000 cells had very low prediction errors, with training and test curves converging to a median mean absolute error (MAE) of 0.02 (4B, right column). Again, the optimized protocol resulting in corresponding MAE distributions that narrowed substantially with increasing dataset size, indicating improved stability and generalizability of parameter predictions.

Massive synthetic data to train and test deep learning algorithm for ion channel parameter estimation.
**(A)** A deep learning derived optimized voltage-clamp protocol (left panel) was designed to activate a broad set of ionic currents in iPSC-CMs by applying dynamic membrane potential steps (black trace), producing distinctive whole-cell current responses (I_Total, red trace). The inset highlights fine-scale current kinetics captured during the protocol. These V_m-I_Total pairs serve as network inputs for parameter inference. **(B)** Prediction accuracy for individual model parameters was evaluated using training datasets of 1,000, 10,000, and 200,000 synthetic cells (left to right). For the smallest dataset, the test error exhibited an asymmetric U-shaped curve across training epochs, indicating limited generalizability. As dataset size increased, training and test errors converged, with median mean absolute errors (MAE) of 0.041, 0.038, and 0.020 for the three dataset sizes, respectively. Bottom panels show MAE distributions across parameters, demonstrating progressively narrower error ranges with larger training datasets. **(C)** APs (V_m), intracellular calcium transients (Cai), total ionic current (I_Total), and six major individual ionic currents (I_Kr, I_CaL, I_Ks, I_K1, I_Na, I_f) simulated from the predicted parameters of a single test cell using the deep learning network trained on 200,000 samples. Blue traces represent the original simulated data (input), and red traces show the model outputs. The close overlap confirms accurate parameter recovery and faithful reproduction of cell-specific electrophysiological behavior.

The deep learning network-generated parameters led to improved model waveforms (red) that duplicated ground truth simulated data (blue) for all outputs, with high fidelity in both amplitude and temporal features (Figure 4C). We observed overlap of AP morphology, Ca²⁺ transient dynamics, and the activation and inactivation profiles of each ion channel. These findings demonstrate that large, diverse training datasets are essential for high-fidelity parameter inference, enabling digital twins to accurately replicate both the quantitative dynamics and qualitative features of iPSC-CM electrophysiology. This scalability positions the framework for reliable application to experimental datasets where precise recovery of cell-specific ionic properties is critical.

Digital twin of human iPSC-CMs reproduce experimental electrophysiology and reveal intrinsic variability in drug response

While we have shown that the deep learning-based approach can create high fidelity digital twins when trained and tested with in silico models of iPSC-CMs, the ultimate proving ground for the digital twin framework is application to real human iPSC-CM recordings. Therefore, we applied the optimized voltage-clamp protocol in its most demanding and key test to derive digital twins from real cells (Figure 5).

Digital twin generation and predictive modeling from real cells in iPSC-CM experimental recordings.
(A, B, top) Ten simulated APs from a synthetic population of 1,100,000 induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) at room-temperature were generated by introducing ±40% random variation to 52 biophysical model parameters from baseline, governing six major ionic currents: (I_Kr, I_CaL, I_Ks, I_K1, I_Na, I_f). Colored traces represent exemplar simulated cells from training data; overlaid black traces are experimentally recorded APs from two representative iPSC-CMs from hiPSC cell line iPS-6-9-9T.B. Insets show the experimental whole-cell current responses to a deep learning optimized voltage-clamp protocol (as described in Figure 3), used for parameter inference. (**C, D, bottom**) Digital twin models of the same experimental iPSC-CMs shown in panels A and B. Digital twins were created by extracting all 52 model parameters from the experimental whole-cell current using the deep learning inference pipeline. These parameters were used to instantiate cell-specific computational models, and AP simulations (red traces) were generated. The close overlay between the experimental traces (**black**) and digital twin predictions (**red**) demonstrates the success of the framework to accurately derive cell-specific digital twins from real cells.

The experimental cells in this study exhibited remarkably large variability in AP morphology, cycle length, and plateau characteristics, reflecting the heterogeneity typical of iPSC-CM preparations. Capturing this breadth of behavior in the modeling pipeline required a correspondingly broad synthetic training dataset. To achieve this, we generated a large in silico population of 1,100,000 iPSC-CMs by applying ±40% random variation to 52 biophysical parameters governing six major ionic currents I_Kr, I_CaL, I_Na, I_Ks, I_K1, and I_f relative to the baseline model. This expanded range ensured that extreme and rare phenotypes present in the experimental recordings were represented in the training space.

A second major consideration was temperature. While the iPSC-CM models were originally parameterized at 37 °C, the experimental data from iPSC-CM are commonly acquired at room temperature. To reconcile this mismatch, we systematically slowed all channel kinetics in the synthetic training population using a Q₁₀ factor of 3, a coarse but effective first approximation, to estimate the slower gating dynamics observed experimentally. This adjustment allowed the network to be trained on current waveforms that more closely matched the temporal characteristics of the experimental recordings.

Whole-cell currents recorded from live cells in response to the protocol described in Figure 3 are shown in Figure 5A and B from two live iPSC-CMs served as the sole input to the trained deep learning network, which successfully inferred all 52 biophysical parameters governing all six major ionic currents I_Kr, I_CaL, I_Na, I_Ks, I_K1, and I_f. These parameter sets were used to instantiate personalized digital twins for each experimental cell.

When simulated, the digital twins reproduced the experimentally recorded APs with excellent agreement, capturing fine-scale features of depolarization rate, plateau amplitude and duration, and repolarization slope (Figure 5C–D), confirming that the pipeline reconstructs the full mechanistic basis of cell-specific electrophysiology from a single experimental protocol.

To investigate how intrinsic variability among induced pluripotent stem cell–derived cardiomyocytes (iPSC-CMs) influences proarrhythmic susceptibility, we compared the responses of two representative model cells with distinct repolarization dynamics (Figure 6A–B). Under control conditions at 37 °C, both Cell 1 and Cell 2 generated stable APs with similar morphology. Exposure to the selective rapid delayed rectifier potassium current I_Kr blocker E-4031 (50 nM) markedly prolonged repolarization in both models, but only Cell 1 exhibited early afterdepolarizations (EADs), whereas Cell 2 maintained stable repolarization. These findings highlight the strong impact of intrinsic cellular variability on proarrhythmic susceptibility and emphasize the value of digital twin models in capturing

Cell-to-cell variability drives population-level responses to drug application in large synthetic iPSC-CM digital twins.
(A) iPSC-CM model **Cell 1** (corresponding to Figure 5A). **(B)** iPSC-CM model **Cell 2** (corresponding to Figure 5B). In both panels, the **top traces** show control simulations at physiological temperature (37 °C). The **middle row** show responses in the presence of E-4031 (50 nM). **Early afterdepolarizations (EADs) occurred in Cell 1 (red) at 50 nM,** but not in **Cell 2 (blue)**. (C) We applied ±20% perturbations to all parameters governing six key ionic currents (I_K1, I_Kr, I_Ks, I_CaL, I_Na, I_f) in Cell 1 (orange) and Cell 2 (cyan) to generate a population of virtual cells (n = 4000), capturing the full spectrum of cell variability within a cell line. Overlaid membrane potential traces (black) show APs from Cell 1 and Cell 2. The population average action potential duration at 90% repolarization (APD₉₀) was 403 ± 47 ms. (D) Incidence of EADs (%) in the virtual population as a function of E-4031 concentration. This population-based digital twin modeling framework enables statistical comparisons across cell lines or patient-specific digital twins.

To extend this analysis beyond single cells, we generated two in silico populations (n = 4000) derived from the network-predicted parameter sets of Cell 1 and Cell 2 (Figure 6C). Each population incorporated ±20% perturbations to all parameters governing six major ionic currents (I_Kr, I_CaL, I_Na, I_Ks, I_K1, and I_f), capturing the intrinsic cell-to-cell variability expected within a single human iPSC line. Under control conditions, both populations showed stable spontaneous firing with overlapping distributions of AP, reflecting baseline heterogeneity in electrophysiological properties.

The resulting population captured a broad distribution of repolarization phenotypes (average APD₉₀ = 403 ± 47 ms). Upon exposure to increasing concentrations of E-4031, the incidence of EADs increased in a dose-dependent manner, reaching approximately 3% of simulated cells at 50 nM (Figure 6D). These results demonstrate that modest cell-to-cell variability in ion-channel properties can substantially amplify the heterogeneity of proarrhythmic responses across a population. The digital-twin framework therefore enables statistically robust quantification of arrhythmia risk by bridging single-cell electrophysiological diversity to population-level drug sensitivity.

Discussion

This study demonstrates, for the first time, the development of cell-specific digital twins of human iPSC-CMs directly from a single experimental recording that is readily generalizable to any excitable cell type. By combining an optimized voltage-clamp protocol, large-scale synthetic training populations, and deep learning–based parameter inference, we reconstructed the complete ionic parameter set underlying individual cell electrophysiology and reproduces AP and current waveforms with high fidelity. This study addresses a longstanding limitation in cardiac modeling: the inability to rapidly and accurately resolve the individualized ionic mechanisms that drive variability in both health and disease. The framework proved robust to two of the most challenging sources of variation in iPSC-CM studies, temperature differences and broad waveform diversity, highlighting the potential for a scalable and adaptable tool for basic research, preclinical safety testing, and precision medicine.

A broad range of AP morphologies and firing rates in our simulated iPSC-CM population were readily evoked with changes of ±40% across ionic parameters (Figure 1). This mirrors experimental reports of substantial heterogeneity in iPSC-CM electrophysiological properties, which may arise from differences in differentiation state, ion channel expression, and culture conditions^34–39. Such diversity has important implications for both mechanistic interpretation and predictive modeling. It suggests that population-based approaches, rather than reliance on single-cell models, are essential for capturing the spectrum of cellular responses^40–43. The framework presented here is well-suited to account for variability, providing a foundation for future studies that link experimental heterogeneity to underlying ionic mechanisms and for developing more robust, generalizable digital twin models.

Figure 2 highlights the key technology developed in this study, a deep learning-based workflow to recover the key ionic parameter set of a cell from a single voltage-clamp experiment. The synthetic training population of 200,000 iPSC-CMs, generated by introducing random variation to 52 parameters governing six key ionic currents I_Kr, I_CaL, I_Na, I_Ks, I_K1, and I_f, ensured that the network was exposed to a broad spectrum of AP morphologies and whole-cell current profiles.

We designed a data-driven protocol design process guided by deep learning as shown in Figure 3. In this approach, candidate voltage segments were iteratively evaluated to minimize parameter prediction error in synthetic populations, enabling the discovery of less variable, more targeted protocols. This optimization strategy effectively probed multiple ionic currents while reducing voltage variation, thereby improving voltage control and cell tolerance. The adaptive nature of the design process ensured that protocols are both information-rich and experimentally viable, providing a robust foundation for accurate parameter inference and digital twin construction from live iPSC-CM recordings. The iterative deep learning-guided approach to voltage-clamp protocol optimization offers clear advantages over traditional manual design strategies. Conventional protocols are typically based on heuristic knowledge of channel gating properties and are rarely tailored to maximize information for multi-parameter inference. Our adaptive framework directly couples experimental design with model performance metrics, using MSE as a quantitative feedback signal to refine the protocol in real time. This ensures that each iteration selectively enriches the data with features most discriminative for parameter recovery, thereby accelerating convergence to an optimal waveform. Importantly, because the optimization is performed in silico using large-scale synthetic populations, the resulting protocol is both data-driven and generalizable, capturing a wide range of ionic phenotypes. Once developed, such a protocol can be implemented in experimental systems without modification, offering a powerful route to improve the precision and efficiency of digital twin construction in human cardiomyocytes.

The clear improvement in predictive accuracy and stability with increasing training dataset size as shown in Figure 4 highlights a fundamental consideration for applying deep learning to biophysically detailed cardiac modeling. Larger datasets capture a wider range of morphological and kinetic variability in ionic currents, reducing the risk of model overfitting and enabling more reliable generalization to new experimental data. The largest dataset produced digital twins that match ground truth across APs, calcium dynamics, and individual currents and indicates the value of comprehensive, diverse training data to ensure high fidelity reproduction at the individual single-cell level.

The deep learning-guided protocol optimization framework described here lays the groundwork for the capstone result presented in Figure 5, where fully parameterized digital twins reproduced experimental iPSC-CM APs with excellent fidelity. By training on large-scale, information-rich synthetic datasets generated through optimized protocols, the platform achieves robustness to both temperature variation and broad morphological diversity in electrophysiological signals. This adaptability ensures that the same modeling pipeline can be rapidly retuned to diverse experimental conditions and phenotypes without sacrificing accuracy. As a result, the approach is not limited to in vitro modeling but is positioned for translational deployment, where personalized, physiology-grounded digital twins could guide patient-specific diagnostics, risk assessment, and therapy optimization. To move seamlessly from protocol design to clinically relevant predictive modeling represents a decisive step toward the integration of digital twins into the practice of precision cardiovascular medicine.

Figure 6 illustrates how intrinsic electrophysiological variability among iPSC-CMs can profoundly influence responses to pharmacological perturbation, even among cells derived from the same genetic background. Small differences in ionic conductance and gating kinetics were sufficient to determine whether a cell remained stable or developed EADs under identical drug exposure.

At the population level, the dose-dependent emergence of EADs provides a quantitative biomarker of proarrhythmic risk that cannot be captured by single deterministic simulations. Importantly, this emergent variability does not require extreme outlier phenotypes but arises naturally from multivariate perturbations around experimentally constrained parameter sets. This highlights the predictive value of large-scale digital twin populations for mechanistic safety pharmacology, enabling the translation of cell-level uncertainty into statistically robust risk metrics. Beyond E-4031, this framework can be generalized to compare patient-derived iPSC-CM lines or therapeutic perturbations, supporting the development of precision cardiotoxicity assessment pipelines that account for both genetic and stochastic sources of electrophysiological diversity.

The success of our parameter inference framework relies critically on the use of large, diverse synthetic datasets that could not be obtained through experimental recordings alone. Training the deep learning model required hundreds of thousands to millions of cellular signals. This is not possible to do experimentally as it would be prohibitively time consuming, costly, and technically unfeasible to acquire high-fidelity voltage-clamp data at scale. In contrast, computationally generated data allow complete control over underlying parameters, precise labeling of ground truth values, and systematic exploration of parameter space well beyond what is directly observable in experiments. This approach not only accelerates model training but also ensures exposure to rare or extreme phenotypes and improves network generalization across diverse experimental cells.

In terms of scope, there are some limitations to our approach. The framework emphasizes six major ionic currents and a set of supporting mechanisms; however, it does not yet incorporate all ionic currents relevant to cardiac electrophysiology. Expanding the framework to include additional currents would necessitate introducing more parameters, which would increase both model complexity and the scale of training data required, although future iterations will undoubtedly address this limitation. Another limitation is the substantial computational resources needed to generate large synthetic datasets and train deep neural networks. Once training is complete, however, parameter inference is rapid and efficient, making the framework practical for downstream applications.

While developed and validated the digital twin framework both in silico and in vitro using human iPSC-derived cardiomyocytes, the deep learning–guided protocol optimization framework can be extended to a wide range of excitable cell systems and experimental contexts. In principle, the same adaptive feedback loop could be deployed in patch-clamp or multi-electrode array recordings from any animal or human cell types, including neurons, smooth muscle cells, or engineered tissue constructs, where capturing rich ionic dynamics is essential for mechanistic modeling. Furthermore, by incorporating additional physiological variables such as calcium handling or mechano-electric feedback, the approach could evolve into a multi-modal protocol design tool capable of constraining both electrical and mechanical parameters in digital twin models.

We anticipate near future studies using the digital twin framework to enable high throughput cell specific phenotyping for disease modeling, allowing investigators to map the electrophysiology of patient derived cardiomyocytes to individualized ionic mechanisms using a single experiment. The framework maintains accuracy across changes in temperature and to accommodate wide variation in AP morphology making it suitable for multicenter studies in which experimental conditions differ to improve reproducibility and enable harmonization of data. In preclinical safety assessment, temperature independent digital twins could be used to evaluate the effects of new compounds at scale across a spectrum of cardiac phenotypes, identifying subgroups at increased risk. Looking further ahead, integration of this modeling platform with longitudinal patient data including genetic information, imaging, and clinical outcomes could give rise to living cardiac digital twins that evolve over time. Such models may ultimately support predictive diagnostics, optimization of therapy, and continuous monitoring in personalized healthcare.

Methods

Human hiPSC-derived Ventricular Cardiomyocyte Differentiation and Culture

The hiPSC lines iPS-6-9-9T.B were purchased from WiCell and cultured in StemMACS iPS-Brew XF medium (Miltenyi Biotec) on hESC-qualified Matrigel (Corning) at 37°C with 5% CO₂ and passaged at ∼70% confluency. Ventricular cardiomyocytes (VCMs) were differentiated by treating confluent hiPSCs with 6 µM CHIR99021 (Tocris) from day 0 to 1 (D0–D1) and 5 µM IWR-1 (Tocris) from D2–D4 in RPMI 1640 supplemented with B27 without insulin (Thermo Fisher). From D7 onward, hiPSC-derived VCMs were cultured in cardiomyocyte (CM) maintenance medium consisting of RPMI 1640 with B27 supplement with insulin. The hiPSC-VCMs were replated on ∼D14 and allowed to proliferate in CM maintenance medium with 2 µM CHIR99021 for ∼7 days until they were replated onto Matrigel-coated glass coverslips for patch-clamp experiments.

Experimental electrophysiology recordings

All electrophysiological recordings were performed at room temperature with an Axopatch 200A amplifier coupled to an Axon Digidata 1550B plus HumSilencer digitizer (Molecular Devices, San José, CA, USA). Cultured ventricular myocytes were placed on the stage of an inverted microscope (IX-71; Olympus Corporation, Tokyo, Japan) and continuously perfused with Tyrode’s solution. Patch pipettes were pulled from borosilicate glass capillaries (Sutter Instrument, Novato, CA, USA) and filled with the appropriate internal solution. The pipette solution contained (in mmol L⁻¹): 140 KCl, 5 NaCl, 10 EGTA, 5 Mg-ATP, and 10 HEPES, adjusted to pH 7.2 with KOH. Ionic currents were recorded in the whole-cell configuration of the patch-clamp technique at a sampling frequency of 10 kHz. After achieving the whole-cell configuration, the amplifier was switched to current-clamp mode for membrane potential measurements. The computationally optimized voltage-pulse protocols (see below) were used to efficiently probe membrane currents. For experimental validation datasets, any incomplete or failed recordings were excluded prior to analysis.

Computationally optimized voltage-clamp protocol

A computationally optimized voltage-clamp protocol was designed to activate individual ionic currents while capturing their dynamic interactions (see Figure 3) sequentially and selectively. This protocol was applied to each synthetic cell to produce time-series data of total ionic current (I_Total). Data were generated at a 0.1 ms time step and down sampled to match experimental acquisition rates. Following the derivation of the final optimized protocol, the time-series data of I_Total current was employed as input to the fully connected neural network and the output layer contained 52 units corresponding to the model parameters, enabling the extraction of the model parameters.

iPSC-CM electrophysiological model

A baseline iPSC-CM electrophysiological model was parameterized with 52 biophysical variables (see model equations detailed in ⁴⁴) representing maximal conductances, gating kinetics, and calcium handling parameters for six major ionic currents (I_K1, I_Kr, I_Ks, I_CaL, I_Na, I_f). We slowed all channel kinetics in the synthetic training population using a Q₁₀ factor of 3. Two synthetic populations of model cells were generated. The first consisted of 200,000 cells simulated at 310 Kelvin (physiological temperature), and the second included 1,100,000 cells simulated at 298 K (room temperature). In both populations, uniformly distributed random perturbations of ±40% were applied to each model parameter relative to baseline values, ensuring physiologically plausible variability. The training dataset was synthetically generated; no missing data occurred.

Fully connected layers

Fully connected deep neural networks with twenty-six double precision hidden layers (1024 nodes; tanh activation) were implemented in TensorFlow. Every neuron in a layer was connected to neurons in the next layer ⁴⁵. The fully connected layers calculated a weighted sum of I_Total and added a bias term to the outputs. These data were then passed to a non-linear activation function ( f ) to define the output for each neuron (Eqs. 1 and 2) ⁴⁶.

Where k ∈ {1, …, n}, and (i, j) represent the number of hidden layers and neurons in each pair of subsequent hidden layers. The optimized values for these parameters were found via hyperparameter tuning where a^k is each neuron output where a⁰ ∈ {IC₁, …, IC_m} is the input to the fully connected layers, and aⁿ⁺¹ is the network output: p̂ is the outputs for parameters. We first assigned random values to all network parameters θ_t; each neuron weight (W_i,j), bias term (b), which is a constant added to calculate the neurons output and other network hyperparameters (the number of hidden layers, the number of neurons for each hidden layer and activation functions for each hidden layer) to start the optimization process for finding the best network infrastructure.

Network performance

Networks were trained using mean squared error (MSE) loss and the Adam optimizer (1 × 10⁻⁴). MSE-loss of function evaluation metric (Eq. 3) to predicted model parameters ^47,48.

where m is the total number of whole-cell membrane current plus calcium transient (IC_m) and p_i and p̂_i are the corresponding parameters and its predicted parameters (the network output), with respect to the sample cells. We used a hyperbolic tangent (tanh) as activation function in Eq. 1 to calculate the output for each hidden layer neuron at each training iteration.

Training datasets of 1,000, 10,000, 200,000, 1,100,000 cells were independently sampled from the full population to evaluate the impact of dataset size on predictive performance. The dataset size (1.1M cells) was selected to ensure broad exploration of parameter space. Subsets of 1,000, 10,000, and 200,000 cells were sampled to evaluate learning curve effects and to demonstrate the impact of dataset size on predictive accuracy. Because the training dataset was synthetically generated, no missing data occurred.

We split the data into training, validation, and test sets using a 70:10:20 ratio, and implemented the network architecture using Keras ⁴⁹. The dataset was also divided into 1% sample batches, requiring 70 iterations to complete one epoch. As we monitor the evaluation metrics using the MSE, we continually optimized the voltage clamp protocol to improve the error output and therefore get continuously closer to an optimal data generation algorithm for parameter extraction. To prevent overfitting, we computed the evaluation metrics using validation data during each iteration of training and compare to values from training data. At the point when model performance on the training data began to degrade compared to the validation dataset, tuning of the network hyperparameters stopped. Model accuracy was assessed using median mean absolute error (MAE) across parameters. This approach allowed us to extract key ion channel parameters from whole-cell currents, ensuring that the reconstructed traces of ionic currents and APs closely mirror the physiological behavior of individual cardiac cells.

Population generation and drug testing

We generated a population by applying ± 20% perturbations to all parameters associated with six major ionic currents (I_K1, I_Kr, I_Ks, I_CaL, I_Na, I_f) around the network-predicted parameter sets for Cell 1 and Cell 2, both originating from the hiPSC lines iPS 6-9-9T.B. This procedure produced a 4000-cell population capturing within-line variability. This unified population captures intrinsic cell-to-cell variability within a shared genetic background. Simulations were performed under control conditions and in the presence of E-4031 concentrations using a Hill type hERG block model.

Simulation of I_Kr Blockade

To simulate the inhibitory effects of E-4031 on I_Kr current, we decreased the peak conductance, G_Kr of each of these independent channels in a concentration dependent fashion using a concentration response relationship with a Hill coefficient of 1 (n = 1) as follow:

where G_Kr,max is the nominal conductance value obtained from each ventricular myocyte model, and the IC₅₀ (7 nM) ^44,50 is the concentration of drug that produces a 50% inhibition of the targeted transmembrane current or intracellular organelle.

Simulation and analysis code was written in C++, Python, TensorFlow and MATLAB R2018a (The MathWorks, Natick, MA, USA). The code was executed on two systems: a BIZON ZX5500 workstation with a 2.7 GHz 64-core AMD Threadripper Pro processor, an NVIDIA H100 GPU, and Python version 3.12, TensorFlow version 2.19; and a Mercury GPU408 server with a 2.55 GHz 192-core AMD EPYC 9684 processor, four NVIDIA H100 GPUs, and Python version 3.10, TensorFlow version 2.18. Compilation was performed using GCC version 10.5. Numerical results were visualized in MATLAB R2018a.

Data availability

Due to the large size of the synthetic datasets (∼1.1M cells), the raw data are not hosted online. However, a representative dataset of 100 cells is provided as an example, and the complete codebase for dataset generation, along with parameter distributions and simulation protocols, is publicly available at GitHub (https://github.com/ClancyLabUCD/Digital-Twin-for-the-Win-Personalized-Cardiac-Electrophysiology). This allows full regeneration of the training and validation datasets as described in the study.

Additional information

Contributions

P.C.Y., M.T.J., and C.C. designed the research study, conducted simulations, acquired data, and analyzed data. D.K.L. and R.S. sample preparation and differentiation of the iPSC lines. G.H.H provided experimental support. L.F.S conducted the electrophysiological recordings. P.C.Y, L.F.S. and C.E.C analyzed data, and wrote the manuscript.

Sources of funding

This work was supported in part by the National Institutes of Health under grants R01HL128537, OT2OD026580, and R01HL17400 (to C. E. Clancy and L. F. Santana), and R01HL159492 (to D. K. Lieu). G. H-Hernandez was supported by the UC Davis Chancellor’s Postdoctoral Fellowship. Additional support from The UC Davis Center for Precision Medicine and Data Sciences was provided to C. E. Clancy, P.-C. Yang and computing facilities.

Funding

HHS | NIH | National Heart, Lung, and Blood Institute (NHLBI) (R01HL128537)

Colleen E Clancy
L Fernando Santana

HHS | NIH | National Heart, Lung, and Blood Institute (NHLBI) (OT2OD026580)

Colleen E Clancy
L Fernando Santana

HHS | NIH | National Heart, Lung, and Blood Institute (NHLBI) (R01HL17400)

L Fernando Santana
Colleen E Clancy

HHS | NIH | National Heart, Lung, and Blood Institute (NHLBI) (R01HL159492)

Deborah K Lieu

UC | University of California, Davis (UCD) (Chancellor's Postdoctoral Fellowship)

Gonzalo Hernandez-Hernandez

UC | University of California, Davis (UCD) (Center for Precision Medicine and Data Sciences)

Pei-Chi Yang
Colleen E Clancy

Significance of findings

Strength of evidence

Abstract

Introlduction

Results

Variability and Digital Twin Modeling of iPSC-CMs

Huge variability captured in simulated iPSC-derived cardiomyocyte populations.

Digital twins of human iPSC-CMs from one simple voltage-clamp recording.

Deep Learning Approaches for Voltage Clamp Optimization and Parameter Estimation

Deep learning guided optimization of an ideal voltage clamp protocol.

Massive synthetic data to train and test deep learning algorithm for ion channel parameter estimation.

Digital twin of human iPSC-CMs reproduce experimental electrophysiology and reveal intrinsic variability in drug response

Digital twin generation and predictive modeling from real cells in iPSC-CM experimental recordings.

Cell-to-cell variability drives population-level responses to drug application in large synthetic iPSC-CM digital twins.

Discussion

Methods

Human hiPSC-derived Ventricular Cardiomyocyte Differentiation and Culture

Experimental electrophysiology recordings

Computationally optimized voltage-clamp protocol

iPSC-CM electrophysiological model

Fully connected layers

Network performance

Population generation and drug testing

Simulation of IKr Blockade

Data availability

Additional information

Contributions

Sources of funding

Funding

References

Article and author information

Author information

Pei-Chi Yang

Mao-Tsuen Jeng

Deborah K Lieu

Regan L Smithers

Gonzalo Hernandez-Hernandez

L Fernando Santana

Colleen E Clancy

Author Notes

Version history

Cite all versions

Copyright

Metrics

Simulation of I_Kr Blockade