ElectroPhysiomeGAN: Generation of Biophysical Neuron Model Parameters from Recorded Electrophysiological Responses

  1. Department of Electrical and Computer Engineering, University of Washington, Seattle, United States
  2. Department of Neuroscience, City University of Hong Kong, Hong Kong, Hong Kong
  3. Department of Applied Mathematics, University of Washington, Seattle, United States

Peer review process

Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Upinder Bhalla
    National Centre for Biological Sciences, Bangalore, India
  • Senior Editor
    Panayiota Poirazi
    FORTH Institute of Molecular Biology and Biotechnology, Heraklion, Greece

Reviewer #1 (Public review):

The work by Kim et al. shows that a parameter generator for biophysical HH-like models can be trained through a GAN-based approach, to reproduce experimentally measured voltage responses and IV curves.
A particularly interesting aspects of this generator is that, once it has been learned, it can be applied to new recordings to generate appropriate parameter sets at a low computational cost, a feature missing from more commonplace evolutionary approaches.

I appreciate the changes the authors have made to the manuscript. The authors have clarified their inverse gradient method. They also provide a better validation and a rich set of ablations. However, I still have major concerns that should be addressed.

Major concerns:

(1) The bad equilibria of the model still remain a concern, as well as other features like the transient overshoots that do not match with the data. I think they could achieve more accuracy here by assigning more weight to such specific features, through adding these as separate objectives for the generator explicitly. The traces contain a five-second current steps, and one second before and one second after the training step. This means that in the RMSE, the current step amplitude will dominate as a feature, as this is simply the state for which the data trace contains most time-points. Note that this is further exacerbated by using the IV curve as an auxiliary objective. I believe a better exploration of specific response features, incorporated as independently weighted loss terms for the generator, could improve the fit. E.g. an auxiliary term could be the equilibrium before and after the current step, another term could penalise response traces that do not converge back to their initial equilibrium, etc.

(2) The explanation of what the authors mean with 'inverse gradient operation' is clear now. However, this term is mathematically imprecise, as the inverse gradient does not exist because the gradient operator is not injective. The method is simply forward integration under the assumption that the derivate of the voltage is known at the grid time-points, and should be described as such.

(3) I appreciate that the authors' method provides parameters of models at a minimal computational cost compared to running an evolutionary optimization for every new recording. I also believe that with some tweaking of the objective, the method could improve in accuracy. However, I share reviewer 2's concerns that the evolutionary baseline methods are not sufficiently explored, as these methods have been used to successfully fit considerably more complex response patterns. One way out of the dilemma is to show that the EP-GAN estimated parameters provide an initial guess that considerably narrows the search space for the evolutionary algorithm. In this context, the authors should also discuss the recent gradient based methods such as Deistler et al. (https://doi.org/10.1101/2024.08.21.608979) or Jones et al (https://doi.org/10.48550/arXiv.2407.04025).

Reviewer #2 (Public review):

Summary:

Generating biophysically detailed computational models that capture the characteristic physiological properties of biological neurons for diverse cell types is an important and difficult problem in computational neuroscience. One major challenge lies in determining the large number of parameters of such models, which are notoriously difficult to fit to experimental data. Thereby, the computational and energy costs can be significant. The study 'ElectroPhysiomeGAN: Generation of Biophysical Neuron Model Parameters from Recorded Electrophysiological Responses' by Kim et al. describes a computationally efficient approach for predicting model parameters of Hodgkin-Huxley neuron models using Generative Adversarial Networks (GANs) trained on simulation data. The method is applied to generate models for 9 non-spiking neurons in C. elegans based on electrophysiological recordings. While the generated models capture the responses of these neurons to some degree, they generally show significant deviations from the empirically observed responses in important features. Although EP-GAN shows clear benefits under limited compute, the results do not yet demonstrate the quality needed to match other state-of-the-art methods. Future work examining extended training, larger datasets, or hybrid approaches would help clarify whether EP-GAN can generate models of high quality. If so, this would indeed be a major step forward; if not, the computationally more expensive methods will remain essential.

Strengths:

The authors work on an important and difficult problem. A noteworthy strength of their approach is that once trained, the GANs can generate models from new empirical data with very little computational effort. The generated models reproduce the response to current injections reasonably well.

Weaknesses:

Major 1: Models do not faithfully capture empirical responses. While the models generated with EP-GAN reproduce the average voltage during current injections reasonably well, the dynamics of the response are generally not well captured. For example, for the neuron labeled RIM (Figure 2), the most depolarized voltage traces show an initial 'overshoot' of depolarization, i.e. they depolarize strongly within the first few hundred milliseconds but then fall back to a less depolarized membrane potential. In contrast, the empirical recording shows no such overshoot. Similarly, for the neuron labeled AFD, all empirically recorded traces slowly ramp up over time. In contrast, the simulated traces are mostly flat. Furthermore, all empirical traces return to the pre-stimulus membrane potential, but many of the simulated voltage traces remain significantly depolarized, far outside of the ranges of empirically observed membrane potentials. The authors trained an additional GAN (EP-GAN Extended) to improve the fit to the resting membrane potential. Interestingly, for one neuron (AWB), this improved the response during stimulation, which now reproduced the slowly raising membrane potentials observed empirically, however, the neuron still does not reliably return to its resting membrane potential. For the other two neurons, the authors report a decrease in accuracy in comparison to EP-GAN. While such deviations may appear small in the Root mean Square Error (RMSE), they likely indicate a large mismatch between the model and the electrophysiological properties of the biological neuron. The authors added a second metric during the revision - percentages of predicted membrane potential trajectories within empirical range. I appreciate this additional analysis. As the empirical ranges across neurons are far larger than the magnitude of dynamical properties of the response ('slow ramps', etc.), this metric doesn't seem to be well suited to quantify to which degree these dynamical properties are captured by the models.

Major 2: Comparison with other approaches is potentially misleading. Throughout the manuscript, the authors claim that their approach outperforms the other approaches tested. But compare the responses of the models in the present manuscript (neurons RIM, AFD, AIY) to the ones provided for the same neurons in Naudin et al. 2022 (https://doi.org/10.1371/journal. pone.0268380). Naudin et al. present models that seem to match empirical data far more accurately than any model presented in the current study. Naudin et al. achieved this using DEMO, an algorithm that in the present manuscript is consistently shown to be among the worst of all algorithms tested. I therefore strongly disagree with the authors claim that a "Comparison of EP-GAN with existing estimation methods shows EP-GAN advantage in the accuracy of estimated parameters". This may be true in the context of the benchmark performed in the study (i.e., a condition of very limited compute resources - 18 generations with a population size of 600, compare that to 2000 generations recommended in Naudin et al.), but while EP-GAN wins under these specific conditions (and yes, here the authors convincingly show that their EP-GAN produces by far the best results!), other approaches seem to win with respect to the quality of the models they can ultimately generate.

Major 3: As long as the quality of the models generated by the EP-GAN cannot be significantly improved, I am doubtful that it indeed can contribute to the 'ElectroPhysiome', as it seems likely that dynamics that are currently poorly captured, like slow ramps, or the ability of the neuron to return to its resting membrane potential, will critically affect network computations. If the authors want to motivate their study based on this very ambitious goal, they should illustrate that single neuron model generation with their approach is robust enough to warrant well-constrained network dynamics. Based on the currently presented results, I find the framing of the manuscript far too bold.

Major 4: The conclusion of the ablation study 'In addition the architecture of EP-GAN permits inference of parameters even when partial membrane potential and steady-state currents profile are given as inputs' does not seem to be justified given the voltage traces shown in Figure 3. For example, for RIM, the resting membrane potential stays around 0 mV, but all empirical traces are around -40mV. For AFD, all simulated traces have a negative slope during the depolarizing stimuli, but a positive slope in all empirically observed traces. For AIY, the shape of hyperpolarized traces is off. While it may be that by their metric neurons in the 25% category are classified as 'preserving baseline accuracy', this doesn't seem justified given the voltage traces presented in the manuscript. It appears the metric is not strict enough.

Author response:

The following is the authors’ response to the original reviews.

We thank the reviewers for valuable feedback and comments. Based on the feedback we revised the manuscript and believe that we addressed most of the reviewers' raised points. Below we include a summary of key revisions and point-by-point responses to reviewers comments.

Abstract/Introduction

We further emphasized EP-GAN strength in parameter inference of detailed neuron parameters vs specialized models with reduced parameters.

Results

We further elaborated on the method of training EP-GAN on synthetic neurons and validating on both synthetic and experimental neurons.

We added a new section Statistical Analysis and Loss Extension which includes:

- Statistical evaluation of baseline EP-GAN and other methods on neurons with multi recording membrane potential responses/steady-state currents data: AWB, URX, HSN

- Evaluation of EP-GAN with added resting potential loss + longer simulations to ensure stability of membrane potential (EP-GAN-E)

Methods

We added a detailed explanation on "inverse gradient process"

We added detailed current/voltage-clamp protocols for both synthetic and experimental validation and prediction scenarios (table 6)

Supplementary

We added error distribution and representative samples for synthetic neuron validations (Fig S1)

We added membrane potential response statistical analysis plots for existing methods for AWB, URX, HSN (Fig S6)

We added steady-state currents statistical analysis plots on EP-GAN + existing methods for AWB, URX, HSN (Fig S7)

We added mean membrane potential errors for AWB, URX, HSN normalized by empirical standard deviations for all methods (Table S4)

Please see our point-by-point responses to specific feedback and comment below.

Reviewer 1:

First, at the methodological level, the authors should explain the inverse gradient operation in more detail, as the reconstructed voltage will not only depend on the evaluation of the right-hand side of the HH-equations, as they write but also on the initial state of the system. Why did the authors not simply simulate the responses?

We thank the reviewer for the feedback regarding the need for further explanation. We have revised the Methods section to provide a more detailed description of the inverse gradient process. The process uses a discrete integration method, similar to Euler’s formula, which takes systems’ initial conditions into account. For the EP-GAN baseline, the initial states were picked soon after the start of the stimulus to reconstruct the voltage during the stimulation period. For EP-GAN with extended loss (EP-GAN-E), introduced in this revision in sub-section Statistical Analysis and Loss Extension, initial states before/after stimulations were also taken into account to incorporate resting voltage states into target loss.

Since EP-GAN is a neural network and we want the inverse gradient process to be part of the training process (i.e., making EP-GAN a “model informed network”), the process is expected to be implemented as a differentiable function of generated parameter p. This enables the derivatives from reconstructed voltages to be traced back to all network components via back-propagation algorithm.

Computationally, this requires the implementation of the process as a combination of discrete array operations with “auto-differentiation”, which allows automatic computation of derivatives for each operation. While explicit simulation of the responses using ODE solvers provides more accurate solutions, the algorithms used by these solvers typically do not support such specialized arrays nor are they compatible with neural network training. We thus utilized PyTorch tensors [54], which support both auto-differentiation and vectorization to implement the process.

The authors did not allow the models time to equilibrate before starting their reconstruction simulations, as testified by the large transients observed before stimulation onset in their plots. To get a sense of whether the models reproduce the equilibria of the measured responses to a reasonable degree, the authors should allow sufficient time for the models to equilibrate before starting their stimulation protocol.

In the added Statistical Analysis and Loss Extension under the Results section, we added results for EP-GAN-E where we simulate the voltage responses with 5 seconds of added stabilization period in the beginning of simulations. The added period mitigates voltage fluctuations observed during the initial simulation phase and we observe that simulated voltage responses indeed reach stable equilibrium for both prior stimulations and for the zero stimulus current-clamp protocol (Figure 5 bottom, Column 3).

In fact, why did the authors not explicitly include the equilibrium voltage as a target loss in their set of loss functions? This would be an important quantity that determines the opening level of all the ion channels and therefore would influence the associated parameter values.

EP-GAN baseline does include equilibrium voltage as a target loss since all current-clamp protocols used in the study (both synthetic and experimental) include a membrane potential trace where the stimulus amplitude is zero throughout the entire recording duration (see added Table 6 for current clamp protocols), thus enforcing EP-GAN to optimize resting membrane potential alongside with other non-zero stimulus current-clamp scenarios.

To further study EP-GAN’s accuracy in resting potential, we evaluated EP-GAN with supplemental resting potential target loss and evaluated its performance in the sub-section Statistical Analysis and Loss Extension. The added loss, combined with 5 seconds of additional stabilization period, improved accuracy in predicting resting potentials by mitigating voltage fluctuations during the early simulation phase and made significant improvements to predicting AWB membrane potential responses where EP-GAN baseline resulted in overshoot of the resting potential.

The authors should provide a more detailed evaluation of the models. They should explicitly provide the IV curves (this should be easy enough, as they compute them anyway), and clearly describe the time-point at which they compute them, as their current figures suggest there might be strong transient changes in them.

We included predicted IV-curve vs ground truth plots in addition to the voltages in the supplementary materials (Figure S2, S5) in the original submitted version of the manuscript. In this revision, we added additional IV-curve plots with statistical analysis for the neurons with multi-recording data (AWB, URX, HSN) in the supplementary materials (Figure S7).

For the evaluation of predicted membrane potential responses, we added further details in Validation Scenarios (Synthetic) under Results section such that it clearly explains on the current-clamp protocols used for both synthetic and experimental neurons and which time interval the RMSE evaluations were performed.

In the sub-section Statistical Analysis and Loss Extension, we introduced a new statistical metric in addition to RMSE, applied for neurons AWB, URX, HSN which evaluates the percentage of predicted voltages that fall within the empirical range (i.e., mean +- 2 std) and voltage error normalized by empirical standard deviations (Table S4).

The authors should assess the stability of the models. Some of the models exhibit responses that look as if they might be unstable if simulated for sufficiently long periods of time. Therefore, the authors should investigate whether all obtained parameter sets lead to stable models.

In the sub-section Statistical Analysis and Loss Extension, we included individual voltage traces generated by both EP-GAN baseline and EP-GAN-E (extended) with longer simulation (+5 seconds) to ensure stability. EP-GAN-E is able to produce equilibrium voltages that are indeed stable and within empirical bounds throughout the simulations for the zero-stimulus current-clamp scenario (column 3) for the 3 tested neurons (AWB, URX, HSN).

Minor:

The authors should provide a description of the model, and it's trainable parameters. At the moment, it is unclear which parameter of the ion channels are actually trained by the methodology.

The detailed description of the model and its ion channels can be found in [7]. Supplementary materials also include an excel table predicted parameters which lists all EP-GAN fitted parameters for 9 neurons (+3 new parameter sets for AWB, URX, HSN using EP-GAN-E) included in the study, the labels for trainability, and their respective lower/upper bounds used during training data generation. In the revised manuscript, we further elaborated on the above information in the second paragraph of the Results section.

Reviewer 2:

Major 1: While the models generated with EP-GAN reproduce the average voltage during current injections reasonably well, the dynamics of the response are not well captured. For example, for the neuron labeled RIM (Figure 2), the most depolarized voltage traces show an initial 'overshoot' of depolarization, i.e. they depolarize strongly within the first few hundred milliseconds but then fall back to a less depolarized membrane potential. In contrast, the empirical recording shows no such overshoot. Similarly, for the neuron labeled AFD, all empirically recorded traces slowly ramp up over time. In contrast, the simulated traces are mostly flat. Furthermore, all empirical traces return to the pre-stimulus membrane potential, but many of the simulated voltage traces remain significantly depolarized, far outside of the ranges of empirically observed membrane potentials. While these deviations may appear small in the Root mean Square Error (RMSE), the only metric used in the study to assess the quality of the models, they likely indicate a large mismatch between the model and the electrophysiological properties of the biological neuron.

EP-GAN main contribution is targeted towards parameter inference of detailed neuron model parameters, in a compute efficient manner. This is a difficult problem to address even with current state-of-the-art fitting algorithms. While EP-GAN is not perfect in capturing the dynamics of the responses and RMSE does not fully reflect the quality of predicted electrophysiological properties, it’s a generic error metric for time series that is easily interpretable and applicable for all methods. Using such a metric, our studies show that EP-GAN overall prediction quality exceeds those of existing methods when given identical optimization goals in a compute normalized setup.

In our revised manuscript, we included a new section Statistical Analysis and Loss Extension under Results section where we performed additional statistical evaluations (e.g., % of predicted responses within empirical range) of EP-GAN’s predictions for neurons with multi recording data. The results show that predicted voltage responses from EP-GAN baseline (introduced in original manuscript) are in general, within the empirical range with ~80% of its responses falling within +- 2 empirical standard deviations, which were higher than existing methods: DEMO (57.9%), GDE3 (37.9%), NSDE (38%), NSGA2 (60.2%).

Major 2: Other metrics than the RMSE should be incorporated to validate simulated responses against electrophysiological data. A common approach is to extract multiple biologically meaningful features from the voltage traces before, during and after the stimulus, and compare the simulated responses to the experimentally observed distribution of these features. Typically, a model is only accepted if all features fall within the empirically observed ranges (see e.g. https://doi.org/10.1371/journal.pcbi.1002107). However, based on the deviations in resting membrane potential and the return to the resting membrane potential alone, most if not all the models shown in this study would not be accepted.

In our original manuscript, due to all of our neurons’ recordings having a single set of recording data, RMSE was chosen to be the most generic and interpretable error metric. We conducted additional electrophysiological recordings for 3 neurons in prediction scenarios (AWB, URX, HSN) and performed statistical analysis of generated models in the sub-section Statistical Analysis and Loss Extension. Specifically, we evaluated the percentage of predicted voltage responses that fall within the empirical range (empirical mean +- 2 std, p ~ 0.05) that encompass the responses before, during and after stimulus (Figure 5, Table 5) and mean membrane potential error normalized by empirical standard deviations (Table S4).

The results show that EP-GAN baseline achieves average of ~80% of its predicted responses falling within the empirical range, which is higher than the other methods: DEMO (57.9%), GDE3 (37.9%), NSDE (38%), NSGA2 (60.2%). Supplementing EP-GAN with additional resting potential loss (EPGAN-E) increased the percentage to ~85% with noticeable improvements in reproducing dynamical features for AWB (Figure 5). Evaluations of membrane potential errors normalized by empirical standard deviations also showed similar results where EP-GAN baseline and EP-GAN-E have average error of 1.0 std and 0.7 std respectively, outperforming DEMO (1.7 std), GDE3 (2.0 std), NSDE (3.0 std) and NSGA (1.5 std) (Table S4).

Major 3: Abstract and introduction imply that the 'ElectroPhysiome' refers to models that incorporate both the connectome and individual neuron physiology. However, the work presented in this study does not make use of any connectomics data. To make the claim that ElectroPhysiomeGAN can jointly capture both 'network interaction and cellular dynamics', the generated models would need to be evaluated for network inputs, for example by exposing them to naturalistic stimuli of synaptic inputs. It seems likely that dynamics that are currently poorly captured, like slow ramps, or the ability of the neuron to return to its resting membrane potential, will critically affect network computations.

In the paper, EP-GAN is introduced as a parameter estimation method that can aid the development of ElectroPhysiome, which is a network model - these are two different method types and we do not claim EP-GAN is a model that can capture network dynamics. To avoid possible confusion, we made further clarifications in the abstract/introduction that EP-GAN is a machine learning approach for neuron HH-parameter estimation.

I find it hard to believe that the methods EP-GAN is compared to could not perform any better. For example, multi-objective optimization algorithms are often successful in generating models that match empirical observations very well, but features used as target of the optimization need to be carefully selected for the optimization to succeed. Likely, each method requires extensive trial and error to achieve the best performance for a given problem. It is therefore hard to do a fair comparison. Given these complications, I would like to encourage the authors to rethink the framing of the story as a benchmark of EP-GAN vs. other methods. Also, the number of parameters does not seem that relevant to me, as long as the resulting models faithfully reproduce empirical data. What I find most interesting is that EP-GAN learns general relationships between electrophysiological responses and biophysical parameters, and likely could also be used to inspect the distribution of parameters that are consistent with a given empirical observation.

We thank the reviewer for providing this perspective. While it is indeed difficult to have a completely fair comparison between existing optimization methods vs EP-GAN due to the fundamental differences in their algorithms, we believe that the current comparisons with other methods are justified as they provide baseline performance metrics to test EP-GAN for its intended use cases.

The main strength of EP-GAN, as previously mentioned, is in its ability to efficiently navigate large detailed HH-models with many parameters so that it can aid in the development of nervous system models such as ElectroPhysiome, potentially fitting hundreds of neurons in a time efficient manner.

While EP-GAN’s ability to learn the general relationship between electrophysiological responses and parameter distribution are indeed interesting and warrant a more careful examination, this is not the main focus of the paper since in this work we focus on introducing EP-GAN as a methodology for parameter inference.

In this context, we believe the comparisons with other methods conducted in a compute normalized manner (i.e., each method is given the same # of simulations) and identical optimization targets provides an adequate framework for evaluating the aforementioned EP-GAN aim. Indeed, while EPGAN excels with larger HH-models, it performs slightly worse than DE for smaller models such as the one used by [16] despite it being more compute efficient (Table S2).

To emphasize the EP-GAN aim, we revised the main manuscript description to focus on its intended use in parameter inference of detailed neuron parameters vs specialized models with reduced parameters.

I could not find important aspects of the methods. What are the 176 parameters that were targeted as trainable parameters? What are the parameter bounds? What are the remaining parameters that have been excluded? What are the Hodgkin-Huxley models used? Which channels do they represent? What are the stimulus protocols?

The detailed description and development of the HH-model that we use and its ion channel list can be found in [7]. Supplementary materials also include an excel table predicted parameters which lists all EP-GAN fitted parameters for 9 neurons (+3 new parameter sets for AWB, URX, HSN using EPGAN-E), the labels for trainability, and parameter bounds used for parameters during the generation of training data.

We also added a new Table which details the current/voltage clamp protocols used for 9 neurons including the ones used for evaluating EP-GAN-E, which was supplemented with longer simulation time to ensure voltage stability (please see Table 6).

I could not assess the validation of the EP-GAN by modeling 200 synthetic neurons based on the data presented in the manuscript since the only reported metric is the RMSE (5.84mV and 5.81mV for neurons sampled from training data and testing data respectively) averaged over all 200 synthetic neurons. Please report the distribution of RMSEs, include other biologically more relevant metrics, and show representative examples. The responses should be carefully investigated for the types of mismatches that occur, and their biological relevance should be discussed. For example, is the EP-GAN biased to generate responses with certain characteristics, like the 'overshoot' discussed in Major 1? Is it generally poor at fitting the resting potential?

We thank the reviewer for the feedback regarding the need for additional supporting data for synthetic neuron validations. In the revised supplementary materials Figure S1, we included the distribution of RMSE errors for both groups of synthetic neuron validations (validation/test set) and representative samples for both EP-GAN baseline and EP-GAN-E. Notably, the inaccuracies observed during the experimental neuron predictions (e.g., resting potential, voltage overshoot) do not necessarily generalize to synthetic neurons, indicating that such mismatches could stem from the differences between synthetic neurons used for training and experimental neurons for predictions. While synthetic neurons are generated according to empirically determined parameter bounds, some experimental neuron types are rarer than the others and may also involve other channels that have not been recorded or modeled in [7], which can affect the quality of predicted parameters (see 2nd and 4th paragraphs of Discussions section for more detail). Also, properties such as recording error/noise that are often present in experimental neurons are not fully accounted for in synthetic neurons.

To further study how these mismatches can be mitigated, in the revision we added an extended version of EP-GAN where target loss was supplemented with additional resting potential and 5 seconds of stabilization period during simulations (EP-GAN-E described in Statistical Analysis and Loss Extension). With such extensions, EP-GAN-E was able to improve its accuracies on both resting potentials and dynamical features with the most notable improvements on AWB where predicted voltage responses closely match slowly rising voltage response during stimulation. EPGAN-E is an example of further extensions to loss function that account for additional experimental features.

Furthermore, the conclusion of the ablation study ('EP-GAN preserves reasonable accuracy up to a 25% reduction in membrane potential responses') does not seem to be justified given the voltage traces shown in Figure 3. For example, for RIM, the resting membrane potential stays around 0 mV, but all empirical traces are around -40mV. For AFD, all simulated traces have a negative slope during the depolarizing stimuli, but a positive slope in all empirically observed traces. For AIY, the shape of hyperpolarized traces is off.

Since EP-GAN baseline optimizes voltage responses during the stimulation period, RMSE was also evaluated with respect to this period. From these errors, we evaluated whether the predicted voltage error for each ablation scenario fell within the 2 standard deviations from the mean error obtained from synthetic neuron test data (i.e. the baseline performance). We found that for input ablation for voltage responses, the error was within such range up to 25% reduction whereas for steady-state current input ablation, all 25%, 50% and 75% reductions resulted in errors within the range.

We extended the “Ablation Studies” sub-section so that the above reasoning is better communicated to the readers.

Additionally, I found a number of minor issues:

Minor 1: Table 1 lists the number of HH simulations as '32k (11k · 3)'. Should it be 33k, since 11.000 times 3 is 33.000? Please specify the exact number of samples.

Minor 2: x- and y-ticks are missing in Fig 2, Fig 3, Fig S1, Fig S2, Fig S3 and Fig S4.

Minor 3: All files in the supplementary zip file should be listed and described.

Minor 4: Code for training the GAN, generation of training datasets and for reproducing the figures should be provided.

Minor 5: In the reference (Figure 3A, Table 1 Row 2): should this refer to Table 2?

Minor 6: 'the ablation is done on stimulus space where a 50% reduction corresponds to removing half of the membrane potential responses traces each associated with a stimulus.' - which half is removed?

We thank the reviewer for pointing out these errors in the original manuscript. The revised manuscript includes corrections for these items. We will publish the python code reproducing the results in the public repository in the near future.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation