1. Computational and Systems Biology
  2. Neuroscience
Download icon

Training deep neural density estimators to identify mechanistic models of neural dynamics

  1. Pedro J Gonçalves  Is a corresponding author
  2. Jan-Matthis Lueckmann  Is a corresponding author
  3. Michael Deistler  Is a corresponding author
  4. Marcel Nonnenmacher
  5. Kaan Öcal
  6. Giacomo Bassetto
  7. Chaitanya Chintaluri
  8. William F Podlaski
  9. Sara A Haddad
  10. Tim P Vogels
  11. David S Greenberg
  12. Jakob H Macke  Is a corresponding author
  1. Computational Neuroengineering, Department of Electrical and Computer Engineering, Technical University of Munich, Germany
  2. Max Planck Research Group Neural Systems Analysis, Center of Advanced European Studies and Research (caesar), Germany
  3. Machine Learning in Science, Excellence Cluster Machine Learning, Tübingen University, Germany
  4. Model-Driven Machine Learning, Institute of Coastal Research, Helmholtz Centre Geesthacht, Germany
  5. Mathematical Institute, University of Bonn, Germany
  6. Centre for Neural Circuits and Behaviour, University of Oxford, United Kingdom
  7. Institute of Science and Technology Austria, Austria
  8. Max Planck Institute for Brain Research, Germany
  9. Max Planck Institute for Intelligent Systems, Germany
Research Article
  • Cited 2
  • Views 2,067
  • Annotations
Cite this article as: eLife 2020;9:e56261 doi: 10.7554/eLife.56261

Abstract

Mechanistic modeling in neuroscience aims to explain observed phenomena in terms of underlying causes. However, determining which model parameters agree with complex and stochastic neural data presents a significant challenge. We address this challenge with a machine learning tool which uses deep neural density estimators—trained using model simulations—to carry out Bayesian inference and retrieve the full space of parameters compatible with raw data or selected data features. Our method is scalable in parameters and data features and can rapidly analyze new data after initial training. We demonstrate the power and flexibility of our approach on receptive fields, ion channels, and Hodgkin–Huxley models. We also characterize the space of circuit configurations giving rise to rhythmic activity in the crustacean stomatogastric ganglion, and use these results to derive hypotheses for underlying compensation mechanisms. Our approach will help close the gap between data-driven and theory-driven models of neural dynamics.

eLife digest

Computational neuroscientists use mathematical models built on observational data to investigate what’s happening in the brain. Models can simulate brain activity from the behavior of a single neuron right through to the patterns of collective activity in whole neural networks. Collecting the experimental data is the first step, then the challenge becomes deciding which computer models best represent the data and can explain the underlying causes of how the brain behaves.

Researchers usually find the right model for their data through trial and error. This involves tweaking a model’s parameters until the model can reproduce the data of interest. But this process is laborious and not systematic. Moreover, with the ever-increasing complexity of both data and computer models in neuroscience, the old-school approach of building models is starting to show its limitations.

Now, Gonçalves, Lueckmann, Deistler et al. have designed an algorithm that makes it easier for researchers to fit mathematical models to experimental data. First, the algorithm trains an artificial neural network to predict which models are compatible with simulated data. After initial training, the method can rapidly be applied to either raw experimental data or selected data features. The algorithm then returns the models that generate the best match.

This newly developed machine learning tool was able to automatically identify models which can replicate the observed data from a diverse set of neuroscience problems. Importantly, further experiments showed that this new approach can be scaled up to complex mechanisms, such as how a neural network in crabs maintains its rhythm of activity. This tool could be applied to a wide range of computational investigations in neuroscience and other fields of biology, which may help bridge the gap between ‘data-driven’ and ‘theory-driven’ approaches.

Introduction

New experimental technologies allow us to observe neurons, networks, brain regions, and entire systems at unprecedented scale and resolution, but using these data to understand how behavior arises from neural processes remains a challenge. To test our understanding of a phenomenon, we often take to rebuilding it in the form of a computational model that incorporates the mechanisms we believe to be at play, based on scientific knowledge, intuition, and hypotheses about the components of a system and the laws governing their relationships. The goal of such mechanistic models is to investigate whether a proposed mechanism can explain experimental data, uncover details that may have been missed, inspire new experiments, and eventually provide insights into the inner workings of an observed neural or behavioral phenomenon (Herz et al., 2006; Gerstner et al., 2012; O'Leary et al., 2015; Baker et al., 2018). Examples for such a symbiotic relationship between model and experiments range from the now classical work of Hodgkin and Huxley, 1952, to population models investigating rules of connectivity, plasticity and network dynamics (van Vreeswijk and Sompolinsky, 1996; Prinz et al., 2004; Vogels et al., 2005; Potjans and Diesmann, 2014; Litwin-Kumar and Doiron, 2012), network models of inter-area interactions (Sporns, 2014; Bassett et al., 2018), and models of decision making (Gold and Shadlen, 2007; Wang, 2008).

A crucial step in building a model is adjusting its free parameters to be consistent with experimental observations. This is essential both for investigating whether the model agrees with reality and for gaining insight into processes which cannot be measured experimentally. For some models in neuroscience, it is possible to identify the relevant parameter regimes from careful mathematical analysis of the model equations. But as the complexity of both neural data and neural models increases, it becomes very difficult to find well-fitting parameters by inspection, and automated identification of data-consistent parameters is required.

Furthermore, to understand how a model quantitatively explains data, it is necessary to find not only the best, but all parameter settings consistent with experimental observations. This is especially important when modeling neural data, where highly variable observations can lead to broad ranges of data-consistent parameters. Moreover, many models in biology are inherently robust to some perturbations of parameters, but highly sensitive to others (Gutenkunst et al., 2007; O'Leary et al., 2015), for example because of processes such as homeostastic regulation. For these systems, identifying the full range of data-consistent parameters can reveal how multiple distinct parameter settings give rise to the same model behavior (Foster et al., 1993; Prinz et al., 2004; Achard and De Schutter, 2006; Alonso and Marder, 2019). Yet, despite the clear benefits of mechanistic models in providing scientific insight, identifying their parameters given data remains a challenging open problem that demands new algorithmic strategies.

The gold standard for automated parameter identification is statistical inference, which uses the likelihood p(𝐱|𝜽) to quantify the match between parameters 𝜽 and data 𝐱. Likelihoods can be efficiently computed for purely statistical models commonly used in neuroscience (Truccolo et al., 2005; Schneidman et al., 2006; Pillow et al., 2008; Yu et al., 2009; Macke et al., 2011; Cunningham and Yu, 2014; Pandarinath et al., 2018), but are computationally intractable for most mechanistic models. Mechanistic models are designed to reflect knowledge about biological mechanisms, and not necessarily to be amenable to efficient inference: many mechanistic models are defined implicitly through stochastic computer simulations (e.g. a simulation of a network of spiking neurons), and likelihood calculation would require the ability to integrate over all potential paths through the simulator code. Similarly, a common goal of mechanistic modeling is to capture selected summary features of the data (e.g. a certain firing rate, bursting behavior, etc…), not the full dataset in all its details. The same feature (such as a particular average firing rate) can be produced by infinitely many realizations of the simulated process (such as a time-series of membrane potential). This makes it impractical to compute likelihoods, as one would have to average over all possible realizations which produce the same output.

Since the toolkit of (likelihood-based) statistical inference is inaccessible for mechanistic models, parameters are typically tuned ad-hoc (often through laborious, and subjective, trial-and-error), or by computationally expensive parameter search: a large set of models is generated, and grid search (Prinz et al., 2003; Tomm et al., 2011; Stringer et al., 2016) or a genetic algorithm (Druckmann et al., 2007; Hay et al., 2011; Rossant et al., 2011; Van Geit et al., 2016) is used to filter out simulations which do not match the data. However, these approaches require the user to define a heuristic rejection criterion on which simulations to keep (which can be challenging when observations have many dimensions or multiple units of measurement), and typically end up discarding most simulations. Furthermore, they lack the advantages of statistical inference, which provides principled approaches for handling variability, quantifying uncertainty, incorporating prior knowledge and integrating multiple data sources. Approximate Bayesian Computation (ABC) (Beaumont et al., 2002; Marjoram et al., 2003; Sisson et al., 2007) is a parameter-search technique which aims to perform statistical inference, but still requires definition of a rejection criterion and struggles in high-dimensional problems. Thus, computational neuroscientists face a dilemma: either create carefully designed, highly interpretable mechanistic models (but rely on ad-hoc parameter tuning), or resort to purely statistical models offering sophisticated parameter inference but limited mechanistic insight.

Here, we propose a new approach using machine learning to combine the advantages of mechanistic and statistical modeling. We present SNPE (Sequential Neural Posterior Estimation), a tool that makes it possible to perform Bayesian inference on mechanistic models in neuroscience without requiring access to likelihoods. SNPE identifies all mechanistic model parameters consistent with observed experimental data (or summary features). It builds on recent advances in simulation-based Bayesian inference (Papamakarios and Murray, 2016; Lueckmann et al., 2017; Greenberg et al., 2019; Cranmer et al., 2020): given observed experimental data (or summary features) 𝐱o, and a mechanistic model with parameters 𝜽, it expresses both prior knowledge and the range of data-compatible parameters through probability distributions. SNPE returns a posterior distribution p(𝜽|𝐱o) which is high for parameters 𝜽 consistent with both the data 𝐱o and prior knowledge, but approaches zero for 𝜽 inconsistent with either (Figure 1).

Goal: algorithmically identify mechanistic models which are consistent with data.

Our algorithm (SNPE) takes three inputs: a candidate mechanistic model, prior knowledge or constraints on model parameters, and data (or summary statistics). SNPE proceeds by (1) sampling parameters from the prior and simulating synthetic datasets from these parameters, and (2) using a deep density estimation neural network to learn the (probabilistic) association between data (or data features) and underlying parameters, that is to learn statistical inference from simulated data. (3) This density estimation network is then applied to empirical data to derive the full space of parameters consistent with the data and the prior, that is, the posterior distribution. High posterior probability is assigned to parameters which are consistent with both the data and the prior, low probability to inconsistent parameters. (4) If needed, an initial estimate of the posterior can be used to adaptively guide further simulations to produce data-consistent results.

Similar to parameter search methods, SNPE uses simulations instead of likelihood calculations, but instead of filtering out simulations, it uses all simulations to train a multilayer artificial neural network to identify admissible parameters (Figure 1). By incorporating modern deep neural networks for conditional density estimation (Rezende and Mohamed, 2015; Papamakarios et al., 2017), it can capture the full distribution of parameters consistent with the data, even when this distribution has multiple peaks or lies on curved manifolds. Critically, SNPE decouples the design of the model and design of the inference approach, giving the investigator maximal flexibility to design and modify mechanistic models. Our method makes minimal assumptions about the model or its implementation, and can for example also be applied to non-differentiable models, such as networks of spiking neurons. Its only requirement is that one can run model simulations for different parameters, and collect the resulting synthetic data or summary features of interest.

While the theoretical foundations of SNPE were originally developed and tested using simple inference problems on small models (Papamakarios and Murray, 2016; Lueckmann et al., 2017; Greenberg et al., 2019), here we show that SNPE can scale to complex mechanistic models in neuroscience, provide an accessible and powerful implementation, and develop validation and visualization techniques for exploring the derived posteriors. We illustrate SNPE using mechanistic models expressing key neuroscientific concepts: beginning with a simple neural encoding problem with a known solution, we progress to more complex data types, large datasets and many-parameter models inaccessible to previous methods. We estimate visual receptive fields using many data features, demonstrate rapid inference of ion channel properties from high-throughput voltage-clamp protocols, and show how Hodgkin–Huxley models are more tightly constrained by increasing numbers of data features. Finally, we showcase the power of SNPE by using it to identify the parameters of a network model which can explain an experimentally observed pyloric rhythm in the stomatogastric ganglion (Prinz et al., 2004)–in contrast to previous approaches, SNPE allows us to search over the full space of both single-neuron and synaptic parameters, allowing us to study the geometry of the parameter space, as well as to provide new hypotheses for which compensation mechanisms might be at play.

Results

Training neural networks to perform Bayesian inference without likelihood evaluations

SNPE performs Bayesian inference on mechanistic models using only model-simulations, without requiring likelihood evaluations. It requires three inputs: a model (i.e. computer code to simulate data from parameters), prior knowledge or constraints on parameters, and data (outputs from the model or the real system it describes, Figure 1). SNPE runs simulations for a range of parameter values, and trains an artificial neural network to map any simulation result onto a range of possible parameters. Importantly, a network trained to maximize log-probability (of parameters given simulation results) will learn to approximate the posterior distribution as given by Bayes rule (Papamakarios and Murray, 2016) (see Materials and methods for details, Figure 1). After training on simulated data with known model parameters, SNPE can perform Bayesian inference of unknown parameters for empirical data. This approach to Bayesian inference never requires evaluating likelihoods. SNPE’s efficiency can be further improved by using the running estimate of the posterior distribution to guide further simulations toward data-compatible regions of the parameter space (Papamakarios and Murray, 2016; Lueckmann et al., 2017; Greenberg et al., 2019). Below, we apply SNPE to a range of stochastic models in neuroscience.

Estimating stimulus-selectivity in linear-nonlinear encoding models

We first illustrate SNPE on linear-nonlinear (LN) encoding models, a special case of generalized linear models (GLMs). These are simple, commonly used phenomenological models for which likelihood-based parameter estimation is feasible (Brown et al., 1998; Paninski, 2004; Pillow, 2007; Gerwinn et al., 2010; Polson et al., 2013; Pillow and Scott, 2012), and which can be used to validate the accuracy of our approach, before applying SNPE to more complex models for which the likelihood is unavailable. We will show that SNPE returns the correct posterior distribution over parameters, that it can cope with high-dimensional observation data, that it can recover multiple solutions to parameter inference problems, and that it is substantially more simulation efficient than conventional rejection-based ABC methods.

An LN model describes how a neuron’s firing rate is modulated by a sensory stimulus through a linear filter 𝜽, often referred to as the receptive field (Pillow et al., 2005; Chichilnisky, 2001). We first considered a model of a retinal ganglion cell (RGC) driven by full-field flicker (Figure 2a). A statistic that is often used to characterize such a neuron is the spike-triggered average (STA) (Figure 2a, right). We therefore used the STA, as well as the firing rate of the neuron, as input 𝐱o to SNPE. (Note that, in the limit of infinite data, and for white noise stimuli, the STA will converge to the receptive field [Paninski, 2004]–for finite, and non-white data, the two will in general be different.) Starting with random receptive fields 𝜽, we generated synthetic spike trains and calculated STAs from them (Figure 2b). We then trained a neural conditional density estimator to recover the receptive fields from the STAs and firing rates (Figure 2c). This allowed us to estimate the posterior distribution over receptive fields, that is to estimate which receptive fields are consistent with the data (and prior) (Figure 2c). For LN models, likelihood-based inference is possible, allowing us to validate the SNPE posterior by comparing it to a reference posterior obtained via Markov Chain Monte Carlo (MCMC) sampling (Polson et al., 2013; Pillow and Scott, 2012). We found that SNPE accurately estimates the posterior distribution (Appendix 1—figure 1 and Appendix 1—figure 2), and substantially outperforms Sequential Monte Carlo (SMC) ABC methods (Sisson et al., 2007; Beaumont et al., 2009; Figure 2d). If SNPE works correctly, its posterior mean filter will match that of the reference posterior – however, it is not to be expected that either of them precisely matches the ground-truth filter (Figure 2c and Appendix 1—figure 1): In the presence of finite sampling and stochasticity, multiple different filters could have plausibly given rise to the observed data. A properly inferred posterior will reflect this uncertainty, and include the true filters as one of many plausible explanations of the data (but not necessarily as the ‘mean’ of all plausible explanations) (Appendix 1—figure 2). Increasing the number of Bernoulli samples in the observed data leads to progressively tighter posteriors, with posterior samples closer to the true filter (Appendix 1—figure 3). Furthermore, SNPE closely agrees with the MCMC reference solution in all these cases, further emphasizing the correctness of the posteriors inferred with SNPE.

Estimating receptive fields in linear-nonlinear models of single neurons with statistical inference.

(a) Schematic of a time-varying stimulus, associated observed spike train and resulting spike-triggered average (STA) (b) Sequential Neural Posterior Estimation (SNPE) proceeds by first randomly generating simulated receptive fields θ, and using the mechanistic model (here an LN model) to generate simulated spike trains and simulated STAs. (c) These simulated STAs and receptive fields are then used to train a deep neural density estimator to identify the distribution of receptive fields consistent with a given observed STA 𝐱o. (d) Relative error in posterior estimation of SNPE and alternative methods (mean and 95% CI; 0 corresponds to perfect estimation, one to prior-level, details in Materials and methods). (e) Example of spatial receptive field. We simulated responses and an STA of an LN-model with oriented receptive field. (f) We used SNPE to recover the distribution of receptive-field parameters. Univariate and pairwise marginals for four parameters of the spatial filter (MCMC, yellow histograms; SNPE, blue lines; ground truth, green; full posterior in Appendix 1—figure 4). Non-identifiabilities of the Gabor parameterization lead to multimodal posteriors. (g) Average correlation (± SD) between ground-truth receptive field and receptive field samples from posteriors inferred with SMC-ABC, SNPE, and MCMC (which provides an upper bound given the inherent stochasticity of the data). (h) Posterior samples from SNPE posterior (SNPE, blue) compared to ground-truth receptive field (green; see panel (e)), overlaid on STA. (i) Posterior samples for V1 data; full posterior in Appendix 1—figure 6.

As a more challenging problem, we inferred the receptive field of a neuron in primary visual cortex (V1) (Niell and Stryker, 2008; Dyballa et al., 2018). Using a model composed of a bias (related to the spontaneous firing rate) and a Gabor function with eight parameters (Jones and Palmer, 1987) describing the receptive field’s location, shape and strength, we simulated responses to 5 min random noise movies of 41 × 41 pixels, such that the STA is high-dimensional, with a total of 1681 dimensions (Figure 2e). This problem admits multiple solutions (as e.g. rotating the receptive field by 180°). As a result, the posterior distribution has multiple peaks (‘modes’). Starting from a simulation result 𝐱o with known parameters, we used SNPE to estimate the posterior distribution p(𝜽|𝐱o). To deal with the high-dimensional data 𝐱o in this problem, we used a convolutional neural network (CNN), as this architecture excels at learning relevant features from image data (Krizhevsky et al., 2012; Simonyan and Zisserman, 2015). To deal with the multiple peaks in the posterior, we fed the CNN’s output into a mixture density network (MDN) (Bishop, 1994), which can learn to assign probability distributions with multiple peaks as a function of its inputs (details in Materials and methods). Using this strategy, SNPE was able to infer a posterior distribution that tightly enclosed the ground truth simulation parameters which generated the original simulated data 𝐱o, and matched a reference MCMC posterior (Figure 2f, posterior over all parameters in Appendix 1—figure 4). For this challenging estimation problem with high-dimensional summary features, an SMC-ABC algorithm with the same simulation-budget failed to identify the correct receptive fields (Figure 2g) and posterior distributions (Appendix 1—figure 5). We also applied this approach to electrophysiological data from a V1 cell (Dyballa et al., 2018), identifying a sine-shaped Gabor receptive field consistent with the original spike-triggered average (Figure 2i; posterior distribution in Appendix 1—figure 6).

Functional diversity of ion channels: efficient high-throughput inference

We next show how SNPE can be efficiently applied to estimation problems in which we want to identify a large number of models for different observations in a database. We considered a flexible model of ion channels (Destexhe and Huguenard, 2000), which we here refer to as the Omnimodel. This model uses eight parameters to describe how the dynamics of currents through non-inactivating potassium channels depend on membrane voltage (Figure 3a). For various choices of its parameters 𝜽, it can capture 350 specific models in publications describing this channel type, cataloged in the IonChannelGenealogy (ICG) database (Podlaski et al., 2017). We aimed to identify these ion channel parameters 𝜽 for each ICG model, based on 11 features of the model’s response to a sequence of five noisy voltage clamp protocols, resulting in a total of 55 different characteristic features per model (Figure 3b, see Materials and methods for details).

Inference on a database of ion-channel models.

(a) We perform inference over the parameters of non-inactivating potassium channel models. Channel kinetics are described by steady-state activation curves, gate, and time-constant curves, τgate. (b) Observation generated from a channel model from ICG database: normalized current responses to three (out of five) voltage-clamp protocols (action potentials, activation, and ramping). Details in Podlaski et al., 2017. (c) Classical approach to parameter identification: inference is optimized on each datum separately, requiring new computations for each new datum. (d) Amortized inference: an inference network is learned which can be applied to multiple data, enabling rapid inference on new data. (e) Posterior distribution over eight model parameters, θ1 to θ8. Ground truth parameters in green, high-probability parameters in purple, low-probability parameters in magenta. (f) Traces obtained by sampling from the posterior in (e). Purple: traces sampled from posterior, that is, with high posterior probability. Magenta: trace from parameters with low probability. (g) Observations (green) and traces generated by posterior samples (purple) for four models from the database.

Because this model’s output is a typical format for functional characterization of ion channels both in simulations (Podlaski et al., 2017) and in high-throughput electrophysiological experiments (Dunlop et al., 2008; Suk et al., 2019; Ranjan et al., 2019), the ability to rapidly infer different parameters for many separate experiments is advantageous. Existing fitting approaches based on numerical optimization (Destexhe and Huguenard, 2000; Ranjan et al., 2019) must repeat all computations anew for a new experiment or data point (Figure 3c). However, for SNPE the only heavy computational tasks are carrying out simulations to generate training data, and training the neural network. We therefore reasoned that by training a network once using a large number of simulations, we could subsequently carry out rapid ‘amortized’ parameter inference on new data using a single pass through the network (Figure 3d; Speiser et al., 2017; Webb et al., 2018). To test this idea, we used SNPE to train a neural network to infer the posterior from any data 𝐱. To generate training data, we carried out 1 million Omnimodel simulations, with parameters randomly chosen across ranges large enough to capture the models in the ICG database (Podlaski et al., 2017). SNPE was run using a single round, that is, it learned to perform inference for all data from the prior (rather than a specific observed datum). Generating these simulations took around 1000 CPU-hours and training the network 150 CPU-hours, but afterwards a full posterior distribution could be inferred for new data in less than 10 ms.

As a first test, SNPE was run on simulation data, generated by a previously published model of a non-inactivating potassium channel (McTavish et al., 2012; Figure 3b). Simulations of the Omnimodel using parameter sets sampled from the obtained posterior distribution (Figure 3e) closely resembled the input data on which the SNPE-based inference had been carried out, while simulations using ‘outlier’ parameter sets with low probability under the posterior generated current responses that were markedly different from the data 𝐱o (Figure 3f). Taking advantage of SNPE’s capability for rapid amortized inference, we further evaluated its performance on all 350 non-inactivating potassium channel models in ICG. In each case, we carried out a simulation to generate initial data from the original ICG model, used SNPE to calculate the posterior given the Omnimodel, and then generated a new simulation 𝐱 using parameters sampled from the posterior (Figure 3f). This resulted in high correlation between the original ICG model response and the Omnimodel response, in every case (>0.98 for more than 90% of models, see Appendix 1—figure 7). However, this approach was not able to capture all traces perfectly, as for example it failed to capture the shape of the onset of the bottom right model in Figure 3g. Additional analysis of this example revealed that this example is not a failure of SNPE, but rather a limitation of the Omnimodel: in particular, directly fitting the steady-state activation and time-constant curves on this specific example yielded no further quantitative or qualitative improvement, suggesting that the limitation is in the model, not the fit. Thus, SNPE can be used to reveal limitations of candidate models and aid the development of more verisimilar mechanistic models.

Calculating the posterior for all 350 ICG models took only a few seconds, and was fully automated, that is, did not require user interactions. These results show how SNPE allows fast and accurate identification of biophysical model parameters on new data, and how SNPE can be deployed for applications requiring rapid automated inference, such as high-throughput screening-assays, closed-loop paradigms (e.g. for adaptive experimental manipulations or stimulus-selection [Kleinegesse and Gutmann, 2019]), or interactive software tools.

Hodgkin–Huxley model: stronger constraints from additional data features

The Hodgkin–Huxley (HH) model (Hodgkin and Huxley, 1952) of action potential generation through ion channel dynamics is a highly influential mechanistic model in neuroscience. A number of algorithms have been proposed for fitting HH models to electrophysiological data (Prinz et al., 2003; Huys et al., 2006; Pospischil et al., 2008; Rossant et al., 2011; Meliza et al., 2014; Van Geit et al., 2016; Ben-Shalom et al., 2019), but (with the exception of Daly et al., 2015) these approaches do not attempt to estimate the full posterior. Given the central importance of the HH model in neuroscience, we sought to test how SNPE would cope with this challenging non-linear model.

As previous approaches for HH models concentrated on reproducing specified features (e.g. the number of spikes, [Pospischil et al., 2008]), we also sought to determine how various features provide different constraints. We considered the problem of inferring eight biophysical parameters in a HH single-compartment model, describing voltage-dependent sodium and potassium conductances and other intrinsic membrane properties, including neural noise, making the model stochastic by nature (Figure 4a, left). We simulated the neuron’s voltage response to the injection of a square wave of depolarizing current, and defined the model output 𝐱 used for inference as the number of evoked action potentials along with six additional features of the voltage response (Figure 4a, right, details in Materials and methods). We first applied SNPE to observed data 𝐱o created by simulation from the model, calculating the posterior distribution using all seven features in the observed data (Figure 4b). The posterior contained the ground truth parameters in a high probability-region, as in previous applications, indicating the consistency of parameter identification. The variance of the posterior was narrower for some parameters than for others, indicating that the seven data features constrain some parameters strongly (such as the potassium conductance), but others only weakly (such as the adaptation time constant). Additional simulations with parameters sampled from the posterior closely resembled the observed data 𝐱o, in terms of both the raw membrane voltage over time and the seven data features (Figure 4c, purple and green). Parameters with low posterior probability (outliers) generated simulations that markedly differed from 𝐱o (Figure 4c, magenta).

Inference for single compartment Hodgkin–Huxley model.

(a) Circuit diagram describing the Hodgkin–Huxley model (left), and simulated voltage-trace given a current input (right). Three out of 7 voltage features are depicted: (1) number of spikes, (2) mean resting potential, and (3) standard deviation of the pre-stimulus resting potential. (b) Inferred posterior for 8 parameters given seven voltage features. Ground truth parameters in green, high-probability parameters in purple, low-probability parameters in magenta. (c) Traces (left) and associated features f (right) for the desired output (observation), the mode of the inferred posterior, and a sample with low posterior probability. The voltage features are: number of spikes sp, mean resting potential rpot, standard deviation of the resting potential σrpot, and the first four voltage moments, mean m1, standard deviation m2, skewness m3 and kurtosis m4. Each value plotted is the mean feature ± standard deviation across 100 simulations with the same parameter set. Each feature is normalized by σf PRIOR, the standard deviation of the respective feature of simulations sampled from the prior. (d) Partial view of the inferred posteriors (4 out of 8 parameters) given 1, 4 and 7 features (full posteriors over eight parameters in Appendix 1—figure 8). (e) Traces for posterior modes given 1, 4 and 7 features. Increasing the number of features leads to posterior traces that are closer to the observed data. (f) Observations from Allen Cell Types Database (green) and corresponding mode samples (purple). Posteriors in Appendix 1—figure 9.

Genetic algorithms are commonly used to fit parameters of deterministic biophysical models (Druckmann et al., 2007; Hay et al., 2011; Van Geit et al., 2016; Gouwens et al., 2018). While genetic algorithms can also return multiple data-compatible parameters, they do not perform inference (i.e. find the posterior distribution), and their outputs depend strongly on user-defined goodness-of-fit criteria. When comparing a state-of-the-art genetic algorithm (Indicator Based Evolutionary Algorithm, IBEA, [Bleuler et al., 2003; Zitzler and Künzli, 2004; Van Geit et al., 2016]) to SNPE, we found that the parameter-settings favored by IBEA produced simulations whose summary features were as similar to the observed data as those obtained by SNPE high-probability samples (Appendix 1—figure 10). However, high-scoring IBEA parameters were concentrated in small regions of the posterior, that is, IBEA did not identify the full space of data-compatible models.

To investigate how individual data features constrain parameters, we compared SNPE-estimated posteriors based (1) solely on the spike count, (2) on the spike count and three voltage-features, or (3) on all 7 features of 𝐱o. As more features were taken into account, the posterior became narrower and centered more closely on the ground truth parameters (Figure 4d, Appendix 1—figure 8). Posterior simulations matched the observed data only in those features that had been used for inference (e.g. applying SNPE to spike counts alone identified parameters that generated the correct number of spikes, but for which spike timing and subthreshold voltage time course were off, Figure 4e). For some parameters, such as the potassium conductance, providing more data features brought the peak of the posterior (the posterior mode) closer to the ground truth and also decreased uncertainty. For other parameters, such as VT, a parameter adjusting the spike threshold (Pospischil et al., 2008), the peak of the posterior was already close to the correct value with spike counts alone, but adding additional features reduced uncertainty. While SNPE can be used to study the effect of additional data features in reducing parameter uncertainty, this would not be the case for methods that only return a single best-guess estimate of parameters. These results show that SNPE can reveal how information from multiple data features imposes collective constraints on channel and membrane properties in the HH model.

We also inferred HH parameters for eight in vitro recordings from the Allen Cell Types database using the same current-clamp stimulation protocol as in our model (Allen Institute for Brain Science, 2016; Teeter et al., 2018; Figure 4f, Appendix 1—figure 9). In each case, simulations based on the SNPE-inferred posterior closely resembled the original data (Figure 4f). We note that while inferred parameters differed across recordings, some parameters (the spike threshold, the density of sodium channels, the membrane reversal potential and the density of potassium channels) were consistently more strongly constrained than others (the intrinsic neural noise, the adaptation time constant, the density of slow voltage-dependent channels and the leak conductance) (Appendix 1—figure 9). Overall, these results suggest that the electrophysiological responses measured by this current-clamp protocol can be approximated by a single-compartment HH model, and that SNPE can identify the admissible parameters.

Crustacean stomatogastric ganglion: sensitivity to perturbations

We next aimed to demonstrate how the full posterior distribution obtained with SNPE can lead to novel scientific insights. To do so, we used the pyloric network of the stomatogastric ganglion (STG) of the crab Cancer borealis, a well-characterized neural circuit producing rhythmic activity. In this circuit, similar network activity can arise from vastly different sets of membrane and synaptic conductances (Prinz et al., 2004). We first investigated whether data-consistent sets of membrane and synaptic conductances are connected in parameter space, as has been demonstrated for single neurons (Taylor et al., 2006), and, second, which compensation mechanisms between parameters of this circuit allow the neural system to maintain its activity despite parameter variations. While this model has been studied extensively, answering these questions requires characterizing higher dimensional parameter spaces than those accessed previously. We demonstrate how SNPE can be used to identify the posterior distribution over both membrane and synaptic conductances of the STG (31 parameters total) and how the full posterior distribution can be used to study the above questions at the circuit level.

For some biological systems, multiple parameter sets give rise to the same system behavior (Prinz et al., 2004; Marder and Goaillard, 2006; Gutierrez et al., 2013; Fisher et al., 2013; Marder et al., 2015; Alonso and Marder, 2019). In particular, neural systems can be robust to specific perturbations of parameters (O'Leary et al., 2014; Marder et al., 2015; O'Leary and Marder, 2016), yet highly sensitive to others, properties referred to as sloppiness and stiffness (Goldman et al., 2001; Gutenkunst et al., 2007; Machta et al., 2013; O'Leary et al., 2015). We studied how perturbations affect model output using a model (Prinz et al., 2004) and data (Haddad and Marder, 2018) of the pyloric rhythm in the crustacean stomatogastric ganglion (STG). This model describes a triphasic motor pattern generated by a well-characterized circuit (Figure 5a). The circuit consists of two electrically coupled pacemaker neurons (anterior burster and pyloric dilator, AB/PD), modeled as a single neuron, as well as two types of follower neurons (lateral pyloric (LP) and pyloric (PY)), all connected through inhibitory synapses (details in Materials and methods). Eight membrane conductances are included for each modeled neuron, along with seven synaptic conductances, for a total of 31 parameters. This model has been used to demonstrate that virtually indistinguishable activity can arise from vastly different membrane and synaptic conductances in the STG (Prinz et al., 2004; Alonso and Marder, 2019). Here, we build on these studies and extend the model to include intrinsic neural noise on each neuron (see Materials and methods).

Identifying network models underlying an experimentally observed pyloric rhythm in the crustacean stomatogastric ganglion.

(a) Simplified circuit diagram of the pyloric network from the stomatogastric ganglion. Thin connections are fast glutamatergic, thick connections are slow cholinergic. (b) Extracellular recordings from nerves of pyloric motor neurons of the crab Cancer borealis (Haddad and Marder, 2018). Numbers indicate some of the used summary features, namely cycle period (1), phase delays (2), phase gaps (3), and burst durations (4) (see Materials and methods for details). (c) Posterior over 24 membrane and seven synaptic conductances given the experimental observation shown in panel b (eight parameters shown, full posterior in Appendix 1—figure 11). Two high-probability parameter sets in purple. Inset: magnified marginal posterior for the synaptic strengths AB to LP neuron vs. PD to LP neuron. (d) Identifying directions of sloppiness and stiffness. Two samples from the posterior both show similar network activity as the experimental observation (top left and top right), but have very different parameters (purple dots in panel c). Along the high-probability path between these samples, network activity is preserved (trace 1). When perturbing the parameters orthogonally off the path, network activity changes abruptly and becomes non-pyloric (trace 2).

We applied SNPE to an extracellular recording from the STG of the crab Cancer borealis (Haddad and Marder, 2018) which exhibited pyloric activity (Figure 5b), and inferred the posterior distribution over all 31 parameters based on 18 salient features of the voltage traces, including cycle period, phase delays, phase gaps, and burst durations (features in Figure 5B, posterior in Figure 5c, posterior over all parameters in Appendix 1—figure 11, details in Materials and methods). Consistent with previous reports, the posterior distribution has high probability over extended value ranges for many membrane and synaptic conductances. To verify that parameter settings across these extended ranges are indeed capable of generating the experimentally observed network activity, we sampled two sets of membrane and synaptic conductances from the posterior distribution. These two samples have widely disparate parameters from each other (Figure 5c, purple dots, details in Materials and methods), but both exhibit activity highly similar to the experimental observation (Figure 5d, top left and top right).

We then investigated the geometry of the parameter space producing these rhythms (Achard and De Schutter, 2006; Alonso and Marder, 2019). First, we wanted to identify directions of sloppiness, and we were interested in whether parameter settings producing pyloric rhythms form a single connected region, as has been shown for single neurons (Taylor et al., 2006), or whether they lie on separate ‘islands’. Starting from the two parameter settings showing similar activity above, we examined whether they were connected by searching for a path through parameter space along which pyloric activity was maintained. To do this, we algorithmically identified a path lying only in regions of high posterior probability (Figure 5c, white, details in Materials and methods). Along the path, network output was tightly preserved, despite a substantial variation of the parameters (voltage trace 1 in Figure 5d, Appendix 1—figure 12). Second, we inspected directions of stiffness by perturbing parameters off the path. We applied perturbations that yield maximal drops in posterior probability (see Materials and methods for details), and found that the network quickly produced non-pyloric activity (voltage trace 2, Figure 5d; Goldman et al., 2001). Note that, while parameter set 2 seems to lie in regions of high probability when inspecting pairwise marginals, it in fact has low probability under the full posterior distribution (Appendix 1—figure 13). In identifying these paths and perturbations, we exploited the fact that SNPE provides a differentiable estimate of the posterior, as opposed to parameter search methods which provide only discrete samples.

Overall, these results show that the pyloric network can be robust to specific perturbations in parameter space, but sensitive to others, and that one can interpolate between disparate solutions while preserving network activity. This analysis demonstrates the flexibility of SNPE in capturing complex posterior distributions, and shows how the differentiable posterior can be used to study directions of sloppiness and stiffness.

Predicting compensation mechanisms from posterior distributions

Experimental and computational studies have shown that stable neural activity can be maintained despite variable circuit parameters (Prinz et al., 2004; Marder and Taylor, 2011; O’Leary, 2018). This behavior can emerge from two sources (Marder and Taylor, 2011): either, the variation of a certain parameter barely influences network activity at all, or alternatively, variations of several parameters influence network activity, but their effects compensate for one another. Here, we investigated these possibilities by using the posterior distribution over membrane and synaptic conductances of the STG.

We began by drawing samples from the posterior and inspecting their pairwise histograms (i.e. the pairwise marginals, Figure 6a, posterior over all parameters in Appendix 1—figure 11). Consistent with previously reported results (Taylor et al., 2009), many parameters seem only weakly constrained and only weakly correlated (Figure 6b). However, this observation does not imply that the parameters of the network do not have to be finely tuned: pairwise marginals are averages over many network configurations, where all other parameters may take on diverse values, which could disguise that each individual configuration is finely tuned. Indeed, when we sampled parameters independently from their posterior histograms, the resulting circuit configurations rarely produced pyloric activity, indicating that parameters have to be tuned relative to each other (Appendix 1—figure 14). This analysis also illustrates that the (common) approach of independently setting parameters can be problematic: although each parameter individually is in a realistic range, the network as a whole is not (Golowasch et al., 2002). Finally, it shows the importance of identifying the full posterior distribution, which is far more informative than just finding individual parameters and assigning error bars.

Predicting compensation mechanisms in the stomatogastric ganglion.

(a) Inferred posterior. We show a subset of parameters which are weakly constrained (full posterior in Appendix 1—figure 11). Pyloric activity can emerge from a wide range of maximal membrane conductances, as the 1D and 2D posterior marginals cover almost the entire extent of the prior. (b) Correlation matrix, based on the samples shown in panel (a). Almost all correlations are weak. Ordering of membrane and synaptic conductances as in Appendix 1—figure 11. (c) Conditional distributions given a particular circuit configuration: for the plots on the diagonal, we keep all but one parameter fixed. For plots above the diagonal, we keep all but two parameters fixed. The remaining parameter(s) are narrowly tuned; tuning across parameters is often highly correlated. When conditioning on a different parameter setting (right plot), the conditional posteriors change, but correlations are often maintained. (d) Conditional correlation matrix, averaged over 500 conditional distributions like the ones shown in panel (c). Black squares highlight parameter-pairs within the same model neuron. (e) Consistency with experimental observations. Top: maximal conductance of the fast transient potassium current and the maximal conductance of the hyperpolarization current are positively correlated for all three neurons. This has also been experimentally observed in the PD and the LP neuron (MacLean et al., 2005). Bottom: the maximal conductance of the hyperpolarization current of the postsynaptic neuron can compensate the strength of the synaptic input, as experimentally observed in the PD and the LP neuron (Grashow et al., 2010; Marder, 2011). The boxplots indicate the maximum, 75% quantile, median, 25% quantile, and minimum across 500 conditional correlations for different parameter pairs. Face color indicates mean correlation using the colorbar shown in panel (b).

In order to investigate the need for tuning between pairs of parameters, we held all but two parameters constant at a given consistent circuit configuration (sampled from the posterior), and observed the network activity across different values of the remaining pair of parameters. We can do so by calculating the conditional posterior distribution (details in Materials and methods), and do not have to generate additional simulations (as would be required by parameter search methods). Doing so has a simple interpretation: when all but two parameters are fixed, what values of the remaining two parameters can then lead to the desired network activity? We found that the desired pattern of pyloric activity can emerge only from narrowly tuned and often highly correlated combinations of the remaining two parameters, showing how these parameters can compensate for one another (Figure 6c). When repeating this analysis across multiple network configurations, we found that these ‘conditional correlations’ are often preserved (Figure 6c, left and right). This demonstrates that pairs of parameters can compensate for each other in a similar way, independently of the values taken by other parameters. This observation about compensation could be interpreted as an instance of modularity, a widespread underlying principle of biological robustness (Kitano, 2004).

We calculated conditional correlations for each parameter pair using 500 different circuit configurations sampled from the posterior (Figure 6d). Compared to correlations based on the pairwise marginals (Figure 6b), these conditional correlations were substantially stronger. They were particularly strong across membrane conductances of the same neuron, but primarily weak across different neurons (black boxes in Figure 6d).

Finally, we tested whether the conditional correlations were in line with experimental observations. For the PD and the LP neuron, it has been reported that overexpression of the fast transient potassium current (IA) leads to a compensating increase of the hyperpolarization current (IH), suggesting a positive correlation between these two currents (MacLean et al., 2003; MacLean et al., 2005). These results are qualitatively consistent with the positive conditional correlations between the maximal conductances of IA and IH for all three model neurons (Figure 6e top). In addition, using the dynamic clamp, it has been shown that diverse combinations of the synaptic input strength and the maximal conductance of IH lead to similar activity in the LP and the PD neuron (Grashow et al., 2010; Marder, 2011). Consistent with these findings, the non-zero conditional correlations reveal that there can indeed be compensation mechanisms between the synaptic strength and the maximal conductance of IH of the postsynaptic neuron (Figure 6e bottom).

Overall, we showed how SNPE can be used to study parameter dependencies, and how the posterior distribution can be used to efficiently explore potential compensation mechanisms. We found that our method can predict compensation mechanisms which are qualitatively consistent with experimental studies. We emphasize that these findings would not have been possible with a direct grid-search over all parameters: defining a grid in a 31-dimensional parameter space would require more than 231 > 2 billion simulations, even if one were to use the coarsest-possible grid with only two values per dimension.

Discussion

How can we build models which give insights into the causal mechanisms underlying neural or behavioral dynamics? The cycle of building mechanistic models, generating predictions, comparing them to empirical data, and rejecting or refining models has been of crucial importance in the empirical sciences. However, a key challenge has been the difficulty of identifying mechanistic models which can quantitatively capture observed phenomena. We suggest that a generally applicable tool to constrain mechanistic models by data would expedite progress in neuroscience. While many considerations should go into designing a model that is appropriate for a given question and level of description (Herz et al., 2006; Brette, 2015; Gerstner et al., 2012; O'Leary et al., 2015), the question of whether and how one can perform statistical inference should not compromise model design. In our tool, SNPE, the process of model building and parameter inference are entirely decoupled. SNPE can be applied to any simulation-based model (requiring neither model nor summary features to be differentiable) and gives full flexibility on defining a prior. We illustrated the power of our approach on a diverse set of applications, highlighting the potential of SNPE to rapidly identify data-compatible mechanistic models, to investigate which data-features effectively constrain parameters, and to reveal shortcomings of candidate-models.

Finally, we used a model of the stomatogastric ganglion to show how SNPE can identify complex, high-dimensional parameter landscapes of neural systems. We analyzed the geometrical structure of the parameter landscape and confirmed that circuit configurations need to be finely tuned, even if individual parameters can take on a broad range of values. We showed that different configurations are connected in parameter space, and provided hypotheses for compensation mechanisms. These analyses were made possible by SNPE’s ability to estimate full parameter posteriors, rather than just constraints on individual parameters, as is common in many statistical parameter-identification approaches.

Related work

SNPE builds on recent advances in machine learning and in particular in density-estimation approaches to likelihood-free inference (Papamakarios and Murray, 2016; Le et al., 2017a; Lueckmann et al., 2017; Chan et al., 2018; Greenberg et al., 2019, reviewed in Cranmer et al., 2020). We here scaled these approaches to canonical mechanistic models of neural dynamics and provided methods and software-tools for inference, visualization, and analysis of the resulting posteriors (e.g. the high-probability paths and conditional correlations presented here).

The idea of learning inference networks on simulated data can be traced back to regression-adjustment methods in ABC (Beaumont et al., 2002; Blum and François, 2010). Papamakarios and Murray, 2016 first proposed to use expressive conditional density estimators in the form of deep neural networks (Bishop, 1994; Papamakarios et al., 2017), and to optimize them sequentially over multiple rounds with cost-functions derived from Bayesian inference principles. Compared to commonly used rejection-based ABC methods (Rubin, 1984; Pritchard et al., 1999), such as MCMC-ABC (Marjoram et al., 2003), SMC-ABC (Sisson et al., 2007; Liepe et al., 2014), Bayesian-Optimization ABC (Gutmann and Corander, 2016), or ensemble methods (Britton et al., 2013; Lawson et al., 2018), SNPE approaches do not require one to define a distance function in data space. In addition, by leveraging the ability of neural networks to learn informative features, they enable scaling to problems with high-dimensional observations, as are common in neuroscience and other fields in biology. We have illustrated this capability in the context of receptive field estimation, where a convolutional neural network extracts summary features from a 1681 dimensional spike-triggered average. Alternative likelihood-free approaches include synthetic likelihood methods (Wood, 2010; Costa et al., 2013; Wilkinson, 2014; Meeds and Welling, 2014; Papamakarios et al., 2019a; Lueckmann et al., 2019; Durkan et al., 2018), moment-based approximations of the posterior (Barthelmé and Chopin, 2014; Schröder et al., 2019), inference compilation (Le et al., 2017b; Casado et al., 2017), and density-ratio estimation (Hermans et al., 2020). For some mechanistic models in neuroscience (e.g. for integrate-and-fire neurons), likelihoods can be computed via stochastic numerical approximations (Chen, 2003; Huys and Paninski, 2009; Meliza et al., 2014) or model-specific analytical approaches (Huys et al., 2006; Hertäg et al., 2012; Pozzorini et al., 2015; Ladenbauer et al., 2018; René et al., 2020).

How big is the advance brought by SNPE relative to ‘conventional’ brute-force approaches that aim to exhaustively explore parameter space? A fundamental difference from grid search approaches that have been applied to neuroscientific models (Prinz et al., 2003; Caplan et al., 2014; Stringer et al., 2016) is that SNPE can perform Bayesian inference for stochastic models, whereas previous approaches identified parameters whose deterministic model-outputs were heuristically ‘close’ to empirical data. Depending on the goal of the analysis, either approach might be preferable. SNPE, and Bayesian inference more generally, is derived for stochastic models. SNPE can, in principle, also be applied to deterministic models, but a rigorous mathematical interpretation or empirical evaluation in this regime is beyond the scope of this study. SNPE also differs conceptually and quantitatively from rejection-ABC, in which random parameters are accepted or rejected based on a distance-criterion. SNPE uses all simulations during training instead of rejecting some, learns to identify data features informative about model parameters rather than relying on the user to choose the correct data features and distance metric, and performs considerably better than rejection-ABC, in particular for problems with high-dimensional observations (Figure 2). Another advantage over grid search and rejection-ABC is that SNPE can ‘amortize’ inference of parameter posteriors, so that one can quickly perform inference on new data, or explore compensation mechanisms, without having to carry out new simulations, or repeatedly search a simulation database. We should still note that SNPE can require the generation of large sets of simulations, which can be viewed as a brute-force step, emphasising that one of the main strengths of SNPE over conventional brute-force approaches relies on the processing of these simulations via deep neural density estimators.

Our approach is already finding its first applications in neuroscience–for example, Oesterle et al., 2020 have used a variant of SNPE to constrain biophysical models of retinal neurons, with the goal of optimizing stimulation approaches for neuroprosthetics. Concurrently with our work, Bittner et al., 2019 developed an alternative approach to parameter identification for mechanistic models and showed how it can be used to characterize neural population models which exhibit specific emergent computational properties. Both studies differ in their methodology and domain of applicability (see descriptions of underlying algorithms in our prior work [Lueckmann et al., 2017; Greenberg et al., 2019] and theirs [Loaiza-Ganem et al., 2017]), as well in the focus of their neuroscientific contributions. Both approaches share the overall goal of using deep probabilistic inference tools to build more interpretable models of neural data. These complementary and concurrent advances will expedite the cycle of building, adjusting and selecting mechanistic models in neuroscience.

Finally, a complementary approach to mechanistic modeling is to pursue purely phenomenological models, which are designed to have favorable statistical and computational properties: these data-driven models can be efficiently fit to neural data (Brown et al., 1998; Truccolo et al., 2005; Pillow, 2007; Pillow et al., 2008; Schneidman et al., 2006; Macke et al., 2011; Yu et al., 2009; Pandarinath et al., 2018; Cunningham and Yu, 2014) or to implement desired computations (Sussillo and Abbott, 2009). Although tremendously useful for a quantitative characterization of neural dynamics, these models typically have a large number of parameters, which rarely correspond to physically measurable or mechanistically interpretable quantities, and thus it can be challenging to derive mechanistic insights or causal hypotheses from them (but see e.g. Mante et al., 2013; Sussillo and Barak, 2013; Maheswaranathan et al., 2019).

Use of summary features

When fitting mechanistic models to data, it is common to target summary features to isolate specific behaviors, rather than the full data. For example, the spike shape is known to constrain sodium and potassium conductances (Druckmann et al., 2007; Pospischil et al., 2008; Hay et al., 2011). When modeling population dynamics, it is often desirable to achieve realistic firing rates, rate-correlations and response nonlinearities (Rubin et al., 2015; Bittner et al., 2019), or specified oscillations (Prinz et al., 2004). In models of decision making, one is often interested in reproducing psychometric functions or reaction-time distributions (Ratcliff and McKoon, 2008). Choice of summary features might also be guided by known limitations of either the model or the measurement approach, or necessitated by the fact that published data are only available in summarized form. Several methods have been proposed to automatically construct informative summary features (Blum et al., 2013; Jiang et al., 2017; Izbicki et al., 2019). SNPE can be applied to, and might benefit from the use of summary features, but it also makes use of the ability of neural networks to automatically learn informative features in high-dimensional data. Thus, SNPE can also be applied directly to raw data (e.g. using recurrent neural networks [Lueckmann et al., 2017]), or to high-dimensional summary features which are challenging for ABC approaches (Figure 2). In all cases, care is needed when interpreting models fit to summary features, as choice of features can influence the results (Blum et al., 2013; Jiang et al., 2017; Izbicki et al., 2019).

Applicability and limitations

A key advantage of SNPE is its general applicability: it can be applied whenever one has a simulator that allows to stochastically generate model outputs from specific parameters. Furthermore, it can be applied in a fully ‘black-box manner’, that is, does not require access to the internal workings of the simulator, its model equations, likelihoods or gradients. It does not impose any other limitations on the model or the summary features, and in particular does not require them to be differentiable. However, it also has limitations which we enumerate below.

First, current implementations of SNPE scale well to high-dimensional observations (∼1000s of dimensions, also see Greenberg et al., 2019), but scaling SNPE to even higher-dimensional parameter spaces (above 30) is challenging (note that previous approaches were generally limited to less than 10 dimensions). Given that the difficulty of estimating full posteriors scales exponentially with dimensionality, this is an inherent challenge for all approaches that aim at full inference (in contrast to just identifying a single, or a few heuristically chosen parameter fits).

Second, while it is a long-term goal for these approaches to be made fully automatic, our current implementation still requires choices by the user: as described in Materials and methods, one needs to choose the type of the density estimation network, and specify settings related to network-optimization, and the number of simulations and inference rounds. These settings depend on the complexity of the relation between summary features and model parameters, and the number of simulations that can be afforded. In the documentation accompanying our code-package, we provide examples and guidance. For small-scale problems, we have found SNPE to be robust to these settings. However, for challenging, high-dimensional applications, SNPE might currently require substantial user interaction.

Third, the power of SNPE crucially rests on the ability of deep neural networks to perform density estimation. While deep nets have had ample empirical success, we still have an incomplete understanding of their limitations, in particular in cases where the mapping between data and parameters might not be smooth (e.g. near phase transitions).

Fourth, when applying SNPE (or any other model-identification approach), validation of the results is of crucial importance, both to assess the accuracy of the inference procedure, as well as to identify possible limitations of the mechanistic model itself. In the example applications, we used several procedures for assessing the quality of the inferred posteriors. One common ingredient of these approaches is to sample from the inferred model, and search for systematic differences between observed and simulated data, e.g. to perform posterior predictive checks (Cook et al., 2006; Talts et al., 2018; Liepe et al., 2014; Lueckmann et al., 2017; Greenberg et al., 2019; Figure 2g, Figure 3f,g, Figure 4c, and Figure 5d). These approaches allow one to detect ‘failures’ of SNPE, that is, cases in which samples from the posterior do not reproduce the data. However, when diagnosing any Bayesian inference approach, it is challenging to rigorously rule out the possibility that additional parameter-settings (e.g. in an isolated ‘island’) would also explain the data. Thus, it is good practice to use multiple initializations of SNPE, and/or a large number of simulations in the initial round. There are challenges and opportunities ahead in further scaling and automating simulation-based inference approaches. However, in its current form, SNPE will be a powerful tool for quantitatively evaluating mechanistic hypotheses on neural data, and for designing better models of neural dynamics.

Materials and methods

Code availability

Request a detailed protocol

Code implementing SNPE based on Theano, is available at http://www.mackelab.org/delfi/. An extended toolbox based on PyTorch is available at http://www.mackelab.org/sbi/ (Tejero-Cantero et al., 2020).

Simulation-based inference

Request a detailed protocol

To perform Bayesian parameter identification with SNPE, three types of input need to be specified:

  1. A mechanistic model. The model only needs to be specified through a simulator, that is that one can generate a simulation result 𝐱 for any parameters 𝜽. We do not assume access to the likelihood p(𝐱|𝜽) or the equations or internals of the code defining the model, nor do we require the model to be differentiable. This is in contrast to many alternative approaches (including Bittner et al., 2019), which require the model to be differentiable and to be implemented in a software code that is amenable to automatic differentiation packages. Finally, SNPE can both deal with inputs 𝐱 which resemble ‘raw’ outputs of the model, or summary features calculated from data.

  2. Observed data 𝐱o of the same form as the results 𝐱 produced by model simulations.

  3. A prior distribution p(𝜽) describing the range of possible parameters. p(𝜽) could consist of upper and lower bounds for each parameter, or a more complex distribution incorporating mechanistic first principles or knowledge gained from previous inference procedures on other data. In our applications, we chose priors deemed reasonable or informed by previous studies (see Materials and methods), although setting such priors is an open problem in itself, and outside of the scope of this study.

For each problem, our goal was to estimate the posterior distribution p(𝜽|𝐱o). To do this, we used SNPE (Papamakarios and Murray, 2016; Lueckmann et al., 2017; Greenberg et al., 2019). Setting up the inference procedure required three design choices:

  1. A network architecture, including number of layers, units per layer, layer type (feedforward or convolutional), activation function and skip connections.

  2. A parametric family of probability densities qψ(𝜽) to represent inferred posteriors, to be used as conditional density estimator. We used either a mixture of Gaussians (MoG) or a masked autoregressive flow (MAF) (Papamakarios et al., 2017). In the former case, the number of components K must be specified; in the latter the number of MADES (Masked Autoencoder for Distribution Estimation) nMADES. Both choices are able to represent richly structured, and multimodal posterior distributions (more details on neural density estimation below).

  3. A simulation budget, that is, number of rounds R and simulations per round Nr. The required number of simulations depends on both the dimensionality and complexity of the function between summary statistics and model parameters. While the number of parameters and summary-features can easily be determined, it can be hard to determine how ‘complex’ (or nonlinear) this mapping is. This makes it difficult to give general guidelines on how many simulations will be required. A practical approach is to choose a simulation-budget based on the computational cost of the simulation, inspect the results (e.g. with posterior predictive checks), and add more simulations when it seems necessary.

We emphasize that SNPE is highly modular, that is, that the the inputs (data, the prior over parameter, the mechanistic model), and algorithmic components (network architecture, probability density, optimization approach) can all be modified and chosen independently. This allows neuroscientists to work with models which are designed with mechanistic principles—and not convenience of inference—in mind. Furthermore, it allows SNPE to benefit from advances in more flexible density estimators, more powerful network architectures, or optimization strategies.

With the problem and inference settings specified, SNPE adjusts the network weights ϕ based on simulation results, so that p(𝜽|𝐱)qF(𝐱,ϕ)(𝜽) for any 𝐱. In the first round of SNPE, simulation parameters are drawn from the prior p(𝜽). If a single round of inference is not sufficient, SNPE can be run in multiple rounds, in which samples are drawn from the version of qF(𝐱o,ϕ)(𝜽) at the beginning of the round. After the last round, qF(𝐱o,ϕ) is returned as the inferred posterior on parameters 𝜽 given observed data 𝐱o. If SNPE is only run for a single round, then the generated samples only depend on the prior, but not on 𝐱o: in this case, the inference network is applicable to any data (covered by the prior ranges), and can be used for rapid amortized inference.

SNPE learns the correct network weights ϕ by minimizing the objective function j(𝜽j,𝐱j) where the simulation with parameters 𝜽j produced result 𝐱j. For the first round of SNPE (𝜽j,𝐱j)=-logqF(𝐱j,ϕ), while in subsequent rounds a different loss function accounts for the fact that simulation parameters were not sampled from the prior. Different choices of the loss function for later rounds result in SNPE-A (Papamakarios and Murray, 2016), SNPE-B (Lueckmann et al., 2017) or SNPE-C algorithm (Greenberg et al., 2019). To optimize the networks, we used ADAM with default settings (Kingma and Ba, 2014).

The details of the algorithm are below:

Algorithm 1: SNPE
  Input: simulator with (implicit) density p(𝐱|𝜽), observed data 𝐱o, prior p(𝜽), density family qψ,
  neural network F(𝐱,ϕ), number of rounds , simulation count for each round Nr
  randomly initialize ϕ
  p~1(𝜽):=p(𝜽)
  N:=0
  for r=1 to R do
    for i=1Nr do
      sample 𝜽N+ip~r(𝜽)
      simulate 𝐱N+ip(𝐱|𝜽N+i)
    NN+Nr
    train ϕargminϕj=1N(𝜽j,𝐱j)
    p~r(𝜽):=qF(𝐱o,ϕ)(𝜽)
  return qF(𝐱o,ϕ)(𝜽)

Bayesian inference without likelihood-evaluations with SNPE

Request a detailed protocol

In Papamakarios and Murray, 2016, it was shown that the procedure described above (i.e. sample from the prior, train a flexible density estimator by minimizing the log-loss (𝜽j,𝐱j)=-jlogqF(𝐱j,ϕ)(𝜽j)) can be used to perform Bayesian inference without likelihood evaluations.

For the multi-round case, in which samples are no longer drawn from the prior, but adaptively generated from a (generally more focused) proposal distribution, the loss function needs to be modified. Different variants of SNPE differ in how exactly this is done:

  • SNPE-A minimizes the same loss function as in the first round, but applies a post-hoc analytical correction (Papamakarios and Murray, 2016)

  • SNPE-B minimizes an importance-weighted loss function, directly approximating the posterior and therefore not requiring a post-hoc correction (Lueckmann et al., 2017)

  • SNPE-C avoids importance weights (which can have high variance), by either calculating normalization constants in closed-form or using a classifier-based loss (Greenberg et al., 2019)

Neural density estimation

Request a detailed protocol

As described above, SNPE approximates the posterior distribution with flexible neural density estimators: either a mixture density network (MDN) or a masked autoregressive flow (MAF). Below, we provide a few more details about these density estimators, how we chose their respective architectures, and when to choose one or the other.

The MDN outputs the parameters of a mixture of Gaussians (i.e. mixture weights, and for each component of the mixture, the mean vector and covariance entries). Thus, for an MDN composed of K components, we chose an architecture with at least as many units per layer as K(1+Nθ+Nθ(Nθ+1)/2)1, where Nθ is the number of parameters to infer, to ensure enough flexibility to approximate well the parameters of the mixture of Gaussians. For example, when inferring the parameters of the Hodgkin-Huxley model given in vitro recordings from mouse cortex (Allen Cell Types Database, https://celltypes.brain-map.org/data), we infer the posterior over eight parameters with a mixture of two Gaussians, and the MDN needs at least 89 units per layer. Across applications, we found two layers to be sufficient to appropriately approximate the posterior distribution.

MAF is a specific type of normalizing flow, which is a highly flexible density estimator (Rezende and Mohamed, 2015; Papamakarios et al., 2017; Papamakarios et al., 2019b). Normalizing flows consist of a stack of bijections which transform a simple distribution (usually a multivariate Gaussian distribution) into the target distribution. Each bijection is parameterized by a specific type of neural network (for MAF: a Masked Autoencoder for Distribution Estimation, or MADE). In our experiments, five stacked bijections are enough to approximate even complex posterior distributions. Depending on the size of the parameter and data space, each neural network had between [50,50] and [100,100,100] hidden units.

When using SNPE in a single-round, we generally found superior performance for MAFs as compared to MDNs. When running inference across multiple rounds, training MAFs leads to additional challenges which might impede the quality of inference (Greenberg et al., 2019; Durkan et al., 2020).

Linear-nonlinear encoding models

Request a detailed protocol

We used a Linear-Nonlinear (LN) encoding model (a special case of a generalized linear model, GLM, [Brown et al., 1998; Paninski, 2004; Truccolo et al., 2005; Pillow, 2007; Pillow et al., 2008; Gerwinn et al., 2010]) to simulate the activity of a neuron in response to a univariate time-varying stimulus. Neural activity zi was subdivided in T=100 bins and, within each bin i, spikes were generated according to a Bernoulli observation model,

ziBern(η(𝐯i𝒇+β)),

where 𝐯i is a vector of white noise inputs between time bins i-8 and i, 𝒇 a length-9 linear filter, β is the bias, and η()=exp()/(1+exp()) is the canonical inverse link function for a Bernoulli GLM. As summary features, we used the total number of spikes N and the spike-triggered average 1N𝐕𝐳, where 𝐕=[v1,v2,,vT] is the so-called design matrix of size 9×T. We note that the spike-triggered sum 𝐕𝐳 constitutes sufficient statistics for this GLM, that is that selecting the STA and N together as summary features does not lead to loss of model relevant information over the full input-output dataset {𝐕,𝐳}. We used a Gaussian prior with zero mean and covariance matrix Σ=σ2(FF)1, where 𝐅 encourages smoothness by penalizing the second-order differences in the vector of parameters (De Nicolao et al., 1997).

For inference, we used a single round of 10,000 simulations, and the posterior was approximated with a Gaussian distribution (𝜽10,𝐱10). We used a feedforward neural network with two hidden layers of 50 units each. We used a Polya Gamma Markov Chain Monte Carlo sampling scheme (Polson et al., 2013) to estimate a reference posterior.

In Figure 2d, we compare the performance of SNPE with two classical ABC algorithms, rejection ABC and Sequential Monte Carlo ABC as a function of the number of simulations. We report the relative error in Kullback-Leibler divergence, which is defined as:

(1) DKL(pMCMC(𝜽|𝐱)||p^(𝜽|𝐱))DKL(pMCMC(𝜽|𝐱)||p(𝜽)),

and which ranges between 0 (perfect recovery of the posterior) and 1 (estimated posterior no better than the prior). Here, pMCMC(𝜽|𝐱) is the ground-truth posterior estimated via Markov Chain Monte Carlo sampling, p^(𝜽|𝐱) is the estimated posterior via SNPE, rejection ABC or Sequential Monte Carlo ABC, and p(𝜽) is the prior.

For the spatial receptive field model of a cell in primary visual cortex, we simulated the activity of a neuron depending on an image-valued stimulus. Neural activity was subdivided in bins of length Δt=0.025s and within each bin i, spikes were generated according to a Poisson observation model, 

ziPoiss(η(𝐯i𝒉+β)),

where 𝐯i is the vectorized white noise stimulus at time bin i, 𝒉 a 41 × 41 linear filter, β is the bias, and η()=exp() is the canonical inverse link function for a Poisson GLM. The receptive field 𝒉 is constrained to be a Gabor filter:

h(gx,gy)=gexp(x2+r2y22σ2)cos(2πfxϕ)x=(gxx)cosψ(gyy)sinψy=(gxx)sinψ+(gyy)cosψσ=2log22πf2w+12w1,

where (gx,gy) is a regular grid of 41 × 41 positions spanning the 2D image-valued stimulus. The parameters of the Gabor are gain g, spatial frequency f, aspect-ratio r, width w, phase ϕ (between 0 and π), angle ψ (between 0 and 2π) and location x,y (assumed within the stimulated area, scaled to be between −1 and 1). Bounded parameters were transformed with a log-, or logit-transform, to yield unconstrained parameters. After applying SNPE, we back-transformed both the parameters and the estimated posteriors in closed form, as shown in Figure 2. We did not transform the bias β.

We used a factorizing Gaussian prior for the vector of transformed Gabor parameters

[logg,logf,logr,logw,l0,π(ϕ),l0,2π(ψ),l-1,1(x),l-1,1(y)],

where transforms l0,π(X)=log(X/(2π-X)), l0,2π(X)=log(X/(π-X)), l-1,1(X)=log((X+1)/(1-X)) ensured the assumed ranges for the Gabor parameters ϕ,ψ,x,y. Our Gaussian prior had zero mean and standard deviations [0.5,0.5,0.5,0.5,1.9,1.78,1.78,1.78]. We note that a Gaussian prior on a logit-transformed random variable logitX with zero mean and standard deviation around 1.78 is close to a uniform prior over the original variable X. For the bias β, we used a Gaussian prior with mean −0.57 and variance 1.63, which approximately corresponds to an exponential prior exp(β)Exp(λ) with rate λ=1 on the baseline firing rate exp(β) in absence of any stimulus.

The ground-truth parameters for the demonstration in Figure 2 were chosen to give an asymptotic firing rate of 1 Hz for 5 min stimulation, resulting in 299 spikes, and a signal-to-noise ratio of −12dB.

As summary features, we used the total number of spikes N and the spike-triggered average 1N𝐕𝐳, where 𝐕=[v1,v2,,vT] is the stimulation video of length T=300/Δt=12000. As for the GLM with a temporal filter, the spike-triggered sum 𝐕𝐳 constitutes sufficient statistics for this GLM.

For inference, we applied SNPE-A with in total two rounds: an initial round serves to first roughly identify the relevant region of parameter space. Here we used a Gaussian distribution to approximate the posterior from 100,000 simulations. A second round then used a mixture of eight Gaussian components to estimate the exact shape of the posterior from another 100,000 simulations (𝜽9,𝐱1682). We used a convolutional network with five convolutional layers with 16 to 32 convolutional filters followed by two fully connected layers with 50 units each. The total number of spikes N within a simulated experiment was passed as an additional input directly to the fully-connected layers of the network. Similar to the previous GLM, this model has a tractable likelihood, so we use MCMC to obtain a reference posterior.

We applied this approach to extracelullar recordings from primary visual cortex of alert mice obtained using silicon microelectrodes in response to colored-noise visual stimulation. Experimental methods are described in Dyballa et al., 2018.

Comparison with Sequential Monte Carlo (SMC) ABC

Request a detailed protocol

In order to illustrate the competitive performance of SNPE, we obtained a posterior estimate with a classical ABC method, Sequential Monte Carlo (SMC) ABC (Sisson et al., 2007; Beaumont et al., 2009). Likelihood-free inference methods from the ABC family require a distance function d(𝐱o,𝐱) between observed data 𝐱o and possible simulation outputs 𝐱 to characterize dissimilarity between simulations and data. A common choice is the (scaled) Euclidean distance d(𝐱o,𝐱)=||𝐱-𝐱o||2. The Euclidean distance here was computed over 1681 summary features given by the spike-triggered average (one per pixel) and a single summary feature given by the ‘spike count’. To ensure that the distance measure was sensitive to differences in both STA and spike count, we scaled the summary feature ‘spike count’ to account for about 20% of the average total distance (other values did not yield better results). The other 80% were computed from the remaining 1681 summary features given by spike-triggered averages.

To showcase how this situation is challenging for ABC approaches, we generated 10,000 input-output pairs (𝜽i,𝐱i)p(𝐱|𝜽)p(𝜽) with the prior and simulator used above, and illustrate the 10 STAs and spike counts with closest d(𝐱o,𝐱i) in Appendix 1—figure 5a. Spike counts were comparable to the observed data (299 spikes), but STAs were noise-dominated and the 10 ‘closest’ underlying receptive fields (orange contours) showed substantial variability in location and shape of the receptive field. If even the ‘closest’ samples do not show any visible receptive field, then there is little hope that even an appropriately chosen acceptance threshold will yield a good approximation to the posterior. These findings were also reflected in the results from SMC-ABC with a total simulation budget of 106 simulations (Appendix 1—figure 5b). The estimated posterior marginals for ‘bias’ and ‘gain’ parameters show that the parameters related to the firing rate were constrained by the data 𝐱o, but marginals of parameters related to shape and location of the receptive field did not differ from the prior, highlighting that SMC-ABC was not able to identify the posterior distribution. The low correlations between the ground-truth receptive field and receptive fields sampled from SMC-ABC posterior further highlight the failure of SMC-ABC to infer the ground-truth posterior (Appendix 1—figure 5c). Further comparisons of neural-density estimation approaches with ABC-methods can be found in the studies describing the underlying machine-learning methodologies (Papamakarios and Murray, 2016; Lueckmann et al., 2019; Greenberg et al., 2019).

Ion channel models

Request a detailed protocol

We simulated non-inactivating potassium channel currents subject to voltage-clamp protocols as:

IK=g¯Km(VEK),

where V is the membrane potential, g¯K is the density of potassium channels, EK is the reversal potential of potassium, and m is the gating variable for potassium channel activation. m is modeled according to the first-order kinetic equation

dmdt=m(V)mτm(V),

where m(V) is the steady-state activation, and τm(V) the respective time constant. We used a general formulation of m(V) and τm(V) (Destexhe and Huguenard, 2000), where the steady-state activation curve has two parameters (slope and offset) and the time constant curve has six parameters, amounting to a total of 8 parameters (θ1 to θ8): 

m(V)=11+eθ1V+θ2τm(V)=θ4e[θ5(Vθ3)+θ6(Vθ3)2]+e[θ7(Vθ3)+θ8(Vθ3)2].

Since this model can be used to describe the dynamics of a wide variety of channel models, we refer to it as Omnimodel.

We modeled responses of the Omnimodel to a set of five noisy voltage-clamp protocols (Podlaski et al., 2017): as described in Podlaski et al., 2017, the original voltage-clamp protocols correspond to standard protocols of activation, inactivation, deactivation, ramp and action potential, to which we added Gaussian noise with zero mean and standard deviation 0.5 mV. Current responses were reduced to 55 summary features (11 per protocol). Summary features were coefficients to basis functions derived via Principal Components Analysis (PCA) (10 per protocol) plus a linear offset (one per protocol) found via least-squares fitting. PCA basis functions were found by simulating responses of the non-inactivating potassium channel models to the five voltage-clamp protocols and reducing responses to each protocol to 10 dimensions (explaining 99.9% of the variance).

To amortize inference on the model, we specified a wide uniform prior over the parameters: θ1𝒰(0,1),θ2𝒰(10.,10.), θ3𝒰(120.,120.),θ4𝒰(0.,2000), θ5𝒰(0.,0.5),θ6𝒰(0,0.05), θ7𝒰(0.,0.5),θ8𝒰(0,0.05).

For inference, we trained a shared inference network in a single round of 106 simulations generated by sampling from the prior (𝜽8,𝐱55). The density estimator was a masked autoregressive flow (MAF) (Papamakarios et al., 2017) with five MADES with [250,250] hidden units each.

We evaluated performance on 350 non-inactivating potassium ion channels selected from IonChannelGenealogy (ICG) by calculating the correlation coefficient between traces generated by the original model and traces from the Omnimodel using the posterior mode (Appendix 1—figure 7).

Single-compartment Hodgkin–Huxley neurons

Request a detailed protocol

We simulated a single-compartment Hodgkin–Huxley type neuron with channel kinetics as in Pospischil et al., 2008,

CmdVdt=gl(ElV)+g¯Nam3h(ENaV)+g¯Kn4(EKV)+g¯Mp(EKV)+Iinj+ση(t)dqdt=q(V)qτq(V),q{m,h,n,p},

where V is the membrane potential, Cm is the membrane capacitance, gl is the leak conductance, El is the membrane reversal potential, g¯c is the density of channels of type c (Na+, K+, M), Ec is the reversal potential of c, (m, h, n, p) are the respective channel gating kinetic variables, and ση(t) is the intrinsic neural Gaussian noise. The right hand side of the voltage dynamics is composed of a leak current, a voltage-dependent Na+ current, a delayed-rectifier K+ current, a slow voltage-dependent K+ current responsible for spike-frequency adaptation, and an injected current Iinj. Channel gating variables q have dynamics fully characterized by the neuron membrane potential V, given the respective steady-state q(V) and time constant τq(V) (details in Pospischil et al., 2008). Two additional parameters are implicit in the functions q(V) and τq(V): VT adjusts the spike threshold through m, h, n, τm, τh and τn; τmax scales the time constant of adaptation through τp(V) (details in Pospischil et al., 2008). We set ENa=53 mV and EK=-107 mV, similar to the values used for simulations in Allen Cell Types Database (http://help.brain-map.org/download/attachments/8323525/BiophysModelPeri.pdf).

We applied SNPE to infer the posterior over eight parameters (g¯Na, g¯K, gl, g¯M, τmax, VT, σ, El), given seven voltage features (number of spikes, mean resting potential, standard deviation of the resting potential, and the first four voltage moments, mean, standard deviation, skewness and kurtosis).

The prior distribution over the parameters was uniform,

𝜽𝒰(plow,phigh),

where plow=[0.5,10-4,10-4,10-4,50,40,10-4,35] and phigh=[80,15,0.6,0.6,3000,90,0.15,100]. These ranges are similar to the ones obtained in Pospischil et al., 2008, when fitting the above model to a set of electrophysiological recordings.

For inference in simulated data, we used a single round of 100,000 simulations (θR8,xR7). The density estimator was a masked autoregressive flow (MAF) (Papamakarios et al., 2017) with five MADES with [50,50] hidden units each.

For the inference on in vitro recordings from mouse cortex (Allen Cell Types Database, https://celltypes.brain-map.org/data), we selected eight recordings corresponding to spiny neurons with at least 10 spikes during the current-clamp stimulation. The respective cell identities and sweeps are: (518290966,57), (509881736,39), (566517779,46), (567399060,38), (569469018,44), (532571720,42), (555060623,34), (534524026,29). For each recording, SNPE-B was run for two rounds with 125,000 Hodgkin–Huxley simulations each, and the posterior was approximated by a mixture of two Gaussians. In this case, the density estimator was composed of two fully connected layers of 100 units each.

Comparison with genetic algorithm

Request a detailed protocol

We compared SNPE posterior with a state-of-the-art genetic algorithm (Indicator Based Evolutionary Algorithm IBEA, [Bleuler et al., 2003; Zitzler and Künzli, 2004] from the BluePyOpt package [Van Geit et al., 2016]), in the context of the Hodgkin-Huxley model with 8 parameters and seven features (Appendix 1—figure 10). For each Hodgkin-Huxley model simulation i and summary feature j, we used the following objective score:

ϵij=|xij-xojσj|,j=1,,7,

where xij is the value of summary feature j for simulation i, xoj is the observed summary feature j, and σj is the standard deviation of the summary feature j computed across 1000 previously simulated datasets. IBEA outputs the hall-of-fame, which corresponds to the 10 parameter sets with the lowest sum of objectives j7ϵij. We ran IBEA with 100 generations and an offspring size of 1000 individuals, corresponding to a total of 100,000 simulations.

Circuit model of the crustacean stomatogastric ganglion

Request a detailed protocol

We used extracellular nerve recordings made from the stomatogastric motor neurons that principally comprise the triphasic pyloric rhythm in the crab Cancer borealis (Haddad and Marder, 2018). The preparations were decentralized, that is, the axons of the descending modulatory inputs were severed. The data was recorded at a temperature of 11°C. See Haddad and Marder, 2018 for full experimental details.

We simulated the circuit model of the crustacean stomatogastric ganglion by adapting a model described in Prinz et al., 2004. The model is composed of three single-compartment neurons, AB/PD, LP, and PD, where the electrically coupled AB and PD neurons are modeled as a single neuron. Each of the model neurons contains eight currents, a Na+ current INa, a fast and a slow transient Ca2+ current ICaT and ICaS, a transient K+ current IA, a Ca2+-dependent K+ current IKCa, a delayed rectifier K+ current IKd, a hyperpolarization-activated inward current IH, and a leak current Ileak. In addition, the model contains seven synapses. As in Prinz et al., 2004, these synapses were simulated using a standard model of synaptic dynamics (Abbott and Marder, 1998). The synaptic input current into the neurons is given by Is=gss(VpostEs), where gs is the maximal synapse conductance, Vpost the membrane potential of the postsynaptic neuron, and Es the reversal potential of the synapse. The evolution of the activation variable s is given by

dsdt=s¯(Vpre)-sτs

with

s¯(Vpre)=11+exp((Vth-Vpre)/δ)andτs=1-s¯(Vpre)k-.

Here, Vpre is the membrane potential of the presynaptic neuron, Vth is the half-activation voltage of the synapse, δ sets the slope of the activation curve, and k- is the rate constant for transmitter-receptor dissociation rate.

As in Prinz et al., 2004, two types of synapses were modeled since AB, LP, and PY are glutamatergic neurons whereas PD is cholinergic. We set Es=-70 mV and k-=1/40 ms for all glutamatergic synapses and Es=-80 mV and k-=1/100 ms for all cholinergic synapses. For both synapse types, we set Vth=-35 mV and δ=5 mV.

For each set of membrane and synaptic conductances, we numerically simulated the rhythm for 10 s with a step size of 0.025 ms. At each time step, each neuron received Gaussian noise with mean zero and standard deviation 0.001 mV.ms−0.5.

We applied SNPE to infer the posterior over 24 membrane parameters and 7 synaptic parameters, that is, 31 parameters in total. The seven synaptic parameters were the maximal conductances gs of all synapses in the circuit, each of which was varied uniformly in logarithmic domain from 0.01 nS to 1000 nS, with the exception of the synapse from AB to LP, which was varied uniformly in logarithmic domain from 0.01 nS to 10000 nS. The membrane parameters were the maximal membrane conductances for each of the neurons. The membrane conductances were varied over an extended range of previously reported values (Prinz et al., 2004), which led us to the uniform prior bounds plow=[0,0,0,0,0,25,0,0] mS cm-2 and phigh=[500,7.5,8,60,15,150,0.2,0.01] mS cm-2 for the maximal membrane conductances of the AB neuron, plow=[0,0,2,10,0,0,0,0.01] mS cm-2 and phigh=[200,2.5,12,60,10,125,0.06,0.04] mS cm-2 for the maximal membrane conductances of the LP neuron, and plow=[0,0,0,30,0,50,0,0] mS cm-2 and phigh=[600,12.5,4,60,5,150,0.06,0.04] mS cm-2 for the maximal membrane conductances of the PY neuron. The order of the membrane currents was: [Na, CaT, CaS, A, KCa, Kd, H, leak].

We used the 15 summary features proposed by Prinz et al., 2004, and extended them by three additional features. The features proposed by Prinz et al., 2004 are 15 salient features of the pyloric rhythm, namely: cycle period T (s), AB/PD burst duration dABb (s), LP burst duration dLPb (s), PY burst duration dPYb (s), gap AB/PD end to LP start ΔtAB-LPes (s), gap LP end to PY start ΔtLP-PYes (s), delay AB/PD start to LP start ΔtAB-LPss (s), delay LP start to PY start ΔtLP-PYss (s), AB/PD duty cycle dAB, LP duty cycle dLP, PY duty cycle dPY, phase gap AB/PD end to LP start ΔϕAB-LP, phase gap LP end to PY start ΔϕLP-PY, LP start phase ϕLP, and PY start phase ϕPY. Note that several of these values are only defined if each neuron produces rhythmic bursting behavior. In addition, for each of the three neurons, we used one feature that describes the maximal duration of its voltage being above −30 mV. We did this as we observed plateaus at around −10 mV during the onset of bursts, and wanted to distinguish such traces from others. If the maximal duration was below 5 ms, we set this feature to 5 ms. To extract the summary features from the observed experimental data, we first found spikes by searching for local maxima above a hand-picked voltage threshold, and then extracted the 15 above described features. We set the additional 3 features to 5 ms.

We used SNPE to infer the posterior distribution over the 18 summary features from experimental data. For inference, we used a single round with 18.5 million samples, out of which 174,000 samples contain bursts in all neurons. We therefore used these 174,000 samples with well defined summary features for training the inference network (𝜽31,𝐱18). The density estimator was a masked autoregressive flow (MAF) (Papamakarios et al., 2017) with five MADES with [100,100,100] hidden units each. The synaptic conductances were transformed into logarithmic space before training and for the entire analysis.

Previous approaches for fitting the STG circuit (Prinz et al., 2004) first fit individual neuron features and reduce the number of possible neuron models (Prinz et al., 2003), and then fit the whole circuit model. While powerful, this approach both requires the availability of single-neuron data, and cannot give access to potential compensation mechanisms between single-neuron and synaptic parameters. Unlike Prinz et al., 2004, we apply SNPE to directly identify the full 31 dimensional parameter space without requiring experimental measurements of each individual neuron in the circuit. Despite the high-dimensional parameter space, SNPE can identify the posterior distribution using 18 million samples, whereas a direct application of a full-grid method would require 4.651021 samples to fill the 31 dimensional parameter space on a grid with five values per dimension.

Finding paths in the posterior

Request a detailed protocol

In order to find directions of robust network output, we searched for a path of high posterior probability. First, as in Prinz et al., 2004, we aimed to find two similar model outputs with disparate parameters. To do so, we sampled from the posterior and searched for two parameter sets whose summary features were within 0.1 standard deviations of all 174,000 samples from the observed experimental data, but that had strongly disparate parameters from each other. In the following, we denote the obtained parameter sets by 𝜽s and 𝜽g.

Second, in order to identify whether network output can be maintained along a continuous path between these two samples, we searched for a connection in parameter space lying in regions of high posterior probability. To do so, we considered the connection between the samples as a path and minimized the following path integral:

(2) (γ)=01log(pθ|x(γ(s)|xo))γ˙(s)ds.

To minimize this term, we parameterized the path γ(s) using sinusoidal basis-functions with coefficients αn,k:

γ(s)=[k=1Kα1,ksin(πks)k=1KαN,ksin(πks)]+[k=K+12Kα1,ksin2(πks)k=K+12KαN,ksin2(πks)]+(1s)θs+sθg

These basis functions are defined such that, for any coefficients αn,k, the start and end points of the path are exactly the two parameter sets defined above:

γ(0)=𝜽s γ(1)=𝜽g

With this formulation, we have framed the problem of finding the path as an unconstrained optimization problem over the parameters αn,k. We can therefore minimize the path integral using gradient descent over αn,k. For numerical simulations, we approximated the integral in Equation 2 as a sum over 80 points along the path and use two basis functions for each of the 31 dimensions, that is, K=2.

In order to demonstrate the sensitivity of the pyloric network, we aimed to find a path along which the circuit output quickly breaks down. For this, we picked a starting point along the high-probability path and then minimized the posterior probability. In addition, we enforced that the orthogonal path lies within an orthogonal disk to the high-probability path, leading to the following constrained optimization problem:

min𝜽log(p(𝜽|𝐱)) s.t.nTΔ𝜽=0

where n is the tangent vector along the path of high probability. This optimization problem can be solved using the gradient projection method (Rosen, 1960):

Δ𝜽=-P(log(p(𝜽|𝐱)))(log(p(𝜽|𝐱)))TP(log(p(𝜽|𝐱)))

with projection matrix P=𝟙-1nTnnnT and 𝟙 indicating the identity matrix. Each gradient update is a step along the orthogonal path. We let the optimization run until the distance along the path is 1/27 of the distance along the high-probability path.

Identifying conditional correlations

Request a detailed protocol

In order to investigate compensation mechanisms in the STG, we compared marginal and conditional correlations. For the marginal correlation matrix in Figure 6b, we calculated the Pearson correlation coefficient based on 1.26 million samples from the posterior distribution p(𝜽|𝐱). To find the two-dimensional conditional distribution for any pair of parameters, we fixed all other parameters to values taken from an arbitrary posterior sample, and varied the remaining two on an evenly spaced grid with 50 points along each dimension, covering the entire prior space. We evaluated the posterior distribution at every value on this grid. We then calculated the conditional correlation as the Pearson correlation coefficient over this distribution. For the 1-dimensional conditional distribution, we varied only one parameter and kept all others fixed. Lastly, in Figure 6d, we sampled 500 parameter sets from the posterior, computed the respective conditional posteriors and conditional correlation matrices, and took the average over the conditional correlation matrices.

Appendix 1

Appendix 1—figure 1
Comparison between SNPE-estimated posterior and reference posterior (obtained via MCMC) on LN model.

(a) Posterior mean ± one standard deviation of temporal filter (receptive field) from SNPE posterior (SNPE, blue) and reference posterior (MCMC, yellow). (b) Full covariance matrices from SNPE posterior (left) and reference (MCMC, right).

Appendix 1—figure 2
Full posterior for LN model.

In green, ground-truth parameters. Marginals (blue lines) and 2D marginals for SNPE (contour lines correspond to 95% of the mass) and MCMC (yellow histograms).

Appendix 1—figure 3
LN model with additional data.

With additional data, posterior samples cluster more tightly around the true filter. From left to right and top to bottom, SNPE (blue) and MCMC (yellow, for reference) are applied to observations with more independent Bernoulli trials, leading to progressively tighter posteriors and posterior samples closer to the true filter (which is the same across panels). Mean ± one standard deviation is shown. Note that SNPE closely agrees with the MCMC reference solution in all cases (a-d).

Appendix 1—figure 4
Full posterior for Gabor GLM receptive field model.

SNPE posterior estimate (blue lines) compared to reference posterior (MCMC, histograms). Ground-truth parameters used to simulate the data in green. We depict the distributions over the original receptive field parameters, whereas we estimate the posterior as a Gaussian mixture over transformed parameters, see Materials and methods for details. We find that a (back-transformed) Gaussian mixture with four components approximates the posterior well in this case.

Appendix 1—figure 5
SMC-ABC posterior estimate for Gabor GLM receptive field model.

(a) Spike-triggered averages (STAs) and spike counts with closest distance to the observed data out of 10000 simulations with parameters sampled from the prior. Spike counts are comparable to the observed data (299 spikes), but receptive fields (contours) are not well constrained. (b) Results for SMC-ABC with a total of 106 simulations. Histograms of 1000 particles (orange) returned in the final iteration of SMC-ABC, compared to prior (red contour lines) and ground-truth parameters (green). Distributions over (log-/logit-)transformed parameters, axis limits scaled to mean ± 3 standard deviations of the prior. (c) Correlations between ground-truth receptive field and receptive fields sampled from SMC-ABC posterior (orange), SNPE posterior (blue), reference MCMC posterior (yellow) and prior (red). The SNPE-estimated receptive fields are almost as good as those of the reference posterior, the SMC-ABC estimated ones no better than the prior.

Appendix 1—figure 6
Full posterior for Gabor LN receptive field model on V1 recordings.

We depict the distributions over the receptive field parameters, derived from the Gaussian mixture over transformed-parameters (see Materials and methods for details).

Appendix 1—figure 7
Summary results on 350 ICG channel models, and comparison with direct fits.

We generate predictions either with the posterior mode (blue) or with parameters obtained by directly fitting steady-state activation and time-constant curves (yellow). We calculate the correlation coefficient (CC) between observation and prediction. The distribution of CCs is similar for both approaches.

Appendix 1—figure 8
Full posteriors for Hodgkin-Huxley model for 1, 4, and 7 features.

Images show the pairwise marginals for 7 features. Each contour line corresponds to 68% density mass for a different inferred posterior. Light blue corresponds to 1 feature and dark blue to 7 features. Ground truth parameters in green.

Appendix 1—figure 9
Full posteriors for Hodgkin-Huxley model on 8 different recordings from Allen Cell Type Database.

Images show the pairwise marginals for 7 features. Each contour line corresponds to 68% density mass for a different inferred posterior.

Appendix 1—figure 10
Comparison between SNPE posterior and IBEA samples for Hodgkin-Huxley model with 8 parameters and 7 features.

(a) Full SNPE posterior distribution. Ground truth parameters in green and IBEA 10 parameters with highest fitness (‘hall-of-fame’) in orange. (b) Blue contour line corresponds to 68% density mass for SNPE posterior. Light orange corresponds to IBEA sampled parameters with lowest IBEA fitness and dark orange to IBEA sampled parameters with highest IBEA fitness. This plot shows that, in general, SNPE and IBEA can return very different answers– this is not surprising, as both algorithms have different objectives, but this highlights that genetic algorithms do not in general perform statistical inference. (c) Traces for samples with high probability under SNPE posterior (purple), and for samples with high fitness under IBEA objective (hall-of-fame; orange traces). (d) Features for the desired output (observation), the mode of the inferred posterior (purple) and the best sample under IBEA objective (orange). Each voltage feature is normalized by σf \! PRIOR, the standard deviation of the respective feature of simulations sampled from the prior.

Appendix 1—figure 11
Full posterior for the stomatogastric ganglion over 24 membrane and 7 synaptic conductances.

The first 24 dimensions depict membrane conductances (top left), the last 7 depict synaptic conductances (bottom right). All synaptic conductances are logarithmically spaced. Between two samples from the posterior with high posterior probability (purple dots), there is a path of high posterior probability (white).

Appendix 1—figure 12
Identifying directions of sloppiness and stiffness in the pyloric network of the crustacean stomatogastric ganglion.

(a) Minimal and maximal values of all summary statistics along the path lying in regions of high posterior probability, sampled at 20 evenly spaced points. Summary statistics change only little. The summary statistics are z-scored with the mean and standard deviation of 170,000 bursting samples in the created dataset. (b) Summary statistics sampled at 20 evenly spaced points along the orthogonal path. The summary statistics show stronger changes than in panel a and, in particular, often could not be defined because neurons bursted irregularly, as indicated by an ‘x’ above barplots. (c) Minimal and maximal values of the circuit parameters along the path lying in regions of high posterior probability. Both membrane conductances (left) and synaptic conductances (right) vary over large ranges. Axes as in panel (d). (d) Circuit parameters along the orthogonal path. The difference between the minimal and maximal value is much smaller than in panel (c).

Appendix 1—figure 13
Posterior probability along high probability and orthogonal path.

Along the path that was optimized to lie in regions of high posterior probability (purple), the posterior probability remains relatively constant. Along the orthogonal path (pink), optimized to quickly reduce posterior probability, the probability quickly drops. The start and end points as well as the points labeled 1 and 2 correspond to the points shown in Figure 5c.

Appendix 1—figure 14
Evaluating circuit configurations in which parameters have been sampled independently.

(a) Factorized posterior, that is, posterior obtained by sampling each parameter independently from the associated marginals. Many of the pairwise marginals look similar to the full posterior shown in Appendix 1—figure 11, as the posterior correlations are low. (b) Samples from the factorized posterior– only a minority of these samples produce pyloric activity, highlighting the significance of the posterior correlations between parameters. (c) Left: summary features for 500 samples from the posterior. Boxplot for samples where all summary features are well-defined (80% of all samples). Right: summary features for 500 samples from the factorized posterior. Only 23% of these samples have well-defined summary features. The summary features from the factorized posterior have higher variation than the posterior ones. Summary features are z-scored using the mean and standard deviation of all samples in our training dataset obtained from prior samples. The boxplots indicate the maximum, 75% quantile, median, 25% quantile, and minimum. The green ‘x’ indicates the value of the experimental data (the observation, shown in Figure 5b).

References

  1. 1
    Modeling Small Networks
    1. L Abbott
    2. E Marder
    (1998)
    MIT Press.
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
    Approximate bayesian computation in population genetics
    1. MA Beaumont
    2. W Zhang
    3. DJ Balding
    (2002)
    Genetics 162:2025–2035.
  9. 9
  10. 10
  11. 11
    Mixture Density Networks, Technical Report
    1. CM Bishop
    (1994)
    Aston University.
  12. 12
  13. 13
    Pisa'a platform and programming language independent interface for search algorithms
    1. S Bleuler
    2. M Laumanns
    3. L Thiele
    4. E Zitzler
    (2003)
    International Conference on Evolutionary Multi-Criterion Optimization. pp. 494–508.
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
    Improvements to inference compilation for probabilistic programming in large-scale scientific simulators
    1. ML Casado
    2. AG Baydin
    3. DM Rubio
    4. TA Le
    5. F Wood
    6. L Heinrich
    7. G Louppe
    8. K Cranmer
    9. K Ng
    10. W Bhimji
    (2017)
    NeurIPS Workshop on Deep Learning for Physical Sciences.
  21. 21
    A likelihood-free inference framework for population genetic data using exchangeable neural networks
    1. J Chan
    2. V Perrone
    3. J Spence
    4. P Jenkins
    5. S Mathieson
    6. Y Song
    (2018)
    Advances in Neural Information Processing Systems. pp. 8594–8605.
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
    Sequential neural methods for likelihood-free inference
    1. C Durkan
    2. G Papamakarios
    3. I Murray
    (2018)
    NeurIPS Bayesian Deep Learning Workshop.
  34. 34
    On contrastive learning for likelihood-free inference
    1. C Durkan
    2. I Murray
    3. G Papamakarios
    (2020)
    International Conference on Machine Learning.
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
    Automatic posterior transformation for likelihood-free inference
    1. D Greenberg
    2. M Nonnenmacher
    3. J Macke
    (2019)
    International Conference on Machine Learning. pp. 2404–2414.
  46. 46
  47. 47
  48. 48
    Bayesian optimization for likelihood-free inference of simulator-based statistical models
    1. MU Gutmann
    2. J Corander
    (2016)
    The Journal of Machine Learning Research 17:4256–4302.
  49. 49
  50. 50
  51. 51
    Likelihood-free mcmc with approximate likelihood ratios
    1. J Hermans
    2. V Begy
    3. G Louppe
    (2020)
    International Conference on Machine Learning.
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
    Adam: a method for stochastic optimization
    1. DP Kingma
    2. J Ba
    (2014)
    International Conference on Learning Representations.
  61. 61
  62. 62
    Efficient bayesian experimental design for implicit models
    1. S Kleinegesse
    2. MU Gutmann
    (2019)
    The 22nd International Conference on Artificial Intelligence and Statistics. pp. 476–485.
  63. 63
    Imagenet classification with deep convolutional neural networks
    1. A Krizhevsky
    2. I Sutskever
    3. GE Hinton
    (2012)
    Advances in Neural Information Processing Systems. pp. 1097–1105.
  64. 64
  65. 65
  66. 66
  67. 67
    Inference compilation and universal probabilistic programming
    1. TA Le
    2. AG Baydin
    3. F Wood
    (2017b)
    Artificial Intelligence and Statistics. pp. 1338–1348.
  68. 68
  69. 69
  70. 70
    Maximum entropy flow networks
    1. G Loaiza-Ganem
    2. Y Gao
    3. JP Cunningham
    (2017)
    5th International Conference on Learning Representations, ICLR.
  71. 71
    Flexible statistical inference for mechanistic models of neural dynamics
    1. J-M Lueckmann
    2. PJ Goncalves
    3. G Bassetto
    4. K Öcal
    5. M Nonnenmacher
    6. JH Macke
    (2017)
    Advances in Neural Information Processing Systems. pp. 1289–1299.
  72. 72
    Likelihood-free inference with emulator networks
    1. J-M Lueckmann
    2. G Bassetto
    3. T Karaletsos
    4. JH Macke
    (2019)
    Proceedings of the 1st Symposium on Advances in Approximate Bayesian Inference, Volume 96 of Proceedings of Machine Learning Research. pp. 32–53.
  73. 73
  74. 74
    Empirical models of spiking in neural populations
    1. JH Macke
    2. L Buesing
    3. JP Cunningham
    4. BM Yu
    5. KV Shenoy
    6. M Sahani
    (2011)
    Advances in Neural Information Processing Systems. pp. 1350–1358.
  75. 75
  76. 76
  77. 77
    Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics
    1. N Maheswaranathan
    2. A Williams
    3. MD Golub
    4. S Ganguli
    5. D Sussillo
    (2019)
    Advances in Neural Information Processing Systems. pp. 15696–15705.
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
    Gps-abc: gaussian process surrogate approximate bayesian computation
    1. E Meeds
    2. M Welling
    (2014)
    Conference on Uncertainty in Artificial Intelligence.
  86. 86
  87. 87
  88. 88
  89. 89
  90. 90
  91. 91
  92. 92
  93. 93
  94. 94
  95. 95
    Masked autoregressive flow for density estimation
    1. G Papamakarios
    2. T Pavlakou
    3. I Murray
    (2017)
     Advances in Neural Information Processing Systems. pp. 2338–2347.
  96. 96
    Sequential neural likelihood: fast likelihood-free inference with autoregressive flows
    1. G Papamakarios
    2. D Sterratt
    3. I Murray
    (2019a)
    The 22nd International Conference on Artificial Intelligence and Statistics. pp. 837–848.
  97. 97
  98. 98
    Fast ε-free inference of simulation models with bayesian conditional density estimation
    1. G Papamakarios
    2. I Murray
    (2016)
    Advances in Neural Information Processing Systems. pp. 1028–1036.
  99. 99
  100. 100
  101. 101
  102. 102
    Fully bayesian inference for neural models with negative-binomial spiking
    1. JW Pillow
    2. J Scott
    (2012)
    Advances in Neural Information Processing Systems. pp. 1898–1906.
  103. 103
  104. 104
  105. 105
  106. 106
  107. 107
  108. 108
  109. 109
  110. 110
  111. 111
  112. 112
  113. 113
  114. 114
    Variational inference with normalizing flows
    1. DJ Rezende
    2. S Mohamed
    (2015)
    Proceedings of the 32nd International Conference on International Conference on Machine Learning. pp. 1530–1538.
  115. 115
  116. 116
  117. 117
  118. 118
  119. 119
  120. 120
  121. 121
    Very deep convolutional networks for large-scale image recognition
    1. K Simonyan
    2. A Zisserman
    (2015)
    International Conference on Learning Representations.
  122. 122
  123. 123
    Fast amortized inference of neural activity from calcium imaging data with variational autoencoders
    1. A Speiser
    2. J Yan
    3. EW Archer
    4. L Buesing
    5. SC Turaga
    6. JH Macke
    (2017)
    Advances in Neural Information Processing Systems. pp. 4024–4034.
  124. 124
  125. 125
  126. 126
  127. 127
  128. 128
  129. 129
  130. 130
  131. 131
  132. 132
  133. 133
  134. 134
  135. 135
  136. 136
  137. 137
  138. 138
  139. 139
  140. 140
    Faithful inversion of generative models for effective amortized inference
    1. S Webb
    2. A Golinski
    3. R Zinkov
    4. S Narayanaswamy
    5. T Rainforth
    6. YW Teh
    7. F Wood
    (2018)
    Advances in Neural Information Processing Systems. pp. 3070–3080.
  141. 141
    Accelerating abc methods using gaussian processes
    1. R Wilkinson
    (2014)
    AISTATS.
  142. 142
  143. 143
  144. 144
    Indicator-based selection in multiobjective search
    1. E Zitzler
    2. S Künzli
    (2004)
    International Conference on Parallel Problem Solving From Nature. pp. 832–842.

Decision letter

  1. John R Huguenard
    Senior Editor; Stanford University School of Medicine, United States
  2. Timothy O'Leary
    Reviewing Editor; University of Cambridge, United Kingdom
  3. Mark S Goldman
    Reviewer; University of California at Davis, United States

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

The problem of relating dynamical models to data pervades science and is particularly challenging in neuroscience due to high uncertainty in model parameters, model structure and inherent biological variability. This study presents an approach to performing model parameter inference based on large scale simulation and deep neural networks that permits efficient querying of model behaviour and underlying parameter distribution once simulations have been performed. The usefulness is demonstrated on a number of challenging and relevant models.

Decision letter after peer review:

Thank you for submitting your article "Training deep neural density estimators to identify mechanistic models of neural dynamics" for consideration by eLife. Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by a Reviewing Editor and John Huguenard as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Mark S. Goldman (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.

This is a generally well written paper showcasing methodology for inferring model parameter distributions that are compatible with specified behaviours. The methodology uses recent advances in machine learning to approximate a likelihood function over model parameters that can be reused to perform inference. The method is applied to a number of informative examples. Benchmarks provide a favourable comparison against some existing methods (approximate bayesian computation/sampling) and the required computational resources are described.

Some of the reviewers found the methodology hard to follow in places; certain details were lacking. Although the authors pointed out limitations, there were also queries about potential failure modes of this methodology and whether there were methodological heuristics that are non-obvious but important, such as a means of assessing how many samples might be required for a particular model.

The reviewers agreed that a brief revision addressing specific points on methodological clarity (outlined in the reviewer's comments) and commenting on these other points would be worthwhile.

Reviewer #1:

General Assessment: This paper introduces a machine learning technique which generalizes Bayesian approaches to mechanistic models which are complex, multiparameter, and which do not necessarily have probability distributions as their key outputs. This is an interesting and likely productive direction to investigate. They claim their method is scalable and adaptable and the paper applies it to several problems of increasing complexity. However, I am unable to judge their method, because almost no details are given. This is the case even for very general questions such as what they use in place of the likelihood function for Bayes rule, and what they mean by a trained density estimator. The method in this paper may or may not be worthy of publication, but – since it is a method paper – I think the paper would need to be rewritten in a manner that readers can understand both in generality and in detail what their method accomplishes.

Review of Goncalves et al.:

The authors introduce a machine learning technique for finding ensembles of parameter values of mechanistic models which are consistent with experimental data.

The core of their method is the “Sequential Neural Posterior Estimation” (SNPE) which takes in a mechanistic model, a (thing which resembles a) prior and summary data, does some mysterious machine-learning processing and spits out something like a posterior probability distribution.

I agree with the authors that large unwieldy mechanistic models likely require machine learning to find parameter ensembles. I also agree that it should be possible to develop Bayesian like approaches for models which are not naturally probabilistic, a problem they pose nicely, and which I have come across but not quite realized was of such importance. I also find their results on the gastric circuit interesting- particularly the finding that the many disparate parameter values that can all fit data seem to lie on a single plateau, rather than on distinct minima

However, I don't really know where to begin in evaluating SNPE. It feels as if there is a section missing- the Introduction talks in generalities about large mechanistic models and the need to address their difficulties, and then it jumps to discussing the particular applications of SNPE.

I don't know what the authors mean by posterior and prior, nor what they mean when they say their method works without a likelihood.

There is muddling of what their output is and what they do to achieve it, and I don't find either sufficiently explained. For example, machine learning could be used to approximate a posterior in the usual Bayesian sense. Still I would want to know what prior they are using what their likelihood function is and what posterior they are approximating, even if there is a black box in the middle. I don't understand what their p(θ|x) is an approximation to.

Reviewer #2:

General assessment

This is a really clear and well written paper with practical solutions for an important problem in theoretical neuroscience. In addition, the authors provide high quality code and documentation making it fairly straightforward for other groups to make use of these tools. I would expect that these tools would become widely adopted in the community. Indeed, I would love to have had this for many previous modelling studies and will certainly use it in the future.

Major comments

1) My main issue is that applying this method depends on how well the deep network does its job of estimating the density (which depends partly on the user specifying the correct architecture and parameters). The authors do discuss this under limitations, but it might be interesting to see what a failure would look like here and what are the scientific consequences of such a failure. I suspect the answer might partly be that you should always verify the resulting posterior distributions, but how practical is this in a 31-dimensional setting, for example? When the authors were working on the examples for this manuscript, how much work did it take to make sure that there were no failures (assuming they verified this) and how did they detect them? (For example, in subsection “Functional diversity of ion channels: efficient high-throughput inference” it's implied that the only failures in this example were because the Omnimodel wasn't able to fit some data.)

2) Would it be possible to use SNPE to automatically suggest the optimal experiment that would give the narrowest posterior distribution over parameters? This might already be what was suggested when citing papers that "automatically construct informative summary features", I wasn't entirely sure.

Reviewer #3:

This paper addresses a major issue in fitting neuroscience models: how to identify the often degenerate, or nearly degenerate, set of parameters that can underlie a set of experimental observations. Whereas previous techniques often depended upon brute force explorations or special parametric forms or local linearizations to generate sets of parameters consistent with measured properties, the authors take advantage of advances in deep networks for use in estimating probability densities to address this problem. Overall, I think this paper and the complementary submission have the potential to be transformative contributions to model fitting efforts in neuroscience, and the ability of this work to handle cases that are not continuous is a plus. However, I think the paper needs to clarify/quantify how much of an advance this is over brute force methods (plus simple post-hoc analyses on the simulated database), given the heavy brute force aspect of generating the training set, and also whether there are biases in the method. Also, the writing could benefit from more focus on methodology (rather than on examples) to ease adoption of the method.

Substantive concerns:

1) In the example of Figure 1, the Appendix 1—figure 1A (which should appear in the main figure rather than the supplement) appears to show that the SNPE's mean temporal filter is considerably off from the true temporal value. Why is this the case? More generally, this suggests that more “ground truth” examples with direct side-by-side comparisons should be provided to assure that the estimates provided by this method are not significantly biased. For example, Figure 5 is wonderful in showing a sloppy/insensitive direction in parameter space that has sharp curvature. Could one do something similar to this and directly compare (for an example with smaller number of parameters, e.g. 4 or 5) the estimated probability distribution to a “ground truth” generated through a sufficiently dense grid search, for different numbers of training samples.

2) The authors need to clarify, quantitatively, how much of a benefit this is over simple post-hoc analysis of results of brute force methods that just randomly (uniformly) sample parameters, especially given that there is a lot of brute force work in this method as well in generating the samples for the deep network training. For such brute force methods (e.g. rather than a grid, some studies just uniformly randomly generate parameters to use in the simulations), one can also just post-hoc fit the results to a mixture of Gaussians or similar classical density estimator so that other values can be interpolated from the obtained data. How much is the deep network approach needed? My guess is that it does help, potentially a huge amount, but the authors need to show this in a carefully controlled comparison.

3) Related to #1, given that this paper is primarily a methodological advance, I think the authors could spend more time laying out the essence of the basic algorithm in the beginning of the Results. For example, it's not clear in Figure 1 exactly what is being input and output from the neural density estimator deep network. More generally, it's worth describing more about the algorithm, perhaps in less technical language, in the main results.

Within the Materials and methods, since the authors indicated some playing around with parameters was needed, it would help a lot for the adoption of this method by others to have more explanation of different features of the method and more intuition behind choices. For example, what is the loss function used after the initial round? Are there any basic constraints on what shape the deep network should have (e.g. should it fan out in the middle layers, fan in, or be approximately uniform in width)? What is the basic idea behind a masked autoregressive flow (MAF) and when should that be chosen versus the mixture of Gaussians model? The algorithm's description could also be expanded (or a short version could be in the main and expanded version in the Materials and methods).

4) Is there some way to ballpark estimate, or post-hoc characterize, how many samples may be needed? Seemingly, the fact that this works with less than one sample per dimension says something about the geometry of the solution space and its effective dimensionality. (If this seems too hard, the authors can punt on it).

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your article "Training deep neural density estimators to identify mechanistic models of neural dynamics" for consideration by eLife. Your revised article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and John Huguenard as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Mark S Goldman (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.

Summary

This is methodology is of potentially wide interest, and although the authors have made efforts to explain the interpretation, applicability and caveats of the methodology, one reviewer feels that the manuscript could benefit from further clarification on:

1) How one should interpret “posterior probability density” in cases where there is a deterministic relationship between parameters and behaviour, e.g. in extreme cases where there might be a single solution.

2) The fairness of comparisons with ABC/sampling – are these alternatives being compared in an even-handed way?

3) It may be helpful to have a very explicit example of the results SNPE returns on a problem where “ground truth” of feasible parameters are known. This could be a simple model dynamical system, used for illustration.

Full reviewer comments are included below, for reference.

Reviewer #2:

I'm very happy with the reviewers response to my previous review, and congratulate them on a great paper!

Reviewer #3:

The authors have clarified the methods in a way that I think should be helpful, and I remain optimistic that this will be a very valuable technique for the field. However, I think some of the other concerns raised about rigorously demonstrating the technique, and clarifying the method conceptually were not addressed. I will try to re-state these concerns in a broader scope that I hope strengthens the paper, clarifies potential issues, and gets at some key conceptual and rigor issues that need to be addressed

1) Conceptual question: The manuscript and reply heavily emphasize that SNPE performs Bayesian inference, as compared to approaches that only are applicable to deterministic modeling (like classic brute force approaches). However, outside of the very simple illustrative example of Figure 2, it appears that all of the remaining examples are done on deterministic models. This leads to a fundamental conceptual question: how should one interpret the resulting posteriors resulting from SNPE?

To be concrete, if one used the precise output of a simulation (rather than summary statistics), then the actual posterior probability over parameter values is a δ function (assuming the model if identifiable). If there are nearby solutions, this wouldn't change the true posterior probability, i.e. despite being nearby, they would have posterior probability of zero. Thus, while the authors criticize other methods for using “user defined” criteria of being close to a solution, this example illustrates that if one found the exact posterior probability (as the authors state in their reply that the method should ideally do if it works correctly), all information about the existence of nearby (i.e. almost, but not entirely, degenerate) solutions would be lost. I'm guessing the density estimator will somehow tradeoff “posterior probability large” for “close to the correct solution”, but this then requires some interpretation of how “closeness to the correct simulation output” is translated to “posterior probability”. Put in this light, at least a user defined criterion of closeness for a deterministic model, as done in traditional fitting methods, is explicit and easy to interpret. Can the authors justify or provide users with an explanation for how to interpret SNPE when applied to deterministic simulations-my guess is that, when SNPE is working well, it's some sort of smoothed version of the true, very spiky posterior, but is there a way for users to interpret the solution, e.g. how would a user know if there are nearby solutions if the SNPE did a great job of finding the true posterior and therefore didn't indicate these nearby solutions? And how should the user interpret “probabilities” in a deterministic model where there may only be 1 true solution?

Note: when summary statistics are used rather than exact simulation output, this will likely create some degeneracies. However, this still may end up creating a razor thin slice in parameter space that corresponds to truly, exactly matching the summary statistics, which would then be represented by the true posterior. This scenario then returns to the same fundamental questions of: a) how to interpret the posterior from SNPE, and b) how to interpret a simulation output that is very close to the summary statistics but not equal to them and therefore perhaps lost if SNPE is finding the true posterior of the deterministic model.

2) On the one example that is stochastic (Figure 2), I am still confused by the results shown. Specifically, in Figure 2C, the posterior samples consistently and systematically undershoot the true receptive field for intermediate parameter values. This is a simple LN model for which classic maximum likelihood models usually doing an excellent job finding the true receptive field, at least with enough data. If one ran standard maximum likelihood estimation on this, would it miss? If so, why? Is this reflecting that the authors are using too broad a prior and it is not converging to the true posterior or that too few samples are used (the number of samples is not indicated)-if the latter, can the authors run with more samples and show that their method then converges to the correct answer? Or is it something else (e.g. see next paragraph)?

In addition, the authors state that their method works because it gets the same answer as MCMC. However, I'm wondering if both might be making errors due to the use of the Polya-Γ sampling scheme since, at least in the Pillow and Scott article cited, this was used not for a standard LN model but rather to model more complex models in which the variance was greater than for a Poisson spike count. In any case, it seems worth explaining where the systematic error in the posterior samples is coming from, given that the LN model is such a simple classic model fit whose parameters have been successfully identified by simpler maximum likelihood methods.

3) Related to the Figure 2 example, I was struck by the huge errors and nearly zero correlation for the ABC methods. Is this fundamental to ABC or is this because the authors are hamstringing the ABC method by applying it in a very naïve manner that isn't how someone using ABC would apply it. Specifically, from the methods, it looks like the authors are considering every parameter to be independent, with no smoothing or regularization. I would think that someone using ABC methods would apply regularizations or the like that might dramatically reduce the effective dimensionality of the problem, so it's not clear if the comparison shown in the manuscript is fair. This isn't to say that SNPE won't still work much better than ABC methods, but it's worth checking to make sure the ABC methods are being applied in a realistic manner if the comparison is to be made.

4) Ground truthing. The authors suggest that they can't ground truth the technique due, in part, to the stochastic nature of it. While I appreciate there may be some challenges, a lot could be done, especially since most of the manuscript focuses on deterministic examples that are easier to check. For example, in Figures 3-5, parameter estimation results that may or may not be correct are shown and the reader is left to assume they are correct by looking at a single or very small number of examples. While I fully expect that it does work reliably, it would be nice to actually run a large set of simulations for posterior checks (even if, e.g., only the mode of the distribution was checked) and then report some statistics.

5) Related to ground truthing, in Figure 5D and the highlighted 2-D plot of Figure 5C, I am confused as to whether the example shows the success or partial failure of the method. Specifically, the point labeled "2" in the Figures 5C and 5D appears to be in a bright (and therefore presumably high probability?) region of the parameter space, both in an absolute sense and possibly (it's hard to tell) compared to the light purple circle simulation (and also relative to going further down below this light purple trace). However, the authors use point 2 to make the point that the simulation looks very far off from the reference trace. This suggests, possibly, that similar probabilities being output by SNPE is on the one hand (values close to the light purple circle) leading to a very good fit and on the other hand (red point 2) leading to a very bad fit – thus, this suggests that one perhaps can't trust “bright” regions in the SNPE posterior to correspond to “good” fits. Can the authors check this and address the more general concern. I'm guessing this may relate to my larger conceptual question of interpreting the “probabilities” for SNPE in my reviewer comment #1 above.

6) Difference from “brute force methods”. I thank the authors for trying to generate a comparison to brute force methods by showing a rejection method. However, I don't think this is necessary and I think it's confusing as I don't think “brute force” and “rejection” are the same (i.e. I'd remove this from the paper as being more confusing than helpful). To me at least, “brute force” methods simply generate many example simulations into a database. This is separable from the problem of “what does one then do with this database afterwards”. My point was simply that I think SNPE can be interpreted as being either within the class of brute force searches in generating a very large database of simulations (e.g. ~20 million for Figure 5), or if one focuses on the density estimation component, as being a method that is complementary to the brute force database generation step by addressing the “what does one then do with this database afterwards” component of the problem. For the “what does one do with this database” question, one can certainly imagine many things-one can simply visualize the parameter space through various slices, effectively marginalizing over certain variables and illustrating for example where bifurcations occur that might be difficult to identify mathematically, or can (as the authors highlight has often been done previously…but this should not be equated with parameter search methods, as it's just one possible post-search thing that can be done) implement some sort of selection/rejection criterion. One might also want to interpolate/extrapolate, asking for an estimate of what a new set of parameters would lead to, for which a density estimator such as SNPE would excel.

This is less a criticism of the approach and more of a clarification for placing the method within existing work. I agree with the authors that current use of brute force generation of databases has not taken advantage of better methods of post-database-generation analysis, as SNPE offers, leading to the likely (and potentially dramatically, as demonstrated in Figure 5) incorrect implicit assumption that one needs to generate many samples per dimension to well-characterize the space. I just wanted to separate the notion of generating a database of many examples (which doesn't need to be done on a grid), from the notion of how one post-hoc analyzes it. And overall, this focus on the power of post-database-generation analysis is why I still think SNPE, and the more general focus on use of density estimators, is a potentially transformative contribution for the practice of model building (in addition to SNPE's applicability to stochastic problems that is already well emphasized in the manuscript).

https://doi.org/10.7554/eLife.56261.sa1

Author response

Reviewer #1:

[…] I agree with the authors that large unwieldy mechanistic models likely require machine learning to find parameter ensembles. I also agree that it should be possible to develop Bayesian like approaches for models which are not naturally probabilistic, a problem they pose nicely, and which I have come across but not quite realized was of such importance. I also find their results on the gastric circuit interesting- particularly the finding that the many disparate parameter values that can all fit data seem to lie on a single plateau, rather than on distinct minima

However, I don't really know where to begin in evaluating SNPE. It feels as if there is a section missing- the Introduction talks in generalities about large mechanistic models and the need to address their difficulties, and then it jumps to discussing the particular applications of SNPE.

I don't know what the authors mean by posterior and prior, nor what they mean when they say their method works without a likelihood.

There is muddling of what their output is and what they do to achieve it, and I don't find either sufficiently explained. For example, machine learning could be used to approximate a posterior in the usual Bayesian sense. Still I would want to know what prior they are using what their likelihood function is and what posterior they are approximating, even if there is a black box in the middle. I don't understand what their p(θ|x) is an approximation to.

We thank reviewer 1 for the detailed feedback, which has led us to clarify the descriptions of our techniques in both the Results and Materials and methods sections, putting emphasis on (1) the methodological foundation and inner workings of the algorithm and (2) how prior distributions on model parameters are chosen.

To clarify: SNPE performs Bayesian inference “in the usual sense” of calculating the posterior distribution, that is, the probability distribution of model parameters given observed data. However, in contrast to other approaches based on explicit application of Bayes’ rule, SNPE calculates the posterior distribution without ever having to numerically evaluate or approximate the likelihood of parameters given data. Instead, it trains a neural network using an objective function and learning procedure that have been proved capable, in both theory and practice, of recovering the posterior from model simulations (see first section of Results, and Materials and methods).

Thus, our definitions and usage of the concepts of “prior” and “posterior” are the standard versions used in statistical data analysis, the only (but important!) difference is that we can approximate the posterior without having to evaluate the likelihood function.

In order to emphasize and clarify these points, we now start the Results section with a description of SNPE:

“SNPE performs Bayesian inference on mechanistic models using only model-simulations, without requiring likelihood evaluations. […] SNPE's efficiency can be further improved by using the running estimate of the posterior distribution to guide further simulations towards data-compatible regions of the parameter space (Papamakarios and Murray 2016, Lueckmann et al., 2017, Greenberg et al., 2019).”

In the Materials and methods, we now have a section (“Bayesian inference without likelihood-evaluations with SNPE”) describing in more detail the loss function and the diverse versions of SNPE algorithms.

Reviewer #2:

General assessment

This is a really clear and well written paper with practical solutions for an important problem in theoretical neuroscience. In addition, the authors provide high quality code and documentation making it fairly straightforward for other groups to make use of these tools. I would expect that these tools would become widely adopted in the community. Indeed, I would love to have had this for many previous modelling studies and will certainly use it in the future.

We thank reviewer 2 for evaluating our paper and are pleased to know that the reviewer considers the paper to be “well-written” and the tools to have the potential to be “widely adopted in the community.”

Major comments

1) My main issue is that applying this method depends on how well the deep network does its job of estimating the density (which depends partly on the user specifying the correct architecture and parameters). The authors do discuss this under limitations, but it might be interesting to see what a failure would look like here and what are the scientific consequences of such a failure. I suspect the answer might partly be that you should always verify the resulting posterior distributions, but how practical is this in a 31-dimensional setting, for example? When the authors were working on the examples for this manuscript, how much work did it take to make sure that there were no failures (assuming they verified this) and how did they detect them? (For example, in subsection “Functional diversity of ion channels: efficient high-throughput inference” it's implied that the only failures in this example were because the Omnimodel wasn't able to fit some data.)

The reviewer is correct to point out that this methodology has limitations, and this becomes more evident in high-dimensional parameter spaces. In the previous version of the manuscript, we included a Discussion section on limitations, and mentioned the use of posterior predictive checks to diagnose inference-failures:

“One common ingredient of these approaches is to sample from the inferred model, and search for systematic differences between observed and simulated data, e.g. to perform posterior predictive checks (Cook et al., 2006,Talts et al., 2018, Liepe et al., 2014, Lueckmann et al., 2017, Greenberg et al., 2019) (Figure 2G, Figure 3F,G, Figure 4C, and Figure 5D).”

In this approach, one samples from the inferred posterior given the observed data, simulates for these sampled parameters, and compares the simulations with the observed data. While “posterior predictive checks” are scalable to higher-dimensional parameter spaces, since one can sample efficiently with SNPE, this approach only allows one to detect whether a mode of the posterior has been wrongly identified, but fails at detecting when there is an “undiscovered” island of the posterior. Such issues are currently unsolved in the field of simulation-based inference and in Bayesian inference in general. For all applications, and especially in high-dimensional problems, it is good practice to resort to multiple initialisations of these algorithms in order to minimise the failure to detect parameter islands.

We have expanded the Discussion about limitations:

“These approaches allow one to detect “failures” of SNPE, i.e. cases in which samples from the posterior do not reproduce the data. However, when diagnosing any Bayesian inference approach, it is challenging to rigorously rule out the possibility that additional parameter-settings (e.g. in an isolated “island”) would also explain the data. Thus, it is good practice to use multiple initialisations of SNPE, and/or a large number of simulations in the initial round.”

We should emphasise that in the manuscript, we analyse a few test cases (Figure 2) where the posterior is known or calculable, cases that have similar dimensionality as other applications where the posterior is not known (the ion-channel model and the Hodgkin-Huxley model). This gives us confidence in the quality of the inferred posteriors.

2) Would it be possible to use SNPE to automatically suggest the optimal experiment that would give the narrowest posterior distribution over parameters? This might already be what was suggested when citing papers that "automatically construct informative summary features", I wasn't entirely sure.

We thank the reviewer for the excellent point. SNPE could indeed be extended for Bayesian experimental design (see e.g. Kleinegesse and Gutmann, 2019 for a related approach), and we regard this as an exciting avenue for future work. However, we do think that this would constitute a major new research project that would be substantially beyond the scope of this study. We now cite this study when describing the possibilities brought by amortised inference:

“These results show how SNPE allows fast and accurate identification of biophysical model parameters on new data, and how SNPE can be deployed for applications requiring rapid automated inference, such as high-throughput screening-assays, closed-loop paradigms (e.g. for adaptive experimental manipulations or stimulus-selection (Kleinegesse and Gutmann, 2019)), or interactive software tools.”

Reviewer #3:

This paper addresses a major issue in fitting neuroscience models: how to identify the often degenerate, or nearly degenerate, set of parameters that can underlie a set of experimental observations. Whereas previous techniques often depended upon brute force explorations or special parametric forms or local linearizations to generate sets of parameters consistent with measured properties, the authors take advantage of advances in deep networks for use in estimating probability densities to address this problem. Overall, I think this paper and the complementary submission have the potential to be transformative contributions to model fitting efforts in neuroscience, and the ability of this work to handle cases that are not continuous is a plus. However, I think the paper needs to clarify/quantify how much of an advance this is over brute force methods (plus simple post-hoc analyses on the simulated database), given the heavy brute force aspect of generating the training set, and also whether there are biases in the method. Also, the writing could benefit from more focus on methodology (rather than on examples) to ease adoption of the method.

We thank reviewer 3 for evaluating our paper and are pleased to know that the reviewer considers the methodology presented to have “the potential to be transformative”. We address the reviewer's specific comments below.

Substantive concerns:

1) In the example of Figure 1, the Appendix 1—figure 1A (which should appear in the main figure rather than the Appendix) appears to show that the SNPE's mean temporal filter is considerably off from the true temporal value. Why is this the case? More generally, this suggests that more “ground truth” examples with direct side-by-side comparisons should be provided to assure that the estimates provided by this method are not significantly biased. For example, Figure 5 is wonderful in showing a sloppy/insensitive direction in parameter space that has sharp curvature. Could one do something similar to this and directly compare (for an example with smaller number of parameters, e.g. 4 or 5) the estimated probability distribution to a “ground truth” generated through a sufficiently dense grid search, for different numbers of training samples.

The goal of SNPE is to perform Bayesian inference. Thus, if SNPE works correctly, the posterior distribution inferred by SNPE should be the same as the “true” posterior. In particular, the true SNPE-mean filter should be the same as the “true-posterior” mean filter (here computed with MCMC), which is the case in Appendix 1—figure 1A. So, Appendix 1—figure 1A shows that SNPE is working as intended.

For finite and noisy data, the posterior mean is not necessarily the same as the true parameters. In this setting, it is fundamentally impossible for any estimation procedure to recover the true parameters, and Bayesian inference therefore uses a compromise between the data and the prior. This has advantageous features on average, but for a single example, the posterior mean might deviate from the true parameters. These issues are well understood and widely appreciated in statistics, but we realize that we have not sufficiently made this clear in this figure.

(As a side note, this issue implies that verifying the correctness of SNPE is not as simple as “comparing true and recovered parameters” is the reason for why we have included extensive analyses on problems with known ground-truth posteriors and additional posterior predictive checks, see e.g. Figure 2C,D,F,G,H and Appendix 1—figures 1,2,4).

In order to clarify that perfect inference does not necessarily mean recovering the true underlying parameters, we now write in the respective Results sub-section:

“If SNPE works correctly, its posterior mean filter will match that of the reference posterior, however, it is not to be expected that either of them precisely matches the ground-truth filter (Figure 2c and Appendix 1—figure 1): In the presence of finite sampling and stochasticity, multiple different filters could have plausibly given rise to the observed data. A properly inferred posterior will reflect this uncertainty, and include the true filters as one of many plausible explanations of the data (but not necessarily as the “mean” of all plausible explanations) (Appendix 1—figure 2).”

Regarding moving Appendix 1—figure 1A to Figure 2, we think that the posterior samples plotted in Figure 2C alongside the true temporal filter, already allow one to appreciate the respective similarities and differences.

Regarding the comparison of our approach on a (simpler) STG model to a dense grid-search, we would like to point out that the grid search would require the probabilistic mapping of the parameters to the summary statistics, i.e. the likelihood function, which is intractable in the STG case and is therefore not possible to do at this point. Thus, in any problems with intractable likelihoods, it is intractable to compute the “true posterior” to “prove” correctness, but we can do a range of posterior predictive checks (as shown in the manuscript), e.g. showing that for high-probability under the STG posterior, the simulated traces are similar to the observed data (Figure 5C,D, and Appendix 1—figure 12), whereas orthogonal paths lead very quickly to simulated traces different from the observed data. This suggests that the inferred STG posterior is capturing correctly at least one region of high probability.

2) The authors need to clarify, quantitatively, how much of a benefit this is over simple post-hoc analysis of results of brute force methods that just randomly (uniformly) sample parameters, especially given that there is a lot of brute force work in this method as well in generating the samples for the deep network training. For such brute force methods (e.g. rather than a grid, some studies just uniformly randomly generate parameters to use in the simulations), one can also just post-hoc fit the results to a mixture of Gaussians or similar classical density estimator so that other values can be interpolated from the obtained data. How much is the deep network approach needed? My guess is that it does help, potentially a huge amount, but the authors need to show this in a carefully controlled comparison.

We thank the reviewer for the excellent comment. We would like to emphasise that SNPE performs Bayesian inference on stochastic models, whereas “brute-force” approaches are generally only applied to deterministic models, and based on heuristically chosen distance measures. If applied to stochastic models, such brute-force approaches can be interpreted as rejection-ABC, which rejects simulations according to the chosen distance measure, and which we show in Figure 2 to perform poorly compared to SNPE. This suggests that the big advantage of SNPE arises from the fact that (1) it uses all simulations independently of the similarity to the empirical data, and (2) it learns to identify the features most informative about the parameters.

We now have an extensive Discussion on these aspects:

“How big is the advance brought by SNPE relative to “conventional” brute-force approaches that aim to exhaustively explore parameter space? […] Finally, we showed above that, on the STG circuit, SNPE also yielded solutions more similar to experimental data than a brute-force approach (based on fitting a mixture-model to accepted simulations).”

Furthermore, we provide a comparison between SNPE and an approach where we combined rejection ABC with mixture-of-Gaussians density estimation. To do so, we used the dataset of simulations obtained for the crustacean stomatogastric ganglion by​ sampling from the prior and picked the 10000 parameter sets that produced the closest summary features to the observed data. As a distance metric, we used the Euclidean distance over summary features after having standardized them. We then fitted a mixture of Gaussians with 20 components to these 10000 samples using the Python machine learning library scikit-learn. We found that samples from the SNPE posterior have summary statistics that are substantially closer to the observed data than samples from the mixture-of-Gaussians approach. We report these new results as Appendix 1—figure 14 and a respective paragraph in the Results:

“How much does the conditional density estimation network contribute to posterior estimation? […] In the next section, we will furthermore show how such a powerful posterior representation can be used to explore the properties of neural dynamics.”

3) Related to #1, given that this paper is primarily a methodological advance, I think the authors could spend more time laying out the essence of the basic algorithm in the beginning of the Results. For example, it's not clear in Figure 1 exactly what is being input and output from the neural density estimator deep network. More generally, it's worth describing more about the algorithm, perhaps in less technical language, in the main results.

As replied to reviewer 1’s comments, we have clarified the methodology in Results. We now start the Results section with a description of SNPE:

“SNPE performs Bayesian inference on mechanistic models using only model-simulations, without requiring likelihood evaluations. […] SNPE's efficiency can be further improved by using the running estimate of the posterior distribution to guide further simulations towards data-compatible regions of the parameter space (Papamakarios and Murray, 2016, Lueckmann et al., 2017, Greenberg et al., 2019).”

Within the Materials and methods, since the authors indicated some playing around with parameters was needed, it would help a lot for the adoption of this method by others to have more explanation of different features of the method and more intuition behind choices. For example, what is the loss function used after the initial round? Are there any basic constraints on what shape the deep network should have (e.g. should it fan out in the middle layers, fan in, or be approximately uniform in width)? What is the basic idea behind a masked autoregressive flow (MAF) and when should that be chosen versus the mixture of Gaussians model? The algorithm's description could also be expanded (or a short version could be in the main and expanded version in the Materials and methods).

We have extended the Materials and methods section to address reviewer’s questions:

1) We now provide more details on the loss functions for SNPE-A/B/C, and point to previous literature showing how minimising the loss function can be used to perform Bayesian inference without likelihood evaluations;

2) We include a new section about “Neural density estimation”, where we discuss the constraints on the shape of the mixture density networks:

“The MDN outputs the parameters of a mixture of Gaussians (i.e. mixture weights, and for each component of the mixture, the mean vector and covariance entries). Thus, for an MDN composed of K components, we chose an architecture with at least as many units per layer as K( 1 + dim(θ) + dim(θ)(dim(θ)+1)/2 ) – 1, to ensure enough flexibility to approximate well the parameters of the mixture of Gaussians. For example, when inferring the parameters of the Hodgkin-Huxley model given in vitro recordings from mouse cortex (Allen Cell Types Database, https://celltypes.brain-map.org/data), we infer the posterior over 8 parameters with a mixture of two Gaussians, and the MDN needs at least 89 units per layer. Across applications, we found 2 layers to be sufficient to appropriately approximate the posterior distribution.”

3) We also describe the basic idea of MAFs and the respective constraints:

“MAF is a specific type of normalizing flow, which is a highly flexible density estimator (Rezende et al., 2015; Papamakarios et al., 2017, Papamakarios et al., 2019). Normalizing flows consist of a stack of bijections which transform a simple distribution (usually a multivariate Gaussian distribution) into the target distribution. Each bijection is parameterized by a specific type of neural network (for MAF: a Masked Autoencoder for Distribution Estimation, or MADE). In our experiments, 5 stacked bijections are enough to approximate even complex posterior distributions. Depending on the size of the parameter and data space, each neural network had between (50,50) and (200,400) hidden units.”

4) Regarding the choice between MDNs and MAFs, we write:

“When using SNPE in a single-round, we generally found superior performance for MAFs as compared to MDNs. When running inference across multiple rounds, training MAFs leads to additional challenges which might impede the quality of inference (Greenberg et al., 2019; Durkan et al., 2020)”

4) Is there some way to ballpark estimate, or post-hoc characterize, how many samples may be needed? Seemingly, the fact that this works with less than one sample per dimension says something about the geometry of the solution space and its effective dimensionality. (If this seems too hard, the authors can punt on it).

The reviewer is raising an important point. SNPE is using neural density estimators to approximate the function between summary statistics and a distribution over parameters. Thus, in general, the higher the dimensionality and complexity of this function, the more flexible the neural density estimator should be, and therefore the higher the required number of simulations to appropriately fit such a function. Given the unavailability of such a function a priori, it is in general hard to provide a definite answer. In our workflow, we have a few heuristics, which we now clarify in the manuscript, in the Materials and methods, section “Simulation-based inference”:

“The required number of simulations depends on both the dimensionality and complexity of the function between summary statistics and model parameters. While the number of parameters and summary-features can easily be determined, it can be hard to determine how “complex” (or nonlinear) this mapping is, this makes it difficult to give general guidelines on how many simulations will be required. A practical approach is to choose a simulation-budget based on the computational cost of the simulation, inspect the results (e.g. with posterior predictive checks), and add more simulations when it seems necessary.”

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Summary

This is methodology is of potentially wide interest, and although the authors have made efforts to explain the interpretation, applicability and caveats of the methodology, one reviewer feels that the manuscript could benefit from further clarification on:

We thank the editors and reviewers for their constructive comments. We are glad that reviewer 2 was “very happy” with our response and hope to address any remaining concerns by reviewer 3 both in this response and by making the corresponding changes in the manuscript.

Some main concerns seem to be based on the (incorrect) assumption that we deal with deterministic models—this is not the case, all of our models are stochastic. They are only “likelihood-free” in the sense that the likelihood cannot be evaluated (but mathematically it exists). This is addressed in detail in our response below. In addition, we made sure this is now even more clear from the manuscript. We hope to clear up the confusion about the example with ground truth, showing that SNPE indeed agrees with reference solutions and posterior uncertainty shrinks in light of more data, and we clarify fairness of comparisons with ABC/sampling.

1) How one should interpret “posterior probability density” in cases where there is a deterministic relationship between parameters and behaviour, e.g. in extreme cases where there might be a single solution.

SNPE is designed to perform Bayesian inference, i.e. to calculate the posterior distribution on stochastic models. While, in principle, one can also apply SNPE to deterministic models, we believe that a full investigation of how it behaves under these circumstances, and how to interpret its outputs, is outside of the scope of this paper. Furthermore, we would claim that having some model of variability is paramount for any data analysis, as there can never be a perfect match between model and data.

2) The fairness of comparisons with ABC/sampling – are these alternatives being compared in an even-handed way?

We believe they are—see detailed explanation below. We also remark that the superiority of SNPE-algorithms over rejection ABC, albeit in problems outside of neuroscience, has been demonstrated both by us (Lueckmann, Goncalves et al., 2017, Greenberg et al., 2019) and others (Papamakarios and Murray, 2016).

3) It may be helpful to have a very explicit example of the results SNPE returns on a problem where “ground truth” of feasible parameters are known. This could be a simple model dynamical system, used for illustration.

The manuscript contains two examples (a LN GLM and a Gabor GLM) where the likelihood is known, and as we explain below (see response to reviewer), we can obtain the respective ground-truth posteriors using standard inference algorithms (MCMC). We show that SNPE recovers the correct posterior in both cases. In our detailed response below, we also include additional analyses on the LN GLM model, and clear out concerns about the correctness of the returned posteriors (e.g., the STG).

We therefore respectfully disagree that the addition of yet another example would strengthen the manuscript. In case the editor still finds it necessary after our detailed response, we would like to clearly understand what precisely the additional value provided by adding one more example application would be.

Reviewer #3:

The authors have clarified the methods in a way that I think should be helpful, and I remain optimistic that this will be a very valuable technique for the field. However, I think some of the other concerns raised about rigorously demonstrating the technique, and clarifying the method conceptually were not addressed. I will try to re-state these concerns in a broader scope that I hope strengthens the paper, clarifies potential issues, and gets at some key conceptual and rigor issues that need to be addressed

1) Conceptual question: The manuscript and reply heavily emphasize that SNPE performs Bayesian inference, as compared to approaches that only are applicable to deterministic modeling (like classic brute force approaches). However, outside of the very simple illustrative example of Figure 2, it appears that all of the remaining examples are done on deterministic models. This leads to a fundamental conceptual question: how should one interpret the resulting posteriors resulting from SNPE?

To be concrete, if one used the precise output of a simulation (rather than summary statistics), then the actual posterior probability over parameter values is a δ function (assuming the model if identifiable). […] And how should the user interpret “probabilities” in a deterministic model where there may only be 1 true solution?

Note: when summary statistics are used rather than exact simulation output, this will likely create some degeneracies. However, this still may end up creating a razor thin slice in parameter space that corresponds to truly, exactly matching the summary statistics, which would then be represented by the true posterior. This scenario then returns to the same fundamental questions of: a) how to interpret the posterior from SNPE, and b) how to interpret a simulation output that is very close to the summary statistics but not equal to them and therefore perhaps lost if SNPE is finding the true posterior of the deterministic model.

All the applications in the manuscript are stochastic, as described in the original version of the manuscript: we write both in Results and Materials and methods that the Hodgkin-Huxley model includes "intrinsic neural noise"; regarding the STG circuit, we write in Materials and methods that "we added Gaussian noise with a standard deviation of 0.001 mV to the input of each neuron". We note that the information of the stochastic nature of the ion channel model was omitted by mistake in the Materials and methods, and this is now corrected.

We do, however, realize that this point was not sufficiently clear, so we emphasized it further:

1) In the ion channel application, we now explicitly write in the Results where the stochasticity comes in:

“We aimed to identify these ion channel parameters θ for each ICG model, based on 11 features of the model's response to a sequence of 5 noisy voltage clamp protocols, resulting in a total of 55 different characteristic features per model (Figure 3B, see Materials and methods for details).”

And in the Materials and methods:

“We modeled responses of the Omnimodel to a set of five noisy voltage-clamp protocols (Podlaski et al., 2017): as described in Podlaski et al., the original voltage-clamp protocols correspond to standard protocols of activation, inactivation, deactivation, ramp and action potential, to which we added Gaussian noise with zero mean and standard deviation 0.5 mV.”

2) In the Hodgkin-Huxley application:

“We considered the problem of inferring 8 biophysical parameters in a HH single-compartment model, describing voltage-dependent sodium and potassium conductances and other intrinsic membrane properties, including neural noise, making the model stochastic by nature (Figure 4A, left).”

3) In the STG application:

“This model has been used to demonstrate that virtually indistinguishable activity can arise from vastly different membrane and synaptic conductances in the STG (Alonso and Marder, 2019; Prinz et al., 2004). Here, we build on these studies and extend the model to include intrinsic neural noise on each neuron (see Materials and methods).”

More generally, we hold the view that modelling and explicitly treating some​ ​form of noise (either in the underlying system and/or the measurement process) is of critical importance for data analysis in neuroscience. This is (arguably) under-appreciated in computational neuroscience, which often concentrates on reproducing averaged neural activity without taking into account the associated variability or uncertainty. So, we maintain that the restriction of stochastic models is, in our view, not a serious restriction.

As the reviewer writes, SNPE could also be applied to deterministic models and will also return posteriors in this case. However, a rigorous interpretation of the posteriors in that case is mathematically subtle, and a mathematically rigorous treatment of this case is therefore, in our view, beyond the scope of the manuscript. Somewhat speculatively, we agree that the posterior would likely be a “smoothed out” version of the true (δ-shape) posterior, where the smoothing is dependent on the capacity of the neural network used for inference. We note that there is some recent work in machine learning to deal with such cases rigorously, e.g. by resorting to spread divergences (https://arxiv.org/abs/1811.08968;​ didactic blog post by David Barber at https://aiucl.github.io/spread-divergence/).​

We have added a small Discussion about deterministic models:

“SNPE, and Bayesian inference more generally, is derived for stochastic models. SNPE can, in principle, also be applied to deterministic models, but a rigorous mathematical interpretation or empirical evaluation in this regime is beyond the scope of this study.”

2) On the one example that is stochastic (Figure 2), I am still confused by the results shown. Specifically, in Figure 2C, the posterior samples consistently and systematically undershoot the true receptive field for intermediate parameter values. This is a simple LN model for which classic maximum likelihood models usually doing an excellent job finding the true receptive field, at least with enough data. If one ran standard maximum likelihood estimation on this, would it miss? If so, why? Is this reflecting that the authors are using too broad a prior and it is not converging to the true posterior or that too few samples are used (the number of samples is not indicated)-if the latter, can the authors run with more samples and show that their method then converges to the correct answer? Or is it something else (e.g. see next paragraph)?

In addition, the authors state that their method works because it gets the same answer as MCMC. However, I'm wondering if both might be making errors due to the use of the Polya-Γ sampling scheme since, at least in the Pillow and Scott article cited, this was used not for a standard LN model but rather to model more complex models in which the variance was greater than for a Poisson spike count. In any case, it seems worth explaining where the systematic error in the posterior samples is coming from, given that the LN model is such a simple classic model fit whose parameters have been successfully identified by simpler maximum likelihood methods.

This comment contains multiple sub-questions which we would like to answer separately:

1) Can Polya-Γ MCMC (PG-MCMC) be used to infer a reference posterior for the LN model?

For the LN model, we use PG-MCMC (a likelihood-based method) to obtain a reference posterior to compare SNPE against. As shown in Polson, Scott and Windle, 2012, (cited in the manuscript), PG-MCMC is applicable to Bernoulli GLMs, which the LN model is one instance of. It is not the case that PG-MCMC is only applicable to overdispersed models.

2) Is the result obtained by SNPE correct?

The goal of SNPE is to perform Bayesian inference. Thus, to evaluate whether SNPE “works”, one must not compare its posterior to the ground-truth parameters, but rather to the ground-truth posterior, or at least a proxy thereof. We here used the posterior obtained via PG-MCMC as a reference posterior, which SNPE correctly recovers. We do not have doubts about the correctness of PG-MCMC on this example, in particular as the posterior agrees with summary checks, and this model has a unimodal posterior for which MCMC sampling is well understood and works reliably. (If the reference posterior was wrong, it would also be a bit of a coincidence that SNPE gets the same, but wrong posterior…). The only reason why the reviewer seems to question the correctness of the posterior is that he is surprised by the large posterior variance, see below.

3) Does posterior uncertainty shrink with more data?

The posteriors inferred with the PG-MCMC scheme and SNPE are not tightly constrained around the true filters (see Appendix 1—figure 1, showing posteriors means +/- 1 standard deviation). This “systematic undershooting” does not indicate a failure of the PG-MCMC scheme, but is simply a consequence of the fact that we simulated a short experiment consisting of 100 Bernoulli samples (which we call “single-trial” simulation). For short experiments, posterior uncertainty will be large, and a mismatch between true parameters and posterior mean is expected. As expected, increasing the number of single-trials in one observation leads to progressively tighter posteriors, with posterior samples closer to the true filter. We now describe this result in the Results section and Appendix 1—figure 3:

“Increasing the number of Bernoulli samples in the observed data leads to progressively tighter posteriors, with posterior samples closer to the true filter (Appendix 1—figure 3). Furthermore, SNPE closely agrees with the MCMC reference solution in all these cases, further emphasizing the correctness of the posteriors inferred with SNPE.”

4) Why are we not as “successful as simple MLE methods”

We note that the results by Pillow et al., which the reviewer likely refers to, have been obtained in extremely long recordings, which lead to very tightly constrained posteriors: in this regime, both maximum likelihood estimation and Bayesian inference return the same answer, but this also makes it hard to assess whether the posterior distribution, i.e. the estimation uncertainty, is correctly recovered. The fact that our posterior mean, with more samples, converges to the true parameters shows that, as the data increases, all these methods will converge to the same answer, namely the true parameters. In neuroscience, however, it is very common that one is not in this data-regime, making it important that methods perform correct inference even in the small-data regime.

3) Related to the Figure 2 example, I was struck by the huge errors and nearly zero correlation for the ABC methods. Is this fundamental to ABC or is this because the authors are hamstringing the ABC method by applying it in a very naïve manner that isn't how someone using ABC would apply it. Specifically, from the Materials and methods, it looks like the authors are considering every parameter to be independent, with no smoothing or regularization. I would think that someone using ABC methods would apply regularizations or the like that might dramatically reduce the effective dimensionality of the problem, so it's not clear if the comparison shown in the manuscript is fair. This isn't to say that SNPE won't still work much better than ABC methods, but it's worth checking to make sure the ABC methods are being applied in a realistic manner if the comparison is to be made.

Note that we compare SNPE against SMC-ABC in Figure 2 on two different examples: We show a comparison on the LN model example (Figure 2D) as well as a comparison on the Gabor model (Figure 2G). SNPE outperforms rejection methods in both cases, a result which is consistent with previous work (e.g., Lueckmann, Goncalves et al., 2017; Greenberg et al., 2019). Benchmarking against other methods is not the primary focus of this manuscript.

The LN model comparison is performed on low dimensional summary statistics (10d) rather than raw data traces, so it does not put SMC-ABC to a disadvantage. In the Gabor example, SMC-ABC fails because it is not able to deal with the high-dimensionality of the data. This illustrates an important drawback of SMC-ABC, which would need to be combined with other methods/pre-processing steps, if one wanted to use it for this problem. SNPE allows automatic learning of summary statistics, so that inference on high-dimensional data is possible. We maintain that this is a good illustrative comparison to include.

We also note that the suggestion to smooth the problem would, in effect, just reduce this high-dimensional estimation to an (effectively) lower-dimensional, and substantially easier estimation problem. Accordingly, all estimation approaches would work better, and the differences between methods would be smaller.

4) Ground truthing. The authors suggest that they can't ground truth the technique due, in part, to the stochastic nature of it. While I appreciate there may be some challenges, a lot could be done, especially since most of the manuscript focuses on deterministic examples that are easier to check. For example, in Figures 3-5, parameter estimation results that may or may not be correct are shown and the reader is left to assume they are correct by looking at a single or very small number of examples. While I fully expect that it does work reliably, it would be nice to actually run a large set of simulations for posterior checks (even if, e.g., only the mode of the distribution was checked) and then report some statistics.

Regarding the LN and Gabor models, and as discussed above, we include comparisons of SNPE posteriors with reference posteriors. Furthermore, we now also have a new supplementary figure for the LN model showing consistency with increases in data (see above).

All the other applications covered in the manuscript have intractable likelihoods, so ground-truthing is challenging. Therefore, we opted for posterior predictive checks on the data, rather than ground-truthing on the parameters. In addition, we do point out that the manuscript does, in effect, contain multiple quantitative evaluations:

1) In the ion channel model application, Appendix 1—figure 7 shows the correlation between the observation and the mode of the respective posterior for 350 different observations. We believe this figure demonstrates the reliable quality of the posteriors for this application.

2) In the Hodgkin-Huxley model applications, Figure 4A-E shows the application of SNPE to a synthetic dataset, Figure 4F shows the application of SNPE to 8 different datasets, and Appendix 1—figure 10C,D show multiple samples from the posterior for the synthetic dataset. We believe this is strong evidence of the success of SNPE on this application.

3) In the STG application, Figure 5D in combination with Appendix 1—figure 12A, and Appendix 1—figure 14C show that the summary statistics of a large number of samples from the posterior vary little around the observed summary statistics. This is again strong evidence that SNPE performed well in this application.

5) Related to ground truthing, in Figure 5D and the highlighted 2-D plot of Figure 5C, I am confused as to whether the example shows the success or partial failure of the method. Specifically, the point labeled "2" in the Figures 5C and 5D appears to be in a bright (and therefore presumably high probability?) region of the parameter space, both in an absolute sense and possibly (it's hard to tell) compared to the light purple circle simulation (and also relative to going further down below this light purple trace). However, the authors use point 2 to make the point that the simulation looks very far off from the reference trace. This suggests, possibly, that similar probabilities being output by SNPE is on the one hand (values close to the light purple circle) leading to a very good fit and on the other hand (red point 2) leading to a very bad fit – thus, this suggests that one perhaps can't trust “bright” regions in the SNPE posterior to correspond to “good” fits. Can the authors check this and address the more general concern. I'm guessing this may relate to my larger conceptual question of interpreting the “probabilities” for SNPE in my reviewer comment #1 above.

We thank the reviewer for raising this potentially confusing point. As the reviewer points out, when inspecting the 2D-marginal probability distribution, the point labeled “2” seems to lie in regions of high posterior probability. However, this does not imply that the point actually lies in a region of high probability in the full 31-dimensional posterior space (its 2-d project lies in a high probability region, but the full high-d version does not), and thus the visual inspection of 2D-marginals can be misleading. To demonstrate this point, we have added a supplementary figure that shows the posterior probability along both paths, see Appendix 1—figure 13 and refer to this figure in the main text as follows:

“Note that, while parameter set 2 seems to lie in regions of high probability when inspecting pairwise marginals, it in fact has low probability under the full posterior distribution (see Appendix 1—figure 13).”

As can be seen, the posterior probability along the orthogonal path drops off very quickly, which matches the off-looking activity and further demonstrates the success of the method.

6) Difference from “brute force methods”. I thank the authors for trying to generate a comparison to brute force methods by showing a rejection method. However, I don't think this is necessary and I think it's confusing as I don't think “brute force” and “rejection” are the same (i.e. I'd remove this from the paper as being more confusing than helpful […] And overall, this focus on the power of post-database-generation analysis is why I still think SNPE, and the more general focus on use of density estimators, is a potentially transformative contribution for the practice of model building (in addition to SNPE's applicability to stochastic problems that is already well emphasized in the manuscript).

We thank the reviewer for the suggestion regarding the clarification of the method in the context of previous approaches. We agree that the process of generating a large set of simulations and analysing it can be decoupled, at least for the case of one round SNPE—we note that in multi-round SNPE, the sampling of parameters is adaptively guided by the posterior from the previous round, and thus this decoupling of sampling and analysis is somewhat debatable. We have added a clarification regarding the brute-force aspect of SNPE in the Discussion subsection “Related Work”:

“We should still note that SNPE can require the generation of large sets of simulations, which can be viewed as a brute-force step, emphasising that one of the main strengths of SNPE over conventional brute-force approaches relies on the processing of these simulations via deep neural density estimators.”

We have now removed the analysis we had made and included (at the reviewers request, we were under the impression we were following his instructions) in the previous round. If the reviewer or the editor wants it included again, we are happy to put it back in.

https://doi.org/10.7554/eLife.56261.sa2

Article and author information

Author details

  1. Pedro J Gonçalves

    1. Computational Neuroengineering, Department of Electrical and Computer Engineering, Technical University of Munich, Munich, Germany
    2. Max Planck Research Group Neural Systems Analysis, Center of Advanced European Studies and Research (caesar), Bonn, Germany
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    Contributed equally with
    Jan-Matthis Lueckmann and Michael Deistler
    For correspondence
    pedro.goncalves@caesar.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6987-4836
  2. Jan-Matthis Lueckmann

    1. Computational Neuroengineering, Department of Electrical and Computer Engineering, Technical University of Munich, Munich, Germany
    2. Max Planck Research Group Neural Systems Analysis, Center of Advanced European Studies and Research (caesar), Bonn, Germany
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Contributed equally with
    Pedro J Gonçalves and Michael Deistler
    For correspondence
    jan-matthis.lueckmann@tum.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4320-4663
  3. Michael Deistler

    1. Computational Neuroengineering, Department of Electrical and Computer Engineering, Technical University of Munich, Munich, Germany
    2. Machine Learning in Science, Excellence Cluster Machine Learning, Tübingen University, Tübingen, Germany
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Contributed equally with
    Pedro J Gonçalves and Jan-Matthis Lueckmann
    For correspondence
    michael.deistler@tum.de
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3573-0404
  4. Marcel Nonnenmacher

    1. Computational Neuroengineering, Department of Electrical and Computer Engineering, Technical University of Munich, Munich, Germany
    2. Max Planck Research Group Neural Systems Analysis, Center of Advanced European Studies and Research (caesar), Bonn, Germany
    3. Model-Driven Machine Learning, Institute of Coastal Research, Helmholtz Centre Geesthacht, Geesthacht, Germany
    Contribution
    Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  5. Kaan Öcal

    1. Max Planck Research Group Neural Systems Analysis, Center of Advanced European Studies and Research (caesar), Bonn, Germany
    2. Mathematical Institute, University of Bonn, Bonn, Germany
    Contribution
    Software, Investigation, Visualization, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8528-6858
  6. Giacomo Bassetto

    1. Computational Neuroengineering, Department of Electrical and Computer Engineering, Technical University of Munich, Munich, Germany
    2. Max Planck Research Group Neural Systems Analysis, Center of Advanced European Studies and Research (caesar), Bonn, Germany
    Contribution
    Software, Visualization, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  7. Chaitanya Chintaluri

    1. Centre for Neural Circuits and Behaviour, University of Oxford, Oxford, United Kingdom
    2. Institute of Science and Technology Austria, Klosterneuburg, Austria
    Contribution
    Resources, Data curation, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4252-1608
  8. William F Podlaski

    Centre for Neural Circuits and Behaviour, University of Oxford, Oxford, United Kingdom
    Contribution
    Resources, Data curation, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-6619-7502
  9. Sara A Haddad

    Max Planck Institute for Brain Research, Frankfurt, Germany
    Contribution
    Resources, Data curation, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0807-0823
  10. Tim P Vogels

    1. Centre for Neural Circuits and Behaviour, University of Oxford, Oxford, United Kingdom
    2. Institute of Science and Technology Austria, Klosterneuburg, Austria
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
  11. David S Greenberg

    1. Computational Neuroengineering, Department of Electrical and Computer Engineering, Technical University of Munich, Munich, Germany
    2. Model-Driven Machine Learning, Institute of Coastal Research, Helmholtz Centre Geesthacht, Geesthacht, Germany
    Contribution
    Data curation, Software, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing
    Competing interests
    No competing interests declared
  12. Jakob H Macke

    1. Computational Neuroengineering, Department of Electrical and Computer Engineering, Technical University of Munich, Munich, Germany
    2. Max Planck Research Group Neural Systems Analysis, Center of Advanced European Studies and Research (caesar), Bonn, Germany
    3. Machine Learning in Science, Excellence Cluster Machine Learning, Tübingen University, Tübingen, Germany
    4. Max Planck Institute for Intelligent Systems, Tübingen, Germany
    Contribution
    Conceptualization, Supervision, Funding acquisition, Validation, Visualization, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    Jakob.Macke@gmail.com
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5154-8912

Funding

Deutsche Forschungsgemeinschaft (SFB 1233)

  • Jan-Matthis Lueckmann
  • Marcel Nonnenmacher
  • Giacomo Bassetto
  • Jakob H Macke

Deutsche Forschungsgemeinschaft (SFB 1089)

  • Pedro J Gonçalves
  • Jakob H Macke

Deutsche Forschungsgemeinschaft (SPP 2041)

  • Jakob H Macke

Bundesministerium für Bildung und Forschung (01IS18052 A-D)

  • Michael Deistler
  • Marcel Nonnenmacher
  • Jakob H Macke

H2020 European Research Council (SYNAPSEEK)

  • Chaitanya Chintaluri
  • William F Podlaski
  • Tim P Vogels

Wellcome Trust Senior Research Fellowship (214316/Z/18/Z)

  • Tim P Vogels

UK Research and Innovation (UKRI-BBSRC BB/N019512/1)

  • Chaitanya Chintaluri

Deutsche Forschungsgemeinschaft (Germany's Excellence Strategy - EXC-Number 2064/1 610 - Project number 390727645)

  • Jakob H Macke

Wellcome Trust (Sir Henry Dale Fellowship WT100000)

  • William F Podlaski
  • Tim P Vogels

Royal Society (Sir Henry Dale Fellowship WT100000)

  • William F Podlaski
  • Tim P Vogels

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Mahmood S Hoseini and Michael Stryker for sharing their data for Figure 2, and Philipp Berens, Sean Bittner, Jan Boelts, John Cunningham, Richard Gao, Scott Linderman, Eve Marder, Iain Murray, George Papamakarios, Astrid Prinz, Auguste Schulz and Srinivas Turaga for discussions and/or comments on the manuscript. This work was supported by the German Research Foundation (DFG) through SFB 1233 ‘Robust Vision’, (276693517), SFB 1089 ‘Synaptic Microcircuits’, SPP 2041 ‘Computational Connectomics’ and Germany's Excellence Strategy – EXC-Number 2064/1 – Project number 390727645 and the German Federal Ministry of Education and Research (BMBF, project ‘ADIMEM’, FKZ 01IS18052 A-D) to JHM, a Sir Henry Dale Fellowship by the Wellcome Trust and the Royal Society (WT100000; WFP and TPV), a Wellcome Trust Senior Research Fellowship (214316/Z/18/Z; TPV), a ERC Consolidator Grant (SYNAPSEEK; WPF and CC), and a UK Research and Innovation, Biotechnology and Biological Sciences Research Council (CC, UKRI-BBSRC BB/N019512/1). We gratefully acknowledge the Leibniz Supercomputing Centre for funding this project by providing computing time on its Linux-Cluster.

Senior Editor

  1. John R Huguenard, Stanford University School of Medicine, United States

Reviewing Editor

  1. Timothy O'Leary, University of Cambridge, United Kingdom

Reviewer

  1. Mark S Goldman, University of California at Davis, United States

Publication history

  1. Received: February 21, 2020
  2. Accepted: September 16, 2020
  3. Accepted Manuscript published: September 17, 2020 (version 1)
  4. Accepted Manuscript updated: September 18, 2020 (version 2)
  5. Version of Record published: October 22, 2020 (version 3)

Copyright

© 2020, Gonçalves et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 2,067
    Page views
  • 325
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Computational and Systems Biology
    2. Neuroscience
    Jonathan Oesterle et al.
    Research Article Updated

    While multicompartment models have long been used to study the biophysics of neurons, it is still challenging to infer the parameters of such models from data including uncertainty estimates. Here, we performed Bayesian inference for the parameters of detailed neuron models of a photoreceptor and an OFF- and an ON-cone bipolar cell from the mouse retina based on two-photon imaging data. We obtained multivariate posterior distributions specifying plausible parameter ranges consistent with the data and allowing to identify parameters poorly constrained by the data. To demonstrate the potential of such mechanistic data-driven neuron models, we created a simulation environment for external electrical stimulation of the retina and optimized stimulus waveforms to target OFF- and ON-cone bipolar cells, a current major problem of retinal neuroprosthetics.

    1. Computational and Systems Biology
    2. Genetics and Genomics
    Reza K Hammond et al.
    Research Article Updated

    To uncover novel significant association signals (p<5×10−8), genome-wide association studies (GWAS) requires increasingly larger sample sizes to overcome statistical correction for multiple testing. As an alternative, we aimed to identify associations among suggestive signals (5 × 10−8≤p<5×10−4) in increasingly powered GWAS efforts using chromatin accessibility and direct contact with gene promoters as biological constraints. We conducted retrospective analyses of three GIANT BMI GWAS efforts using ATAC-seq and promoter-focused Capture C data from human adipocytes and embryonic stem cell (ESC)-derived hypothalamic-like neurons. This approach, with its extremely low false-positive rate, identified 15 loci at p<5×10−5 in the 2010 GWAS, of which 13 achieved genome-wide significance by 2018, including at NAV1, MTIF3, and ADCY3. Eighty percent of constrained 2015 loci achieved genome-wide significance in 2018. We observed similar results in waist-to-hip ratio analyses. In conclusion, biological constraints on sub-significant GWAS signals can reveal potentially true-positive loci for further investigation in existing data sets without increasing sample size.