Dendritic nonlinearities are tuned for efficient spikebased computations in cortical circuits
 Cited
 2
 Views
 1,646
 Comments
 0
Abstract
Cortical neurons integrate thousands of synaptic inputs in their dendrites in highly nonlinear ways. It is unknown how these dendritic nonlinearities in individual cells contribute to computations at the level of neural circuits. Here, we show that dendritic nonlinearities are critical for the efficient integration of synaptic inputs in circuits performing analog computations with spiking neurons. We developed a theory that formalizes how a neuron's dendritic nonlinearity that is optimal for integrating synaptic inputs depends on the statistics of its presynaptic activity patterns. Based on their in vivo preynaptic population statistics (firing rates, membrane potential fluctuations, and correlations due to ensemble dynamics), our theory accurately predicted the responses of two different types of cortical pyramidal cells to patterned stimulation by twophoton glutamate uncaging. These results reveal a new computational principle underlying dendritic integration in cortical neurons by suggesting a functional link between cellular and systemslevel properties of cortical circuits.
https://doi.org/10.7554/eLife.10056.001eLife digest
Imagine that you are in the habit of checking three different weather forecasts each day, and then one day in early September the first forecast suddenly predicts snow. If you live in an area where it doesn’t normally snow in September, your initial reaction is likely to be surprise. However, you will not be quite so surprised to see a prediction of snow in the second forecast, and by the third forecast you will hardly be surprised at all.
In these three cases, you have responded to the same piece of information in a different way. In mathematics, this type of response is referred to as “nonlinear” because the output (varying degrees of surprise) is not directly proportional to the input (identical predictions of snow). In the case of the weather forecasts, the source of the nonlinearity was the fact that the three predictions were not truly independent. Instead, they corresponded with one another, or “correlated”, because all three depended on the weather itself.
In the brain, a single neuron can receive thousands of inputs from other cells. These are received via junctions called synapses that form between the cells. In many cases, the synapses form on the receiving neuron’s dendrites – the short branches that protrude from its cell body. Each dendrite can receive signals from hundreds of other neurons, and must combine these inputs to produce a single neuronal response. How dendrites do this is not clear.
Ujfalussy et al. have now developed a computational model that predicts the optimal response of dendrites to complex and realistic inputs from other neurons. The model shows that when dendrites receive inputs from neurons that independently respond to different stimuli, the optimal response is for the dendrites to average the inputs. This is a form of linear processing. By contrast, when the inputs are correlated – for example, because they come from neurons responding to the same stimulus – the optimal response is nonlinear processing. In this and other cases, the optimal response predicted by the model is similar to the response observed in real dendrites.
The model also makes a number of testable predictions; for example, that neurons with correlated activities will tend to form clusters of synapses close together on the dendrites of a target neuron, whereas neurons with unrelated activities will tend to form synapses that are further apart. Somewhat unexpectedly, Ujfalussy et al. show that compensating for input correlations accounts for almost all the nonlinearities that can be found in real neurons' dendrites – at least in response to relatively simple input patterns. Thus, it remains to be shown whether nonlinear dendritic responses to more complex input patterns can also be explained by this single principle. Further studies are also required to understand how different plasticity mechanisms enable neurons to achieve this close match between input correlations and dendritic processing.
https://doi.org/10.7554/eLife.10056.002Introduction
The dendritic tree of a cortical neuron performs a highly nonlinear transformation on the thousands of inputs it receives from other neurons, sometimes resulting in a markedly sublinear (Longordo et al., 2013) and often in strongly superlinear integration of synaptic inputs (Losonczy and Magee, 2006; Nevian et al., 2007; Branco and Häusser, 2011; Makara and Magee, 2013). These nonlinearities have been traditionally studied from the perspective of singleneuron computations, using a few wellcontrolled synaptic stimuli, revealing a remarkable repertoire of arithmetic operations that the dendrites of cortical neurons carry out (Poirazi and Mel, 2001; London and Häusser, 2005; Branco et al., 2010) including additive, multiplicative and divisive ways of combining individual synaptic inputs in the cell’s response (Silver, 2010). More recently, the role of nonlinear dendritic integration in actively shaping responses of single neurons under in vivo conditions has been demonstrated in several cortical areas including the hippocampus (Grienberger et al., 2014), as well as visual (Smith et al., 2013) and somatosensory cortices (Xu et al., 2012; Lavzin et al., 2012; Palmer et al., 2014).
However, while many of the basic biophysical mechanisms underlying these nonlinearities are well understood (Stuart et al., 2007), it has proven a daunting task to include all these mechanisms in larger scale network models to understand their interplay at the level of the circuit (Herz et al., 2006). Conversely, studies of cortical computation and dynamics have largely ignored the complex and highly nonlinear information processing capabilities of the dendritic tree and concentrated on circuitlevel computations emerging from interactions between pointlike neurons with single, somatic nonlinearities (Hopfield, 1984; Seung and Sompolinsky, 1993; Gerstner and Kistler, 2002; Vogels et al., 2011). Therefore, it is unknown how dendritic nonlinearities in individual cells contribute to computations at the level of a neural circuit.
A limitation of most theories of nonlinear dendritic integration is that they focus on highly simplified input regimes (Mel et al., 1998; Poirazi et al., 2003; Archie and Mel, 2000; Poirazi and Mel, 2001; Ujfalussy et al., 2009), essentially requiring both the inputs and the output of a cell to have stationary firing rates. This approach thus ignores the effects and consequences of temporal variations in neural activities at the time scale of interspike intervals characteristic of in vivo states in cortical populations (Crochet et al., 2011; Haider et al., 2013). In contrast, we propose an approach which is specifically centered on these naturally occurring statistical patterns – in analogy to the principle of ‘adaptation to natural input statistics’ which has been highly successful in accounting for the inputoutput relationships of cells in a number of sensory areas at the systems level (Simoncelli and Olshausen, 2001). We pursued this principle in understanding the integrative properties of individual cortical neurons, for which the relevant statistical input patterns are those characterising the spatiotemporal dynamics of their presynaptic spike trains. Thus, rather than modelling specific biophysical properties of a neuron directly, our goal was to predict the phenomenological input integration properties that result from those biophysical properties and are matched to the statistics of the presynaptic activities.
Our theory is based on the observation that cortical neurons mainly communicate by action potentials, which are temporally punctate allornone events. In contrast, the computations cortical circuits perform are commonly assumed to involve the transformations of analog activities varying continuously in time, such as firing rates or membrane potentials (Rumelhart et al., 1986; Hopfield, 1984; Dayan and Abbott, 2001; Archie and Mel, 2000; London et al., 2010). This implies a fundamental bottleneck in cortical computations: the discrete and stochastic firing of spikes by neurons conveys only a limited amount of information about their rapidly fluctuating activities (Pfister et al., 2010; Sengupta et al., 2014). Formalising the implications of this bottleneck mathematically reveals that the robust operation of a circuit requires its neurons to integrate their inputs in highly nonlinear ways that specifically depend on two complementary factors: the computation performed by the neuron and the longterm statistics of the inputs it receives from its presynaptic partners.
To critically evaluate our theory, we first illustrate qualitatively the nonlinearities that most efficiently overcome the spiking bottleneck for different classes of presynaptic correlation structures. Next, to provide biophysical insight, we demonstrate that the form of optimal input integration for these presynaptic correlations can be efficiently approximated by a canonical, biophysicallymotivated model of dendritic integration. Finally, we test the prediction that cortical dendrites are optimally tuned to their input statistics in in vitro experiments. For this, we use available in vivo data to characterize the presynaptic population activity of two different types of cortical pyramidal cells. Based on these input statistics, our theory accurately predicts the integrative properties of the postsynaptic dendrites measured in twophoton glutamate uncaging experiments. We also show that NMDA receptor activation is necessary for dendritic integration to approximate the optimal response. These results suggest a novel functional role for dendritic nonlinearities in allowing postsynaptic neurons to integrate their richly structured synaptic inputs nearoptimally, thus making a key contribution to dynamically unfolding cortical computations.
Results
Suppose that every day you check your three favorite websites for the weather forecast. On a September day, the first website forecasts snow which you find hard to believe as it is highly unusual in your area – so you dismiss it as the forecaster’s mistake. However, when you read a similar forecast on the second site, you become convinced that snow is coming, and by the time the third site brings you the same news you are hardly surprised at all. Thus, even though all three sources conveyed the same information (snow), they had different impact on you – in other words, their cumulative effect was nonlinear. This nonlinearity was due to the fact that the information you get from these sites tends to be correlated as they are all related to a common cause, the actual weather. Below we argue that the same fundamental statistical principle, that correlated information sources require nonlinear integration, accounts for the dendritic nonlinearities of cortical pyramidal neurons.
Overcoming the spiking bottleneck in circuit computations
To introduce our theory, we consider a postsynaptic neuron computing some function, $f$, of the activity of its presynaptic partners, $\mathbf{u}$ (Figure 1A, top):
where $\dot{v}$ is the resultant temporal change of the activity of the postsynaptic neuron. We chose $\mathbf{u}$ and $v$ to be analog variables, rather than for example digital spike trains, in line with the vast bulk of theories of network computations (Hopfield, 1984; Dayan and Abbott, 2001; Pouget et al., 2003) and experimental results suggesting analog coding in the cortex (London et al., 2010; Shadlen and Newsome, 1998). In particular, we considered these variables to correspond to the coarsegrained (lowpass filtered) somatic membrane potentials of neurons (in particular, excluding the action potentials themselves, as often reported in experimental data; Carandini and Ferster, 2000), although the theory can equally be formalized in terms of instantaneous firing rates (Materials and methods, Figure 1figure supplement 1).
The standard description of neural circuit dynamics in Equation 1 hides an important informational bottleneck intrinsic to the operation of cortical circuits. While according to Equation 1, the postsynaptic neuron’s analog activity, $v$, is required to depend directly on the analog activities of its presynaptic partners, $\mathbf{u}$, in reality it only accesses these presynaptic activities through the spikes the presynaptic population transmits, $\mathbf{s}$, incurring a substantial loss of information (Alenda et al., 2010; Sengupta et al., 2014). Therefore, the function $g$ a neuron actually implements on its inputs can only depend directly on the presynaptic spikes, not the underlying activities (Figure 1A, bottom):
Importantly, while $f\left(\mathbf{u}\right)$ is dictated by the computational function of the circuit, the actual transformation of the synaptic input to the postsynaptic response, expressed by $g\left(\mathbf{s}\right)$, is determined by the morphological and biophysical properties of the cell. (For these purposes, we regard the presynaptic side of synapses, transforming presynaptic spike trains to synaptic transmission events, as conceptually being part of the postsynaptic cell’s $g\left(\mathbf{s}\right)$ function.) How can then the neuron integrate the incoming presynaptic spikes, as formalized by $g\left(\mathbf{s}\right)$, such that the resulting postsynaptic response best matches the required computational function, $f\left(\mathbf{u}\right)$, thereby alleviating the fundamental informational bottleneck of spikingbased communication?
Determining the best $g\left(\mathbf{s}\right)$ is nontrivial because the same presynaptic spike train may be the result of many different underlying presynaptic activities (Paninski, 2006), each potentially implying a different output of the computational function. This ambiguity is formalized mathematically as a posterior probability distribution, $\text{P}\left(\mathbf{u}\right\mathbf{s})$, expressing the probability that the analog activities of the presynaptic cells might currently be $\mathbf{u}$ given their spike trains, $\mathbf{s}$ (Pfister et al., 2010; Ujfalussy et al., 2011). The optimal response, i.e. the $g\left(\mathbf{s}\right)$ that minimizes the average squared error relative to $f\left(\mathbf{u}\right)$, is the expectation of $f\left(\mathbf{u}\right)$ under the posterior:
Crucially, the expression for the posterior, given by Bayes’ rule, is:
Note that while Equations 3–4 do not reveal directly the specific biophysical properties the postsynaptic cell should have, they tell us phenomenologically what signal integration properties should result from its biophysical properties. In particular, they make it explicit that the optimal $g\left(\mathbf{s}\right)$ depends fundamentally on two factors (Figure 1B, top):
the function that needs to be computed, $f\left(\mathbf{u}\right)$, and
the statistics of presynaptic activities: $\text{P}\left(\mathbf{u}\right)$, the prior probability distribution characterizing the longrun statistics of multineural activity patterns in the presynaptic ensemble, and the likelihood $\text{P}\left(\mathbf{s}\right\mathbf{u})$, expressing the potentially probabilistic relationship between analog activities (e.g. somatic membrane potential trajectories) and emitted spike trains.
In the following, we show that the outcome of the integration of presynaptic spike trains in cortical neurons approximates very closely the optimal response, and that dendritic nonlinearities are crucial for achieving this nearoptimality. For this, 1) we make an assumption about the computational function of the postsynaptic cell, $f\left(\mathbf{u}\right)$ (Figure 1B, top right); 2) we constrain presynaptic statistics, $\text{P}\left(\mathbf{u}\right)$ and $\text{P}\left(\mathbf{s}\right\mathbf{u})$, by in vivo data about cortical population activity patterns (Figure 1B, top left); and with these 3) we compute the optimal response they jointly determine for various stimulation patterns (Figure 1B, bottom left and right).
Optimal input integration is nonlinear
To specify our model, we considered the case when $f\left(\mathbf{u}\right)$ itself is linear. Although networks with purely linear dynamics can perform nontrivial computations already (Dayan and Abbott, 2001; Hennequin et al., 2014), in the general case, we do expect $f\left(\mathbf{u}\right)$ to be nonlinear, e.g. sigmoidal (Hopfield, 1984). Nevertheless, in typical electrophysiological experiments only a small fraction of the full dynamic range of a neuron’s total input is stimulated (Figure 1—figure supplement 2), and so we approximate the computational function, $f\left(\mathbf{u}\right)$, as being linear on this limited input range without loss of generality. (See Figure 1—figure supplement 3 for the application of the theory to the case of nonlinear $f$.) Yet, as we show below, for physiologically realistic statistics of presynaptic activity patterns, the optimal response combines input spike trains in highly nonlinear ways even in the case of linear computation, predicting experimentally characterized nonlinearities in dendritic input integration. In particular, second and higherorder prior presynaptic correlations, represented by $\text{P}\left(\mathbf{u}\right)$, will have a major role in determining the form of the corresponding optimal response. The likelihood, $\text{P}\left(\mathbf{s}\right\mathbf{u})$, also influences the optimal response, but only in its quantitative details, as it does not involve correlations across neurons: each neuron’s firing is independent from the others’, given its own somatic membrane potential (Materials and methods).
Previous suggestions for how postsynaptic neurons achieve reliable computation despite the substantial ambiguity about the individual presynaptic activities relied on the linear averaging of inputs arriving from a sufficiently large pool of presynaptic neurons (Dayan and Abbott, 2001; Pfister et al., 2010). However, linear averaging is only guaranteed to produce the correct output, as dictated by Equations 34, if the activities of presynaptic neurons are statistically independent under the prior distribution, i.e. $\text{P}\left(\mathbf{u}\right)={\prod}_{i}\text{P}\left({u}_{i}\right)$ (Materials and methods). In contrast, the membrane potential (Crochet et al., 2011) and spiking (Cohen and Kohn, 2011) of cortical neural populations often show complex patterns of correlations, which include both ‘spatial’ (crosscorrelations between different neurons) and temporal components (autocorrelations, i.e. the correlation of the activity of the same cell with itself at different moments in time). Thus, in this more general case, we expect the optimal response to involve a nonlinear integration of spike trains. While temporal correlations alone do not require nonlinear dendritic integration across synapses, only local nonlinearities within each synapse, as brought about e.g. by short term synaptic plasticity (Pfister et al., 2010), spatial correlations require the nonlinear integration of spikes emitted by different presynaptic neurons.
To illustrate that presynaptic spatial correlations require nonlinear integration across synapses, we compared the best linear response to a given presynaptic spike pattern with the optimal response (Equation 3, as approximated by Equation 23) for two different input statistics that differed only in the correlations between the presynaptic cells but not in the activity dynamics or spiking of individual neurons (temporal correlations). To compute the postsynaptic response, we assumed that dendritic integration in the postsynaptic cell was linear but, in order to dissect the role of dendritic integration across synapses from the effects of nonlinearities in individual synapses, we allowed spikes from the same presynaptic neuron still to be integrated nonlinearly (Pfister et al., 2010). In the first case (Figure 1C), when the presynaptic neurons were independent, the best linear response was identical to the optimal response. However, if presynaptic neurons became correlated, the optimal response became nonlinear and the best linear response was unable to accurately follow the fluctuations in the input (Figure 1D).
Thus, inputs from presynaptic neurons whose activity tends to be correlated need to be nonlinearly integrated, while inputs from uncorrelated sources need to be integrated linearly. This could be naturally achieved in the same dendritic tree by clustering synapses of correlated inputs to efficiently engage dendritic nonlinearities, while distributing the synapses of uncorrelated inputs on different dendritic branches (Larkum and Nevian, 2008). Crucially, for correlated inputs it is also necessary that the dendritic nonlinearities have just the appropriate characteristics for the particular pattern of correlations in presynaptic activities.
The form of the optimal nonlinearity depends on the statistics of presynaptic inputs
In order to systematically study the nonlinearities in the optimal response in the face of naturalistic input patterns, we derived and analyzed its behavior for a flexible class of richly structured, correlated inputs. Our statistical model for presynaptic activities, specifying the parametric forms of $\text{P}\left(\mathbf{u}\right)$ and $\text{P}\left(\mathbf{s}\right\mathbf{u})$ (Materials and methods and Figure 2—figure supplement 1), was able to generate a variety of multineural activity patterns resembling the statistical properties described in in vitro and in vivo multielectrode recordings of neuronal population activities (Figure 2A and D show two representative examples). Once we have specified the statistical model of presynaptic activities, it uniquely determined the optimal response to any given input pattern (Equations 3–4). Thus, we used the same statistical model in two fundamentally different ways: first, to generate “naturalistic” in vivolike patterns of presynaptic membrane potential traces and spike trains; and second, to compute the optimal response pattern to any stimulation pattern, be it “naturalistic” or parametrically varying “artificial” as used in typical in vitro experiments.
The optimal response determined by this statistical model, for essentially any setting of parameters, was inherently nonlinear because the additional effect of a presynaptic spike depended on the pattern of spikes that had been previously received from the presynaptic population. Temporal correlations in the presynaptic population caused the optimal response to depend on the spiking history of the same cell (Pfister et al., 2010), while crucially, the additional presence of spatial correlations introduced a dependency on the past spikes of other cells. Thus, the integrated effect of multiple spikes could not be computed as a simple linear sum of their individual effects in isolation. Specifically, a spike that was consistent with the information already gained from recent presynaptic spikes had only a small effect on the response (Figure 2B). Conversely, a spike that was unexpected based on the recent spiking history caused a larger change (Figure 2E).
As could be anticipated based on Equations 34, whether a spike counted as expected or unexpected relative to recently received spikes, and hence whether it had a small or large postsynaptic effect, depended on the longrun prior distribution of presynaptic activities, $\text{P}\left(\mathbf{u}\right)$. As a result, the same pattern of presynaptic spikes led to qualitatively different responses under different prior distributions. In particular, sublinear integration was optimal when presynaptic activities exhibited Gaussian random walks and thus they did not contain statistical dependencies beyond second order correlations (Figure 2AC), as seen in the retina and cortical cultures (Schneidman et al., 2006). This was because with the activities of presynaptic neurons being positively correlated, successive spikes conveyed progressively less information about the presynaptic signal resulting in sublinear integration (Figure 2C) and the strength of the sublinearity depended on the magnitude of correlations (Ujfalussy and Lengyel, 2011). In contrast, supralinear integration was optimal when the presynaptic population exhibited coordinated switches between distinct states associated with large differences in the activity levels compared to activityfluctuations within each state (Figure 2D–F). These switches led to higher order statistical dependencies as seen in the cortex in vivo, either due to populationwide modulation by cortical state (Gentet et al., 2010; Crochet et al., 2011), or due to stimulusdriven activation of particular cell ensembles (Harris et al., 2003; Ohiorhenuan et al., 2010; Miller et al., 2014). In this case, while observing a few spikes was consistent with random membrane potential fluctuations within the quiescent state, thus only warranting a small response, further spikes suggested that the presynaptic population was in the active state now and thus the response should be larger, leading to supralinear integration (Figure 2F).
Note, that nonlinearities in the optimal postsynaptic response needed not simply compensate for the nonlinearities in the presynaptic spike generation process, as captured by $\text{P}\left(\mathbf{s}\right\mathbf{u})$, but they critically depended on the presynaptic correlations, as captured by $\text{P}\left(\mathbf{u}\right)$. Indeed, in Figures 1C, D and 2A–F, the same spiking nonlinearity was used and yet very different input integration was required depending on the form of the presynaptic statistics: linear integration for uncorrelated inputs (Figure 1B) and nonlinear integration for correlated inputs (Figure 1C), with sub or supralinear integration being optimal depending on whether only second order (Figure 2A–C) or also higher order correlations were present in the presynaptic population (Figure 2D–F). Moreover, optimal input integration remained nonlinear even if the postsynaptic neuron computed a function of the presynaptic firing rates (rather than membrane potentials) which were linearly related to spikes (Figure 1—figure supplement 1).
Nonlinear dendrites can approximate the optimal response
The nonlinear input integration seen in the optimal response strongly resembled dendritic nonlinearities. Indeed, the basic biophysical mechanisms present in dendrites naturally yield nonlinearities that are qualitatively similar to those of the optimal response: purely passive properties lead to sublinear integration (Koch, 1999), whereas locally generated dendritic spikes endow dendrites with strong supralinearities (Nevian et al., 2007; Branco and Häusser, 2011). However, the full mathematical implementation of the optimal response is excessively complex (Materials and methods) and thus, there is unlikely to be a onetoone mapping between the variables necessary for implementing it and the biophysical quantities available in dendrites. Therefore, we sought to establish a formal proof that dendriticlike dynamics can implement, even if approximately, the optimal response. For this, we considered two limiting cases of our statistical model of presynaptic activities, $\text{P}\left(\mathbf{u}\right)$ and $\text{P}\left(\mathbf{s}\text{}\mathbf{u}\right)$, and compared the properties of the corresponding optimal response to a wellestablished simplified model of nonlinear dendritic integration, using a combination of analytical and numerical techniques.
First, we considered a limiting case in which the statistics of a large presynaptic population were strongly dominated by the simultaneous switching of presynaptic neurons between a quiescent and an active state (as shown in Figure 2D). In this limiting case we were able to show mathematically (see Materials and methods) that a simple, biophysicallymotivated, canonical model of nonlinear dendritic integration (Poirazi and Mel, 2001) is able to produce responses that are nearidentical to the optimal response for any sequence of presynaptic spikes (Figure 3A, see also Figures 4C). In this simple dendritic model (Figure 3A, inset; Equations 24–25), inputs within a branch are integrated linearly and the local dendritic response is then obtained by transforming this linear combination through a sigmoidal nonlinearity, which is a hallmark of supralinear behavior in dendrites (Poirazi et al., 2003b).
Second, we considered another limiting case in which the statistics of the presynaptic population were fully characterized by secondorder correlations (as shown in Figure 2A). In this case, the same type of dendritic model, but with a sublinear inputoutput mapping, was able to approximate the optimal response very closely. Although a closedform solution for the optimal nonlinear mapping could not be obtained in this case, it could be shown to be sublinear (Appendix), and was well approximated by a sigmoidal nonlinearity parametrized to be dominantly saturating (Figures 3B and Figure 3—figure supplement 1).
We also noted that it was the same type of sigmoidal nonlinearity which could implement supralinear and sublinear integration depending on the input regime (low background, synchronous spikes: supralinear; high background, asynchronous spikes: sublinear integration, compare Figure 3A and B, inset). This suggests that dendritic integration may adapt to systematic changes in presynaptic statistics, such as those brought about by transitioning between the desynchronized and synchronized states of the neocortex, or sharp waves and theta activity in the hippocampus, without having to change the parameters of its nonlinearity (Borst et al., 2005) (Figure 3—figure supplement 2). Indeed, Gasparini and Magee (2006) demonstrated that dendritic integration in hippocampal pyramidal cells was supralinear when inputs were highly synchronized (as they are during sharp waves), while integration was linear if the input was asynchronous (such as during theta activity).
Nonlinear dendrites are computationally advantageous
While the foregoing analyzes proved that dendriticlike nonlinearities can closely approximate the optimal response in certain limiting cases, they do not address directly whether having such nonlinearities in input integration is crucial for attaining nearoptimal computational performance for more realistic input statistics, or simpler forms of input integration could achieve similar computational power. To study this, we considered a scenario in which the presynaptic population consisted of four ensembles, such that neurons belonging to each ensemble underwent synchronized switches in their activity levels which were independent across the four ensembles, while there were also independent fluctuations in the activity of individual presynaptic neurons which were comparable in magnitude to those caused by these synchronized activity switches (Figure 4A). We then assessed the performance of four different variants of a simple dendritic model relative to that of the optimal response (Figure 4B): a model with linear dendrites and soma; a model in which only the soma was nonlinear, and two models in which nonlinearities resided in the dendrites with either random or clustered connectivity between the presynaptic assemblies and the dendritic branches.
We quantified the performance of each of the models based on how closely their output approximated the linear average of the analog presynaptic activities giving rise to the spike trains they were integrating (Figure 4—figure supplement 1, Materials and methods). For a fair comparison, we tuned the parameters of each variant of the dendritic model to obtain the best possible performance with it (Figure 4C). The model with nonlinear dendrites and clustered connectivity had nearoptimal crossvalidated performance (Figure 4D) while all other models performed significantly worse (n = 20 runs, t = 51, t = 35, t = 20, and P$<$10${}^{15}$, P$<$10${}^{15}$, P$<$10${}^{13}$; respectively from left to right as shown in Figure 4D). This remained true when we varied the number and firing rate of presynaptic neurons over a wide range, and under a diverse set of qualitatively different populationlevel statistics, determining the dynamics of assembly switchings and withinassembly membrane potential correlations (Figure 4—figure supplement 2).
Taken together, these results demonstrate that the clustering of correlated inputs together with nonlinearities akin to those found in dendrites is necessary to achieve optimal estimation performance in the face of presynaptic correlations. However, in order to be tractable, our dendritic model was mathematically simplified and, as a result, only qualitatively reproduced the nonlinearities of real dendrites. Thus, we directly compared experimentally recorded responses in dendrites to the optimal response.
Nonlinear integration in cortical neurons is matched to their input statistics
A crucial prediction of our theory is that dendritic nonlinearities act to achieve nearoptimal responses in a way that the form of the nonlinearity is specifically matched to the longrun statistics of the presynaptic population. We tested this prediction in experiments in which two different types of cortical pyramidal neurons, from layer 2/3 of the neocortex (Figure 5A–E) and from area CA3 of the hippocampus (Figure 5F–J), received patterned dendritic stimulation using twophoton glutamate uncaging, and compared their subthreshold somatic responses with the optimal responses predicted by the theory.
For generating our predictions of the optimal response in these two cell types, we fitted the parameters describing presynaptic statistics in our model, $\text{P}\left(\mathbf{u}\right)$ and $\text{P}\left(\mathbf{s}\right\mathbf{u})$, to the statistical patterns in the activity of their respective presynaptic populations. For neocortical pyramidal cells, we fitted in vivo data available on the membrane potential fluctuations of layer 2/3 pyramidal cellpairs in the barrel cortex during quiet wakefulness (Gentet et al., 2010; Crochet et al., 2011) (NC, Figure 5A, Table 2). For hippocampal pyramidal cells, we fitted presynaptic statistics to membrane potential fluctuations (Ylinen et al., 1995; English et al., 2014) and to multineuron spiking patterns of hippocampal pyramidal cells during sharp wave activity (Csicsvari et al., 1999; 2000) (HP, Figure 5F, Table 3). Due to the limitations of available hippocampal data sets, extracellular rather than intracellular data was used for fitting correlations. The motivation for our choice of the particular neocortical and hippocampal states used for fitting presynaptic statistics was twofold. First, the general network state of the slice preparations in which we tested dendritic integration was likely most analogous to these states (A Gulyás, personal communication; see also Karlocai et al., 2014; Schlingloff et al., 2014), characterized by relatively suppressed neural excitability due to low levels of cholinergic modulation (Harris and Thiele, 2011; Eggermann et al., 2014). Second, the stimulation protocol used in our study (short bursts of synaptic stimuli following longer silent periods) was also most consistent with population activity during hippocampal sharp waves and quiet wakefulness in the cortex. In order to capture variability across the cells we recorded from, the parameters related to postsynaptic dendritic filtering (amplitude and decay of the response to a single stimulation, and the size of the dendritic subunit, Figure 5—figure supplement 1BC) were tuned for the individual dendrites. Importantly, the parameters describing presynaptic statistics were fitted without regard to our dendritic experimental data, thus allowing a strong test of our predictions about dendritic integration (see Materials and methods).
We found that the nonlinear integration of individual spike patterns in cortical neurons was remarkably well fit by the optimal response when it was tuned to the correct presynaptic statistics (Figure 5C,H). The systematic dependence of response amplitudes on the interstimulus interval (ISI) in individual cells (Figure 5D,I) was also well predicted by the optimal response. We quantified the quality of match between the predicted and experimentally recorded time course of responses across a population of n = 6 (neocortex) and n = 6 (hippocampus) dendrites under a range of conditions varying ISI or the number of stimuli, and found that the precision of our predictions was not statistically different from that expected from the inherent variability of responses in individual dendrites (Figure 5E,J; neocortex: t = 0.2, P = 0.85; hippocampus: t = 1.85, P = 0.12). In contrast, when the optimal response was tuned to unrealistic presynaptic statistics characterized purely by secondorder correlations (cor2), or by a lack of any correlations implying statistically independent presynaptic firing (ind), the quality of fits became significantly worse (Figure 5E,J; neocortex: t = −4.6, P = 0.006 for cor2, and t = −4.9, P = 0.004 for ind; hippocampus: t = −4, P = 0.01 for cor2, and t = −4.9, P = 0.004 for ind).
Moreover, using realistic presynaptic statistics, but matching hippocampal rather than neocortical activities, also resulted in significantly worse fits for neocortical responses (Figures 5E; t= −3.6, P = 0.02). The converse was not observed in the case of hippocampal neurons (Figures 5J; t= 0.43, P = 0.68). This might be because hippocampal neurons also receive neocortical inputs (albeit on their apical not basal dendrites) that show similar population activity patterns to the ones we matched here for the neocortical cells (Isomura et al., 2006), while the primary sensory cortical pyramidal cells we recorded from do not receive direct input from the hippocampus. Nevertheless, when we analyzed the quality of fit between our predictions and recorded responses in hippocampal and neocortical data together, we found a small, but significant interaction between the source of the input statistics (neocortex or hippocampus) and the location of the postsynaptic neurons (ANOVA F = 5, P $<$ 0.05). This suggests that dendritic nonlinearities in cortical pyramidal neurons are specifically tuned to the dynamics of their presynaptic cortical ensembles. Furthermore, the blockade of NMDA receptor activation by AP5 resulted in dendritic responses that afforded substantially poorer fits by the model, even after refitting the postsynaptic parameters (Figure 5E,J, AP5). This indicated that the fine tuning of dendritic nonlinearities to input statistics relied on the action of NMDA receptors.
As dendrites in both of our cortical cell types integrated inputs supralinearly, as a further control, we analyzed similar data available from cerebellar stellate cell dendrites, which are known to integrate their inputs sublinearly (Abrahamsson et al., 2012) (Figure 5—figure supplement 2). In this case, we fitted the statistics of individual presynaptic cells to those of cerebellar granule cells. The correlations between these cells are less known, but we found that assuming simple secondorder correlations made the optimal response a close match to dendritic responses. In contrast, the hippocampal or neocorticallike statistics that were crucial for matching responses in cortical dendrites (Figure 5D,H) resulted in a substantially poorer fit in this cerebellar cell type. This demonstrates a double dissociation in the matching of cortical and subcortical neuron types to cortical and noncortical input statistics.
Discussion
We established a functional link between the statistics of the synaptic inputs impinging on the dendritic tree of a neuron and the way those inputs are integrated within the dendritic tree. We first demonstrated that efficient computation in spiking circuits requires nonlinear input integration if the activities of the neurons are correlated and that the structure of the presynaptic correlations determines the form of the optimal input integration. Second, we showed that the optimal response can be efficiently approximated by a canonical biophysicallymotivated model of dendritic signal processing both for linearly correlated inputs and for cellassembly dynamics. Third, we found that nonlinear dendrites with synaptic clustering carry significant benefits for decoding richly structured presynaptic spike trains. Finally, in vitro measurements of dendritic integration in two different types of cortical pyramidal neurons yielded postsynaptic responses that closely matched those predicted to be optimal given the in vivo input statistics of those particular cell types. These results suggest that nonlinear dendrites are essential to decode complex spatiotemporal spike patterns and thus play an important role in networklevel computations in neural circuits.
Biophysical substrate
The central insight of our theory is the relationship between presynaptic statistics and postsynaptic response, formalized as the optimal response. The optimal response can be expressed as a set of nonlinear differential equations that requires storing and continuously updating $~{N}^{2}$ variables within the dendritic tree, where $N$ is the number of synapses (Materials and methods). Thus, it is unlikely to be implemented by the postsynaptic neuron as such. Consequently, to demonstrate the biophysical feasibility of our theory, we derived a simple approximation to the optimal response that performs about equally well using just a few postsynaptic variables and that corresponds to a canonical descriptive model of dendritic integration (Poirazi et al., 2003; Poirazi et al., 2003b).
We found that simple second order correlations between presynaptic neurons imply sublinear integration which can be implemented by the saturating nonlinearity characteristic of passive dendrites. Conversely, the biophysical substrate for the type of supralinear integration that was optimal for stateswitching dynamics likely involves NMDA receptors because the particular dendritic nonlinearites observed in the cortical cells in which we tested our theory are known to be mediated primarily through NMDA receptor activation (Branco et al., 2010; Makara and Magee, 2013; Major et al., 2013). Indeed, we found that pharmacological inactivation of NMDA receptors abolished the precise match between dendritic integration and presynaptic statistics in these neurons (Figure 5). Moreover, the local plateau potentials generated by NMDA currents have been shown to have graded response durations (Major et al., 2008), and the resulting nonlinearities could be continuously tuned between weaker and stronger forms (boosting and bistability, Major et al., 2013). These properties make NMDA receptor mediated dendritic nonlinearities ideally suited for being matched to presynaptic statistics, as the optimal response involves sustained dendritic depolarisations of varying duration (Figure 4) that depend parametrically on those statistics.
Input statistics and clustering
A central prediction of our theory that awaits confirmation is the existence of a tight relationship between the structure of correlations in the activity of presynaptic cells and the morphological clustering of their synapses on the postsynaptic dendrite. This is because our theory requires nonlinear integration of inputs from neurons with statistically dependent activity, while spikes from independent neurons need to be integrated linearly. Biophysical considerations suggest (Koch, 1999) and experimental data supports (Polsky et al., 2004; Losonczy and Magee, 2006) that, when synchronous, nearby inputs on the same dendritic branch are summed nonlinearly, whereas widely separated inputs are combined linearly. Consequently, our theory predicts that the correlation structure of the inputs will be mapped to the dendritic tree in a way that presynaptic neurons with strongly correlated activities target nearby locations while independent neurons innervate distinct dendritic subunits.
According to our theory, the kind of correlation relevant for determining synaptic clustering is the ‘marginal’ correlations between the membrane potentials of presynaptic neurons. Marginal correlations include both signal and noise correlations (Averbeck et al., 2006) and thus can reach substantial magnitudes even when noise correlations alone are small, as e.g. during desynchronized cortical states (Renart et al., 2010), especially for neurons with overlapping receptive fields (Froudarakis et al., 2014), and when measured between the membrane potentials of neurons rather than their spike counts (Dorn and Ringach, 2003; de la Rocha et al., 2007; Poulet and Petersen, 2008).
At the level of different dendritic regions, the segregation of different input pathways along the dendritic tree of hippocampal neurons supports this prediction (Witter et al., 1989; Druckmann et al., 2014). At the level of individual synapses, the degree and the existence of clustering among inputs showing correlated activity is currently debated. High resolution imaging revealed subcellular topography of sensory inputs in the tadpole visual system (Bollmann and Engert, 2009), clustered patterns of axonal activity in the parallel fibres that provide input to cerebellar Purkinje cells (Wilms and Häusser, 2015), and experiencedriven synaptic clustering in the barn owl auditory localization pathway (McBride et al., 2008). Furthermore, it has been demonstrated that neighboring synapses are more likely to be coactive than synapses that are further away from each other in developing hippocampal pyramidal cells (Kleindienst et al., 2011) as well as in hippocampal cultures and in vivo in the barrel cortex during spontaneous activity (Takahashi et al., 2012). These results thus suggest clustering of correlated inputs.
In contrast, an interspersion of differently tuned orientation, frequency or whiskerspecific synaptic inputs on the same dendritic segments was found in the mouse visual, auditory or somatosensory cortex, respectively, thus challenging the prevalence of synaptic clustering (Jia et al., 2010; Chen et al., 2011; Varga et al., 2011). However, in all these studies the stimuli used were nonnaturalistic and varied along a single stimulus dimension only (direction of drifting gratings, pitch of pure tones, or the identity of the single whisker being stimulated), which may account for the apparent lack of clustering. In particular, our theory predicts clustering based on the longterm statistical dependencies between the responses of the presynaptic neurons for naturalistic inputs, which can be quite poorly predicted from their tuning properties for single stimulus dimensions (Harris et al., 2003; Fiser et al., 2004). In contrast, the statistical dependencies relevant for our theory are well represented by those found during spontaneous activity (Berkes et al., 2011). Indeed, studies finding evidence in favor of synaptic clustering analyzed the structure of synaptic input to dendritic branches during spontaneous network activity (McBride et al., 2008; Kleindienst et al., 2011; Makino and Malinow, 2011; Takahashi et al., 2012). Thus, presynaptic correlations for naturalistic stimulus sets may be predictive of synaptic clustering and providing more direct evidence for or against such clustering will offer a crucial test of our theory.
Linear vs. nonlinear postsynaptic computations
Although, in general, we expect singleneuron computations to be nonlinear (Zador, 2000), and our theory indeed applies to nonlinear computations (Figure 1—figure supplement 3), we assumed the postsynaptic computation to be linear for matching experimental data. This choice was justified by two reasons. First, it is difficult to determine, without making strong prior assumptions, what kind of nonlinear function the neuron actually computes; and so the choice of any particular such function would have been arbitrary. Note that even in relatively wellcharacterized cortical areas (such as the visual cortex) it is unknown how much of the computationally relevant output of individual neurons (such as orientation or direction selectivity) is brought about by specific nonlinearities in the inputoutput transformations of these neurons, or by multiple steps of feedforward and recurrent processing carried out at various stages of the visual pathway between the retina and those neurons. Moreover, in some cases, even networks with linear dynamics can provide a remarkably close fit to experimentally observed cortical population dynamics (Hennequin et al., 2014). This issue may be best addressed in systems that are more specialized than the cortex so that there are wellsupported hypotheses about the particular nonlinear computations individual neurons need to perform, such as the fly visual system (Single and Borst, 1998) or the mammalian and avian auditory brain stem (AgmonSnir et al., 1998). In order to test our theory in these systems, in vivo multineural data will need to be collected from the afferent brain areas, preferably in the unanesthetized animal, for characterising the relevant statistical properties of the presynaptic population to which dendritic nonlinearities are adapted according to our prediction.
Second, any nonlinear function can be approximated to high precision by a linear function over a sufficiently limited input range. Currently available experimental techniques for systematically probing dendritic nonlinearities, including those used in our study, only provide data over such a very limited range ($\sim $0.1% of the number of excitatory inputs impinging a neuron, Megías et al., 2001). Inputs in this small range do not sufficiently engage global nonlinearities brought about by active somatic conductances or global events such as Ca${}^{\text{2+}}$ spikes. Thus, we could assume linear computation over this range without loss of generality (Figure 1—figure supplement 2). In fact, from this perspective, it is a nontrivial phenomenon to account for on its own right that stimulating such a small fraction of inputs already leads to observable nonlinearities in the postsynaptic dendrite. By defining the computation to be linear, we could demonstrate that such strong dendritic nonlinearities arise naturally in our theory, entirely due to the correlations in the prior input statistics, thus providing a functional account for this remarkable phenomenon.
Once patterned dendritic stimulation over a broader and more realistic range of inputs becomes feasible, our theory will provide a principled method for dissecting the roles of presynaptic correlations vs. genuine nonlinear computations in shaping dendritic nonlinearities. A sufficiently rich set of such data will allow the fitting of presynaptic parameters, as we did here, followed by fitting postsynaptic transfer functions to dendritic responses without having to make strong prior assumptions about their (linear) nature.
Analog communication, stochastic synaptic transmission and shortterm synaptic plasticity
Our formalism was based on the assumption that cortical neurons only influence each other’s membrane potentials via the action potentials they emit. While there exist other, more analog forms of communication, such as the modulation of the effects of action potentials by subthreshold potentials (Clark and Häusser, 2006), the propagation of voltage signals through gap junctions (Vervaeke et al., 2012), and ephaptic interactions between nearby cells (Anastassiou et al., 2011), these either require slow membrane potential dynamics, small distances between interacting cells, or large degrees of population synchrony, and are thus generally believed to have a supplementary role beside spikebased communication (Sengupta et al., 2014). Note that our theory is selfconsistent even though it considers spiking only in the presynaptic population and not in the postsynaptic neuron. This is because we assumed that the computationally relevant mapping is that between the membrane potentials of the presynaptic neurons and the postsynaptic cell (Figure 1A, Equation 1), and so, by induction, the spikes of the postsynaptic neuron will effect the mapping from its membrane potential to those of its postsynaptic partners.
We also assumed that presynaptic spikes deterministically and uniformly impact the postsynaptic response, and thus apparently neglected the stochasticity in synaptic transmission, and in particular systematic variations in synaptic efficacy brought about by shortterm synaptic plasticity. Nevertheless, these presynaptic features are compatible with our theory. The stochasticity of synaptic transmission, due to a baseline level of synaptic failures, is straightforward to incorporate in the model by reducing the effective presynaptic firing rate, which can thus be interpreted as a ‘transmission rate’ instead. In fact, we have already done this while matching hippocampal and neocortical presynaptic statistics (Table 1).
Shortterm synaptic plasticity, resulting in dynamical changes in synaptic efficacy as a function of the recent spiking history of the presynaptic neuron, is not only a constraint in our framework, but as we have shown in related work, it can act itself as an optimal estimator of the membrane potentials of individual presynaptic neurons (Pfister et al., 2010). Thus, the effects of shortterm plasticity can be regarded as a special case of what can be expected from our optimal response: when presynaptic neurons are statistically independent, spikes arriving at different synapses are integrated linearly, and local nonlinearities acting at individual synapses suffice (Figure 1C, see also Materials and methods). However, the importance of nonlinear interactions between inputs from different presynaptic neurons, brought about by dendritic nonlinearities, rapidly increases with the magnitude of presynaptic correlations, especially in large populations (Figure 1D, see also Ujfalussy et al., 2011).
These considerations suggest that shortterm synaptic plasticity and dendritic nonlinearities have complementary roles in tuning the postsynaptic response to the statistics of the presynaptic population along the orthogonal dimensions of time and space. The former is useful in the face of temporal correlations private to individual presynaptic neurons (autocorrelations, e.g., brought about by spike frequency adaptation, Pfister and Surace, 2014), while the latter is matched to spatiotemporal correlation patterns present across the presynaptic population.
Inhibitory neurons
We focused on the nonlinear integration of excitatory inputs in the dendritic tree of cortical neurons that have been extensively studied and described over the past decades, giving rise to a strong body of converging evidence as to their characteristics and mechanisms (Spruston, 2008). Recent work studying the nonlinear interaction between inhibitory and excitatory inputs in active dendrites (Gidon and Segev, 2012; Jadi et al., 2012; Müller et al., 2012; Wilson et al., 2012; LovettBarron et al., 2012) demonstrated that local inhibition has a powerful control over the excitability of the dendritic tree.
However, it is not yet clear whether these inhibitory inputs are directly involved in the computation performed by the circuit, just as excitatory neurons but with negative signs (Koch et al., 1982), or, alternatively, they may have a more ancillary role in supporting computations carried out primarily by excitatory neurons (Vogels et al., 2011).
Our theory can be extended to include both possibilities, by allowing inhibitory inputs to contribute to the computational function, $f\left(\mathbf{u}\right)$, with negative weights, or by considering them as providing auxiliary information about the common state of the excitatory presynaptic ensemble, especially when this state is in the more suppressed regime. Indeed, our preliminary results suggest that such an extension of our theory (Ujfalussy and Lengyel, 2013) successfully accounts for the interaction of (excitatory) Schaffer collateral inputs with the feedforward inhibitory effects of the temporoammonic pathway (Remondes and Schuman, 2002), likely mediated by interneurons delivering dendritic inhibition (DvorakCarbone and Schuman, 1999).
In the present paper we focused on dendritic integration in pyramidal neurons because dendritic nonlinearities have traditionally been more extensively characterized in this cell type, but our theory equally applies to synaptic integration in other types of neurons, including inhibitory interneurons. Therefore, our theory predicts a qualitative similarity of dendritic integration in different neuron types (i.e. interneurons versus principal cells) when they receive inputs from overlapping presynaptic populations. Indeed, it has been recently found that inhibitory interneurons can exhibit dendritic NMDA spikes under certain experimental circumstances (Katona et al., 2011; Chiovini et al., 2014) in addition to standard sublinear integration. The differences between dendritic integration in excitatory and inhibitory neurons could be attributed to their different computational function, $f\left(\mathbf{u}\right)$, or differences in the specific presynaptic populations innervating them.
Adaptation of dendritic nonlinearities to presynaptic statistics
According to our theory, the optimal response depends on prior information about the input statistics. Consequently, for dendritic processing to approximate the optimal response, this prior information needs to be implicitly encoded in the form of the particular nonlinearity a dendrite expresses. Therefore, our theory predicts an ongoing adaptation of dendritic nonlinearities to presynaptic firing statistics across several timescales.
First, there is a simple yet potent mechanism implicit in our theory that can ensure that a match of dendritic integration to presynaptic statistics is maintained as those statistics are changing over time. This is based on the observation that essentially instantaneous, albeit probably incomplete, adaptation can occur even without specific changes in the integrative properties of dendrites per se, simply due to the fact that a critical level of input synchrony is required to elicit dendritic spikes, and so the same sigmoidlooking dendritic transfer function can be used as superlinear, linear, or sublinear, depending on which part of its input range is being used (Figure 3—figure supplement 2).
Second, to match the more specific modulation of the statistics of presynaptic activities by global cortical states (Crochet et al., 2011; Mizuseki and Buzsaki, 2014), dendritic integration may also be modulated by these states. As different cortical states are typically accompanied by changes in the neuromodulatory milieu (Hasselmo, 1995; Harris and Thiele, 2011), neuromodulators may be the ideal substrates to ensure that dendritic integration also changes according to the current cortical activity mode. This may provide a functional account of changes in the excitability of the dendritic tree as dynamically regulated by acetylcholine and monoamines (Sjöström et al., 2008).
Third, experiencedependent synaptic plasticity can gradually change the statistics of the presynaptic population activity implying that the optimal form of input integration should also change as a function of experience. We propose that branchspecific forms of plasticity of dendritic excitability (Losonczy et al., 2008; Makara et al., 2009; Müller et al., 2012) may have a functional role in enabling dendrites to adjust the form of input integration to such slowly developing and longlasting changes in the statistics of their inputs.
Finally, whether inputs from two presynaptic cells are integrated linearly or nonlinearly in a dendrite depends critically on the distance between their synapses within the dendritic tree (Polsky et al., 2004; Losonczy and Magee, 2006). Our theory requires nonlinear integration of inputs from neurons with statistically dependent activity, predicting a mapping of presynaptic correlations on the postsynaptic dendritic tree. Local electrical and biochemical signals can drive synaptic plasticity (Larkum and Nevian, 2008; Govindarajan et al., 2011; Winnubst et al., 2015) and rewiring (DeBello, 2008) leading to synaptic clustering of correlated inputs along the dendritic tree (Kleindienst et al., 2011; Takahashi et al., 2012).
A combination of all these mechanisms may be crucial for achieving and dynamically maintaining, at the level of individual neurons, a detailed matching of dendritic nonlinearities to presynaptic statistics. Thus, our theory provides a novel framework for studying a range of phenomena regarding the dynamical regulation of dendritic nonlinearities from the perspective of circuitlevel computations.
Materials and methods
Source code for reproducing the analyses and simulations presented in the paper as well as the experimental data we used for testing our predictions are available online (https://bitbucket.org/bbu20/optimdendr).
Computing the optimal response
In order to study the optimal form of input integration with realistic input statistics, we need to make two important assumptions. First, we need to assume a particular algebraic form for the computation that a neuron performs. Second, we need to define what the relevant presynaptic statistics are, that is, the membrane potential and spiking dynamics of the presynaptic population under naturalistic in vivo circumstances. Given these two assumptions, the theory uniquely defines the optimal response of a neuron to any input pattern. The optimal response has qualitatively similar properties whether computations are defined as mappings between pre or postsynaptic membrane potentials or firing rates (Figure 1—figure supplement 1).
Throughout the paper the term input refers to the spatiotemporal spiking pattern impinging the neuron while the response of a neuron refers to its (subthreshold) somatic membrane potential (or firing rate, see below). All parameters used in the paper are given in Table 1 or in the caption of the corresponding Figure.
Postsynaptic computation
We assumed the postsynaptic computation to be linear, i.e. the dynamics of the postsynaptic membrane potential $v\left(t\right)$ evolves according to a weighted sum of the presynaptic membrane potential values, $\mathbf{u}\left(t\right)$ (cf. Equation 1):
where ${\tau}_{\text{post}}$ is the time constant of the postsynaptic neuron, $N$ is the number of presynaptic neurons, and ${w}_{i}$ is the computational weight assigned to presynaptic neuron $i$. As the postsynaptic neuron cannot access presynaptic membrane potentials, $\mathbf{u}$, directly only the spikes the presynaptic cells emit, $\mathbf{s}$, (Figure 1, Equation 2), the optimal response (that minimizes mean squared error) is the expectation of Equation 5 under the posterior distribution of the presynaptic membrane potential at time $t$, $\mathbf{u}\left(t\right)$ given the history of presynaptic spiking up to that time, $\mathbf{s}(0:t)$ (cf. Equation 3):
Throughout the paper we call the output of Equation 6 the optimal response and compare its behavior to input integration in the dendrites of cortical pyramidal cells.
Table 1 shows the values of the parameters in Equation 5 ($N$, ${\tau}_{\text{post}}$, and ${w}_{i}$) used in the simulations. In short, to illustrate the contributions of inference to Equation 6 (the term including the integral), we used ${\tau}_{\text{post}}=0$ in Figures 1, 3 and 4 as well as in all Supplemental Figures, unless otherwise stated. We used ${\tau}_{\text{post}}=10$ ms in Figure 2 to aid comparison with experimental data and fitted ${\tau}_{\text{post}}$ to data for Figure 5. Throughout the paper we used ${w}_{i}=1/N$, except in Figure 5 where we fit $N$ and ${w}_{i}=w$ jointly to the data.
Presynaptic statistics
Computing the posterior, $\text{P}\left(\mathbf{u}\left(t\right)\right\mathbf{s}(0:t))$ in Equation 6 requires a model for the joint membrane potential and spiking statistics of the presynaptic population, $\text{P}(\mathbf{u},\mathbf{s})$ (see also Equation 4). For mathematical convenience, we present some of our results below in discrete time with time step size $\delta t$, which we will eventually take to zero to derive timecontinuous equations. We distinguish discrete and continuous time results by using time as an index versus as an argument of the corresponding timedependent quantities, e.g. ${\mathbf{u}}_{t}$ vs. $\mathbf{u}\left(t\right)$.
We describe the joint statistics of presynaptic membrane potentials and spikes by a hierarchical generative model that has three layers of variables (Figure 2—figure supplement 1A,B). The global state of the system is described by a single binary variable, $z$ that switches between a quiescent ($$) and an active ($+$) state following firstorder Markovian dynamics (see Appendix for the extension to an arbitrary number of states). The transition rates to the active and quiescent states are given by ${\mathrm{\Omega}}_{+}$ and ${\mathrm{\Omega}}_{}$, respectively.
The dynamics of (subthreshold) membrane potentials $\mathbf{u}$ are modeled as a multivariate OrnsteinUhlenbeck (mOU) process, which yields random walklike behavior that (unlike simple Brownian motion) decays exponentially towards a baseline defined by the resting potential $\overline{\mathbf{u}}$, which in turn depends on the momentary global state of the system, ${z}_{t}$:
where $\tau $ is the presynaptic time constant of the exponential decay, and $\mathbf{Q}$ is the ‘process noise’ covariance matrix determining the variance of individual membrane potentials (together with $\tau $) and, importantly, also the correlations between presynaptic neurons. It is straightforward to extend the model by also making these parameters state (or in fact, neuron) dependent.
Note that both the state switching and mOU components of this model introduce both spatial and temporal statistical dependencies in the membrane potentials and spike trains of presynaptic cells. In the rest of this paper, we informally refer to any statistical dependency (second or higher order) as ‘correlation’, and we write ‘autocorrelation’ when we refer to the correlations between the membrane potential (firing rate) values of the same neuron at different times, and ‘crosscorrelation’ when referring to the correlation between the activities of two different cells (at the same time, or at different times). Also note that temporal and spatial correlations can not be studied in complete isolation in the case of smoothly varying signals, such as membrane potentials, as the crosscorrelation between the activity of two presynaptic neurons always has a characteristic temporal profile. While it is possible to consider a presynaptic neuronal population completely lacking spatial correlations (i.e. independent presynaptic neurons, as in Figure 1C), having a population with only spatial but not temporal correlations would require the membrane potentials of the individual neurons to be temporally white noise – which is so far removed from reality that we did not consider this case worth pursuing.
More specifically, the timescale of temporal correlations (autocorrelations) in the model depend on the transition rates of the switching component, ${\mathrm{\Omega}}_{+}$ and ${\mathrm{\Omega}}_{}$, and the presynaptic time constant of the mOU component, $\tau $, such that cells are autocorrelated as long as $\tau $, ${\mathrm{\Omega}}_{+}^{1}$, and ${\mathrm{\Omega}}_{}^{1}>0$. Spatial correlation (crosscorrelations between different presynaptic neurons) also emerge from both components. First, the pairs of presynaptic neurons corresponding to the nonzero offdiagonal elements of $\mathbf{Q}$ matrix of the mOU component become correlated. Second, synchronous state transitions during state switching in the presynaptic ensemble introduce positive correlations. Importantly, in both the temporal and spatial domains, while the mOU process can only introduce secondorder correlations (i.e. it makes membrane potentials be distributed according to a multivariate normal), the switching process introduces higher order correlations (such that membrane potentials are not normally distributed any more). These higher order correlations are stronger when the effect of stateswitching is large relative to the membrane potential fluctuations within a single state.
Finally, instead of modeling the detailed dynamics of action potential generation, we model spiking phenomenologically by introducing a single discrete variable, ${s}_{i,t}$, for each presynaptic neuron that represents the number of spikes neuron $i$ fires in time step $t$. (Note that in the limit $\delta t\to 0$ this variable becomes binary, i.e. there can never be more than one spike fired in a $\delta t$ time window.) Spiking in each cell only depends on the membrane potential of that cell, and follows an inhomogeneous Poisson process with the firing rate, $r$, being an exponential function of the membrane potential (Gerstner and Kistler, 2002):
where $\beta $ describes how deterministically the cell switches to firing at threshold ($u=0$) and $g$ is the firing rate at that threshold. We modeled the absolute refractory period by not allowing the generation of spikes (i.e. setting ${r}_{i,t}=0$) within a time window of length ${\tau}_{\text{refr}}$ following each spike in a cell, regardless of its membrane potential.
The parameters of the presynaptic statistics used in the paper are given in Table 1. Examples of neural dynamics generated by the model are shown in Figures 1, 2, 4, and 5.
Inference and the optimal response
Our goal was to infer the posterior distribution of the membrane potential based on the spiking pattern observed up to time $t$, $\text{P}\left({\mathbf{u}}_{t}\right{\mathbf{s}}_{0:t})$.
We first show that linear dendritic integration is sufficient when presynaptic neurons are statistically independent. We start by noting that by marginalising out the past membrane potential history of the presynaptic cells and using Bayes’ rule, the posterior can always be written as
and as the spikes of each neuron are independent from all other neurons conditioned on its own membrane potential history (Equation 8), this can be rewritten as
In the special case when we assume that presynaptic neurons are statistically independent, i.e. their prior factorizes $\text{P}\left({\mathbf{u}}_{0:t}\right)={\prod}_{i}\text{P}\left({u}_{i,0:t}\right)$, the posterior also becomes factorised
which in continuous time reads simply as
Thus, taking our usual assumption that the postsynaptic computation is linear (Equation 5), the optimal response in Equation 6 can be written as
indicating that integration of inputs from different neurons is linear in this case (it is a weighted sum of terms each depending on just a single presynaptic neuron). However, even in this case, note that integration of input spikes from the same presynaptic neuron, i.e. the result of the integral over each ${u}_{i}\left(t\right)$ as a function of ${s}_{i}(0:t)$, is still nonlinear in general (Pfister et al., 2010). Indeed, Equation 14 including these local nonlinearties was used to compute the linear response in Figure 1C–D.
In the general case inference can be performed using filtering such that in each step we update the inferred state of the hidden variables, ${z}_{t}$ and ${\mathbf{u}}_{t}$, using information from two different sources: the likelihood of emitting a particular spiking pattern (observation) and the dynamics of the hidden variables combined with the previous estimate (innovation):
where the likelihood $\text{P}\left({\mathbf{s}}_{t}\right{\mathbf{u}}_{t})$ is defined by Equation 8, the dynamics of the global state variable $\text{P}\left({z}_{t}\right{z}_{t\delta t})$ is first order, Markovian (see above) and the statedependent membrane potential dynamics $\text{P}\left({\mathbf{u}}_{t}\right{\mathbf{u}}_{t\delta t},{z}_{t})$ is given by Equation 7. Equation 15 thus defines a mapping between the posterior distribution of the hidden variables in the previous time step (last term on RHS) and their current distribution (LHS). The posterior over membrane potentials can then be obtained by simply marginalising out the state variable:
For the following, it is useful to represent the posterior as a product of two terms:
As the state variable is binary, its posterior is a Bernoulli distribution which we parametrize by $\zeta $, without loss of generality:
However, in general, the posterior of the membrane potentials conditioned on the current state, ${z}_{t}$, can be arbitrarily complex. To allow an analytical reduction of the inference process, we adopted an assumed density filtering approach in which this distribution is momentmatched in each time step by a multivariate normal distribution which is thus described by two (sets of) parameters, its mean, ${\mathbf{\mu}}_{t}^{\left(z\right)}$, and covariance, ${\mathbf{\text{\Sigma}}}_{\mathbf{t}}^{\mathbf{\left(}\mathbf{z}\mathbf{\right)}}$:
with the analogous equation for the posterior of ${\mathbf{u}}_{t}$ conditioned on ${z}_{t}$ being in the $$ state.
One advantage of this parametric approach is that inference (filtering) can be implemented by updating only the parameters describing the (approximate) posterior distribution (Equations 18–19): ${\zeta}_{t}$, ${\mathbf{\mu}}_{t}^{\left(z\right)}$, and ${\mathbf{\text{\Sigma}}}_{t}^{\left(z\right)}$. In the Appendix we derive an analytical form for these parameter updates resulting in the following set of differential equations:
where ${\mathbf{\gamma}}^{\left(z\right)}$ (${\mathbf{\Gamma}}^{\left(z\right)}$) is a statedependent vector (diagonal matrix) of which the elements ${\gamma}_{i}^{\left(z\right)}={\mathrm{\Gamma}}_{ii}^{\left(z\right)}=g{e}^{\beta {\mu}_{i}^{\left(z\right)}+\frac{1}{2}{\beta}^{2}{\mathrm{\Sigma}}_{ii}^{\left(z\right)}}$ are the expected firing rates of the neurons in a given state, ${\overline{\gamma}}^{\left(z\right)}={\sum}_{i}{\gamma}_{i}^{\left(z\right)}$ is the expected total population firing rate in state $z$, and $\u27e8\mathbf{\Gamma}\u27e9=\zeta {\mathbf{\Gamma}}^{+}+(1\zeta ){\mathbf{\Gamma}}^{}$ is the expected firing rate of the cells averaged across states. In these equations, the spike trains of the presynaptic neurons are represented by the sum of Diracdelta functions in continuous time and are denoted by $\mathbf{s}\left(t\right)$, to be distinguished from its discrete time analog, ${\mathbf{s}}_{t}$, such that $\mathbf{s}\left(t\right)=\underset{\delta t\to 0}{\mathrm{lim}}{\mathbf{s}}_{t}/\delta t$ (see Equation A62 in the Appendix). The differential equations for the conditional mean and variance in the $$ state, ${\dot{\mathbf{\mu}}}^{}$ and ${\dot{\mathbf{\Sigma}}}^{}$, are analogous to Equations 21–22. The absolute refractory period is taken into account by setting ${\gamma}_{i}={\mathrm{\Gamma}}_{ii}=0$ after each observed spike for the duration of the refractory period, ${\tau}_{\text{refr}}$, thus omitting the effect of the likelihood (terms containing ${\mathbf{\gamma}}^{\left(z\right)}$ or ${\mathbf{\Gamma}}^{\left(z\right)}$) from Equations 20–22.
The first term in Equation 20 captures the decay in ${\zeta}_{t}$ that is proportional to the difference in the state conditional firing rates in the absence of presynaptic spikes; the second term expresses the instantaneous change in ${\zeta}_{t}$ after observing a spike, proportional to both the state estimation uncertainty, $\zeta \left(1\zeta \right)$, and the differences in the conditional firing rates (${\mathbf{\gamma}}^{+}{\mathbf{\gamma}}^{}$); and the last term captures the decay of ${\zeta}_{t}$ to its steady state in the absence of observations. The filtering equations for the conditional mean and covariance (Equations 21–22) are each composed of three terms: the first term expresses the decay of the variable towards its baseline in the absence of observations; the second term captures the effect of the current observation (i.e. the presence or absence of a spike) on the variable; and the third term describes the changes in the variable caused by potential state transitions. This can be viewed an extension and generalization of earlier work deriving the equivalents of Equations 21–22 for the special case of a single neuron without stateswitching dynamics (Pfister et al., 2010).
Another advantage of the parametrization of the posterior we chose is that computing the optimal response, i.e. the posterior expectation of the simple linear functions that we consider in this paper, becomes straightforward (cf. Equation 6):
In order to verify the assumed density filtering approximations used above we numerically integrated the system of differential equations (Equations 20–22) using the software package R (R Core Team, 2012; Soetaert et al., 2010) and compared the results with those obtained using standard particle filters (Doucet et al., 2001). In these simulations we used 500,000 particles to evaluate Equation 15 with $1$ neuron and 2 states. Figure 2—figure supplement 1C shows that the results of assumed density filtering are essentially identical to those of particle filtering, confirming that the approximations we used were valid.
Dendritic approximation of the optimal response
Here we first describe a simple canonical model of dendritic integration following Poirazi and Mel (Poirazi and Mel, 2001), and then show that it provides an approximation to the optimal response (Equations 20–23) in the limiting case in which presynaptic dynamics are dominated by simultaneous switching between a quiescent and an active state. In this simple dendritic model, inputs within a branch are integrated linearly:
where ${v}_{\mathrm{lin}}$ is the variable linearly integrating inputs with weight $\mathcal{\mathcal{B}}$, dendritic time constant $1/\mathcal{\mathcal{A}}$ and steady state value $\mathcal{\mathcal{C}}/\mathcal{\mathcal{A}}$ (in the absence of spikes), and $s\left(t\right)={\sum}_{i}{s}_{i}\left(t\right)$ denotes the spike train of presynaptic neurons, collecting all spikes from the presynaptic population. The actual dendritic response, ${v}_{\mathrm{den}}$, is then given by mapping this linear response through a sigmoidal nonlinearity, scaled to be between ${v}_{\mathrm{min}}$ and ${v}_{\mathrm{max}}$ (Figure 3A, inset):
To demonstrate that this reduced model of dendritic integration closely approximates the optimal response, we first note that under appropriate conditions (${\tau}_{\text{post}}$ is small, $N$ is large, $\mathbf{Q}$ is diagonal, and $\beta $ is small relative to the diagonal elements of $\mathbf{Q}$) the dynamics of the optimal response are dominated by the state switching process (Equation 20; see Appendix). Thus, the optimal response essentially follows the inference about the global state variable, $\zeta $, up to linear rescaling and filtering:
with ${\overline{u}}^{+}$ and ${\overline{u}}^{}$ respectively denoting the resting membrane potential in the active and quiescent states. As Equation 26 is linear, all nonlinear interactions, corresponding to dendritic nonlinearities, must be contained in the temporal dynamics of the posterior probability of this global state variable being in the active state, $\zeta \left(t\right)$, which can be expressed as
where constants $B$ and $C$ depend on the parameters of the presynaptic statistics (see Appendix). Note that the fact that $\zeta \left(1\zeta \right)$ multiplies Equation 27 expresses the simple intuition that the size of the update to $\zeta $ (the posterior probability of $z=+$) in response to incoming information (presence, $B$ term, or absence of a spike, $C$ term) should be proportional to our current (posterior) uncertainty about $z$; and since the posterior is a Bernoulli distribution, the uncertainty associated with it is simply $\zeta \left(1\zeta \right)$.
The solution of Equation 27 can be expressed in a form that is similar, albeit not identical (see below), to the canonical model for dendritic integration (Equations 24–25). This form requires the linear integration of incoming spikes
and the temporal evolution of $\zeta $ is expressed as a sigmoidal function of the linearly integrated inputs $\nu $:
Thus, dendrites with sigmoidal nonlinearity are nearoptimal when their synaptic inputs switch between a quiescent and an active state.
The main difference between the dendritic integrator and the optimal response is that the dynamics of spike integration imply exponential decay towards a finite baseline in the former (there is a negative term in Equation 24 which is scaled by ${v}_{\mathrm{lin}}$) and steady decrease towards negative infinity in the latter (the only negative term in Equation 28 is a constant, independent of $\nu $). This is because the approximations we used for deriving Equation 27 were accurate only in the quasistatic case, when the state switching dynamics are infinitely slow. In this case, remote and more recent observations should have identical effects on the current value of $\zeta $ as they all correspond to the same underlying state. In the more general case, when state switching occurs with nonzero probability, more remote observations likely correspond to a state which has changed in the meantime, and should thus count for less, such that their effect on the current value of $\zeta $ should decay with time – leading to leaky integration of incoming spikes, similar to that in Equation 24.
Simplified neuron models
To compare quantitatively the response of the linear and nonlinear dendrites to the optimal response in a computational task using realistic input statistics, we divided the presynaptic population into four groups (cell assemblies), where neurons within each group were statistically dependent (either through simple secondorder correlations, Figure 4—figure supplement 2, or through sharing a common state variable, Figure 4 and Figure 4—figure supplement 2) while neurons from different groups were independent (Figure 4A). In this task we used 4 different versions of the simplified neuron model (Figure 4B):
The linear model responded to all incoming spikes with an identical postsynaptic potential (PSPs) characterized by its amplitude (${w}_{\mathrm{\ell}}$), decay time constant (${\tau}_{\mathrm{\ell}}$) and the resting potential (${\overline{v}}_{\mathrm{\ell}}$):
where $s\left(t\right)$ is the total incoming spike train (as before). This model had three parameters.
The model with a somatic nonlinearity had linear dendrites but a nonlinear soma. Motivated by our analytical calculations we used a sigmoidal nonlinearity:
where we computed ${\stackrel{\mathit{~}}{\mathit{v}}}_{\mathrm{\ell}}\left(t\right)$ as defined in Equation 30 above with ${w}_{\mathrm{\ell}}=1$ and ${\overline{v}}_{\mathrm{\ell}}=0$ (as these parameters were interchangeable with ${\beta}_{\mathrm{g}}$ and ${\theta}_{\mathrm{g}}$). This model had five free parameters.
The random dendrites model had four nonlinear dendritic subunits, each receiving inputs from a unique set of 10 neurons randomly selected from the four presynaptic assemblies, and integrating these inputs using a sigmoidal nonlinearity (Equations 3031). Each subunit had its own set of five parameters, and the outputs of the subunits were simply averaged in the soma (without loss of generality), resulting in 20 parameters in total.
The clustered dendrites model was similar to the random dendrites model, with the important difference that neurons in each cell assembly selectively targeted a single nonlinear dendritic branch. Since the presynaptic statistics was the same for all four branches, we constrained the parameters of the four dendritic subunits to be identical and this model had only five parameters (the same performance was achieved when we relaxed this constraint and optimised all 20 parameters).
To fit the models we generated 240 slong samples of presynaptic activity and optimized the parameters of the models to minimize the squared error between the signal ($v$, the true average of the stimulated presynaptic potentials, cf. Equation 5 with ${w}_{i}=\frac{1}{N}$) and their estimates ($\widehat{v}$, the outputs of the models), averaged over the duration of the sample:
After training, we tested the different models in crossvalidation, on a novel 120 slong input sequence, and quantified their performance by the fraction of variance unexplained, i.e. the temporally averaged squared error, ${\u03f5}_{\mathrm{estimation}}$, normalized by the variance of the signal (as a sensible upper limit on the error – achievable by an estimator that predicts the prior mean, ignoring incoming spikes altogether):
where $\mathrm{Var}\left[v\right]=\frac{1}{T1}{\sum}_{t}{({v}_{t}\text{E}\left[v\right])}^{2}$ and $\text{E}\left[v\right]=\frac{1}{T}{\sum}_{t}{v}_{t}$. (Normalization was unnecessary during training because the parameters of the models that we were optimizing obviously did not influence the variance of the signal.) Figure 4 shows $1{\overline{\u03f5}}_{\mathrm{estimation}}$, i.e. the fraction of variance explained as ‘performance’, and Figure 1—figure supplement 3, Figure 3—figure supplement 2, and Figure 4—figure supplement 2 show ${\overline{\u03f5}}_{\mathrm{estimation}}$ as ‘estimation error’.
Fitting and predicting experimental data
To predict dendritic integration in hippocampal and neocortical neurons we fitted the parameters describing presynaptic statistics in our model, $\text{P}\left(\mathbf{u}\right)$ and $\text{P}\left(\mathbf{s}\right\mathbf{u})$, to the statistical patterns in the activity of their respective presynaptic populations.
The basal dendrites of neocortical layer 2/3 pyramidal cells are targeted by neighbouring pyramidal neurons as well as by neurons from layer 4 (Douglas and Martin, 2004). We used in vivo intracellular paired recordings from layer 2/3 pyramidal neurons in the barrel cortex (Poulet and Petersen, 2008; Gentet et al., 2010; Crochet et al., 2011) to set the parameters of our model to reproduce the salient features of the presynaptic population dynamics during quiet wakefulness (Tables 1–2).
In the hippocampal experiments we stimulated synapses on the proximal dendrites of CA3 neurons targeted by recurrent collaterals of neighbouring pyramidal cells (Andersen et al., 2007). We fitted the presynaptic statistics to in vivo population activity patterns recorded from the hippocampus during quiet wakefulness, characterised by sharp wave (SPW) activity (Csicsvari et al., 2000). As intracellular recordings from CA3 pyramidal neurons during SPW activity in the awake animal are not available, we fitted the presynaptic statistics to awake extracellular data from CA3 (Csicsvari et al., 2000; Grosmark et al., 2012) and intracellular (Ylinen et al., 1995; English et al., 2014) data from CA1 pyramidal neurons (Tables 1 and 3).
Comparing the optimal response to dendritic integration
We used four different parameter sets (models) to describe the activity of the presynaptic population (Figure 1). The parameters of the HP and NC models were fitted to in vivo recordings from the corresponding presynaptic populations as described above. As a control, we used two simpler models with no state switching dynamics. The cor2 model had correlated membrane potential fluctuations with all crosscorrelations between presynaptic neurons being the same, $\frac{1}{N1}\le \rho \le 1$. Neurons in the last model, ind, had independent membrane potential fluctuations. The HP, NC, and ind models had no free parameters, while parameter $\rho $ of the cor2 model was left free and later tuned to fit dendritic data. Note that to fit supralinear cortical responses, $\rho $ had to be tuned to unnaturally large negative values in this model (Figure 5—figure supplement 1A) – and it still produced significantly poorer fits than the HP and NC models (Figure 5).
After setting the parameters of the presynaptic population, we computed the optimal response by numerically integrating Equations 20–22 in the software package R (R Core Team, 2012; Soetaert et al., 2010). When comparing the optimal response to experimental data, we assumed that each uncaging event corresponded to a single spike at the presynaptic axon terminal, and spines not showing measurable gluEPSP were considered to be nonstimulated. For all four presynaptic parameter sets, we varied two postsynaptic parameters to fit the responses of the optimal estimator to dendritic integration data, i.e. the somatic membrane potential traces recorded in our in vitro experiments. These two parameters were the weight $w$ of the presynaptic neurons and the time constant of the postsynaptic filtering, ${\tau}_{\text{post}}$ (Equation 5). To avoid overfitting, we assumed that all presynaptic neurons had equal weight, i.e. $\text{\u2200}i{w}_{i}=w$. A final free parameter that we had to consider was the number of synapses that were in the same functional cluster as the synapses we stimulated in our experiments – where the term ‘functional cluster’ refers to a set of synapses for which the presynaptic cells are correlated. This parameter was irrelevant for the ind model (by definition), it was fixed at 20 for the cor2 model (because its effects on the optimal response were largely indistinguishable from that of varying $\rho $, see above), and it was tuned to fit dendritic integration for the stateswitching models (NC and HP). In sum, the number of free parameters used to fit dendritic integration data was 2 for the ind model and 3 for the cor2, NC, and HP models. We confirmed that the higher number of free parameters in the latter models did not result in an unfair advantage in fitting performance by using Bayesian Information Criterion (BIC), rather than squared error (see below), as our measure of performance. BIC includes an explicit term penalizing the number of parameters, and our results were not qualitatively affected by it: fitting the model using the relevant in vivo statistics resulted in 3500$\pm $940 (NC, mean$\pm $s.d.) and 2000$\pm $1100 (HP) higher BIC scores than when using independent statistics (where each unit of BIC difference corresponds to a likelihood that is higher by a factor of $e\simeq 2.71$).
We fitted each recorded neuron independently using these parameters by minimizing the mean squared error between the predicted, $\stackrel{~}{v}$, and the recorded postsynaptic membrane potential (averaged across repetitions of the same stimulation in the same cell), ${\overline{v}}^{*}$:
To be able to compare results across different neurons and different stimulation protocols, we normalized the error by the total variance of the data:
where $\mathrm{Var}\left[{\overline{v}}^{*}\right]=\frac{1}{T1}{\sum}_{t}{({\overline{v}}_{t}^{*}\text{E}\left[{\overline{v}}^{*}\right])}^{2}$ and $\text{E}\left[{\overline{v}}^{*}\right]=\frac{1}{T}{\sum}_{t}{\overline{v}}_{t}^{*}$. A natural lower bound of our fitting error was the intrinsic variability of the data, so we computed the mean of the variance of the experimental data across repetitions, normalized by the total variance of the data:
where ${v}_{t,l}^{*}$ is the raw data before averaging across repetitions, $L$ is the number of repetitions using the same stimulation protocol in the same cell and ${\overline{v}}_{t}^{*}=\frac{1}{L}{\sum}_{L}{v}_{t,l}^{*}$. Figures 5D,H and Figure 5—figure supplement 1G show ${\overline{\u03f5}}_{\mathrm{fitting}}$ as the ‘fitting error’ and ${\overline{\u03f5}}_{\mathrm{min}}$ as ‘var’. The best fitting parameter values for the postsynaptic time constant, ${\tau}_{\text{post}}$, and total number of neurons in a functional cluster, $N$, are shown in Figure 5—figure supplement 1B,C.
Experimental methods
Neocortex
Slice preparation and electrophysiology
Acute sagittal brain slices (300 μm) incorporating both visual and somatosensory cortex were prepared from 3–6 weekold SpragueDawley rats as previously described (Sjöström and Häusser, 2006) and in accordance with institutional and national guidelines. Experiments were carried out at 32°C–35°C in artificial cerebrospinal fluid (ACSF) containing (in mM): NaCl 125, KCl 2.5, glucose 25, NaH${}_{2}$PO${}_{4}$ 1.25, NaHCO${}_{3}$ 25, MgCl${}_{2}$ 1, CaCl${}_{2}$ 2 (pH 7.3 when bubbled with 95% O${}_{2}$ and 5% CO${}_{2}$). Somatic wholecell recordings were obtained with a Multiclamp 700B amplifier (Molecular Devices, Sunnyvale, CA), and data was acquired at 50 kHz using customwritten software in Matlab 7.2 (Mathworks, Natick, MA) interfacing with an ITC18 A/D board (Instrutech, Holliston, MA). Patch pipettes had a resistance of 3–6 M$\mathrm{\Omega}$ when filled with a solution containing (in mM): KMeSO${}_{4}$ 130, HEPES 10, KCl 7, MgATP 2, Na2ATP 2, Na${}_{2}$GTP 0.3, EGTA 0.05 (pH 7.2) and R${}_{\text{series}}$ was $<$30 M$\mathrm{\Omega}$. For visualization of cell morphology Alexa Fluor 594 (100 μM; Invitrogen, Carlsbad, CA) was added to the internal solution.
TwoPhoton imaging and uncaging
Simultaneous 2photon imaging and uncaging was performed using a dual galvanometerbased scanning system (Prairie Technologies, Middleton, WI) using two Ti:sapphire pulsed lasers (MaiTai, SpectraPhysics, Santa Clara, CA), one tuned to 840 nm for imaging cell morphology, and another tuned to 720 nm for photolysis of MNIcagedLglutamate. Neurons were visualized using an Olympus BX51WI objective (60x, 0.9 NA; Olympus, Melville, NY). Twophoton glutamate uncaging was carried out based on previously published methods (Gasparini and Magee, 2006; Losonczy and Magee, 2006; Branco and Häusser, 2011). MNIcagedLglutamate (12 mM, Tocris Cookson, UK) was dissolved in (in mM): NaCl 125, KCl 2.5, HEPES 10, CaCl${}_{2}$ 2, MgCl${}_{2}$ 1, glucose 25, and puffed locally. To block NMDA receptors (Figure 5E,J, blue squares), 500 μM DAP5 was included in the glutamate puffing pipette.
Stimulation and data analysis
A short burst of presynaptic uncaging events equally spaced in time with inter stimulus interval (ISI) between 1 and 20 ms was applied on 1–7 visually identified dendritic spines. For each dendritic branch either the number of stimuli or the ISI was varied. Uncaging exposure time was 100–500 μs and the intertrial interval was 10 s. All data was acquired using custom written software in Matlab 7.2 (Mathworks). The original data recorded at 50 kHz were averaged across identical trials, filtered with a Gaussian kernel with $\sigma =0.2$ ms and subsampled at 2 kHz for analyses. All spines were responsive in the neocortical experiments.
Hippocampus
Slice preparation and electrophysiology
Adult male SpragueDawley rats (8–12 weekold) were used to prepare transverse slices (400 $\mu $m) from the hippocampus similarly to that described previously (Losonczy and Magee, 2006), according to methods approved by the Janelia Farm Institutional Animal Care and Use Committee and the Animal Care and Use Committee (ACUC) of the Institute of Experimental Medicine, Hungarian Academy of Sciences, and in accordance with 86/609/EEC/2 and DIRECTIVE 2010/63/EU Directives of the EU. Experiments were carried out at 33°C–35°C in artificial cerebrospinal fluid (ACSF) containing (in mM): NaCl 125, KCl 3, glucose 25, NaH${}_{2}$PO${}_{4}$ 1.25, NaHCO${}_{3}$ 25, MgCl${}_{2}$ 1, CaCl${}_{2}$ 1.3, Napyruvate 3, and ascorbic acid 1 saturated with 95% O${}_{2}$ and 5% CO${}_{2}$. Somatic wholecell recordings were obtained with BVC700 amplifier (Dagan, Minneapolis, MN) in the active ‘bridge’ mode, filtered at 3 kHz and data was acquired at 50 kHz. Patch pipettes had a resistance of 2–6 M$\mathrm{\Omega}$ when filled with a solution containing (in mM): Kgluconate 120, KCl 20, HEPES 10, NaCl 4, Mg${}_{2}$ATP 4, Tris${}_{2}$GTP 0.3, phosphocreatine 14, complemented with 100 $\mu $M Alexa Fluor 488 (InvitrogenMolecular Probes, Eugene, OR) and $\sim $0.1–0.3% biocytin (Sigma), pH=7.25 and R${}_{\text{series}}$ was $<$30 M$\mathrm{\Omega}$. Alexa Fluor 488 fluorescence or biocytin labeling with immunoperoxidase reaction was used for post hoc verification of the localization of neurons along the proximodistal axis of CA3. All CA3 neurons included in this study had resting membrane potentials between −62 and −72 mV. Cells were hyperpolarized when necessary to avoid action potential firing during synaptic stimulation.
TwoPhoton imaging and uncaging
A dual galvanometer based two photon scanning system (Prairie Technologies, Middleton, WI) equipped with an Olympus BX61 microscope (60X, 0.9 NA objective) was used to image Alexa 488loaded neurons and to uncage glutamate at individual dendritic spines as described (Losonczy and Magee, 2006; Makara and Magee, 2013). Two ultrafast pulsed laser beams (Chameleon Ultra II; Coherent, Auburn, CA, USA) were used, one at 920 nm for imaging Alexa 488 and the other at 720 nm to photolyze MNIcagedLglutamate (Tocris Cookson, Ballwin, MO, USA; 10 mM applied through a pipette above the slice). Laser beam intensity was independently controlled with electrooptical modulators (Model 350–50, Conoptics, Danbury, CT, USA). Unitary gluEPSP amplitude and rise time was close to that of mEPSPs as measured by sucrose application at dendritic segments 70–168 $\mu $m from the soma as described before (Magee and Cook, 2000; Makara and Magee, 2013). To standardize these experiments, results were included in the analysis only if 1) at least $65\%$ of the selected spines were responsive (see below), 2) the average amplitude of the successful unitary gluEPSPs was 0.2–0.6 mV and maximum unitary gluEPSP amplitude was $\le 1.2$ mV, 3) at least 5 mV expected amplitude was achieved, and 4) unitary responses were stabile with repeated stimulation. To block NMDA receptors 50 $\mu $M AP5 was added to both bath solution and the puffing pipette solution.
Stimulation and data analysis
Dendritic branches on basal dendritic segments 100–160 $\mu $m from the soma were stimulated by synchronous uncaging of MNIglutamate at a spatially clustered set (1–32) of visually identified spines using 0.2 ms uncaging duration with different intervals (in the range of 0.1–5 ms) between synapses. For each dendritic branch either the number of stimuli (control) or the ISI (NMDA block) was varied. To detect nonresponsive spines, we fitted the individual responses with a doubleexponential function:
Spines were classified as nonresponsive if the rise time of the gluEPSP (${\tau}_{2}$) was slower than 15 ms or its start time (${t}_{0}$) was more than 5 ms (relative to the stimulation time). The original data recorded at 50 kHz were averaged across identical trials, filtered with a Gaussian kernel with $\sigma =0.2$ ms and subsampled at 2 kHz for analyses.
Appendix
A Derivation of the optimal response
Here we derive the filtering equations for the switching multivariate OU (mOU) process introduced in the main text, such that we generalise the twostate model presented there to an arbitrary number of hidden states.
A.1 The generative model of presynaptic activities
The generative model of presynaptic activities has two hidden and one visible layers (Figure 2—figure supplement 1). The state of the system is described by a global discrete state variable, $z$, and its dynamics is characterised by the state transition matrix, $\mathbf{\Omega}$:
where ${\mathrm{\Omega}}_{\mathrm{z}\mathrm{z}\text{'}}$ is the rate of transitioning from state $z\text{'}$ to $z$, and
and ${\mathrm{\delta}}_{\mathrm{z}{\mathrm{z}}^{\text{'}}}$ is the Kronecker delta function. This description becomes identical to the state switching dynamics presented in the main text in the case of a binary state variable, $z\in \{,+\}$.
The dynamics of the membrane potentials $\mathbf{u}$ are modeled as a mOU process where the resting potential, ${\overline{\mathbf{u}}}^{\left({z}_{t}\right)}$, the time constant of the neurons, ${\tau}^{\left({z}_{t}\right)}$, and the covariance of the process noise, ${\mathbf{Q}}^{\left({z}_{t}\right)}$, may depend on the current state, ${z}_{t}$:
For simplicity, in the main text we assumed that only the resting potential depends on the state variable.
The observations are spike counts, $\mathbf{s}$, which are given by an inhomogeneous Poisson process for each neuron independently from the other neurons given its own membrane potential:
where $g$ is the baseline firing rate and $\beta $ is a parameter controlling the nonlinearity of the spiking.
A.2 The inference problem
Our goal is to infer the posterior distribution of the current membrane potentials based on the spiking pattern observed up to time $t$:
Inference of the value of the hidden variables requires information from two different sources: the likelihood of emitting a particular spiking pattern (observation) and the dynamics of the hidden variables combined with the previous estimate (innovation).
where the likelihood $\text{P}\left({\mathbf{s}}_{t}\right{\mathbf{u}}_{t})$ is defined by Equation A3, the dynamics of the global state variable $\mathrm{P}\left({z}_{t}\right{z}_{t\delta t})$ is defined by Equation A1 and the statedependent membrane potential dynamics $\mathrm{P}({\mathbf{u}}_{t}=\mathbf{u}{\mathbf{u}}_{t\delta t}={\mathbf{u}}^{\text{'}},{z}_{t}=z)$ is given by Equation A2. Equation A6 is the filtering equation, analogous to Equation 15 in the main text. Note that both the observation ${f}_{\mathbf{s}}(\delta t,\mathbf{u})$ and the innovation $h(\delta t,\mathbf{u},z)$ terms depend on $\delta t$, the time step used in the inference.
In general, computing Equation A6 is intractable so we take an assumed density filtering approach in which we approximate the posterior in each time step with a particular parametric form and derive how the parameters of this approximation change due to Equation A6. Specifically, we represent the posterior using the following factorisation (without approximation):
The first term of Equation A7 is the posterior over the current value of the discrete hidden state variable, ${z}_{t}$, which is a discrete distribution that can be parametrised by a single (vector) parameter, ${\zeta}_{t}$ (still without approximation):
The second term of Equation A7 is the distribution of the membrane potentials conditioned on the current state, which we approximate by a multivariate Gaussian with parameters ${\mathbf{\mu}}_{t}^{\left(z\right)}$ and ${\mathbf{\Sigma}}_{t}^{\left(z\right)}$ respectively describing the conditional mean and covariance of the presynaptic membrane potentials:
In sum, this approximate posterior has three sets of parameters that fully characterise it: ${\zeta}_{t}$, ${\mathbf{\mu}}_{t}^{\left(z\right)}$, and ${\mathbf{\Sigma}}_{t}^{\left(z\right)}$. In Sections A.3A.6 we derive the update dynamics for these parameters when observing arbitrary spiking patterns.
For notational convenience, we also introduce a further quantity, which can be derived from these parameters: the expected firing rate of the cells in each hidden state, either written as a vector ${\mathbf{\gamma}}_{t}^{\left(z\right)}$ or as a diagonal matrix ${\mathbf{\Gamma}}_{t}^{\left(z\right)}$, whose elements are
A.3 Parameter updates
The advantage of describing the elements of the posterior in a parametric form is that we only need to update the parameters of Equations A8A9. As the parameter updates are derived using Equation A6, in which the innovation and the observation terms depended on $\delta t$, the parameter updates will also depend on $\delta t$. Eventually, to derive continuoustime dynamics, we will take the limit of $\delta t\to 0$. Incidentally, the Gaussian approximation in Equation A9 is also more accurate with small $\delta t$.
We rewrite the parameters of the approximate posterior by using auxiliary variables (this also makes it explicit how the update rule depends on $\delta t$):
where the auxiliaries are defined as:
where ${f}_{\mathbf{s}}(\delta t,\mathbf{u})$ and $h(\delta t,\mathbf{u},z)$ were defined in Equation A6.
A.4 Firstorder approximations to parameter updates
Our goal is to derive differential equations for the parameter updates. Therefore we will assume that $\delta t$ is sufficiently small (i.e. $\delta t\to 0$) to use a first order approximation of Equations A11A13 around $\delta t=0$:
where
In particular, in order to compute Equations A19A21 we will need the value of $h(\delta t,\mathbf{u},z)$ and ${h}^{\prime}(\delta t,\mathbf{u},z)$ with $\delta t=0$ (see Section A.8.1):
Having derived the general form of the parameter updates we need to apply it to the possible observations, i.e. derive how observing a particular presynaptic spiking pattern at time $t$ changes the posterior distribution over the state variable, $\text{P}({z}_{t}=z{\mathbf{s}}_{0:t})$ and the presynaptic membrane potentials, $\text{P}({\mathbf{u}}_{t}=\mathbf{u}{z}_{t}=z,{\mathbf{s}}_{0:t})$. In the limit of $\delta t\to 0$ we consider only two alternative outcomes of the observation process: the case of observing zero or one spike in a single time bin.
A.5 Observation: No spikes
We first calculate the likelihood (sec. A.5.1) and the auxiliaries (sec. A.5.2) with ${\mathbf{s}}_{t}=0$ and then derive the corresponding parameter updates in sec. A.5.3.
A.5.1 Likelihood
A.5.2 Auxiliaries
Here we describe the values of the auxiliaries required to compute Equations A19A21 with the likelihood function given above:
The following quantities have been derived in Ujfalussy et al., 2011:
where ${\overline{\gamma}}_{t}^{\left(z\right)}={\sum}_{i}{\gamma}_{t,i}^{\left(z\right)}$ is the expected population firing rate in state $z$.
The derivation of the following auxiliaries is given in Section A.8.1:
A.5.3 Parameter updates
Now we are ready to substitute Equation A34A43 into Equation A19A21 to derive the parameter updates.
State probabilities
However, since ${\overline{\mathrm{\Omega}}}_{\mathrm{z}\text{'}}={\sum}_{\mathrm{z}\text{'}\text{'}}{\mathrm{\Omega}}_{\mathrm{z}\text{'}\text{'}\mathrm{z}\text{'}}$ the last terms in Equation A44 cancel each other and we can write that
where $\u27e8\cdot \u27e9$ denotes averaging over $z\text{'}$, and so $\u27e8{\overline{\gamma}}_{t}\u27e9={\sum}_{z\text{'}}{\zeta}_{t\delta t}^{(z\text{'})}{\overline{\gamma}}_{t}^{(z\text{'})}$ is the average expected population firing rate. In Equation A45 two different processes are changing the posterior state probabilities. First, changes in ${\zeta}^{\left(z\right)}$ are proportional to the difference between the total firing rate in state $z$, ${\overline{\gamma}}^{\left(z\right)}$, and the average firing rate, $\u27e8\overline{\gamma}\u27e9$. This process causes a decay in the posterior probability of those states in which the expected firing rate is high and thus incompatible with the current observation (no spikes). Second, posterior state probabilities are changing according to the prior state transition dynamics captured by $\mathrm{\Omega}$.
Posterior mean
The evolution of the conditional mean is governed by three factors. The first term in the bracket describes the decay of the posterior mean towards the prior mean with time constant ${\tau}^{\left(z\right)}$. The second term captures the effect of the observation (absence of spikes) where the decrease in the posterior mean is proportional to the posterior covariance ${\mathbf{\Sigma}}^{\left(z\right)}$ (because when more uncertain a priori, the observation should have a larger impact) and the expected firing rate ${\mathbf{\gamma}}^{\left(z\right)}$ (because a higher expected firing rate is more incompatible with the absence of spikes and thus requires a larger correction). This term is omitted in the absence of observations, i.e. during the absolute refractory period. Finally, the third term expresses the effect of a possible transition from each of the other states $z\text{'}$ to state $z$ which is proportional to the product of the prior transition rate ${\mathrm{\Omega}}_{zz\text{'}}$ and the relative probabilities of the two states (i.e. the relative proportion of the probability mass coming from state ${z}^{\prime}$ against that already existing in state $z$), and the difference between the respective posterior means (as the more different these are, the larger the bias contributed by the probability mass coming from state ${z}^{\prime}$ is).
Posterior covariance
And finally, in order to compute the covariance, we will need:
Now we substitute Equation A47A48 into Equation A14 to obtain the linear form for the change of the covariance matrix:
The evolution of the posterior covariance is governed by the same three factors as the evolution of the conditional mean. The first is the decay of the posterior covariance towards the prior covariance $\frac{{\tau}^{\left(z\right)}}{2}{\mathbf{Q}}^{\left(z\right)}$ with time constant $\frac{{\tau}^{\left(z\right)}}{2}$. (Note that this decay is twice faster than that of the mean.) The second term captures the effect of the observation (absence of spikes) such that the decrease in the posterior covariance is proportional to the square of the covariance ${\mathbf{\Sigma}}^{\left(z\right)}$ and the expected firing rate ${\mathbf{\Gamma}}^{\left(z\right)}$ (because this observation is more informative, i.e. it causes a larger surprise, when the expected firing rate is higher). Finally, the last term expresses the effect of possible hidden transitions between state $z$ and state $z\text{'}$ on the posterior covariance (which is again proportional to the relative proportion of the probability masses and the differences between the two states).
A.6 Observation: cell $i$ emits a spike
Now we assume that cell $i$ emits a spike while all other neurons remain silent in the current time step. Note that as $\delta t\to 0$ the probability of having multiple presynaptic neurons spiking in the same time bin, or the same cell firing twice, converges to zero.
A.6.1 Likelihood
In this case the current spiking pattern is denoted by ${\mathbf{s}}_{t}={\widehat{\mathbf{s}}}^{\left(i\right)}$, that is ${\hat{\text{s}}}_{j}^{\left(i\right)}\stackrel{\mathrm{\u25b3}}{=}{\delta}_{ij}$ and the likelihood is given by
The derivative of the likelihood is
In the following, we follow the same steps as in the previous section, but replacing ${f}_{0}(0,\mathbf{u})$ and ${f}_{0}^{\prime}(0,\mathbf{u})$ with ${f}_{{\widehat{\mathbf{s}}}^{\left(i\right)}}(0,\mathbf{u})$ and ${f}_{{\widehat{\mathbf{s}}}^{\left(i\right)}}^{\prime}(0,\mathbf{u})$, respectively.
A.6.2 Auxiliaries
Again, the goal is to derive differential equations for the parameter updates. As the number of time bins with spikes does not grow with $\delta t\to 0$ we only need to compute terms up to zeroth order (cf. Equations A19A21). However, as the likelihood (Equation A51) is 0 when $\delta t=0$, the values of the auxiliaries in Equations A54A56 are also 0 (Equations A15A18), and so we shall use l’Hôpital’s rule to deal with the $\delta t\to 0$ limit:
Substituting Equation A53 into Equations A22A25 gives the following values for the corresponding derivatives (for details see Ujfalussy et al., 2011):
A.6.3 Parameter updates
Finally, by substituting Equations A57A60 back into Equations A54A56, we can derive the parameter updates in the case of observing a spike.
State probabilities
where $\u27e8{\gamma}_{t,i}\u27e9={\sum}_{z\text{'}}{\zeta}_{t\delta t}^{(z\text{'})}{\gamma}_{t,i}^{(z\text{'})}$ is the average (over states) expected firing rate of cell $i$. Equation A61 shows that the posterior state probabilities change instantaneously after observing a spike, and the change is proportional to the normalised difference between the expected firing rate of cell $i$ in state $z$ and its stateaveraged expected firing rate. Therefore if state $z$ is more compatible with the observed spiking pattern than state $z\text{'}$, i.e. ${\gamma}_{i}^{\left(z\right)}>{\gamma}_{i}^{(z\text{'})}$, then the posterior probability of being in state $z$, ${\zeta}^{\left(z\right)}$, will increase while ${\zeta}^{(z\text{'})}$ will decrease.
Equation A61 can be rewritten into vector form as follows:
which in turn can be rewritten in differential equation form by taking the $\delta t\to 0$ limit:
where $\mathbf{s}\left(t\right)=\underset{\delta t\mathit{\to}\mathit{0}}{\mathrm{lim}}{\mathbf{s}}_{\mathrm{t}}/\delta t$ is the sum of Diracdelta functions representing the presynaptic spike trains.
Posterior mean
Thus, the posterior conditional means of the presynaptic membrane potentials also change instantaneously after observing a spike in the presynaptic population. The change is proportional to the posterior variance in the case of the cell that emitted the spike while for other presynaptic neurons the change is proportional to the posterior covariance between the given cell and the neuron that emitted the spike. Note that, similar to the change in the posterior state probabilities, the change of the mean is instantaneous and it does not depend on the time step.
The differential equation form of Equation A63 reads as
where $\mathbf{s}\left(t\right)$ is, again, the sum of Diracdelta functions representing the presynaptic spike trains.
Posterior covariance
Thus, Equation A66 indicates that the posterior covariance does not change directly after observing a spike – only indirectly, through the increase in the posterior mean (Equation A63) and thus $\mathbf{\Gamma}$ (Equation A10), which in turn decreases the covariance in the silent period following the spike (Equation A49).
A.7 Differential equations
Here we summarize the results of Equations A45,A46,A49, and Equation A61A66 as differential equations in continuous time:
Once again note that in the continuous time limit the presynaptic spike trains are represented by the sum of Diracdelta functions and we denote it with $\mathbf{s}\left(t\right)$ instead of ${\mathbf{s}}_{t}$.
In the case of two states ($+$ and $$), we can write Equation A67 as
where we emphasise that the observation of either a spike or the absence of a spike causes changes in the state probabilities that are proportional to their uncertainty, ${\zeta}^{+}\left(1{\zeta}^{+}\right)$.
A.8 Derivations of auxiliaries
A.8.1 Derivation of Equations A26A27
After substituting Equations A1A2 and Equation A9 into Equation A6 we can derive the $\delta t$ dependence of the innovation term:
Substituting $\delta t=0$ into Equation A71 yields Equation A26.
Next, we take the derivative of Equation A71 with respect to $\delta t$:
where the derivative of the last term is provided below in general form.
Substituting $\delta t=0$ into Equation A72 after using Equation A74 to express its last term yields Equation A27.
A.8.2 Derivation for Equations A41A43
The above result can be used to compute terms appearing in the integrals of Equations A22A25 which are involved in computing Equations A41A43
B The optimality of the sigmoidal nonlinearity
Neurons need to combine incoming spikes nonlinearly to perform efficient analog computations based on digital spikes. In the main text, and the previous sections of this Appendix, we derived the optimal way of combining incoming spikes (Equations 2022), but the exact implementation of this optimal nonlinear mapping within the dendritic tree is unrealistic. A plausible alternative would be if each dendrite branch acted as a simple linearnonlinear unit, computing a nonlinear function of the linear combination of its linearly filtered input spike trains (Poirazi et al., 2003), but it is unclear whether this could efficiently approximate the optimal response, and if so then what is the specific functional form of the nonlinearity that is able to make this approximation tight. In the main paper, we demonstrated that a sigmoidal nonlinearity approximates the optimal response remarkably well if the dynamics of the presynaptic population is dominated by simultaneous switching between a quiescent and an active state. In this section we provide additional details for this derivation.
B.1 Simplified presynaptic dynamics
In the following derivations, for convenience, we assume that the weights of the linear computation are roughly uniform and scale inversely with $N$, ${w}_{i}\stackrel{~}{\propto}1/N$, although our derivations also remain valid with other scalings of ${w}_{i}$ as long as the number of weights significantly greater than $0$ scales with $N$. We start by noting that the optimal response can always be trivially rewritten as requiring the inference of the state, $z\left(t\right)$, and conditioned on the state, the average presynaptic membrane potential directly, $m\left(t\right)={\sum}_{i}{w}_{i}{u}_{i}\left(t\right)=1/N{\sum}_{i}{u}_{i}\left(t\right)$ (instead of the individual membrane potentials, as in Equations 3 and 6):
where, for simplicity, we took ${\tau}_{\text{post}}=0$, although it is not essential for our argument – which instead applies to the total input when ${\tau}_{\text{post}}\ne 0$. Note that we do not generally use this seemingly simpler form because the likelihood describing the spike generation process (Equation 8) becomes more complex, and notably nonfactorised, when conditioned on $m\left(t\right)$ instead of $\mathbf{u}\left(t\right)$. Nevertheless, it is instructive to write down the prior over $m$ (cf. Equation 7):
where ${\overline{m}}^{\left(z\right)}=\frac{1}{N}{\sum}_{i}^{N}{\overline{u}}_{i}^{\left(z\right)}$ is the mean and ${\overline{q}}^{2}=\frac{1}{{N}^{2}}{\sum}_{ij}{Q}_{ij}$ is the process noise variance defined in terms of the parameters of the original mOU process. In the following sections, we derive simplified forms of inference, that can be readily related to dendritic processing, by taking different limits to the original model, and in particular Equation A86.
B.2 Sigmoidal nonlinearity with state switching dynamics
We start by assuming that the presynaptic neurons are equally variable and conditionally independent given the population state (${Q}_{i,j}={q}^{2}{\delta}_{ij}$). It is easy to see that in this case ${\overline{q}}^{2}={q}^{2}/N$ and thus the prior variance of $m$ (which is ${\overline{q}}^{2}\tau /2$, as can be seen from Equation A86) also scales as $1/N$. Thus, for a large presynaptic population, $N\to \mathrm{\infty}$, the prior variance of $m$ diminishes which means that the posterior inference is also greatly simplified because there is no uncertainty any more about $m$ given the population state, i.e. $\text{P}\left(m\left(t\right)\rightz\left(t\right)=z,\mathbf{s}(0:t))=\delta (m\left(t\right){\overline{m}}^{\left(z\right)})$, and so the optimal response (Equations A85) for two population states simplifies to
Although it seems that the only quantity that needs updating in Equation A87 is $\zeta \left(t\right)$, in general this still requires keeping track of the membrane potentials of individual presynaptic neurons because the dynamics of $\zeta \left(t\right)$ (Equation 20) depends on the individual expected firing rates (conditioned on each population state). Therefore, to allow further simplification, we assume that all expected firing rates within a state are identical and constant, equal to their prior value, i.e. ${\gamma}_{i}^{+}\approx {\gamma}^{+}=g{e}^{\beta {\overline{u}}^{+}+\frac{1}{2}{\beta}^{2}{q}^{2}\tau /2}$, and so the population rate is simply ${\overline{\gamma}}^{+}\approx N{\gamma}^{+}$ (with corresponding definitions for ${\gamma}^{}$ and ${\overline{\gamma}}^{}$). With these assumptions the dynamics of the posterior estimate of population state reduces to
As we do not track the membrane potential fluctuations of the individual neurons in this reduced model, we also replaced the $\mathbf{s}\left(t\right)$ vector by the total spike train of the population, $s\left(t\right)={\sum}_{i}{s}_{i}\left(t\right)$.
Amongst the parameters of Equation A88 only $C$ scales with $N$, therefore the last term dominates Equation A88 during silent periods. Conversely, the second term (involving $B$) will dominate during spikes. Thus, the first term (involving $A$) can be ignored. Finally, we note that
and therefore $B\left(\zeta \right)$ is nearly constant if the difference between ${\gamma}^{+}$ and ${\gamma}^{}$ is small (i.e. $\beta \ll \frac{\text{In}2}{{\overline{u}}^{+}{\overline{u}}^{}}$), in which case
These considerations allow us to write the approximate form of Equation A88 as (cf. Equation 27):
Solving Equation A91 for $\zeta \left(t\right)$ is possible by noting that it is separable, and so the solution can be written as
where solving the integral on the left yields
In general, the value of $\chi $ would need to be determined by solving the initial value problem, but for simplicity and without loss of generality here we take $\chi =0$ and assume that $\nu \left(0\right)$ has the appropriate value to map to the required $\zeta \left(0\right)$.
From Equation A93, $\zeta $ can be expressed as
which in turn can be rewritten as
where we introduced the new variable $\nu =\int [Bs\left(t\right)C]\text{d}t$ whose definition can be equally written as a differential equation:
The system defined by Equations A96A97 has a form that is closely analogous to that of the standard linearnonlinear model of dendritic processing with a logistic sigmoid nonlinearity, and is presented in the main text (Methods) as Equations 2829.
B.3 Sublinear integration with second order correlations
We also considered another limiting case when the dynamics of the presynaptic population was fully characterized by purely secondorder correlations between the neurons. This means that there is no state switching dynamics, so inference is simplified by taking $\zeta =1$, but we also cannot assume (as we did in the previous section) that the posterior correlations and thus the posterior variance of $m$ vanishes in the large $N$ limit. Instead, we seek a formalism in which the population activity of $N$ cells, represented by the variables $\{\mathbf{u},\mathbf{s}\}$, is captured by the lower dimensional dynamics of $\{m,s\}$ (where $s\left(t\right)={\sum}_{i}{s}_{i}\left(t\right)$ is still the ‘total’ input spike train, as before). Under this simplified model, the prior dynamics of $m$ is as described by Equation A86, and the likelihood is analogous to the one we had for individual neurons (Equation 8):
As the likelihoods of the two models can not be exactly matched, we chose the parameters $\overline{\beta}$ and $\overline{g}$ to match the mean and the variance of the population firing rate. Thus, there is now a single variable that characterizes the whole population, $m$, whose dynamics (both the prior and likelihood) are analogous to those we had for a single membrane potential in the previous model. Consequently, inference in this collapsed model requires the manipulation of only two scalar variables – instead of the original $\mathcal{\mathcal{O}}\left({N}^{2}\right)$ –, the posterior mean and variance of $m$, $\overline{\mu}$ and ${\overline{\sigma}}^{2}$ respectively, whose dynamics can be derived analogously to those obtained before for the full population (Equations 2122):
where, in line with previous notation, $\overline{\gamma}=\overline{g}{e}^{\overline{\beta}\overline{\mu}+\frac{1}{2}{\overline{\beta}}^{2}{\overline{\sigma}}^{2}}$ is the posterior mean estimate of the average firing rate of the presynaptic population. (These equations only apply to the special case when ${w}_{i}=1/N$ and ${Q}_{ij}={\delta}_{ij}q+\left(1{\delta}_{ij}\right)\rho q$, but an extension to the more general case is relatively straightforward.) We used this reduced model to infer the mean membrane potential of the presynaptic population – which is relevant in the special case when ${w}_{i}=1/N$ and ${\tau}_{\mathrm{post}}=0$ (see above) so that $\stackrel{~}{v}=\overline{\mu}$. Figure 3—figure supplement 1A–C demonstrates that our reduced model approximates the full model remarkably well provided that there are substantial correlations in the presynaptic population.
Next, we noted that during inference the dynamics of the posterior variance is faster than the dynamics of the posterior mean (Equations A99A100) and therefore the posterior variance can be approximated by its (meandependent) steady state value, ${\overline{\sigma}}_{\mathrm{\infty}}^{2}\left(\overline{\mu}\right)$. If we replace the posterior variance in Equation A100 with its steady state value we obtain the following approximate onedimensional dynamics:
As the optimal response to a single spike is proportional to the posterior variance, the functional form of the steady state is highly informative about the nonlinearity of the integration. Figure 3—figure supplement 1D shows that ${\overline{\sigma}}_{\mathrm{\infty}}^{2}\left(\overline{\mu}\right)$ is a decreasing sigmoidal function, thus responses to consecutive spikes will add sublinearly in this case, as each spike increases $\overline{\mu}$, which in turn reduces ${\overline{\sigma}}_{\mathrm{\infty}}^{2}$, which results in a smaller response for the next spike.
Expressing Equation A101 directly in the form of linearnonlinear dynamics, as in the case of switching dynamics, did not seem feasible, so we focussed instead on establishing a correspondence only at the times of presynaptic spikes, when $s\left(t\right)>0$. This required that there exists an instantaneous mapping $h\left({v}_{\mathrm{lin}}\right)$ such that
where ${v}_{\mathrm{lin}}$ is the linearly filtered and integrated inputs, whose dynamics $\dot{{v}_{lin}}$ is given by Equation 24. Differentiating both sides of Equation A102 wrt. time, and substituting Equation A101 and Equation 24, respectively, to each side yields
after only keeping the dominating terms including $s\left(t\right)$ on both sides and subsequently simplifying by $s\left(t\right)$.
Although there was no closedform analytical solution for $h\left({v}_{\mathrm{lin}}\right)$ from Equation A103, it could be obtained by numerical integration (Figure 3—figure supplement 1E, blue line). It is clear that the optimal mapping $h$ is indeed sublinear in this case and it can be reasonably well approximated using the concave half of the logistic sigmoid function derived above, for the switching dynamics, even during silent periods (Figure 3—figure supplement 1E, red line versus black dots; Figure 3). There is a simple intuition for the sublinearity implied by $h$ (following on from the intuition we gave for the sublinearity of the response after Equation A101): according to Equation A103, its derivative is proportional to the steady state posterior variance, which in turn is a nonnegative monotonically decreasing function (Figure 3—figure supplement 1D), and hence the final nonlinearity must be monotonically increasing and, importantly, concave – i.e. sublinear.
References
 1
 2

3
Sensory input drives multiple intracellular information streams in somatosensory cortexJournal of Neuroscience 30:10872–10884.https://doi.org/10.1523/JNEUROSCI.617409.2010

4
Ephaptic coupling of cortical neuronsNature Neuroscience 14:217–223.https://doi.org/10.1038/nn.2727
 5

6
A model for intradendritic computation of binocular disparityNature Neuroscience 3:54–63.https://doi.org/10.1038/71125

7
Neural correlations, population coding and computationNature Reviews. Neuroscience 7:358–366.https://doi.org/10.1038/nrn1888
 8
 9

10
Adaptation without parameter change: dynamic gain control in motion detectionProceedings of the National Academy of Sciences of the United States of America 102:6172–6176.https://doi.org/10.1073/pnas.0500491102
 11
 12

13
The probability of neurotransmitter release: variability and feedback control at single synapsesNature Reviews Neuroscience 10:373–383.https://doi.org/10.1038/nrn2634

14
Membrane potential and firing rate in cat primary visual cortexJournal of Neuroscience 20:470–484.
 15
 16

17
Neural coding: hybrid analog and digital signalling in axonsCurrent Biology 16:R585–R588.https://doi.org/10.1016/j.cub.2006.07.007

18
Measuring and interpreting neuronal correlationsNature Neuroscience 14:811–819.https://doi.org/10.1038/nn.2842
 19

20
Fast network oscillations in the hippocampal CA1 region of the behaving ratJournal of Neuroscience 19:RC20.
 21
 22
 23

24
Microrewiring as a substrate for learningTrends in Neurosciences 31:577–584.https://doi.org/10.1016/j.tins.2008.08.006

25
Estimating membrane voltage correlations from extracellular spike trainsJournal of Neurophysiology 89:2271–2278.https://doi.org/10.1152/jn.000889.2002

26
Sequential Monte Carlo Methods in PracticeNew York, NY: Springer New York.https://doi.org/10.1007/9781475734379

27
Neuronal circuits of the neocortexAnnual Review of Neuroscience 27:419–451.https://doi.org/10.1146/annurev.neuro.27.070203.144152
 28

29
Tonic inhibition enhances fidelity of sensory information transmission in the cerebellar cortexJournal of Neuroscience 32:11132–11143.https://doi.org/10.1523/JNEUROSCI.046012.2012

30
Patterned activity in stratum lacunosum moleculare inhibits CA1 pyramidal neuron firingJournal of Neurophysiology 82:3213–.3222.
 31
 32
 33
 34
 35

36
Statedependent dendritic computation in hippocampal CA1 pyramidal neuronsJournal of Neuroscience 26:2088–2100.https://doi.org/10.1523/JNEUROSCI.442805.2006
 37

38
Spiking Neuron Models: Single Neurons, Populations, PlasticityCambridge: Cambridge University Press.https://doi.org/10.1017/CBO9780511815706
 39
 40
 41
 42
 43
 44

45
Cortical state and attentionNature Reviews Neuroscience 12:509–523.https://doi.org/10.1038/nrn3084

46
Neuromodulation and cortical function: modeling the physiological basis of behaviorBehavioural Brain Research 67:1–27.https://doi.org/10.1016/01664328(94)00113T
 47
 48
 49

50
Neurons with graded response have collective computational properties like those of twostate neuronsProceedings of the National Academy of Sciences of the United States of America 81:3088–3092.https://doi.org/10.1073/pnas.81.10.3088
 51

52
Locationdependent effects of inhibition on local spiking in pyramidal neuron dendritesPLoS Computational Biology 8:e1002550.https://doi.org/10.1371/journal.pcbi.1002550

53
Coordinated memory replay in the visual cortex and hippocampus during sleepNature Neuroscience 10:100–107.https://doi.org/10.1038/nn1825
 54
 55

56
Roller coaster scanning reveals spontaneous triggering of dendritic spikes in CA1 interneuronsProceedings of the National Academy of Sciences of the United States of America 108:2148–2153.https://doi.org/10.1073/pnas.1009270108
 57

58
Retinal ganglion cells: a functional interpretation of dendritic morphologyPhilosophical Transactions of the Royal Society B: Biological Sciences 298:227–263.https://doi.org/10.1098/rstb.1982.0084
 59

60
Synaptic clustering by dendritic signalling mechanismsCurrent Opinion in Neurobiology 18:321–331.https://doi.org/10.1016/j.conb.2008.08.013
 61

62
Dendritic computationAnnual Review of Neuroscience 28:503–532.https://doi.org/10.1146/annurev.neuro.28.061604.135703
 63

64
Sublinear integration underlies binocular processing in primary visual cortexNature Neuroscience 16:714–723.https://doi.org/10.1038/nn.3394
 65
 66

67
Regulation of neuronal input transformations by tunable dendritic inhibitionNature Neuroscience 15:423–430.https://doi.org/10.1038/nn.3024
 68

69
Active properties of neocortical pyramidal neuron dendritesAnnual Review of Neuroscience 36:1–24.https://doi.org/10.1146/annurevneuro062111150343

70
Spatiotemporally graded NMDA Spike/Plateau potentials in basal dendrites of neocortical pyramidal neuronsJournal of Neurophysiology 99:2584–2601.https://doi.org/10.1152/jn.00011.2008
 71
 72
 73

74
Learning drives differential clustering of axodendritic contacts in the barn owl auditory systemJournal of Neuroscience 28:6960–6973.https://doi.org/10.1523/JNEUROSCI.135208.2008
 75

76
Translationinvariant orientation tuning in visual "complex" cells could derive from intradendritic computationsJournal of Neuroscience 18:4325–.4334.

77
Visual stimuli recruit intrinsically generated cortical ensemblesProceedings of the National Academy of Sciences of the United States of America 111:E4053.https://doi.org/10.1073/pnas.1406077111

78
Theta oscillations decrease spike synchrony in the hippocampus and entorhinal cortexPhilosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 369, 10.1098/rstb.2012.0530.
 79
 80
 81
 82

83
NMDA spikes enhance action potential generation during sensory inputNature Neuroscience 17:383–390.https://doi.org/10.1038/nn.3646

84
The most likely voltage path and large deviations approximations for integrateandfire neuronsJournal of Computational Neuroscience 21:71–87.https://doi.org/10.1007/s1082700672004
 85

86
Shortterm facilitation as a normative consequence of presynaptic spikerate adaptationIn Cosyne Abstracts.

87
Synapses with shortterm plasticity are optimal estimators of presynaptic membrane potentialsNature Neuroscience 13:1271–1275.https://doi.org/10.1038/nn.2640
 88
 89
 90

91
Computational subunits in thin dendrites of pyramidal cellsNature Neuroscience 7:621–627.https://doi.org/10.1038/nn1253

92
Inference and computation with population codesAnnual Review of Neuroscience 26:381–410.https://doi.org/10.1146/annurev.neuro.26.041002.131112
 93

94
R: A Language and Environment for Statistical ComputingAustria: R Foundation for Statistical Computing, Vienna.
 95
 96

97
Parallel distributed processing, volume 1: explorations in the microstructure of cognition: foundations, MIT PressParallel distributed processing, volume 1: explorations in the microstructure of cognition: foundations, MIT Press.

98
Mechanisms of sharp wave initiation and ripple generationJournal of Neuroscience 34:11385–11398.https://doi.org/10.1523/JNEUROSCI.086714.2014
 99

100
Consequences of converting graded to action potentials upon neural information coding and energy efficiencyPLoS Computational Biology 10:e1003439.https://doi.org/10.1371/journal.pcbi.1003439

101
Simple models for reading neuronal population codesProceedings of the National Academy of Sciences of the United States of America 90:10749–10753.https://doi.org/10.1073/pnas.90.22.10749

102
The variable discharge of cortical neurons: implications for connectivity, computation, and information codingJournal of Neuroscience 18:3870–.3896.

103
The Synaptic Organization of the BrainOxford University Press.https://doi.org/10.1093/acprof:oso/9780195159561.001.1
 104

105
Natural image statistics and neural representationAnnual Review of Neuroscience 24:1193–1216.https://doi.org/10.1146/annurev.neuro.24.1.1193
 106
 107

108
Dendritic excitability and synaptic plasticityPhysiological Reviews 88:769–840.https://doi.org/10.1152/physrev.00016.2007
 109

110
Solving differential equations in r : package deSolveJournal of Statistical Software 33:pp. 1–pp. 25.https://doi.org/10.18637/jss.v033.i09

111
Pyramidal neurons: dendritic structure and synaptic integrationNature Reviews Neuroscience 9:206–221.https://doi.org/10.1038/nrn2286
 112
 113

114
Parallel computational subunits in dentate granule cells generate multiple place fieldsPLoS Computational Biology 5:e1000500.https://doi.org/10.1371/journal.pcbi.1000500

115
Advances in Neural Information Processing Systems 241188–1196, Active dendrites: adaptation to spikebased communication, Advances in Neural Information Processing Systems 24.

116
Feedforward inhibition in hippocampal microcircuits: adaptation to spikebased computation.In Cosyne Abstracts.

117
Dendritic coding of multiple sensory inputs in single cortical neurons in vivoProceedings of the National Academy of Sciences of the United States of America 108:15420–15425.https://doi.org/10.1073/pnas.1112355108
 118
 119
 120
 121
 122

123
Functional organization of the extrinsic and intrinsic circuitry of the parahippocampal regionProgress in Neurobiology 33:161–253.https://doi.org/10.1016/03010082(89)900099
 124

125
Sharp waveassociated highfrequency oscillation (200 hz) in the intact hippocampus: network and intracellular mechanismsJournal of Neuroscience 15:30–46.
 126
Decision letter

Frances K SkinnerReviewing Editor; University Health Network, Canada
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
[Editors’ note: a previous version of this study was rejected after peer review, but the authors submitted for reconsideration. The first decision letter after peer review is shown below.]
Thank you for choosing to send your work entitled "Dendritic nonlinearities are tuned for efficient spikebased computations in cortical circuits" for consideration at eLife. Your full submission has been evaluated by Eve Marder (Senior editor) and three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the decision was reached after very extensive discussions between the reviewers. Based on our discussions and the individual reviews below, we have decided not to consider the present manuscript further at this time. However, we would be willing to consider a significantly revised resubmission (below).
While the reviewers all felt that the work was quite interesting, concern was raised in the ensuing discussions as to the unclear presentation of the essence of the work. In fact, the reviewers spent a considerable time discussing what they thought was the structure of the model, and what it means. The fact that the reviewers went back and forth over these issues indicates a lack of clarity in the Results section. Therefore, it was decided that the work needed significant revisions and significant effort would be required to make the work accessible to the general reader, not only in making the take home message clear, but also in explaining how and why detailed mathematical derivations are needed and used.
Overall, two major aspects require consideration, as summarized below with some suggestions. Additional detailed comments from the reviewers are appended below.
1) A clearer and simpler version of the paper is needed.
A clear line through the reasoning that will allow everyone to see the core of what has been done, and why, needs to be presented. Specifically, the paper needs to be transparent and math that obfuscates should be removed or simplified or explained. The authors are encouraged to present their essential results in a much more straightforward fashion to help elucidate their mathematical derivations.
A suggestion is to use a flow chart of the process/protocol (capturing presynaptic firing statistics, fitting, optimizing etc., and lay bare the limitations/assumptions – e.g., the output computation f(u) is assumed to be a linear function), describe the essential results in words, and point to the mathematical parts in methods/or a mathematical appendix of how and why. We understand and appreciate that you wish to publish the full mathematical treatment, but most eLife readers won't benefit from the math in its present form. The mathematical results could be made complete and readable in an Appendix, while freeing the Results to show the logic of the model and how it works. It looks complicated because of using the most general formulations (multistate state transitions, and multivariate normal with arbitrary statedependent covariance), but the main results are obtained via somewhat simpler formulations. More detailed mathematical derivations could go into a mathematical appendix if necessary.
In this way, people (of diverse backgrounds) would be able to sink their teeth into the contribution, build on it, test it and so on.
2) Takehome message(s) and biological/biophysical intuition needs to be presented.
No biophysical model seems to be included, even simply. Everything seems to be folded into the optimal g(s), so that it is not obvious how best to make a biological/biophysical connection.
The derived 'optimal traces' do, in principle, bring a more stringent test to the theory, but in Figure 4 they have access to the whole presynaptic dynamics so the optimal trace can be replaced by the averaged presynaptic potential, while in Figure 5 the experimental pattern of stimulation is not sufficiently complex to really probe the detailed temporal structure predicted by the theory. The other figures support qualitative statements. This, for example, could be explicitly stated and rationalized for the reader.
Some aspects came through but because of the first point above, there was not always a clear consensus of what the authors intended as the main message(s) from the work. Could these intents (if appropriate) be better explained?
For example:i) Presynaptic neurons with correlated inputs should target neighboring regions of the dendritic tree so that the inputs can sum nonlinearly while presynaptic neurons with uncorrelated inputs should map onto different portions of the tree and the inputs sum linearly?ii) An optimizing aspect/principle that could ultimately predict the 'location structure' of synapses as a crucial test of their theory.iii) Optimally decoding the information present in the spikes to do the desired computation depends upon the correlationsiv) Addressing the question of why there are nonlinearities in the first place.v) An optimal way to recover the averaged presynaptic potential only from the presynaptic spike trains? Can the biophysics approximate this optimal 'decoding' operation?
Reviewer #1:
In this most interesting study, the authors propose a principle that dendritic nonlinearity is optimized to integrate synaptic inputs from the statistics of the presynaptic cells. This principle assumes that communication is mainly via spikes whereas computation is not 'punctate', thus identifying the bottleneck of dendritic processing. They go on to use models (clustered connectivity prediction) and experiment to show that this principle is followed.
The authors may have uncovered a fundamental principle, and as they state in the Discussion, "Once patterned dendritic stimulation over a broader and more realistic range of inputs becomes feasible, our theory will provide a principled method for dissecting the roles of presynaptic correlations vs. genuine nonlinear computations in shaping dendritic nonlinearities." Nice I think.
I have a few comments.
1) The authors could state that graded synaptic transmission exists (many examples in invertebrates), so that their principle may not apply there, I assume. Is it definitively known that graded synaptic transmission is not present in cortical circuits?
2) It would be helpful if the authors could expand on the discussion regarding biophysical substrate. That is, to fill in the blanks on how and why "NMDA spikes may provide a general solution". Was some modeling (not shown) done to be able to state this? Please provide some additional discussion/rationale/assumptions for making these statements.
3) Regarding inhibition, the authors discuss potential addition from the perspective of excitatory cells. However, perspectives from inhibitory cells are not expressed. Do the authors expect their principle to hold from the perspective of inhibitory cells (receiving excitatory and/or inhibitory inputs)? The authors should provide some discussion/rationale/assumptions of why this is not included/considered. I do not think it is a given that computation is only from the perspective of excitatory cells? For example, consider the reviews by Klausberger and Somogyi (Science 2008) and Chamberland and Topolnik (Frontiers in Neuroscience 2012).
Reviewer #2:
Nonlinear dendritic integration is often seen as a thorn in the foot. Following in vitro work by Larkum, Magee, Stuart, Hausser and others, it is now clear that dendrites introduce various types of nonlinearity to input integration. Further observations showed physiological patterns that appeared to tune specifically the dendritic nonlinearities. Computations that are carried out in the dendrites (as in Taylor et al. 2000) were then discovered. Thus, it has become increasingly difficult to ignore their computational role. The search for a theoretical role (mainly by Mel and coworkers), on the other hand, was mainly focused on ratebased computations. Yet, rate and spiking descriptions are often qualitatively different (as in plasticity or synfire chains). Understanding the key mechanisms for spikebased dendritic computations remains a fundamental problem, but a very difficult one.
In this article, Ujfalussy and coworkers address the hypothesis that nonlinear dendritic integration serves to decode the correlated presynaptic spike trains. They describe mainly three observations that back up this hypothesis. First, they show mathematically that the canonical model of nonlinear integration of Poirazi and Mel (P&M) approximates an optimal decoding of the switching between active and quiescent states. Second, they compared how different biophysically detailed models approximated an optimal decoding strategy. They found that synaptic input clustered on nonlinear dendrites performed better at this task than when the same input was lumped in the soma, or synapses randomly assigned to dendrites. Last, they show that in vitro measurements of dendritic integration in a specific cell type matched the optimal integration strategy expected to decode the input statistics of that particular cell in vivo. These are the three main observations, but a particular strength of this article is that it draws a clear topdown rhetoric and provides general estimates of (bayes) optimal spiking communication.
Although I don't subscribe to all the assumptions described here, they are sensible and worth a careful examination. Ujfalussy and coworkers offer a thorough and elegant treatment of the question relying substantially on most of the stateoftheart experimental recordings relevant to this problem. The conclusions are well tested, often with very stringent criteria. The relevant literature is properly cited. The article is generally well written.
The main contributions will appeal to researchers coming from different approaches. The mathematical neuroscientist will respect the relation between the P&M model and optimal decoding. Those considering decoding spike train from a bayesian perspective will check the derivations in the supplementary material. Others coming from the literature on noisecorrelation will appreciate that, with dendrites, some of the reasonings should be revised. Finally, the new theoretical role for dendritic nonlinearities can clearly be tested with present experimental techniques. For these reasons, I believe that even if the supposed role is later shown to be false or incomplete, the article will have had a strong impact on the community.
Reviewer #2 (Minor Comments):
Generally, confusion arises in the number of models considered. I think that, when a comparison between models is the main result, these different models should be mentioned at the beginning of the section. For instance in the last paragraph of the section “The form of the optimal nonlinearity depends on the statistics of presynaptic inputs” one suddenly learns of the actual comparisons being made.
On a first reading, I could not understand what was done in the first paragraph of the section” Nonlinear integration in cortical neurons is matched to their input statistics”. After reading the supplementary, it is clear, but I think a few sentences should be added to describe in a little more details what was actually compared. For instance, one can read from the first sentence that to test the theory, we need to measure presynaptic statistics and derive the optimal nonlinearity, then measure the nonlinearity in vitro and compare the nonlinearities. That is not what was done, I know now, but I could not rule that out from the paragraph as is. It is also difficult to understand the end of second paragraph of the same section on the first go.
What is the experimental stimulus protocol used in the uncaging experiments? How broad is the distribution of ISI? Evenly distributed on Figure 5c and g? Are each burst made of evenly spaced presynaptic encaging? Can you really say overfitting is small?
Why exactly the multistate description in Eq. 11 and 19 if it is never used? Personally, it is only when I saw Eq. S67 that I finally reached the proper mathematical intuitions. Other than the need for simplicity, I am concerned that some of the conclusions would not hold in the multistate scenario suggested by Eq. 11.
Figure 1: 1B is argued to offer a better fit to optimal than 1C. But there are two differences, correlation and scale. Could it be that in the uncorrelated case, the fit to optimal is not as good when the potential fluctuates on greater amplitudes? In fact, I don't understand why the uncorrelated case should not have a saturating dendritic nonlinearity to counteract the exponential spiking nonlinearity."
Reviewer #3:
The authors present a detailed mathematical analysis, computations and experimental results aimed at supporting the conclusion that dendritic nonlinearities used to integrate synaptic inputs in a neuron are optimized based upon the statistics of that neuron's presynaptic connections. In particular, the suggested result is that if a neuron's presynaptic connections are uncorrelated, sublinear responses are optimal. If, however, the firing of presynaptic neurons are correlated, supralinear responses are optimal for that neuron. If correct, I would find this conclusion to be interesting.
One of the main difficulties with the manuscript is that the details are not easily approachable. There are many pages of complicated mathematics that one must work through, and in many places there hasn't been much attempt to try to explain things in simpler terms to the general reader – I suspect that one will not be able to follow the details here unless one is already an expert.
While the major focus on presynaptic activity is in terms of spikes (and I will grant that a good explanation for this is given), I think it's a concern is that the focus on postsynaptic activity is in terms of subthreshold responses. Granted, the authors say that they can also perform their calculation if the firing rate is substituted for the subthreshold response. This seems like an inconsistency, however – if two or more neurons share synaptic inputs from overlapping sets of presynaptic neurons, and the correlations in the spiking patterns of the presynaptic neurons matter, why is it that correlations in the output spikes of those two or more neurons don't matter? This would seem to be a basic internal inconsistency in the manuscript.
In addition, while there is a huge amount of detail associated with the mathematical calculations, I found the details associated with the experiments to be a bit too sparse. It would have been useful, for example, to see an image of the neocortical pyramidal cell and the specific synapses which are being stimulated.
[Editors’ note: what now follows is the decision letter after the authors submitted for further consideration.]
Thank you for submitting your work entitled "Dendritic nonlinearities are tuned for efficient spikebased computations in cortical circuits" for peer review at eLife. Your submission has been favorably evaluated by Eve Marder (Senior Editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors.
The reviewers have discussed the reviews with one another and the Reviewing editor has drafted this decision to help you prepare a revised submission.
All of the reviewers are strongly positive about this work but they also want to ensure that the work is accessible, understandable and widely appreciated. They would like all aspects be made as clear as possible. Thus, pending the following revisions, we expect to accept the paper.
Please respond directly to the revisions on a pointbypoint basis, revising the paper where needed.
1) Double check all equations in the mathematical appendix – there was some concern given potential sign errors in other equations (see below). Additional explanatory guidance through the mathematical steps should be provided for those readers who may be interested in following these details. The math is still very complicated.
2) Check and correct potential inconsistencies related to the description of the function s(t). In the subsection “Presynaptic statistics” it is still stated that in the limit in which the time step goes to zero the discrete version becomes binary, i.e., zero or 1. In the subsection “Inference and the optimal response”, however it's stated that s(t) is a sum of Diracdelta functions. The two descriptions can't be consistent, of course, unless there is a division by a time step somewhere. There does not appear to be a careful definition in the manuscript, however.
3) Check the solution of equation (22) given in (23) and (24) for potential sign errors.
4) The additional details about the experiments were appreciated. However, a question arose about the blocking of NMDA channels. It's stated that substantially poorer fits were obtained for this case. Please confirm/clarify whether the postsynaptic filtering parameters were refit in this case?
5) It seems like a bit of an inconsistency that while a big point is made that presynaptic neurons must communicate with spikes, no spiking is considered in the postsynaptic neurons. If spiking is the only way that neurons can communicate (the authors' point), why should one care about the subthreshold response of the postsynaptic neuron? Since so much of the manuscript covers the subthreshold response, something explicit (explanationwise) should be said about this in the manuscript (and we realize that they consider the case of firing rates).
6) Include a 'toy model' explanation (without math, or if possible, with straightforward enough math equations that most can follow) that summarizes the link between pairwise correlation (temporal) and dendritic integration (spatial) before any specific statistical models are discussed. This would help the reader follow the more detailed case, and avoid confusion in consideration of later statements in the paper. A good place for this might be at the beginning of the Results section.
Specifically, the way the text reads is confusing about when spatial correlations (for lack of a better term) between different presynaptic neurons are being discussed, and when temporal correlations between inputs are being considered. Part of the issue may be that the mathematical formulation in the main text completely hides issues of temporal correlation in the details. Thus one can be left wondering if one of the two types of correlations might be more important and if the theory could shed light on this.
For example, consider these comments from the manuscript:
"However, if presynaptic neurons became correlated the optimal response became nonlinear and the best linear response was unable to accurately follow the fluctuations in the input (Figure 1D)."
"Same as (AC) but for presynaptic neurons exhibiting synchronized switches between a quiescent and an active state (D, bottom), introducing higher order correlations between the neurons." (Figure 2 caption)
"In particular, sublinear integration was optimal when presynaptic activities exhibited simple Gaussian random walks and thus they could not contain statistical dependencies beyond second order correlations (Figure 2AC)."
The authors here are trying to provide an intuitive explanation as to why input correlations lead to the observed behavior and this, of course, is good and to be encouraged. In each case, however, what type of correlation is being thought of – spatial or temporal? Further elaboration might provide additional insight.
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Thank you for submitting your work entitled "Dendritic nonlinearities are tuned for efficient spikebased computations in cortical circuits" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Eve Marder as the Senior Editor.
The reviewers have discussed the reviews with one another and the Reviewing editor has drafted this decision to help you prepare a revised submission.
The reviewers appreciated the revisions done by the authors, but it was felt that the intuition as provided was incomplete and the mathematical appendix was still not clearly presented. Specifically, it was noted that is unclear how the authors distinguish autocorrelation from crosscorrelation (spatial) in the formalism and the text, and as this is a main point of the paper, it should be up front early on in the Discussion (and not just in caption of Figure 1 and later in Methods). Also, eq A91 has an exact solution (separable). In essence, the math appendix should only include what is needed to explain the methods – anything that is more extensive than needed should be either removed or simplified.
In summary, the authors should carefully go through the entire ms to ensure that these aspects are addressed in general. Please also note specific additional comments from the reviewers to address:
1) The manuscript still has trouble delineating spatial vs temporal correlations when it tries to explain different points. For example:
Introduction section: "We pursued this principle in understanding the integrative properties of individual cortical neurons, for which the relevant statistical input patterns are those characterising the dynamically evolving spike trains of their presynaptic partners." This sounds like the manuscript is concerned with temporal issues.
"The optimal response determined by this statistical model, for essentially any setting of parameters, was inherently nonlinear because the additional effect of a presynaptic spike depended on the pattern of spikes that had been previously received from the presynaptic population and thus the integrated effect of multiple spikes could not be computed as a simple linear sum of their individual effects in isolation." This also sounds to me as if a claim that the temporal pattern of inputs is what's leading to nonlinearity.
2) Regarding the errors in the mathematics, we thought that when the authors repaired the solution of Equation 27 they would also fix other instances of this equation in the manuscript. Equation 27 is recapitulated in the mathematical appendix as Equation A91, but the discussion around Equation A91 is clearly wrong. Somehow a different solution from the solution of Equation 27 is obtained; the answer should be equation A92 with Α = 0,Β = B and C = −A. Here’s how the solution goes: we start with
This equation is separable:
The lefthand size can be simplified by partial fractions:
so that
Both sides of the equation can now be integrated. Assuming 0 < $\text{\zeta}$ < 1, we get
or equivalently,
where
Then solving
We don’t know what the authors are trying to do with the rest of Section B.2.2, but it appears to be wrong. We also don’t know to what extent fixing this will affect the rest of the results.
https://doi.org/10.7554/eLife.10056.022Author response
[Editors’ note: the author responses to the first round of peer review follow.]
We included a thorough answer to all points raised by the reviewers. In particular, we substantially simplified the presentation of the paper by rewriting the introduction and the first part of the results section; adding a flowchart to Figure 1 giving a concise summary of the logic of the manuscript; and rewriting several parts of the Methods section to streamline it. Moreover, we highlighted the takehome messages at the beginning of the Discussion. In addition, we also included experiments using NMDA receptor blockade to provide more mechanistic insights; added a supplementary figure demonstrating that our stimulation protocol is sufficiently complex to probe dendritic integration in the relevant input range and added a supplementary figure where we compared the responses of different models to the averaged presynaptic potential instead of the optimal response.
[…] Overall, two major aspects require consideration, as summarized below with some suggestions. Additional detailed comments from the reviewers are appended below.
1) A clearer and simpler version of the paper is needed. A clear line through the reasoning that will allow everyone to see the core of what has been done, and why, needs to be presented. Specifically, the paper needs to be transparent and math that obfuscates should be removed or simplified or explained.
We substantially simplified the presentation of the math at the beginning of the paper and removed all math that is not essential to follow the logic of the presentation. We believe that the equations that remained in the main text (4 out of 7 in the previous version) are essential to introduce key concepts of the paper. These equations are now explained and illustrated in the text and presented in a highly simplified way.
The authors are encouraged to present their essential results in a much more straightforward fashion to help elucidate their mathematical derivations. A suggestion is to use a flow chart of the process/protocol (capturing presynaptic firing statistics, fitting, optimizing etc., and lay bare the limitations/assumptions – e.g., the output computation f(u) is assumed to be a linear function),
As suggested, we added a flowchart presenting the logic of the paper to the first figure. This new panel also serves as a roadmap to the rest of the paper. In the flowchart, we highlighted our assumptions, illustrated the relationships between different parts of the paper and emphasized how specific figures in the paper support our conclusions. Moreover, we added a new paragraph describing the structure of the paper at the end of the Introduction and another paragraph in the Results that uses the flowchart to explain the logic of the Results section after introducing the key concepts at the beginning. We believe that these changes will substantially help the reader to follow the presentation of the paper.
[…] Describe the essential results in words, and point to the mathematical parts in methods/or a mathematical appendix of how and why.
We removed a substantial fraction of the equations from the main text (see above), and clarified the presentation in both the Results and Methods sections. Our main goal was to keep only those mathematical formulae in the main text that were absolutely essential to follow the logic of the paper. Those details that are only necessary to reproduce our results are included in the Methods section while further details of the derivations have been moved to the mathematical appendix.
We understand and appreciate that you wish to publish the full mathematical treatment, but most eLife readers won't benefit from the math in its present form. The mathematical results could be made complete and readable in an Appendix, while freeing the Results to show the logic of the model and how it works. It looks complicated because of using the most general formulations (multistate state transitions, and multivariate normal with arbitrary statedependent covariance), but the main results are obtained via somewhat simpler formulations.
We changed the formula describing the state transition dynamics in the model as suggested by the Reviewers, and this, indeed, simplified substantially the presentation of both the model for the presynaptic statistics and the optimal response (Eq. 1517). We also added an intuitive explanation of the equations describing the optimal response. We also greatly simplified the derivation of the dendritic approximation of the optimal response which provided a more intuitive justification for the sigmoidal nonlinearity (see subsection “Dendritic approximation of the optimal response”). In the representation of the optimal response, the multivariate (rather than univariate) normal and the statedependent posterior covariance matrix are necessary consequences of having more than 1 presynaptic neuron and 1 state, respectively, which are both key to our results, and so we could not simplify further this part of the Methods section.
More detailed mathematical derivations could go into a mathematical appendix if necessary. In this way, people (of diverse backgrounds) would be able to sink their teeth into the contribution, build on it, test it and so on.
We have restructured the paper following this suggestion. We believe that the paper is now presented in a form that is suitable for readers with diverse backgrounds: researchers with experimental interest can follow the logic of the paper by reading the main text, while readers interested in the mathematical details can consult the Methods section and the Mathematical Appendix. We believe that the minimal amount of mathematical details that we left in the main text is necessary to clearly define the concepts used in the paper and to avoid misunderstandings due to the novelty of our approach, and that these formulae and mathematical concepts are sufficiently lightweight (with the corresponding explanations in the text) even for the nonspecialist reader.
2) Takehome message(s) and biological/biophysical intuition needs to be presented. No biophysical model seems to be included, even simply. Everything seems to be folded into the optimal g(s), so that it is not obvious how best to make a biological/biophysical connection.
We note that the first paragraph of the section “Nonlinear dendrites closely approximate the optimal response” in the Results starts by considering potential biophysical substrates of the optimal g(s), and indeed that whole section is devoted to show how the optimal response can be (approximately) implemented by a simplified mechanistic, biophysically motivated model of dendritic integration that has enjoyed prominent success and wide popularity in the field (Poirazi & Mel, 2001). Crucially, this mechanistic model has been extensively linked to biophysical processes by previous studies so we felt that by way of linking our theory to this model we do provide biophysical insights. We also devote the first subsection of the Discussion (now expanded, see below) to discussing biophysical substrates.
Nevertheless, to better manage readers’ expectations, we included “disclaimers” upfront, already in the Introduction, and the first section of the Results, explaining how the inherently topdown nature of our approach does not allow us to make specific biophysical predictions, but how we are still able to provide some biophysical insights. In fact, to provide tighter links with biophysical factors, we also included, as an initial attempt, a novel set of experimental data directly probing the contribution of NMDA receptors to optimal input integration (Figure 5E and J). Specifically, we found that after blocking the NMDA currents dendritic integration became incompatible with the known presynaptic statistics, both in the neocortex and in the hippocampus, supporting the crucial role of NMDA currents in matching dendritic integration to the input statistics.
The derived 'optimal traces' do, in principle, bring a more stringent test to the theory, but in Figure 4 they have access to the whole presynaptic dynamics so the optimal trace can be replaced by the averaged presynaptic potential,
Indeed, in Figure 4 we could show the averaged presynaptic potential together with the optimal trace. However, even in this test the computation is spike based and the postsynaptic neuron does not have access to the presynaptic membrane potentials. In this case the optimal response is the theoretically best possible approximation to the presynaptic potentials that is achievable to the postsynaptic cell given the limited information conveyed by the irregular firing of the presynaptic neurons. Therefore, in Figure 4 we compared the models’ responses to the optimal response and included an additional supplementary figure (Figure 4—figure supplement 1) where we compare the responses to the averaged presynaptic potential.
[…] While in Figure 5 the experimental pattern of stimulation is not sufficiently complex to really probe the detailed temporal structure predicted by the theory.
We agree with the reviewers that the currently used experimental protocols are relatively simplistic compared to the highly structured spatiotemporal stimuli experienced by the dendrites under in vivo conditions. However, we found that despite their relative simplicity, our in vitro stimuli activated the dendritic branches in a similar range (in terms of the integrated postsynaptic effect of synaptic stimulations) as could be expected during in vivolike situations. To illustrate this point, we prepared a supplementary figure (Figure 1—figure supplement 2) where we compare the total input to a given branch during an in vitro experiment with the total input when the stimuli is sampled from the model matched to in vivo data. We found an excellent match between these two distributions suggesting that our stimulation protocols probed integration in a single dendritic branch in the physiologically relevant input range.
Crucially, our approach will also allow us in the future to modify the in vitro stimulation protocols to make them similar also in their structural details to the patterns typical in vivo. Designing such novel, richer but experimentally still feasible protocols possibly optimised for gaining information about dendritic nonlinearities is a highly nontrivial endeavour on its own right (where collaboration between theory and experiments can indeed bring significant advances) and was thus beyond the scope of the current paper.
The other figures support qualitative statements. This, for example, could be explicitly stated and rationalized for the reader.
We feel that Figure 3 is also quantitative in that it shows a direct numerical comparison of the optimal response and the response of the canonical biophysicallymotivated model. Figure 2 is indeed qualitative and we now make this clear upfront in our summary flowchart (Figure 1B). More generally, we included a roadmap to the paper at the end of the Introduction which also states explicitly that this first step is qualitative.
Some aspects came through but because of the first point above, there was not always a clear consensus of what the authors intended as the main message(s) from the work. Could these intents (if appropriate) be better explained? For example:i) Presynaptic neurons with correlated inputs should target neighboring regions of the dendritic tree so that the inputs can sum nonlinearly while presynaptic neurons with uncorrelated inputs should map onto different portions of the tree and the inputs sum linearly?ii) An optimizing aspect/principle that could ultimately predict the 'location structure' of synapses as a crucial test of their theory.iii) Optimally decoding the information present in the spikes to do the desired computation depends upon the correlationsiv) Addressing the question of why there are nonlinearities in the first place.v) An optimal way to recover the averaged presynaptic potential only from the presynaptic spike trains? Can the biophysics approximate this optimal 'decoding' operation?
We rewrote the first paragraph of the Discussion where we first highlight the main messages of the paper and then close the paragraph by a simple takehome message. We hope that these changes, together with the restructured Introduction of the paper will help the reader to identify the major contributions and the main messages of our work.
Reviewer #1:
1) The authors could state that graded synaptic transmission exists (many examples in invertebrates), so that their principle may not apply there, I assume. Is it definitively known that graded synaptic transmission is not present in cortical circuits?
Synaptic transmission can be modulated by graded signals, such as the somatic depolarisation of the presynaptic neuron (Shu et al., 2006), and indeed, this is why we wrote in the manuscript that “cortical neurons mainly communicate by action potentials”, but this effect is believed to have a minor importance relative to the coding of information by action potentials. We added a sentence about graded synaptic transmission in the Discussion.
2) It would be helpful if the authors could expand on the discussion regarding biophysical substrate. That is, to fill in the blanks on how and why "NMDA spikes may provide a general solution". Was some modeling (not shown) done to be able to state this? Please provide some additional discussion/rationale/assumptions for making these statements.
To thoroughly answer this question of the reviewer we made two distinct changes in the manuscript: First, in order to more directly demonstrate the role of NMDA receptors in tuning dendritic nonlinearities to the statistics of the input we tested how well our theory can predict experimental data recorded under their pharmacological blockade. We found that the prediction of our theory became substantially poorer in both the hippocampus and the neocortex after blocking the NMDA receptors supporting their crucial role in matching dendritic integration to the input statistics. Second, we expanded the Discussion regarding the possible role of NMDA receptors in tuning dendritic nonlinearities to state switching dynamics.
3) Regarding inhibition, the authors discuss potential addition from the perspective of excitatory cells. However, perspectives from inhibitory cells are not expressed. Do the authors expect their principle to hold from the perspective of inhibitory cells (receiving excitatory and/or inhibitory inputs)? The authors should provide some discussion/rationale/assumptions of why this is not included/considered.
We added a paragraph to the section “Inhibitory Neurons” discussing the possible application of the theory to inhibitory neurons. In short, our theory applies equally well to dendritic integration in inhibitory cells. We focused on pyramidal neurons because their nonlinearities have been much more extensively characterised.
I do not think it is a given that computation is only from the perspective of excitatory cells? For example, consider the reviews by Klausberger and Somogyi (Science 2008) and Chamberland and Topolnik (Frontiers in Neuroscience 2012).
We agree with the reviewer that computation is not necessarily exclusive to excitatory neurons. In fact we discuss that they might have a similar role in single neuron computation as excitatory neurons.
Reviewer #2:
Nonlinear dendritic integration is often seen as a thorn in the foot. Following in vitro work by Larkum, Magee, Stuart, Hausser and others, it is now clear that dendrites introduce various types of nonlinearity to input integration.
[…]
For these reasons, I believe that even if the supposed role is later shown to be false or incomplete, the article will have had a strong impact on the community.
We are grateful for reviewer 2’s constructive comments. The wording of several of his sentences motivated us to rewrite our paper in a clearer way potentially more understandable for people with diverse background.
Reviewer #2 (Minor Comments):
Generally, confusion arises in the number of models considered. I think that, when a comparison between models is the main result, these different models should be mentioned at the beginning of the section. For instance in the last paragraph of the section “The form of the optimal nonlinearity depends on the statistics of presynaptic inputs” one suddenly learns of the actual comparisons being made.
We agree with the reviewer that it is crucial to clearly introduce the models used in the manuscript when they are first mentioned. Therefore we carefully checked that not only models but also the important concepts are introduced and defined when first presented in the main text. As for the specific section to which the reviewer is referring, we specified the details of the models the first time they are introduced (sections “Optimal input integration is nonlinear” and” The form of the optimal nonlinearity depends on the statistics of presynaptic inputs”) such that when we refer to them later they do not come as a surprise.
On a first reading, I could not understand what was done in the first paragraph of the section” Nonlinear integration in cortical neurons is matched to their input statistics”. After reading the supplementary, it is clear, but I think a few sentences should be added to describe in a little more details what was actually compared. For instance, one can read from the first sentence that to test the theory, we need to measure presynaptic statistics and derive the optimal nonlinearity, then measure the nonlinearity in vitro and compare the nonlinearities. That is not what was done, I know now, but I could not rule that out from the paragraph as is. It is also difficult to understand end of the second paragraph of the same section on the first go.
We added a clause to the referred paragraph and clarified that when testing the theory we compared the actual and the predicted responses instead of the form of the nonlinearities themselves. We also clarified what we mean by the “parameters related to the postsynaptic dendritic filtering”.
What is the experimental stimulus protocol used in the uncaging experiments? How broad is the distribution of ISI? Evenly distributed on Figure 5c and g? Are each burst made of evenly spaced presynaptic encaging?
We added a separate paragraph about the stimulation protocols we used to the Methods section where we describe that in the uncaging experiments we targeted visually identified dendritic spines (17 in the cortex, 130 in the hippocampus) with a burst of presynaptic uncaging stimuli evenly spaced in time. We varied either the number of stimuli in the burst (17 in the neocortex, 130 in the hippocampus) or the delay between the uncaging events within the burst (120 ms in the neocortex and 0.15 ms in the hippocampus) for each dendrite.
Can you really say overfitting is small?
In order to demonstrate that the precise match between the predicted and the actual dendritic nonlinearities is not due to overfitting we performed model selection using Bayesian information criterion that penalises more complex models (see also Methods). We found that the extra flexibility of the neocortical and hippocampal models offered by the one additional parameter compared to the independent model was highly unlikely to account for the superior performance of the more complex models. Moreover, the model with second order correlations had the same number of free parameters as the neocortical and hippocampal models, yet it was unable to fit the data even remotely as well. These results indicate that the good performance of the cortical models is not simply due to the larger number of parameters.
To further demonstrate that overfitting is small, we now also conducted a set of validation simulations. In these simulations we first generated synthetic data matching the data recorded in our hippocampal and neocortical experiments (see below), and fitted this synthetic dataset following the same procedure that we used for fitting real experimental data in the paper, using hippocampal and neocortical input statistics. We expected that if the risk of overfitting were high, then we would be able to fit our simulated hippocampal dataset with neocortical parameters just as well as with hippocampal parameters and vice versa.
Specifically, to generate synthetic data matching the data recorded in our hippocampal experiments, we computed the optimal response for a stimulus (i.e. 30 input spikes with various delays) assuming general presynaptic statistics similar to our hippocampal model (Table 1, HP). We added 20% variability to the presynaptic firing rate and assembly activation rate to simulate a possible mismatch between the true population activity statistics and the statistics used for estimating the optimal response. This added variability within the hippocampal statistics that was small compared to the differences between hippocampal and neocortical population activity statistics. The postsynaptic parameters were sampled randomly from a uniform distribution in the realistic range (Figure 5—figure supplement 1B, C). This protocol provided us with a set of traces very similar to the original experimental data recorded from hippocampal CA3 cells. We followed the same protocol, using neocortical statistics (Table 1, NC), to generate synthetic data similar to our neocortical recordings.
Using these datasets we investigated the accuracy of our fitting procedure with both hippocampal and neocortical presynaptic statistics. We found that when the presynaptic statistics used for fitting matched the presynaptic statistics used to generate the data the fitting error was almost zero (Author response image 1A) and the estimated postsynaptic parameters were similar to the original postsynaptic parameters used to generate the data (Author response image 1B). (Note that fitting error was much smaller than with actual experimental data because synthetic data lacked intrinsic variability between responses to the same stimulus.) The nonzero error between the original and the estimated postsynaptic parameters is due to the variability of the presynaptic parameters within a given dataset that could be compensated by altering the postsynaptic parameters. However, the postsynaptic parameters could not compensate for differences in the presynaptic parameters as large as between hippocampal and neocortical statistics, resulting in substantially larger fitting errors (Author response image 1A) and greater differences between the original and fitted parameters (Author response image 1B). Taken together, these simulations suggest that the stimulation protocol used in our experiments is sufficiently complex to identify the postsynaptic parameters without the danger of overfitting.
Why exactly the multistate description in Eq. 11 and 19 if it is never used? Personally, it is only when I saw Eq. S67 that I finally reached the proper mathematical intuitions. Other than the need for simplicity, I am concerned that some of the conclusions would not hold in the multistate scenario suggested by Eq. 11.
We simplified the equations in the main text and only use the two state description in the main text and moved the more general, multistate equations into the mathematical appendix.
Figure 1: 1B is argued to offer a better fit to optimal than 1C. But there are two differences, correlation and scale. Could it be that in the uncorrelated case, the fit to optimal is not as good when the potential fluctuates on greater amplitudes?
In this figure we illustrated a case in which the nonlinearity of the optimal postsynaptic response depends only on the correlations between the presynaptic neurons and not their individual activity dynamics or spiking. Therefore we chose the same parameters for the individual presynaptic neurons in the two panels  in fact, the only difference between C and D (note, that the numbering of the panels has changed in the new version of the manuscript) is that all neurons in D are switching to the active state simultaneously while they are independent in C. To illustrate this point we added the membrane potential of one example presynaptic neuron to each panel, on the same scale.
The large difference between the scale of the postsynaptic response is due to the fact that individual fluctuations average out when neurons in the population are independent, while they do not if they are correlated. To control for the differences in the scale of the fluctuations we normalise the error between the actual and predicted responses by the variance of the true signal, whenever we quantify errors throughout the paper. Moreover, as we show in the Methods section, when presynaptic neurons are independent and the computation is linear (as we assume in panel B), then the optimal response is literally the linear combination of the responses to the spike trains of the individual neurons.
In fact, I don't understand why the uncorrelated case should not have a saturating dendritic nonlinearity to counteract the exponential spiking nonlinearity."
We argue here that optimal dendritic integration of spikes from different presynaptic neurons is linear if the presynaptic neurons are independent. We are not saying that the optimal response to the spikes from the same presynaptic neuron would be linear in this case. In fact, as the reviewer suggests, spikes from the same neuron should still be integrated in a nonlinear way even if presynaptic neurons are independent (see Pfister et al., 2010). In the Discussion we suggest that the nonlinearities that are private to the individual inputs should be implemented by nonlinear synapses (short term plasticity) instead of nonlinear dendrites. We have now made this point clear in the caption of Figure 1.
Reviewer #3:
The authors present a detailed mathematical analysis, computations and experimental results aimed at supporting the conclusion that dendritic nonlinearities used to integrate synaptic inputs in a neuron are optimized based upon the statistics of that neuron's presynaptic connections. In particular, the suggested result is that if a neuron's presynaptic connections are uncorrelated, sublinear responses are optimal. If, however, the firing of presynaptic neurons are correlated, supralinear responses are optimal for that neuron. If correct, I would find this conclusion to be interesting.
This is probably also what the reviewer meant, but just to be precise: presynaptic correlations imply nonlinear responses, which can be sublinear or supralinear depending on the nature of those correlations, see e.g. Figure 2.
One of the main difficulties with the manuscript is that the details are not easily approachable. There are many pages of complicated mathematics that one must work through, and in many places there hasn't been much attempt to try to explain things in simpler terms to the general reader – I suspect that one will not be able to follow the details here unless one is already an expert.
We simplified the math presented in the main text and provided an overview of the main concepts in Figure 1B. Our goal was to present the most important concepts of the paper in the main text with only the minimal amount of mathematical details necessary and by illustrating the concepts graphically. Therefore we moved the mathematical details necessary for reproducing the results to the Methods section. Additional details about the mathematical derivations are included in the mathematical appendix.
While the major focus on presynaptic activity is in terms of spikes (and I will grant that a good explanation for this is given), I think it's a concern is that the focus on postsynaptic activity is in terms of subthreshold responses. Granted, the authors say that they can also perform their calculation if the firing rate is substituted for the subthreshold response. This seems like an inconsistency, however – if two or more neurons share synaptic inputs from overlapping sets of presynaptic neurons, and the correlations in the spiking patterns of the presynaptic neurons matter, why is it that correlations in the output spikes of those two or more neurons don't matter? This would seem to be a basic internal inconsistency in the manuscript.
We agree with the reviewer that mathematical consistency is an important requirement for any theory. According to our theory, correlations between the activities of any groups of cells presynaptic to a given neuron must be handled by the postsynaptic neuron. In the example brought up by the reviewer, correlations between the neurons in the input neurons determine input integration in the output neurons, whereas correlations in the two output neurons will influence integration in neurons reading them. Thus, correlations in the output spikes of the two neurons DO matter, but only for cells that receive inputs from those particular two neurons. More generally, note that our formalism rests on the very objective of making a selfconsistent (purely membrane potentialbased, or, equivalently, purely firing ratebased) description of network dynamics work when neurons communicate with spikes.
In addition, while there is a huge amount of detail associated with the mathematical calculations, I found the details associated with the experiments to be a bit too sparse. It would have been useful, for example, to see an image of the neocortical pyramidal cell and the specific synapses which are being stimulated.
We added the image of the neurons we recorded from in Figure 5 and provided details about the experimental protocols used in this study in a new paragraph in the Methods section.
[Editors' note: the author responses to the rereview follow.]
All of the reviewers are strongly positive about this work but they also want to ensure that the work is accessible, understandable and widely appreciated. They would like all aspects be made as clear as possible. Thus, pending the following revisions, we expect to accept the paper. Please respond directly to the revisions on a pointbypoint basis, revising the paper where needed.
1) Double check all equations in the mathematical appendix – there was some concern given potential sign errors in other equations (see below). Additional explanatory guidance through the mathematical steps should be provided for those readers who may be interested in followi