1. Neuroscience
Download icon

Dendritic nonlinearities are tuned for efficient spike-based computations in cortical circuits

  1. Balázs B Ujfalussy Is a corresponding author
  2. Judit K Makara
  3. Tiago Branco
  4. Máté Lengyel
  1. University of Cambridge, United Kingdom
  2. Hungarian Academy of Sciences, Hungary
  3. MRC Laboratory of Molecular Biology, United Kingdom
  4. Janelia Farm Research Campus, Howard Hughes Medical Institute, United States
  5. University College London, United Kingdom
  6. Central European University, Hungary
Research Article
Cite as: eLife 2015;4:e10056 doi: 10.7554/eLife.10056


Cortical neurons integrate thousands of synaptic inputs in their dendrites in highly nonlinear ways. It is unknown how these dendritic nonlinearities in individual cells contribute to computations at the level of neural circuits. Here, we show that dendritic nonlinearities are critical for the efficient integration of synaptic inputs in circuits performing analog computations with spiking neurons. We developed a theory that formalizes how a neuron's dendritic nonlinearity that is optimal for integrating synaptic inputs depends on the statistics of its presynaptic activity patterns. Based on their in vivo preynaptic population statistics (firing rates, membrane potential fluctuations, and correlations due to ensemble dynamics), our theory accurately predicted the responses of two different types of cortical pyramidal cells to patterned stimulation by two-photon glutamate uncaging. These results reveal a new computational principle underlying dendritic integration in cortical neurons by suggesting a functional link between cellular and systems--level properties of cortical circuits.


eLife digest

Imagine that you are in the habit of checking three different weather forecasts each day, and then one day in early September the first forecast suddenly predicts snow. If you live in an area where it doesn’t normally snow in September, your initial reaction is likely to be surprise. However, you will not be quite so surprised to see a prediction of snow in the second forecast, and by the third forecast you will hardly be surprised at all.

In these three cases, you have responded to the same piece of information in a different way. In mathematics, this type of response is referred to as “nonlinear” because the output (varying degrees of surprise) is not directly proportional to the input (identical predictions of snow). In the case of the weather forecasts, the source of the nonlinearity was the fact that the three predictions were not truly independent. Instead, they corresponded with one another, or “correlated”, because all three depended on the weather itself.

In the brain, a single neuron can receive thousands of inputs from other cells. These are received via junctions called synapses that form between the cells. In many cases, the synapses form on the receiving neuron’s dendrites – the short branches that protrude from its cell body. Each dendrite can receive signals from hundreds of other neurons, and must combine these inputs to produce a single neuronal response. How dendrites do this is not clear.

Ujfalussy et al. have now developed a computational model that predicts the optimal response of dendrites to complex and realistic inputs from other neurons. The model shows that when dendrites receive inputs from neurons that independently respond to different stimuli, the optimal response is for the dendrites to average the inputs. This is a form of linear processing. By contrast, when the inputs are correlated – for example, because they come from neurons responding to the same stimulus – the optimal response is nonlinear processing. In this and other cases, the optimal response predicted by the model is similar to the response observed in real dendrites.

The model also makes a number of testable predictions; for example, that neurons with correlated activities will tend to form clusters of synapses close together on the dendrites of a target neuron, whereas neurons with unrelated activities will tend to form synapses that are further apart. Somewhat unexpectedly, Ujfalussy et al. show that compensating for input correlations accounts for almost all the nonlinearities that can be found in real neurons' dendrites – at least in response to relatively simple input patterns. Thus, it remains to be shown whether nonlinear dendritic responses to more complex input patterns can also be explained by this single principle. Further studies are also required to understand how different plasticity mechanisms enable neurons to achieve this close match between input correlations and dendritic processing.



The dendritic tree of a cortical neuron performs a highly nonlinear transformation on the thousands of inputs it receives from other neurons, sometimes resulting in a markedly sublinear (Longordo et al., 2013) and often in strongly superlinear integration of synaptic inputs (Losonczy and Magee, 2006; Nevian et al., 2007; Branco and Häusser, 2011; Makara and Magee, 2013). These nonlinearities have been traditionally studied from the perspective of single-neuron computations, using a few well-controlled synaptic stimuli, revealing a remarkable repertoire of arithmetic operations that the dendrites of cortical neurons carry out (Poirazi and Mel, 2001; London and Häusser, 2005; Branco et al., 2010) including additive, multiplicative and divisive ways of combining individual synaptic inputs in the cell’s response (Silver, 2010). More recently, the role of nonlinear dendritic integration in actively shaping responses of single neurons under in vivo conditions has been demonstrated in several cortical areas including the hippocampus (Grienberger et al., 2014), as well as visual (Smith et al., 2013) and somatosensory cortices (Xu et al., 2012; Lavzin et al., 2012Palmer et al., 2014).

However, while many of the basic biophysical mechanisms underlying these nonlinearities are well understood (Stuart et al., 2007), it has proven a daunting task to include all these mechanisms in larger scale network models to understand their interplay at the level of the circuit (Herz et al., 2006). Conversely, studies of cortical computation and dynamics have largely ignored the complex and highly nonlinear information processing capabilities of the dendritic tree and concentrated on circuit-level computations emerging from interactions between point-like neurons with single, somatic nonlinearities (Hopfield, 1984; Seung and Sompolinsky, 1993; Gerstner and Kistler, 2002; Vogels et al., 2011). Therefore, it is unknown how dendritic nonlinearities in individual cells contribute to computations at the level of a neural circuit.

A limitation of most theories of nonlinear dendritic integration is that they focus on highly simplified input regimes (Mel et al., 1998; Poirazi et al., 2003; Archie and Mel, 2000; Poirazi and Mel, 2001; Ujfalussy et al., 2009), essentially requiring both the inputs and the output of a cell to have stationary firing rates. This approach thus ignores the effects and consequences of temporal variations in neural activities at the time scale of inter-spike intervals characteristic of in vivo states in cortical populations (Crochet et al., 2011; Haider et al., 2013). In contrast, we propose an approach which is specifically centered on these naturally occurring statistical patterns – in analogy to the principle of ‘adaptation to natural input statistics’ which has been highly successful in accounting for the input-output relationships of cells in a number of sensory areas at the systems level (Simoncelli and Olshausen, 2001). We pursued this principle in understanding the integrative properties of individual cortical neurons, for which the relevant statistical input patterns are those characterising the spatio-temporal dynamics of their presynaptic spike trains. Thus, rather than modelling specific biophysical properties of a neuron directly, our goal was to predict the phenomenological input integration properties that result from those biophysical properties and are matched to the statistics of the presynaptic activities.

Our theory is based on the observation that cortical neurons mainly communicate by action potentials, which are temporally punctate all-or-none events. In contrast, the computations cortical circuits perform are commonly assumed to involve the transformations of analog activities varying continuously in time, such as firing rates or membrane potentials (Rumelhart et al., 1986; Hopfield, 1984; Dayan and Abbott, 2001; Archie and Mel, 2000; London et al., 2010). This implies a fundamental bottleneck in cortical computations: the discrete and stochastic firing of spikes by neurons conveys only a limited amount of information about their rapidly fluctuating activities (Pfister et al., 2010; Sengupta et al., 2014). Formalising the implications of this bottleneck mathematically reveals that the robust operation of a circuit requires its neurons to integrate their inputs in highly nonlinear ways that specifically depend on two complementary factors: the computation performed by the neuron and the long-term statistics of the inputs it receives from its presynaptic partners.

To critically evaluate our theory, we first illustrate qualitatively the nonlinearities that most efficiently overcome the spiking bottleneck for different classes of presynaptic correlation structures. Next, to provide biophysical insight, we demonstrate that the form of optimal input integration for these presynaptic correlations can be efficiently approximated by a canonical, biophysically-motivated model of dendritic integration. Finally, we test the prediction that cortical dendrites are optimally tuned to their input statistics in in vitro experiments. For this, we use available in vivo data to characterize the presynaptic population activity of two different types of cortical pyramidal cells. Based on these input statistics, our theory accurately predicts the integrative properties of the postsynaptic dendrites measured in two-photon glutamate uncaging experiments. We also show that NMDA receptor activation is necessary for dendritic integration to approximate the optimal response. These results suggest a novel functional role for dendritic nonlinearities in allowing postsynaptic neurons to integrate their richly structured synaptic inputs near-optimally, thus making a key contribution to dynamically unfolding cortical computations.


Suppose that every day you check your three favorite websites for the weather forecast. On a September day, the first website forecasts snow which you find hard to believe as it is highly unusual in your area – so you dismiss it as the forecaster’s mistake. However, when you read a similar forecast on the second site, you become convinced that snow is coming, and by the time the third site brings you the same news you are hardly surprised at all. Thus, even though all three sources conveyed the same information (snow), they had different impact on you – in other words, their cumulative effect was nonlinear. This nonlinearity was due to the fact that the information you get from these sites tends to be correlated as they are all related to a common cause, the actual weather. Below we argue that the same fundamental statistical principle, that correlated information sources require nonlinear integration, accounts for the dendritic nonlinearities of cortical pyramidal neurons.

Overcoming the spiking bottleneck in circuit computations

To introduce our theory, we consider a postsynaptic neuron computing some function, f, of the activity of its presynaptic partners, u (Figure 1A, top):

(1) v˙=f(u)

where v˙ is the resultant temporal change of the activity of the postsynaptic neuron. We chose u and v to be analog variables, rather than for example digital spike trains, in line with the vast bulk of theories of network computations (Hopfield, 1984; Dayan and Abbott, 2001; Pouget et al., 2003) and experimental results suggesting analog coding in the cortex (London et al., 2010; Shadlen and Newsome, 1998). In particular, we considered these variables to correspond to the coarse-grained (low-pass filtered) somatic membrane potentials of neurons (in particular, excluding the action potentials themselves, as often reported in experimental data; Carandini and Ferster, 2000), although the theory can equally be formalized in terms of instantaneous firing rates (Materials and methods, Figure 1-figure supplement 1).

Figure 1 with 3 supplements see all
Spike-based implementation of analogue computations in neural circuits.

(A) Computation (top) is formalized as a mapping, f, from presynaptic activities, u1,uN (left), to the postsynaptic activity, v (right). As neurons communicate with spikes, the implementation (bottom) of any computation must be based on the spikes the presynaptic neurons emit, s1,sN (middle). Optimal input integration in the postsynaptic cells requires that the output of g is close to that of f. (B) The logic and plan of the paper. Grey box in the center shows theoretical framework, blue boxes around it show steps necessary to apply the framework to neural data. To compute the transformation from stimulation patterns (bottom left) to the optimal response (bottom right) we assumed linear computation (top right) and specified the presynaptic statistics based on cortical population activity patterns observed in vivo (top left). To demonstrate the validity of the approach, we studied the fundamental qualitative properties of the optimal response (Figure 2), compared it to biophysical models (Figures 34) and tested it in in vitro experiments (Figure 5). (C) The optimal postsynaptic response (purple line, bottom) linearly integrates spikes from different presynaptic neurons (top: rasters in shades of green; middle: membrane potential of one presynaptic cell) if their activities are statistically independent. (D) Optimal input integration becomes nonlinear (purple line, bottom) if the activities of the presynaptic neurons are correlated (rasters in shades of green, top), even though the long-term statistics and spiking nonlinearity of individual neurons remains the same as in (C). In this case, the best linear response (black line, bottom) is unable to follow the fluctuations in the signal.


The standard description of neural circuit dynamics in Equation 1 hides an important informational bottleneck intrinsic to the operation of cortical circuits. While according to Equation 1, the postsynaptic neuron’s analog activity, v, is required to depend directly on the analog activities of its presynaptic partners, u, in reality it only accesses these presynaptic activities through the spikes the presynaptic population transmits, s, incurring a substantial loss of information (Alenda et al., 2010; Sengupta et al., 2014). Therefore, the function g a neuron actually implements on its inputs can only depend directly on the presynaptic spikes, not the underlying activities (Figure 1A, bottom):

(2) v˙=g(s)

Importantly, while f(u) is dictated by the computational function of the circuit, the actual transformation of the synaptic input to the postsynaptic response, expressed by g(s), is determined by the morphological and biophysical properties of the cell. (For these purposes, we regard the presynaptic side of synapses, transforming presynaptic spike trains to synaptic transmission events, as conceptually being part of the postsynaptic cell’s g(s) function.) How can then the neuron integrate the incoming presynaptic spikes, as formalized by g(s), such that the resulting postsynaptic response best matches the required computational function, f(u), thereby alleviating the fundamental informational bottleneck of spiking-based communication?

Determining the best g(s) is nontrivial because the same presynaptic spike train may be the result of many different underlying presynaptic activities (Paninski, 2006), each potentially implying a different output of the computational function. This ambiguity is formalized mathematically as a posterior probability distribution, P(u|s), expressing the probability that the analog activities of the presynaptic cells might currently be u given their spike trains, s (Pfister et al., 2010; Ujfalussy et al., 2011). The optimal response, i.e. the g(s) that minimizes the average squared error relative to f(u), is the expectation of f(u) under the posterior:

(3) g(s)=f(u)P(u|s) du

Crucially, the expression for the posterior, given by Bayes’ rule, is:

(4) P(u|s)P(s|u)P(u)

Note that while Equations 3–4 do not reveal directly the specific biophysical properties the postsynaptic cell should have, they tell us phenomenologically what signal integration properties should result from its biophysical properties. In particular, they make it explicit that the optimal g(s) depends fundamentally on two factors (Figure 1B, top):

  1. the function that needs to be computed, f(u), and

  2. the statistics of presynaptic activities: P(u), the prior probability distribution characterizing the long-run statistics of multi-neural activity patterns in the presynaptic ensemble, and the likelihood P(s|u), expressing the potentially probabilistic relationship between analog activities (e.g. somatic membrane potential trajectories) and emitted spike trains.

In the following, we show that the outcome of the integration of presynaptic spike trains in cortical neurons approximates very closely the optimal response, and that dendritic nonlinearities are crucial for achieving this near-optimality. For this, 1) we make an assumption about the computational function of the postsynaptic cell, f(u) (Figure 1B, top right); 2) we constrain presynaptic statistics, P(u) and P(s|u), by in vivo data about cortical population activity patterns (Figure 1B, top left); and with these 3) we compute the optimal response they jointly determine for various stimulation patterns (Figure 1B, bottom left and right).

Optimal input integration is nonlinear

To specify our model, we considered the case when f(u) itself is linear. Although networks with purely linear dynamics can perform non-trivial computations already (Dayan and Abbott, 2001; Hennequin et al., 2014), in the general case, we do expect f(u) to be nonlinear, e.g. sigmoidal (Hopfield, 1984). Nevertheless, in typical electrophysiological experiments only a small fraction of the full dynamic range of a neuron’s total input is stimulated (Figure 1—figure supplement 2), and so we approximate the computational function, f(u), as being linear on this limited input range without loss of generality. (See Figure 1—figure supplement 3 for the application of the theory to the case of nonlinear f.) Yet, as we show below, for physiologically realistic statistics of presynaptic activity patterns, the optimal response combines input spike trains in highly nonlinear ways even in the case of linear computation, predicting experimentally characterized nonlinearities in dendritic input integration. In particular, second- and higher-order prior presynaptic correlations, represented by P(u), will have a major role in determining the form of the corresponding optimal response. The likelihood, P(s|u), also influences the optimal response, but only in its quantitative details, as it does not involve correlations across neurons: each neuron’s firing is independent from the others’, given its own somatic membrane potential (Materials and methods).

Previous suggestions for how postsynaptic neurons achieve reliable computation despite the substantial ambiguity about the individual presynaptic activities relied on the linear averaging of inputs arriving from a sufficiently large pool of presynaptic neurons (Dayan and Abbott, 2001; Pfister et al., 2010). However, linear averaging is only guaranteed to produce the correct output, as dictated by Equations 3-4, if the activities of presynaptic neurons are statistically independent under the prior distribution, i.e. P(u)=iP(ui) (Materials and methods). In contrast, the membrane potential (Crochet et al., 2011) and spiking (Cohen and Kohn, 2011) of cortical neural populations often show complex patterns of correlations, which include both ‘spatial’ (cross-correlations between different neurons) and temporal components (auto-correlations, i.e. the correlation of the activity of the same cell with itself at different moments in time). Thus, in this more general case, we expect the optimal response to involve a nonlinear integration of spike trains. While temporal correlations alone do not require nonlinear dendritic integration across synapses, only local nonlinearities within each synapse, as brought about e.g. by short term synaptic plasticity (Pfister et al., 2010), spatial correlations require the non-linear integration of spikes emitted by different presynaptic neurons.

To illustrate that presynaptic spatial correlations require nonlinear integration across synapses, we compared the best linear response to a given presynaptic spike pattern with the optimal response (Equation 3, as approximated by Equation 23) for two different input statistics that differed only in the correlations between the presynaptic cells but not in the activity dynamics or spiking of individual neurons (temporal correlations). To compute the postsynaptic response, we assumed that dendritic integration in the postsynaptic cell was linear but, in order to dissect the role of dendritic integration across synapses from the effects of nonlinearities in individual synapses, we allowed spikes from the same presynaptic neuron still to be integrated nonlinearly (Pfister et al., 2010). In the first case (Figure 1C), when the presynaptic neurons were independent, the best linear response was identical to the optimal response. However, if presynaptic neurons became correlated, the optimal response became nonlinear and the best linear response was unable to accurately follow the fluctuations in the input (Figure 1D).

Table 1

Parameters used in Figures 15 of the main paper (see also Tables 23). Ω- (Ω+) is the rate of switching from the active to the quiescent (from the quiescent to the active) state. The resting potential corresponding to the active and quiescent states is u¯ and -u¯, respectively. Σ¯ii (Σ¯ij) is the posterior variance (covariance) of the presynaptic membrane potential fluctuations in a given state where Σ¯=Qτ2. τrefr is the length of the refractory period and prel is the baseline transmission probability in these synapses (13, 49).

Figure 1Figure 2Figure 3Figure 4Figure 5
Σ¯ij ijmV200.50.500.500*50.5
  1.  *These parameters were fitted to experimentally recorded dendritic responses, see Figure 5—figure supplement 1. The numbers 0 and 20 indicated here are in addition to the number of stimulated synaptic sites in the experiment. For the ind model, this number does not affect the predictions, for the cor2 model its effects could phenomenologically be incorporated into which we chose to fit instead.

Table 2

Features of neocortical population activity during quiet wakefulness. Parameters of the model are given in column NC of Table 1.

DataModel (NC)Reference
duration of active states130 ms100 msGentet et al. (2010)
duration of quiescent states200 ms250 msGentet et al. (2010)
r+, firing rate during active states2.5 Hz2.86 HzGentet et al. (2010)
r-, firing rate during quiescent states1/3 Hz0.39 HzGentet et al. (2010)
2u¯, depolarisation during active states20 mV20 mVGentet et al. (2010)
time constant20 ms20 msPoulet and Petersen (2008)
Table 3

Features of hippocampal population activity during sharp wave-ripple states. Parameters of the model are given in column HP of Table 1A recent intracellular study (English et al., 2014) recording from CA1 neurons in awake mice found parameters similar to our previous estimates. Using the parameters found in that study – r+=12.8 Hz, r-=2.85 Hz (Table 1 of English et al., 2014), 2u¯=5 mV and Σ¯ii=4 mV2 (Figure 3A of English et al., 2014) yielding g=5 Hz and β=0.3 mV-1 – did not influence our results (not shown).

DataModel (HP)Reference
activation rate of an ensemble 0.25 Hz0.027 HzGrosmark et al. (2012); Pfeiffer and Foster (2013)
duration of SPWs105 ms100 msCsicsvari et al. (2000)
r+, firing rate during SPW10 Hz9.5 HzCsicsvari et al. (2000)
r-, firing rate between SPWs0.5 Hz0.6 HzGrosmark et al. (2012); Csicsvari et al. (2000)
2u¯, depolarisation during SPWs0–10 mV4.6 mVYlinen et al. (1995)
time constant8–22 ms20 msEpsztein et al. (2011)

Thus, inputs from presynaptic neurons whose activity tends to be correlated need to be nonlinearly integrated, while inputs from uncorrelated sources need to be integrated linearly. This could be naturally achieved in the same dendritic tree by clustering synapses of correlated inputs to efficiently engage dendritic nonlinearities, while distributing the synapses of uncorrelated inputs on different dendritic branches (Larkum and Nevian, 2008). Crucially, for correlated inputs it is also necessary that the dendritic nonlinearities have just the appropriate characteristics for the particular pattern of correlations in presynaptic activities.

The form of the optimal nonlinearity depends on the statistics of presynaptic inputs

In order to systematically study the nonlinearities in the optimal response in the face of naturalistic input patterns, we derived and analyzed its behavior for a flexible class of richly structured, correlated inputs. Our statistical model for presynaptic activities, specifying the parametric forms of P(u) and P(s|u) (Materials and methods and Figure 2—figure supplement 1), was able to generate a variety of multi-neural activity patterns resembling the statistical properties described in in vitro and in vivo multielectrode recordings of neuronal population activities (Figure 2A and D show two representative examples). Once we have specified the statistical model of presynaptic activities, it uniquely determined the optimal response to any given input pattern (Equations 3–4). Thus, we used the same statistical model in two fundamentally different ways: first, to generate “naturalistic” in vivo-like patterns of presynaptic membrane potential traces and spike trains; and second, to compute the optimal response pattern to any stimulation pattern, be it “naturalistic” or parametrically varying “artificial” as used in typical in vitro experiments.

Figure 2 with 1 supplement see all
Nonlinearities in the optimal response.

(AC) Second order correlations between presynaptic neurons (A) imply sublinear integration (BC). (A) Membrane potentials and spikes of two presynaptic neurons with correlated membrane potential fluctuations. (B) The optimal response (solid line) to a single spike (left) and to a train of six presynaptic spikes (right, green colors correspond to different presynaptic cells, two of which are shown in A) when the long-run statistics of presynaptic neurons are like those shown in (A). Shaded areas highlight how response magnitudes to a single spike from the same presynaptic neuron differ in the two cases: the response to the sixth spike in the train (right, light blue shading) is smaller than the response to a solitary spike (left, gray shading) implying sublinear integration. Dashed line shows linear response. (C) Response amplitudes for 1–12 input spikes versus linear expectations. (DF) Same as (AC) but for presynaptic neurons exhibiting synchronized switches between a quiescent and an active state, introducing higher order correlations between the neurons (D, bottom). In this case, the optimal response shows supralinear integration (EF).


The optimal response determined by this statistical model, for essentially any setting of parameters, was inherently nonlinear because the additional effect of a presynaptic spike depended on the pattern of spikes that had been previously received from the presynaptic population. Temporal correlations in the presynaptic population caused the optimal response to depend on the spiking history of the same cell (Pfister et al., 2010), while crucially, the additional presence of spatial correlations introduced a dependency on the past spikes of other cells. Thus, the integrated effect of multiple spikes could not be computed as a simple linear sum of their individual effects in isolation. Specifically, a spike that was consistent with the information already gained from recent presynaptic spikes had only a small effect on the response (Figure 2B). Conversely, a spike that was unexpected based on the recent spiking history caused a larger change (Figure 2E).

As could be anticipated based on Equations 3-4, whether a spike counted as expected or unexpected relative to recently received spikes, and hence whether it had a small or large postsynaptic effect, depended on the long-run prior distribution of presynaptic activities, P(u). As a result, the same pattern of presynaptic spikes led to qualitatively different responses under different prior distributions. In particular, sublinear integration was optimal when presynaptic activities exhibited Gaussian random walks and thus they did not contain statistical dependencies beyond second order correlations (Figure 2A-C), as seen in the retina and cortical cultures (Schneidman et al., 2006). This was because with the activities of presynaptic neurons being positively correlated, successive spikes conveyed progressively less information about the presynaptic signal resulting in sublinear integration (Figure 2C) and the strength of the sublinearity depended on the magnitude of correlations (Ujfalussy and Lengyel, 2011). In contrast, supralinear integration was optimal when the presynaptic population exhibited coordinated switches between distinct states associated with large differences in the activity levels compared to activity-fluctuations within each state (Figure 2D–F). These switches led to higher order statistical dependencies as seen in the cortex in vivo, either due to population-wide modulation by cortical state (Gentet et al., 2010; Crochet et al., 2011), or due to stimulus-driven activation of particular cell ensembles (Harris et al., 2003; Ohiorhenuan et al., 2010; Miller et al., 2014). In this case, while observing a few spikes was consistent with random membrane potential fluctuations within the quiescent state, thus only warranting a small response, further spikes suggested that the presynaptic population was in the active state now and thus the response should be larger, leading to supralinear integration (Figure 2F).

Note, that nonlinearities in the optimal postsynaptic response needed not simply compensate for the nonlinearities in the presynaptic spike generation process, as captured by P(s|u), but they critically depended on the presynaptic correlations, as captured by P(u). Indeed, in Figures 1C, D and 2A–F, the same spiking nonlinearity was used and yet very different input integration was required depending on the form of the presynaptic statistics: linear integration for uncorrelated inputs (Figure 1B) and nonlinear integration for correlated inputs (Figure 1C), with sub- or supralinear integration being optimal depending on whether only second order (Figure 2A–C) or also higher order correlations were present in the presynaptic population (Figure 2D–F). Moreover, optimal input integration remained nonlinear even if the postsynaptic neuron computed a function of the presynaptic firing rates (rather than membrane potentials) which were linearly related to spikes (Figure 1—figure supplement 1).

Nonlinear dendrites can approximate the optimal response

The nonlinear input integration seen in the optimal response strongly resembled dendritic nonlinearities. Indeed, the basic biophysical mechanisms present in dendrites naturally yield nonlinearities that are qualitatively similar to those of the optimal response: purely passive properties lead to sublinear integration (Koch, 1999), whereas locally generated dendritic spikes endow dendrites with strong supralinearities (Nevian et al., 2007; Branco and Häusser, 2011). However, the full mathematical implementation of the optimal response is excessively complex (Materials and methods) and thus, there is unlikely to be a one-to-one mapping between the variables necessary for implementing it and the biophysical quantities available in dendrites. Therefore, we sought to establish a formal proof that dendritic-like dynamics can implement, even if approximately, the optimal response. For this, we considered two limiting cases of our statistical model of presynaptic activities, P(u) and P(s|u), and compared the properties of the corresponding optimal response to a well-established simplified model of nonlinear dendritic integration, using a combination of analytical and numerical techniques.

First, we considered a limiting case in which the statistics of a large presynaptic population were strongly dominated by the simultaneous switching of presynaptic neurons between a quiescent and an active state (as shown in Figure 2D). In this limiting case we were able to show mathematically (see Materials and methods) that a simple, biophysically-motivated, canonical model of nonlinear dendritic integration (Poirazi and Mel, 2001) is able to produce responses that are near-identical to the optimal response for any sequence of presynaptic spikes (Figure 3A, see also Figures 4C). In this simple dendritic model (Figure 3A, inset; Equations 24–25), inputs within a branch are integrated linearly and the local dendritic response is then obtained by transforming this linear combination through a sigmoidal nonlinearity, which is a hallmark of supralinear behavior in dendrites (Poirazi et al., 2003b).

Figure 3 with 2 supplements see all
A canonical model of dendritic integration approximates the optimal response.

(A) The optimal response (black) and the response of a canonical model of a dendritic branch, v (inset), with a sigmoidal nonlinearity (red, Equation 25) as functions of the linearly integrated input, vlin (inset, Equation 24), when the presynaptic population exhibits synchronized switches between a quiescent and an active state, as in Figure 2D. Black dots show optimal vs. linear postsynaptic response sampled at regular 2.5 ms intervals during a 3 s-long simulation of the presynaptic spike trains. (B) Optimal response (black) approximated by the saturating part of the sigmoidal nonlinearity (blue) when the presynaptic population is fully characterized by second-order correlations, as in Figure 2A. Inset shows the same data on a larger scale to reveal the sigmoidal nature of the underlying nonlinearity (gray box indicates area shown in the main plot).

Figure 4 with 2 supplements see all
A simple nonlinear dendritic model closely approximates the optimal response for realistic input patterns.

(A) Presynaptic spiking activity matching the statistics observed during hippocampal sharp waves. Spike trains (rows) belonging to four different assemblies are shown (colors), gray shading indicates assembly activations. (B) Different variants of the dendritic model, parts colored in yellow, orange, and red highlight the differences between successive variants (see text for details). (C) Estimating the mean of the presynaptic membrane potentials based on the observed spiking pattern (shown in A) by the optimal response (black) compared to the linear (dotted), somatic (yellow), random (orange) and clustered (red) models. (D) Performance of the four model variants compared to that of the optimal response. Gray lines show individual runs, squares show mean ± s.d. Performance is normalized such that 0 is obtained by predicting only the time-average of the signal, and 1 means perfect prediction attainable only with infinitely high presynaptic rates (Materials and methods).


Second, we considered another limiting case in which the statistics of the presynaptic population were fully characterized by second-order correlations (as shown in Figure 2A). In this case, the same type of dendritic model, but with a sublinear input-output mapping, was able to approximate the optimal response very closely. Although a closed-form solution for the optimal nonlinear mapping could not be obtained in this case, it could be shown to be sublinear (Appendix), and was well approximated by a sigmoidal nonlinearity parametrized to be dominantly saturating (Figures 3B and Figure 3—figure supplement 1).

We also noted that it was the same type of sigmoidal nonlinearity which could implement supralinear and sublinear integration depending on the input regime (low background, synchronous spikes: supralinear; high background, asynchronous spikes: sublinear integration, compare Figure 3A and B, inset). This suggests that dendritic integration may adapt to systematic changes in presynaptic statistics, such as those brought about by transitioning between the desynchronized and synchronized states of the neocortex, or sharp waves and theta activity in the hippocampus, without having to change the parameters of its nonlinearity (Borst et al., 2005) (Figure 3—figure supplement 2). Indeed, Gasparini and Magee (2006) demonstrated that dendritic integration in hippocampal pyramidal cells was supralinear when inputs were highly synchronized (as they are during sharp waves), while integration was linear if the input was asynchronous (such as during theta activity).

Nonlinear dendrites are computationally advantageous

While the foregoing analyzes proved that dendritic-like nonlinearities can closely approximate the optimal response in certain limiting cases, they do not address directly whether having such nonlinearities in input integration is crucial for attaining near-optimal computational performance for more realistic input statistics, or simpler forms of input integration could achieve similar computational power. To study this, we considered a scenario in which the presynaptic population consisted of four ensembles, such that neurons belonging to each ensemble underwent synchronized switches in their activity levels which were independent across the four ensembles, while there were also independent fluctuations in the activity of individual presynaptic neurons which were comparable in magnitude to those caused by these synchronized activity switches (Figure 4A). We then assessed the performance of four different variants of a simple dendritic model relative to that of the optimal response (Figure 4B): a model with linear dendrites and soma; a model in which only the soma was nonlinear, and two models in which nonlinearities resided in the dendrites with either random or clustered connectivity between the presynaptic assemblies and the dendritic branches.

We quantified the performance of each of the models based on how closely their output approximated the linear average of the analog presynaptic activities giving rise to the spike trains they were integrating (Figure 4—figure supplement 1, Materials and methods). For a fair comparison, we tuned the parameters of each variant of the dendritic model to obtain the best possible performance with it (Figure 4C). The model with nonlinear dendrites and clustered connectivity had near-optimal cross-validated performance (Figure 4D) while all other models performed significantly worse (n = 20 runs, t = 51, t = 35, t = 20, and P<10-15, P<10-15, P<10-13; respectively from left to right as shown in Figure 4D). This remained true when we varied the number and firing rate of presynaptic neurons over a wide range, and under a diverse set of qualitatively different population-level statistics, determining the dynamics of assembly switchings and within-assembly membrane potential correlations (Figure 4—figure supplement 2).

Taken together, these results demonstrate that the clustering of correlated inputs together with nonlinearities akin to those found in dendrites is necessary to achieve optimal estimation performance in the face of presynaptic correlations. However, in order to be tractable, our dendritic model was mathematically simplified and, as a result, only qualitatively reproduced the nonlinearities of real dendrites. Thus, we directly compared experimentally recorded responses in dendrites to the optimal response.

Nonlinear integration in cortical neurons is matched to their input statistics

A crucial prediction of our theory is that dendritic nonlinearities act to achieve near-optimal responses in a way that the form of the nonlinearity is specifically matched to the long-run statistics of the presynaptic population. We tested this prediction in experiments in which two different types of cortical pyramidal neurons, from layer 2/3 of the neocortex (Figure 5A–E) and from area CA3 of the hippocampus (Figure 5F–J), received patterned dendritic stimulation using two-photon glutamate uncaging, and compared their subthreshold somatic responses with the optimal responses predicted by the theory.

Figure 5 with 2 supplements see all
Nonlinear dendritic integration is matched to presynaptic input statistics.

(A) Sample membrane potential fluctuations (left, adapted from Gentet et al., 2010) and multineuron spiking patterns (right, adapted from Ji and Wilson, 2007) recorded from the neocortex (top), and matched in the model (bottom, see also Tables 13). (B) Two-photon image of a neocortical layer 2/3 pyramidal cell, numbers indicate individual dendritic spines stimulated in the experiment. (C) Responses to trains of seven stimuli using different inter-stimulus intervals (ISI, shown below traces) recorded in the cell shown in (B) (black; mean ± s.d.) and predicted by the optimal response tuned to the presynaptic statistics shown in A (red). Parameters related to postsynaptic dendritic filtering were tuned for the specific dendrite ( Figure 5—figure supplement 1B–C). (D) Dependence of response amplitudes on ISI in the same dendrite shown in B-C (squares), and as predicted by the optimal response (filled circles) or linear integration (empty circles). (E) Average error of fitting dendritic recordings across all dendrites and conditions using the optimal response tuned to different presynaptic statistics (NC, HP, cor2, ind; see text for details) compared to within-data variability (var). Gray lines show individual dendrites. Rightmost bar (NC-AP5) shows fit using NC presynaptic statistics to responses obtained after pharmacological blockade of NMDA receptor activation. (FJ) Same as (AE) for presynaptic patterns characterized by hippocampal sharp waves (F) and recordings from hippocampal CA3 pyramidal cells (HJ) when stimulating synapses on its basal dendrites (G). In vivo data in (F) was adapted from (Ylinen et al., 1995) (left, membrane potential traces, not simultaneously recorded) and (O'Neill et al., 2006) (right, multineuron spike trains). Error bars show s.e.m.


For generating our predictions of the optimal response in these two cell types, we fitted the parameters describing presynaptic statistics in our model, P(u) and P(s|u), to the statistical patterns in the activity of their respective presynaptic populations. For neocortical pyramidal cells, we fitted in vivo data available on the membrane potential fluctuations of layer 2/3 pyramidal cell-pairs in the barrel cortex during quiet wakefulness (Gentet et al., 2010; Crochet et al., 2011) (NC, Figure 5A, Table 2). For hippocampal pyramidal cells, we fitted presynaptic statistics to membrane potential fluctuations (Ylinen et al., 1995; English et al., 2014) and to multineuron spiking patterns of hippocampal pyramidal cells during sharp wave activity (Csicsvari et al., 1999; 2000) (HP, Figure 5F, Table 3). Due to the limitations of available hippocampal data sets, extracellular rather than intracellular data was used for fitting correlations. The motivation for our choice of the particular neocortical and hippocampal states used for fitting presynaptic statistics was two-fold. First, the general network state of the slice preparations in which we tested dendritic integration was likely most analogous to these states (A Gulyás, personal communication; see also Karlocai et al., 2014; Schlingloff et al., 2014), characterized by relatively suppressed neural excitability due to low levels of cholinergic modulation (Harris and Thiele, 2011; Eggermann et al., 2014). Second, the stimulation protocol used in our study (short bursts of synaptic stimuli following longer silent periods) was also most consistent with population activity during hippocampal sharp waves and quiet wakefulness in the cortex. In order to capture variability across the cells we recorded from, the parameters related to postsynaptic dendritic filtering (amplitude and decay of the response to a single stimulation, and the size of the dendritic subunit, Figure 5—figure supplement 1B-C) were tuned for the individual dendrites. Importantly, the parameters describing presynaptic statistics were fitted without regard to our dendritic experimental data, thus allowing a strong test of our predictions about dendritic integration (see Materials and methods).

We found that the non-linear integration of individual spike patterns in cortical neurons was remarkably well fit by the optimal response when it was tuned to the correct presynaptic statistics (Figure 5C,H). The systematic dependence of response amplitudes on the inter-stimulus interval (ISI) in individual cells (Figure 5D,I) was also well predicted by the optimal response. We quantified the quality of match between the predicted and experimentally recorded time course of responses across a population of n = 6 (neocortex) and n = 6 (hippocampus) dendrites under a range of conditions varying ISI or the number of stimuli, and found that the precision of our predictions was not statistically different from that expected from the inherent variability of responses in individual dendrites (Figure 5E,J; neocortex: t = 0.2, P = 0.85; hippocampus: t = 1.85, P = 0.12). In contrast, when the optimal response was tuned to unrealistic presynaptic statistics characterized purely by second-order correlations (cor2), or by a lack of any correlations implying statistically independent presynaptic firing (ind), the quality of fits became significantly worse (Figure 5E,J; neocortex: t = −4.6, P = 0.006 for cor2, and t = −4.9, P = 0.004 for ind; hippocampus: t = −4, P = 0.01 for cor2, and t = −4.9, P = 0.004 for ind).

Moreover, using realistic presynaptic statistics, but matching hippocampal rather than neocortical activities, also resulted in significantly worse fits for neocortical responses (Figures 5E; t= −3.6, P = 0.02). The converse was not observed in the case of hippocampal neurons (Figures 5J; t= 0.43, P = 0.68). This might be because hippocampal neurons also receive neocortical inputs (albeit on their apical not basal dendrites) that show similar population activity patterns to the ones we matched here for the neocortical cells (Isomura et al., 2006), while the primary sensory cortical pyramidal cells we recorded from do not receive direct input from the hippocampus. Nevertheless, when we analyzed the quality of fit between our predictions and recorded responses in hippocampal and neocortical data together, we found a small, but significant interaction between the source of the input statistics (neocortex or hippocampus) and the location of the postsynaptic neurons (ANOVA F = 5, P < 0.05). This suggests that dendritic nonlinearities in cortical pyramidal neurons are specifically tuned to the dynamics of their presynaptic cortical ensembles. Furthermore, the blockade of NMDA receptor activation by AP5 resulted in dendritic responses that afforded substantially poorer fits by the model, even after refitting the postsynaptic parameters (Figure 5E,J, AP5). This indicated that the fine tuning of dendritic nonlinearities to input statistics relied on the action of NMDA receptors.

As dendrites in both of our cortical cell types integrated inputs supralinearly, as a further control, we analyzed similar data available from cerebellar stellate cell dendrites, which are known to integrate their inputs sublinearly (Abrahamsson et al., 2012) (Figure 5—figure supplement 2). In this case, we fitted the statistics of individual presynaptic cells to those of cerebellar granule cells. The correlations between these cells are less known, but we found that assuming simple second-order correlations made the optimal response a close match to dendritic responses. In contrast, the hippocampal- or neocortical-like statistics that were crucial for matching responses in cortical dendrites (Figure 5D,H) resulted in a substantially poorer fit in this cerebellar cell type. This demonstrates a double dissociation in the matching of cortical and subcortical neuron types to cortical and non-cortical input statistics.


We established a functional link between the statistics of the synaptic inputs impinging on the dendritic tree of a neuron and the way those inputs are integrated within the dendritic tree. We first demonstrated that efficient computation in spiking circuits requires nonlinear input integration if the activities of the neurons are correlated and that the structure of the presynaptic correlations determines the form of the optimal input integration. Second, we showed that the optimal response can be efficiently approximated by a canonical biophysically-motivated model of dendritic signal processing both for linearly correlated inputs and for cell-assembly dynamics. Third, we found that nonlinear dendrites with synaptic clustering carry significant benefits for decoding richly structured presynaptic spike trains. Finally, in vitro measurements of dendritic integration in two different types of cortical pyramidal neurons yielded postsynaptic responses that closely matched those predicted to be optimal given the in vivo input statistics of those particular cell types. These results suggest that nonlinear dendrites are essential to decode complex spatio-temporal spike patterns and thus play an important role in network-level computations in neural circuits.

Biophysical substrate

The central insight of our theory is the relationship between presynaptic statistics and postsynaptic response, formalized as the optimal response. The optimal response can be expressed as a set of nonlinear differential equations that requires storing and continuously updating ~N2 variables within the dendritic tree, where N is the number of synapses (Materials and methods). Thus, it is unlikely to be implemented by the postsynaptic neuron as such. Consequently, to demonstrate the biophysical feasibility of our theory, we derived a simple approximation to the optimal response that performs about equally well using just a few postsynaptic variables and that corresponds to a canonical descriptive model of dendritic integration (Poirazi et al., 2003; Poirazi et al., 2003b).

We found that simple second order correlations between presynaptic neurons imply sublinear integration which can be implemented by the saturating nonlinearity characteristic of passive dendrites. Conversely, the biophysical substrate for the type of supralinear integration that was optimal for state-switching dynamics likely involves NMDA receptors because the particular dendritic nonlinearites observed in the cortical cells in which we tested our theory are known to be mediated primarily through NMDA receptor activation (Branco et al., 2010; Makara and Magee, 2013; Major et al., 2013). Indeed, we found that pharmacological inactivation of NMDA receptors abolished the precise match between dendritic integration and presynaptic statistics in these neurons (Figure 5). Moreover, the local plateau potentials generated by NMDA currents have been shown to have graded response durations (Major et al., 2008), and the resulting nonlinearities could be continuously tuned between weaker and stronger forms (boosting and bistability, Major et al., 2013). These properties make NMDA receptor mediated dendritic nonlinearities ideally suited for being matched to presynaptic statistics, as the optimal response involves sustained dendritic depolarisations of varying duration (Figure 4) that depend parametrically on those statistics.

Input statistics and clustering

A central prediction of our theory that awaits confirmation is the existence of a tight relationship between the structure of correlations in the activity of presynaptic cells and the morphological clustering of their synapses on the postsynaptic dendrite. This is because our theory requires nonlinear integration of inputs from neurons with statistically dependent activity, while spikes from independent neurons need to be integrated linearly. Biophysical considerations suggest (Koch, 1999) and experimental data supports (Polsky et al., 2004; Losonczy and Magee, 2006) that, when synchronous, nearby inputs on the same dendritic branch are summed nonlinearly, whereas widely separated inputs are combined linearly. Consequently, our theory predicts that the correlation structure of the inputs will be mapped to the dendritic tree in a way that presynaptic neurons with strongly correlated activities target nearby locations while independent neurons innervate distinct dendritic subunits.

According to our theory, the kind of correlation relevant for determining synaptic clustering is the ‘marginal’ correlations between the membrane potentials of presynaptic neurons. Marginal correlations include both signal and noise correlations (Averbeck et al., 2006) and thus can reach substantial magnitudes even when noise correlations alone are small, as e.g. during desynchronized cortical states (Renart et al., 2010), especially for neurons with overlapping receptive fields (Froudarakis et al., 2014), and when measured between the membrane potentials of neurons rather than their spike counts (Dorn and Ringach, 2003; de la Rocha et al., 2007; Poulet and Petersen, 2008).

At the level of different dendritic regions, the segregation of different input pathways along the dendritic tree of hippocampal neurons supports this prediction (Witter et al., 1989; Druckmann et al., 2014). At the level of individual synapses, the degree and the existence of clustering among inputs showing correlated activity is currently debated. High resolution imaging revealed subcellular topography of sensory inputs in the tadpole visual system (Bollmann and Engert, 2009), clustered patterns of axonal activity in the parallel fibres that provide input to cerebellar Purkinje cells (Wilms and Häusser, 2015), and experience-driven synaptic clustering in the barn owl auditory localization pathway (McBride et al., 2008). Furthermore, it has been demonstrated that neighboring synapses are more likely to be coactive than synapses that are further away from each other in developing hippocampal pyramidal cells (Kleindienst et al., 2011) as well as in hippocampal cultures and in vivo in the barrel cortex during spontaneous activity (Takahashi et al., 2012). These results thus suggest clustering of correlated inputs.

In contrast, an interspersion of differently tuned orientation-, frequency- or whisker-specific synaptic inputs on the same dendritic segments was found in the mouse visual, auditory or somatosensory cortex, respectively, thus challenging the prevalence of synaptic clustering (Jia et al., 2010; Chen et al., 2011; Varga et al., 2011). However, in all these studies the stimuli used were non-naturalistic and varied along a single stimulus dimension only (direction of drifting gratings, pitch of pure tones, or the identity of the single whisker being stimulated), which may account for the apparent lack of clustering. In particular, our theory predicts clustering based on the long-term statistical dependencies between the responses of the presynaptic neurons for naturalistic inputs, which can be quite poorly predicted from their tuning properties for single stimulus dimensions (Harris et al., 2003; Fiser et al., 2004). In contrast, the statistical dependencies relevant for our theory are well represented by those found during spontaneous activity (Berkes et al., 2011). Indeed, studies finding evidence in favor of synaptic clustering analyzed the structure of synaptic input to dendritic branches during spontaneous network activity (McBride et al., 2008; Kleindienst et al., 2011; Makino and Malinow, 2011; Takahashi et al., 2012). Thus, presynaptic correlations for naturalistic stimulus sets may be predictive of synaptic clustering and providing more direct evidence for or against such clustering will offer a crucial test of our theory.

Linear vs. nonlinear postsynaptic computations

Although, in general, we expect single-neuron computations to be nonlinear (Zador, 2000), and our theory indeed applies to nonlinear computations (Figure 1—figure supplement 3), we assumed the postsynaptic computation to be linear for matching experimental data. This choice was justified by two reasons. First, it is difficult to determine, without making strong prior assumptions, what kind of nonlinear function the neuron actually computes; and so the choice of any particular such function would have been arbitrary. Note that even in relatively well-characterized cortical areas (such as the visual cortex) it is unknown how much of the computationally relevant output of individual neurons (such as orientation or direction selectivity) is brought about by specific nonlinearities in the input-output transformations of these neurons, or by multiple steps of feed-forward and recurrent processing carried out at various stages of the visual pathway between the retina and those neurons. Moreover, in some cases, even networks with linear dynamics can provide a remarkably close fit to experimentally observed cortical population dynamics (Hennequin et al., 2014). This issue may be best addressed in systems that are more specialized than the cortex so that there are well-supported hypotheses about the particular nonlinear computations individual neurons need to perform, such as the fly visual system (Single and Borst, 1998) or the mammalian and avian auditory brain stem (Agmon-Snir et al., 1998). In order to test our theory in these systems, in vivo multineural data will need to be collected from the afferent brain areas, preferably in the unanesthetized animal, for characterising the relevant statistical properties of the presynaptic population to which dendritic nonlinearities are adapted according to our prediction.

Second, any nonlinear function can be approximated to high precision by a linear function over a sufficiently limited input range. Currently available experimental techniques for systematically probing dendritic nonlinearities, including those used in our study, only provide data over such a very limited range (0.1% of the number of excitatory inputs impinging a neuron, Megías et al., 2001). Inputs in this small range do not sufficiently engage global nonlinearities brought about by active somatic conductances or global events such as Ca2+ spikes. Thus, we could assume linear computation over this range without loss of generality (Figure 1—figure supplement 2). In fact, from this perspective, it is a non-trivial phenomenon to account for on its own right that stimulating such a small fraction of inputs already leads to observable nonlinearities in the postsynaptic dendrite. By defining the computation to be linear, we could demonstrate that such strong dendritic nonlinearities arise naturally in our theory, entirely due to the correlations in the prior input statistics, thus providing a functional account for this remarkable phenomenon.

Once patterned dendritic stimulation over a broader and more realistic range of inputs becomes feasible, our theory will provide a principled method for dissecting the roles of presynaptic correlations vs. genuine nonlinear computations in shaping dendritic nonlinearities. A sufficiently rich set of such data will allow the fitting of presynaptic parameters, as we did here, followed by fitting postsynaptic transfer functions to dendritic responses without having to make strong prior assumptions about their (linear) nature.

Analog communication, stochastic synaptic transmission and short-term synaptic plasticity

Our formalism was based on the assumption that cortical neurons only influence each other’s membrane potentials via the action potentials they emit. While there exist other, more analog forms of communication, such as the modulation of the effects of action potentials by subthreshold potentials (Clark and Häusser, 2006), the propagation of voltage signals through gap junctions (Vervaeke et al., 2012), and ephaptic interactions between nearby cells (Anastassiou et al., 2011), these either require slow membrane potential dynamics, small distances between interacting cells, or large degrees of population synchrony, and are thus generally believed to have a supplementary role beside spike-based communication (Sengupta et al., 2014). Note that our theory is self-consistent even though it considers spiking only in the presynaptic population and not in the postsynaptic neuron. This is because we assumed that the computationally relevant mapping is that between the membrane potentials of the presynaptic neurons and the postsynaptic cell (Figure 1A, Equation 1), and so, by induction, the spikes of the postsynaptic neuron will effect the mapping from its membrane potential to those of its postsynaptic partners.

We also assumed that presynaptic spikes deterministically and uniformly impact the postsynaptic response, and thus apparently neglected the stochasticity in synaptic transmission, and in particular systematic variations in synaptic efficacy brought about by short-term synaptic plasticity. Nevertheless, these presynaptic features are compatible with our theory. The stochasticity of synaptic transmission, due to a baseline level of synaptic failures, is straight-forward to incorporate in the model by reducing the effective presynaptic firing rate, which can thus be interpreted as a ‘transmission rate’ instead. In fact, we have already done this while matching hippocampal and neocortical presynaptic statistics (Table 1).

Short-term synaptic plasticity, resulting in dynamical changes in synaptic efficacy as a function of the recent spiking history of the presynaptic neuron, is not only a constraint in our framework, but as we have shown in related work, it can act itself as an optimal estimator of the membrane potentials of individual presynaptic neurons (Pfister et al., 2010). Thus, the effects of short-term plasticity can be regarded as a special case of what can be expected from our optimal response: when presynaptic neurons are statistically independent, spikes arriving at different synapses are integrated linearly, and local nonlinearities acting at individual synapses suffice (Figure 1C, see also Materials and methods). However, the importance of nonlinear interactions between inputs from different presynaptic neurons, brought about by dendritic nonlinearities, rapidly increases with the magnitude of presynaptic correlations, especially in large populations (Figure 1D, see also Ujfalussy et al., 2011).

These considerations suggest that short-term synaptic plasticity and dendritic nonlinearities have complementary roles in tuning the postsynaptic response to the statistics of the presynaptic population along the orthogonal dimensions of time and space. The former is useful in the face of temporal correlations private to individual presynaptic neurons (auto-correlations, e.g., brought about by spike frequency adaptation, Pfister and Surace, 2014), while the latter is matched to spatio-temporal correlation patterns present across the presynaptic population.

Inhibitory neurons

We focused on the nonlinear integration of excitatory inputs in the dendritic tree of cortical neurons that have been extensively studied and described over the past decades, giving rise to a strong body of converging evidence as to their characteristics and mechanisms (Spruston, 2008). Recent work studying the nonlinear interaction between inhibitory and excitatory inputs in active dendrites (Gidon and Segev, 2012; Jadi et al., 2012; Müller et al., 2012; Wilson et al., 2012; Lovett-Barron et al., 2012) demonstrated that local inhibition has a powerful control over the excitability of the dendritic tree.

However, it is not yet clear whether these inhibitory inputs are directly involved in the computation performed by the circuit, just as excitatory neurons but with negative signs (Koch et al., 1982), or, alternatively, they may have a more ancillary role in supporting computations carried out primarily by excitatory neurons (Vogels et al., 2011).

Our theory can be extended to include both possibilities, by allowing inhibitory inputs to contribute to the computational function, f(u), with negative weights, or by considering them as providing auxiliary information about the common state of the excitatory presynaptic ensemble, especially when this state is in the more suppressed regime. Indeed, our preliminary results suggest that such an extension of our theory (Ujfalussy and Lengyel, 2013) successfully accounts for the interaction of (excitatory) Schaffer collateral inputs with the feedforward inhibitory effects of the temporo-ammonic pathway (Remondes and Schuman, 2002), likely mediated by interneurons delivering dendritic inhibition (Dvorak-Carbone and Schuman, 1999).

In the present paper we focused on dendritic integration in pyramidal neurons because dendritic nonlinearities have traditionally been more extensively characterized in this cell type, but our theory equally applies to synaptic integration in other types of neurons, including inhibitory interneurons. Therefore, our theory predicts a qualitative similarity of dendritic integration in different neuron types (i.e. interneurons versus principal cells) when they receive inputs from overlapping presynaptic populations. Indeed, it has been recently found that inhibitory interneurons can exhibit dendritic NMDA spikes under certain experimental circumstances (Katona et al., 2011; Chiovini et al., 2014) in addition to standard sublinear integration. The differences between dendritic integration in excitatory and inhibitory neurons could be attributed to their different computational function, f(u), or differences in the specific presynaptic populations innervating them.

Adaptation of dendritic nonlinearities to presynaptic statistics

According to our theory, the optimal response depends on prior information about the input statistics. Consequently, for dendritic processing to approximate the optimal response, this prior information needs to be implicitly encoded in the form of the particular nonlinearity a dendrite expresses. Therefore, our theory predicts an ongoing adaptation of dendritic nonlinearities to presynaptic firing statistics across several time-scales.

First, there is a simple yet potent mechanism implicit in our theory that can ensure that a match of dendritic integration to presynaptic statistics is maintained as those statistics are changing over time. This is based on the observation that essentially instantaneous, albeit probably incomplete, adaptation can occur even without specific changes in the integrative properties of dendrites per se, simply due to the fact that a critical level of input synchrony is required to elicit dendritic spikes, and so the same sigmoid-looking dendritic transfer function can be used as superlinear, linear, or sublinear, depending on which part of its input range is being used (Figure 3—figure supplement 2).

Second, to match the more specific modulation of the statistics of presynaptic activities by global cortical states (Crochet et al., 2011; Mizuseki and Buzsaki, 2014), dendritic integration may also be modulated by these states. As different cortical states are typically accompanied by changes in the neuromodulatory milieu (Hasselmo, 1995; Harris and Thiele, 2011), neuromodulators may be the ideal substrates to ensure that dendritic integration also changes according to the current cortical activity mode. This may provide a functional account of changes in the excitability of the dendritic tree as dynamically regulated by acetylcholine and monoamines (Sjöström et al., 2008).

Third, experience-dependent synaptic plasticity can gradually change the statistics of the presynaptic population activity implying that the optimal form of input integration should also change as a function of experience. We propose that branch-specific forms of plasticity of dendritic excitability (Losonczy et al., 2008; Makara et al., 2009; Müller et al., 2012) may have a functional role in enabling dendrites to adjust the form of input integration to such slowly developing and long-lasting changes in the statistics of their inputs.

Finally, whether inputs from two presynaptic cells are integrated linearly or nonlinearly in a dendrite depends critically on the distance between their synapses within the dendritic tree (Polsky et al., 2004; Losonczy and Magee, 2006). Our theory requires nonlinear integration of inputs from neurons with statistically dependent activity, predicting a mapping of presynaptic correlations on the postsynaptic dendritic tree. Local electrical and biochemical signals can drive synaptic plasticity (Larkum and Nevian, 2008; Govindarajan et al., 2011; Winnubst et al., 2015) and rewiring (DeBello, 2008) leading to synaptic clustering of correlated inputs along the dendritic tree (Kleindienst et al., 2011; Takahashi et al., 2012).

A combination of all these mechanisms may be crucial for achieving and dynamically maintaining, at the level of individual neurons, a detailed matching of dendritic nonlinearities to presynaptic statistics. Thus, our theory provides a novel framework for studying a range of phenomena regarding the dynamical regulation of dendritic nonlinearities from the perspective of circuit-level computations.

Materials and methods

Source code for reproducing the analyses and simulations presented in the paper as well as the experimental data we used for testing our predictions are available online (https://bitbucket.org/bbu20/optimdendr).

Computing the optimal response

In order to study the optimal form of input integration with realistic input statistics, we need to make two important assumptions. First, we need to assume a particular algebraic form for the computation that a neuron performs. Second, we need to define what the relevant presynaptic statistics are, that is, the membrane potential and spiking dynamics of the presynaptic population under naturalistic in vivo circumstances. Given these two assumptions, the theory uniquely defines the optimal response of a neuron to any input pattern. The optimal response has qualitatively similar properties whether computations are defined as mappings between pre- or postsynaptic membrane potentials or firing rates (Figure 1—figure supplement 1).

Throughout the paper the term input refers to the spatio-temporal spiking pattern impinging the neuron while the response of a neuron refers to its (subthreshold) somatic membrane potential (or firing rate, see below). All parameters used in the paper are given in Table 1 or in the caption of the corresponding Figure.

Postsynaptic computation

We assumed the postsynaptic computation to be linear, i.e. the dynamics of the postsynaptic membrane potential v(t) evolves according to a weighted sum of the presynaptic membrane potential values, u(t) (cf. Equation 1):

(5) τpostv˙(t)=-v(t)+i=1Nwiui(t)

where τpost is the time constant of the postsynaptic neuron, N is the number of presynaptic neurons, and wi is the computational weight assigned to presynaptic neuron i. As the postsynaptic neuron cannot access presynaptic membrane potentials, u, directly only the spikes the presynaptic cells emit, s, (Figure 1, Equation 2), the optimal response (that minimizes mean squared error) is the expectation of Equation 5 under the posterior distribution of the presynaptic membrane potential at time t, u(t) given the history of presynaptic spiking up to that time, s(0:t) (cf. Equation 3):

(6) τpostv~˙(t)=-v~(t)+P(u(t)|s(0:t))i=1Nwiui(t) du(t)

Throughout the paper we call the output of Equation 6 the optimal response and compare its behavior to input integration in the dendrites of cortical pyramidal cells.

Table 1 shows the values of the parameters in Equation 5 (N, τpost, and wi) used in the simulations. In short, to illustrate the contributions of inference to Equation 6 (the term including the integral), we used τpost=0 in Figures 1, 3 and 4 as well as in all Supplemental Figures, unless otherwise stated. We used τpost=10 ms in Figure 2 to aid comparison with experimental data and fitted τpost to data for Figure 5. Throughout the paper we used wi=1/N, except in Figure 5 where we fit N and wi=w jointly to the data.

Presynaptic statistics

Computing the posterior, P(u(t)|s(0:t)) in Equation 6 requires a model for the joint membrane potential and spiking statistics of the presynaptic population, P(u,s) (see also Equation 4). For mathematical convenience, we present some of our results below in discrete time with time step size δt, which we will eventually take to zero to derive time-continuous equations. We distinguish discrete and continuous time results by using time as an index versus as an argument of the corresponding time-dependent quantities, e.g. ut vs. u(t).

We describe the joint statistics of presynaptic membrane potentials and spikes by a hierarchical generative model that has three layers of variables (Figure 2—figure supplement 1A,B). The global state of the system is described by a single binary variable, z that switches between a quiescent (-) and an active (+) state following first-order Markovian dynamics (see Appendix for the extension to an arbitrary number of states). The transition rates to the active and quiescent states are given by Ω+ and Ω-, respectively.

The dynamics of (subthreshold) membrane potentials u are modeled as a multivariate Ornstein-Uhlenbeck (mOU) process, which yields random walk-like behavior that (unlike simple Brownian motion) decays exponentially towards a baseline defined by the resting potential u¯, which in turn depends on the momentary global state of the system, zt:

(7) P(ut|ut-δt,zt)=𝒩(ut;(1-δtτ)ut-δt+δtτ u¯(zt),δt Q)

where τ is the presynaptic time constant of the exponential decay, and Q is the ‘process noise’ covariance matrix determining the variance of individual membrane potentials (together with τ) and, importantly, also the correlations between presynaptic neurons. It is straightforward to extend the model by also making these parameters state- (or in fact, neuron-) dependent.

Note that both the state switching and mOU components of this model introduce both spatial and temporal statistical dependencies in the membrane potentials and spike trains of presynaptic cells. In the rest of this paper, we informally refer to any statistical dependency (second or higher order) as ‘correlation’, and we write ‘auto-correlation’ when we refer to the correlations between the membrane potential (firing rate) values of the same neuron at different times, and ‘cross-correlation’ when referring to the correlation between the activities of two different cells (at the same time, or at different times). Also note that temporal and spatial correlations can not be studied in complete isolation in the case of smoothly varying signals, such as membrane potentials, as the cross-correlation between the activity of two presynaptic neurons always has a characteristic temporal profile. While it is possible to consider a presynaptic neuronal population completely lacking spatial correlations (i.e. independent presynaptic neurons, as in Figure 1C), having a population with only spatial but not temporal correlations would require the membrane potentials of the individual neurons to be temporally white noise – which is so far removed from reality that we did not consider this case worth pursuing.

More specifically, the timescale of temporal correlations (auto-correlations) in the model depend on the transition rates of the switching component, Ω+ and Ω-, and the presynaptic time constant of the mOU component, τ, such that cells are auto-correlated as long as τ, Ω+-1, and Ω--1>0. Spatial correlation (cross-correlations between different presynaptic neurons) also emerge from both components. First, the pairs of presynaptic neurons corresponding to the non-zero off-diagonal elements of Q matrix of the mOU component become correlated. Second, synchronous state transitions during state switching in the presynaptic ensemble introduce positive correlations. Importantly, in both the temporal and spatial domains, while the mOU process can only introduce second-order correlations (i.e. it makes membrane potentials be distributed according to a multivariate normal), the switching process introduces higher order correlations (such that membrane potentials are not normally distributed any more). These higher order correlations are stronger when the effect of state-switching is large relative to the membrane potential fluctuations within a single state.

Finally, instead of modeling the detailed dynamics of action potential generation, we model spiking phenomenologically by introducing a single discrete variable, si,t, for each presynaptic neuron that represents the number of spikes neuron i fires in time step t. (Note that in the limit δt0 this variable becomes binary, i.e. there can never be more than one spike fired in a δt time window.) Spiking in each cell only depends on the membrane potential of that cell, and follows an inhomogeneous Poisson process with the firing rate, r, being an exponential function of the membrane potential (Gerstner and Kistler, 2002):

(8) P(st|ut)=i Poisson(si,t;δt ri,t),with ri,t=geβui,t

where β describes how deterministically the cell switches to firing at threshold (u=0) and g is the firing rate at that threshold. We modeled the absolute refractory period by not allowing the generation of spikes (i.e. setting ri,t=0) within a time window of length τrefr following each spike in a cell, regardless of its membrane potential.

The parameters of the presynaptic statistics used in the paper are given in Table 1. Examples of neural dynamics generated by the model are shown in Figures 1, 2, 4, and 5.

Inference and the optimal response

Our goal was to infer the posterior distribution of the membrane potential based on the spiking pattern observed up to time t, P(ut|s0:t).

We first show that linear dendritic integration is sufficient when presynaptic neurons are statistically independent. We start by noting that by marginalising out the past membrane potential history of the presynaptic cells and using Bayes’ rule, the posterior can always be written as

(9) P(ut|s0:t)=P(u0:t|s0:t)du0:t-δt P(s0:t|u0:t)P(u0:t)du0:t-δt

and as the spikes of each neuron are independent from all other neurons conditioned on its own membrane potential history (Equation 8), this can be rewritten as

(10) P(ut|s0:t)iP(si,0:t|ui,0:t)P(u0:t)du0:tδt

In the special case when we assume that presynaptic neurons are statistically independent, i.e. their prior factorizes P(u0:t)=iP(ui,0:t), the posterior also becomes factorised


which in continuous time reads simply as

(13) P(u(t)|s(0:t))= iP(ui(t)|si(0:t))

Thus, taking our usual assumption that the postsynaptic computation is linear (Equation 5), the optimal response in Equation 6 can be written as

(14) τpostv~˙(t)=-v~(t)+i=1Nwiui(t) P(ui(t)|si(0:t))dui(t)

indicating that integration of inputs from different neurons is linear in this case (it is a weighted sum of terms each depending on just a single presynaptic neuron). However, even in this case, note that integration of input spikes from the same presynaptic neuron, i.e. the result of the integral over each ui(t) as a function of si(0:t), is still nonlinear in general (Pfister et al., 2010). Indeed, Equation 14 including these local nonlinearties was used to compute the linear response in Figure 1C–D.

In the general case inference can be performed using filtering such that in each step we update the inferred state of the hidden variables, zt and ut, using information from two different sources: the likelihood of emitting a particular spiking pattern (observation) and the dynamics of the hidden variables combined with the previous estimate (innovation):

(15) P(ut=u,zt=z|s0:t)P(st=s|ut=u)==zP(zt=z|ztδt=z)duP(ut=u|utδt=u,zt=z) P(utδt=u,ztδt=z|s0:tδt)

where the likelihood P(st|ut) is defined by Equation 8, the dynamics of the global state variable P(zt|zt-δt) is first order, Markovian (see above) and the state-dependent membrane potential dynamics P(ut|ut-δt,zt) is given by Equation 7. Equation 15 thus defines a mapping between the posterior distribution of the hidden variables in the previous time step (last term on RHS) and their current distribution (LHS). The posterior over membrane potentials can then be obtained by simply marginalising out the state variable:

(16) P(ut|s0:t)=z P(ut,zt=z|s0:t) 

For the following, it is useful to represent the posterior as a product of two terms:

(17) P(ut=u,zt=z|s0:t)=P(zt=z|s0:t)P(ut=u|zt=z,s0:t)

As the state variable is binary, its posterior is a Bernoulli distribution which we parametrize by ζ, without loss of generality:

(18) P(zt=+|s0:t)=ζt

However, in general, the posterior of the membrane potentials conditioned on the current state, zt, can be arbitrarily complex. To allow an analytical reduction of the inference process, we adopted an assumed density filtering approach in which this distribution is moment-matched in each time step by a multivariate normal distribution which is thus described by two (sets of) parameters, its mean, μt(z), and covariance, Σt(z):

(19) P(ut=u|zt=+,s0:t) 𝒩(u;μt+,Σt+)

with the analogous equation for the posterior of ut conditioned on zt being in the - state.

One advantage of this parametric approach is that inference (filtering) can be implemented by updating only the parameters describing the (approximate) posterior distribution (Equations 18–19): ζt, μt(z), and Σt(z). In the Appendix we derive an analytical form for these parameter updates resulting in the following set of differential equations:


where γ(z) (Γ(z)) is a state-dependent vector (diagonal matrix) of which the elements γi(z)=Γii(z)=geβμi(z)+12β2Σii(z) are the expected firing rates of the neurons in a given state, γ¯(z)=iγi(z) is the expected total population firing rate in state z, and Γ=ζ Γ++(1-ζ)Γ- is the expected firing rate of the cells averaged across states. In these equations, the spike trains of the presynaptic neurons are represented by the sum of Dirac-delta functions in continuous time and are denoted by s(t), to be distinguished from its discrete time analog, st, such that s(t)=limδt0st/δt (see Equation A62 in the Appendix). The differential equations for the conditional mean and variance in the - state, μ˙- and Σ˙-, are analogous to Equations 21–22. The absolute refractory period is taken into account by setting γi=Γii=0 after each observed spike for the duration of the refractory period, τrefr, thus omitting the effect of the likelihood (terms containing γ(z) or Γ(z)) from Equations 20–22.

The first term in Equation 20 captures the decay in ζt that is proportional to the difference in the state conditional firing rates in the absence of presynaptic spikes; the second term expresses the instantaneous change in ζt after observing a spike, proportional to both the state estimation uncertainty, ζ(1-ζ), and the differences in the conditional firing rates (γ+-γ-); and the last term captures the decay of ζt to its steady state in the absence of observations. The filtering equations for the conditional mean and covariance (Equations 21–22) are each composed of three terms: the first term expresses the decay of the variable towards its baseline in the absence of observations; the second term captures the effect of the current observation (i.e. the presence or absence of a spike) on the variable; and the third term describes the changes in the variable caused by potential state transitions. This can be viewed an extension and generalization of earlier work deriving the equivalents of Equations 21–22 for the special case of a single neuron without state-switching dynamics (Pfister et al., 2010).

Another advantage of the parametrization of the posterior we chose is that computing the optimal response, i.e. the posterior expectation of the simple linear functions that we consider in this paper, becomes straightforward (cf. Equation 6):

(23) τpostv~˙(t)=v~(t)+iwi(ζ(t)μi+(t)+(1ζ(t))μi(t))

In order to verify the assumed density filtering approximations used above we numerically integrated the system of differential equations (Equations 20–22) using the software package R (R Core Team, 2012; Soetaert et al., 2010) and compared the results with those obtained using standard particle filters (Doucet et al., 2001). In these simulations we used 500,000 particles to evaluate Equation 15 with 1 neuron and 2 states. Figure 2—figure supplement 1C shows that the results of assumed density filtering are essentially identical to those of particle filtering, confirming that the approximations we used were valid.

Dendritic approximation of the optimal response

Here we first describe a simple canonical model of dendritic integration following Poirazi and Mel (Poirazi and Mel, 2001), and then show that it provides an approximation to the optimal response (Equations 20–23) in the limiting case in which presynaptic dynamics are dominated by simultaneous switching between a quiescent and an active state. In this simple dendritic model, inputs within a branch are integrated linearly:

(24) v˙lin =-𝒜vlin+s(t)-𝒞

where vlin is the variable linearly integrating inputs with weight , dendritic time constant 1/𝒜 and steady state value -𝒞/𝒜 (in the absence of spikes), and s(t)=isi(t) denotes the spike train of presynaptic neurons, collecting all spikes from the presynaptic population. The actual dendritic response, vden, is then given by mapping this linear response through a sigmoidal nonlinearity, scaled to be between vmin and vmax (Figure 3A, inset):

(25) vden(t)=vmin+(vmax-vmin)11+e-vlin(t)

To demonstrate that this reduced model of dendritic integration closely approximates the optimal response, we first note that under appropriate conditions (τpost is small, N is large, Q is diagonal, and β is small relative to the diagonal elements of Q) the dynamics of the optimal response are dominated by the state switching process (Equation 20; see Appendix). Thus, the optimal response essentially follows the inference about the global state variable, ζ, up to linear rescaling and filtering:

(26) v~(t)  u¯-+(u¯+-u¯-)ζ(t)

with u¯+ and u¯- respectively denoting the resting membrane potential in the active and quiescent states. As Equation 26 is linear, all nonlinear interactions, corresponding to dendritic nonlinearities, must be contained in the temporal dynamics of the posterior probability of this global state variable being in the active state, ζ(t), which can be expressed as

(27) ζ ˙ ζ(1-ζ)[B s(t)-C]

where constants B and C depend on the parameters of the presynaptic statistics (see Appendix). Note that the fact that ζ(1-ζ) multiplies Equation 27 expresses the simple intuition that the size of the update to ζ (the posterior probability of z=+) in response to incoming information (presence, B term, or absence of a spike, C term) should be proportional to our current (posterior) uncertainty about z; and since the posterior is a Bernoulli distribution, the uncertainty associated with it is simply ζ(1-ζ).

The solution of Equation 27 can be expressed in a form that is similar, albeit not identical (see below), to the canonical model for dendritic integration (Equations 24–25). This form requires the linear integration of incoming spikes

(28) ν˙=B s(t)-C

and the temporal evolution of ζ is expressed as a sigmoidal function of the linearly integrated inputs ν:

(29) ζ(t)=11+e-ν(t)

Thus, dendrites with sigmoidal nonlinearity are near-optimal when their synaptic inputs switch between a quiescent and an active state.

The main difference between the dendritic integrator and the optimal response is that the dynamics of spike integration imply exponential decay towards a finite baseline in the former (there is a negative term in Equation 24 which is scaled by vlin) and steady decrease towards negative infinity in the latter (the only negative term in Equation 28 is a constant, independent of ν). This is because the approximations we used for deriving Equation 27 were accurate only in the quasi-static case, when the state switching dynamics are infinitely slow. In this case, remote and more recent observations should have identical effects on the current value of ζ as they all correspond to the same underlying state. In the more general case, when state switching occurs with non-zero probability, more remote observations likely correspond to a state which has changed in the meantime, and should thus count for less, such that their effect on the current value of ζ should decay with time – leading to leaky integration of incoming spikes, similar to that in Equation 24.

Simplified neuron models

To compare quantitatively the response of the linear and nonlinear dendrites to the optimal response in a computational task using realistic input statistics, we divided the presynaptic population into four groups (cell assemblies), where neurons within each group were statistically dependent (either through simple second-order correlations, Figure 4—figure supplement 2, or through sharing a common state variable, Figure 4 and Figure 4—figure supplement 2) while neurons from different groups were independent (Figure 4A). In this task we used 4 different versions of the simplified neuron model (Figure 4B):

  • The linear model responded to all incoming spikes with an identical postsynaptic potential (PSPs) characterized by its amplitude (w), decay time constant (τ) and the resting potential (v¯): 

(30) v~˙(t)=v¯-v~τ+w s(t)

where s(t) is the total incoming spike train (as before). This model had three parameters.

  • The model with a somatic nonlinearity had linear dendrites but a nonlinear soma. Motivated by our analytical calculations we used a sigmoidal nonlinearity: 

(31) v~g(t)=ag1+e-βg(v~(t)-θg)-v¯g

where we computed v~(t) as defined in Equation 30 above with w=1 and v¯=0 (as these parameters were interchangeable with βg and θg). This model had five free parameters.

  • The random dendrites model had four nonlinear dendritic subunits, each receiving inputs from a unique set of 10 neurons randomly selected from the four presynaptic assemblies, and integrating these inputs using a sigmoidal nonlinearity (Equations 30-31). Each subunit had its own set of five parameters, and the outputs of the subunits were simply averaged in the soma (without loss of generality), resulting in 20 parameters in total.

  • The clustered dendrites model was similar to the random dendrites model, with the important difference that neurons in each cell assembly selectively targeted a single nonlinear dendritic branch. Since the presynaptic statistics was the same for all four branches, we constrained the parameters of the four dendritic subunits to be identical and this model had only five parameters (the same performance was achieved when we relaxed this constraint and optimised all 20 parameters).

To fit the models we generated 240 s-long samples of presynaptic activity and optimized the parameters of the models to minimize the squared error between the signal (v, the true average of the stimulated presynaptic potentials, cf. Equation 5 with wi=1N) and their estimates (v^, the outputs of the models), averaged over the duration of the sample:

(32) ϵestimation=1Tt=1T(v^t-vt)2

After training, we tested the different models in cross-validation, on a novel 120 s-long input sequence, and quantified their performance by the fraction of variance unexplained, i.e. the temporally averaged squared error, ϵestimation, normalized by the variance of the signal (as a sensible upper limit on the error – achievable by an estimator that predicts the prior mean, ignoring incoming spikes altogether):

(33) ϵ¯estimation=ϵestimationVar[v]

where Var[v]=1T-1t(vt-E[v])2 and E[v]=1Ttvt. (Normalization was unnecessary during training because the parameters of the models that we were optimizing obviously did not influence the variance of the signal.) Figure 4 shows 1-ϵ¯estimation, i.e. the fraction of variance explained as ‘performance’, and Figure 1—figure supplement 3, Figure 3—figure supplement 2, and Figure 4—figure supplement 2 show ϵ¯estimation as ‘estimation error’.

Fitting and predicting experimental data

To predict dendritic integration in hippocampal and neocortical neurons we fitted the parameters describing presynaptic statistics in our model, P(u) and P(s|u), to the statistical patterns in the activity of their respective presynaptic populations.

The basal dendrites of neocortical layer 2/3 pyramidal cells are targeted by neighbouring pyramidal neurons as well as by neurons from layer 4 (Douglas and Martin, 2004). We used in vivo intracellular paired recordings from layer 2/3 pyramidal neurons in the barrel cortex (Poulet and Petersen, 2008; Gentet et al., 2010; Crochet et al., 2011) to set the parameters of our model to reproduce the salient features of the presynaptic population dynamics during quiet wakefulness (Tables 12).

In the hippocampal experiments we stimulated synapses on the proximal dendrites of CA3 neurons targeted by recurrent collaterals of neighbouring pyramidal cells (Andersen et al., 2007). We fitted the presynaptic statistics to in vivo population activity patterns recorded from the hippocampus during quiet wakefulness, characterised by sharp wave (SPW) activity (Csicsvari et al., 2000). As intracellular recordings from CA3 pyramidal neurons during SPW activity in the awake animal are not available, we fitted the presynaptic statistics to awake extracellular data from CA3 (Csicsvari et al., 2000; Grosmark et al., 2012) and intracellular (Ylinen et al., 1995; English et al., 2014) data from CA1 pyramidal neurons (Tables 1 and 3).

Comparing the optimal response to dendritic integration

We used four different parameter sets (models) to describe the activity of the presynaptic population (Figure 1). The parameters of the HP and NC models were fitted to in vivo recordings from the corresponding presynaptic populations as described above. As a control, we used two simpler models with no state switching dynamics. The cor2 model had correlated membrane potential fluctuations with all cross-correlations between presynaptic neurons being the same, -1N-1ρ1. Neurons in the last model, ind, had independent membrane potential fluctuations. The HP, NC, and ind models had no free parameters, while parameter ρ of the cor2 model was left free and later tuned to fit dendritic data. Note that to fit supralinear cortical responses, ρ had to be tuned to unnaturally large negative values in this model (Figure 5—figure supplement 1A) – and it still produced significantly poorer fits than the HP and NC models (Figure 5).

After setting the parameters of the presynaptic population, we computed the optimal response by numerically integrating Equations 20–22 in the software package R (R Core Team, 2012; Soetaert et al., 2010). When comparing the optimal response to experimental data, we assumed that each uncaging event corresponded to a single spike at the presynaptic axon terminal, and spines not showing measurable gluEPSP were considered to be non-stimulated. For all four presynaptic parameter sets, we varied two postsynaptic parameters to fit the responses of the optimal estimator to dendritic integration data, i.e. the somatic membrane potential traces recorded in our in vitro experiments. These two parameters were the weight w of the presynaptic neurons and the time constant of the postsynaptic filtering, τpost (Equation 5). To avoid overfitting, we assumed that all presynaptic neurons had equal weight, i.e. i wi=w. A final free parameter that we had to consider was the number of synapses that were in the same functional cluster as the synapses we stimulated in our experiments – where the term ‘functional cluster’ refers to a set of synapses for which the presynaptic cells are correlated. This parameter was irrelevant for the ind model (by definition), it was fixed at 20 for the cor2 model (because its effects on the optimal response were largely indistinguishable from that of varying ρ, see above), and it was tuned to fit dendritic integration for the state-switching models (NC and HP). In sum, the number of free parameters used to fit dendritic integration data was 2 for the ind model and 3 for the cor2, NC, and HP models. We confirmed that the higher number of free parameters in the latter models did not result in an unfair advantage in fitting performance by using Bayesian Information Criterion (BIC), rather than squared error (see below), as our measure of performance. BIC includes an explicit term penalizing the number of parameters, and our results were not qualitatively affected by it: fitting the model using the relevant in vivo statistics resulted in 3500 ± 940 (NC, mean ± s.d.) and 2000 ± 1100 (HP) higher BIC scores than when using independent statistics (where each unit of BIC difference corresponds to a likelihood that is higher by a factor of e2.71).

We fitted each recorded neuron independently using these parameters by minimizing the mean squared error between the predicted, v~, and the recorded postsynaptic membrane potential (averaged across repetitions of the same stimulation in the same cell), v¯*:

(34) ϵfitting=1Tt=1T(v~t-v¯t*)2

To be able to compare results across different neurons and different stimulation protocols, we normalized the error by the total variance of the data:

(35) ϵ¯fitting=ϵfittingVar[v¯*]

where Var[v¯*]=1T-1t(v¯t*-E[v¯*])2 and E[v¯*]=1Ttv¯t*. A natural lower bound of our fitting error was the intrinsic variability of the data, so we computed the mean of the variance of the experimental data across repetitions, normalized by the total variance of the data:

(36) ϵ¯min=1L-1l=1L1Tt(vt,l*-v¯t*)2Var[v¯*]

where vt,l* is the raw data before averaging across repetitions, L is the number of repetitions using the same stimulation protocol in the same cell and v¯t*=1LLvt,l*. Figures 5D,H and Figure 5—figure supplement 1G show ϵ¯fitting as the ‘fitting error’ and ϵ¯min as ‘var’. The best fitting parameter values for the postsynaptic time constant, τpost, and total number of neurons in a functional cluster, N, are shown in Figure 5—figure supplement 1B,C.

Experimental methods


Slice preparation and electrophysiology

Acute sagittal brain slices (300 μm) incorporating both visual and somatosensory cortex were prepared from 3–6 week-old Sprague-Dawley rats as previously described (Sjöström and Häusser, 2006) and in accordance with institutional and national guidelines. Experiments were carried out at 32°C–35°C in artificial cerebrospinal fluid (ACSF) containing (in mM): NaCl 125, KCl 2.5, glucose 25, NaH2PO4 1.25, NaHCO3 25, MgCl2 1, CaCl2 2 (pH 7.3 when bubbled with 95% O2 and 5% CO2). Somatic whole-cell recordings were obtained with a Multiclamp 700B amplifier (Molecular Devices, Sunnyvale, CA), and data was acquired at 50 kHz using custom-written software in Matlab 7.2 (Mathworks, Natick, MA) interfacing with an ITC-18 A/D board (Instrutech, Holliston, MA). Patch pipettes had a resistance of 3–6 MΩ when filled with a solution containing (in mM): KMeSO4 130, HEPES 10, KCl 7, MgATP 2, Na2ATP 2, Na2GTP 0.3, EGTA 0.05 (pH 7.2) and Rseries was <30 MΩ. For visualization of cell morphology Alexa Fluor 594 (100 μM; Invitrogen, Carlsbad, CA) was added to the internal solution.

Two-Photon imaging and uncaging

Simultaneous 2-photon imaging and uncaging was performed using a dual galvanometer-based scanning system (Prairie Technologies, Middleton, WI) using two Ti:sapphire pulsed lasers (MaiTai, Spectra-Physics, Santa Clara, CA), one tuned to 840 nm for imaging cell morphology, and another tuned to 720 nm for photolysis of MNI-caged-L-glutamate. Neurons were visualized using an Olympus BX51WI objective (60x, 0.9 NA; Olympus, Melville, NY). Two-photon glutamate uncaging was carried out based on previously published methods (Gasparini and Magee, 2006; Losonczy and Magee, 2006; Branco and Häusser, 2011). MNI-caged-L-glutamate (12 mM, Tocris Cookson, UK) was dissolved in (in mM): NaCl 125, KCl 2.5, HEPES 10, CaCl2 2, MgCl2 1, glucose 25, and puffed locally. To block NMDA receptors (Figure 5E,J, blue squares), 500 μM D-AP5 was included in the glutamate puffing pipette.

Stimulation and data analysis

A short burst of presynaptic uncaging events equally spaced in time with inter stimulus interval (ISI) between 1 and 20 ms was applied on 1–7 visually identified dendritic spines. For each dendritic branch either the number of stimuli or the ISI was varied. Uncaging exposure time was 100–500 μs and the inter-trial interval was 10 s. All data was acquired using custom written software in Matlab 7.2 (Mathworks). The original data recorded at 50 kHz were averaged across identical trials, filtered with a Gaussian kernel with σ=0.2 ms and subsampled at 2 kHz for analyses. All spines were responsive in the neocortical experiments.


Slice preparation and electrophysiology

Adult male Sprague-Dawley rats (8–12 week-old) were used to prepare transverse slices (400 μm) from the hippocampus similarly to that described previously (Losonczy and Magee, 2006), according to methods approved by the Janelia Farm Institutional Animal Care and Use Committee and the Animal Care and Use Committee (ACUC) of the Institute of Experimental Medicine, Hungarian Academy of Sciences, and in accordance with 86/609/EEC/2 and DIRECTIVE 2010/63/EU Directives of the EU. Experiments were carried out at 33°C–35°C in artificial cerebrospinal fluid (ACSF) containing (in mM): NaCl 125, KCl 3, glucose 25, NaH2PO4 1.25, NaHCO3 25, MgCl2 1, CaCl2 1.3, Na-pyruvate 3, and ascorbic acid 1 saturated with 95% O2 and 5% CO2. Somatic whole-cell recordings were obtained with BVC-700 amplifier (Dagan, Minneapolis, MN) in the active ‘bridge’ mode, filtered at 3 kHz and data was acquired at 50 kHz. Patch pipettes had a resistance of 2–6 MΩ when filled with a solution containing (in mM): K-gluconate 120, KCl 20, HEPES 10, NaCl 4, Mg2ATP 4, Tris2GTP 0.3, phosphocreatine 14, complemented with 100 μM Alexa Fluor 488 (Invitrogen-Molecular Probes, Eugene, OR) and 0.1–0.3% biocytin (Sigma), pH=7.25 and Rseries was <30 MΩ. Alexa Fluor 488 fluorescence or biocytin labeling with immunoperoxidase reaction was used for post hoc verification of the localization of neurons along the proximodistal axis of CA3. All CA3 neurons included in this study had resting membrane potentials between −62 and −72 mV. Cells were hyperpolarized when necessary to avoid action potential firing during synaptic stimulation.

Two-Photon imaging and uncaging

A dual galvanometer based two photon scanning system (Prairie Technologies, Middleton, WI) equipped with an Olympus BX-61 microscope (60X, 0.9 NA objective) was used to image Alexa 488-loaded neurons and to uncage glutamate at individual dendritic spines as described (Losonczy and Magee, 2006; Makara and Magee, 2013). Two ultrafast pulsed laser beams (Chameleon Ultra II; Coherent, Auburn, CA, USA) were used, one at 920 nm for imaging Alexa 488 and the other at 720 nm to photolyze MNI-caged-L-glutamate (Tocris Cookson, Ballwin, MO, USA; 10 mM applied through a pipette above the slice). Laser beam intensity was independently controlled with electro-optical modulators (Model 350–50, Conoptics, Danbury, CT, USA). Unitary gluEPSP amplitude and rise time was close to that of mEPSPs as measured by sucrose application at dendritic segments 70–168 μm from the soma as described before (Magee and Cook, 2000; Makara and Magee, 2013). To standardize these experiments, results were included in the analysis only if 1) at least 65% of the selected spines were responsive (see below), 2) the average amplitude of the successful unitary gluEPSPs was 0.2–0.6 mV and maximum unitary gluEPSP amplitude was 1.2 mV, 3) at least 5 mV expected amplitude was achieved, and 4) unitary responses were stabile with repeated stimulation. To block NMDA receptors 50 μM AP5 was added to both bath solution and the puffing pipette solution.

Stimulation and data analysis

Dendritic branches on basal dendritic segments 100–160 μm from the soma were stimulated by synchronous uncaging of MNI-glutamate at a spatially clustered set (1–32) of visually identified spines using 0.2 ms uncaging duration with different intervals (in the range of 0.1–5 ms) between synapses. For each dendritic branch either the number of stimuli (control) or the ISI (NMDA block) was varied. To detect non-responsive spines, we fitted the individual responses with a double-exponential function:

(37) ϕ(t)=ϕ0(e-(t-t0)/τ1-e-(t-t0)/τ2)

Spines were classified as non-responsive if the rise time of the gluEPSP (τ2) was slower than 15 ms or its start time (t0) was more than 5 ms (relative to the stimulation time). The original data recorded at 50 kHz were averaged across identical trials, filtered with a Gaussian kernel with σ=0.2 ms and subsampled at 2 kHz for analyses.


A Derivation of the optimal response

Here we derive the filtering equations for the switching multivariate OU (mOU) process introduced in the main text, such that we generalise the two-state model presented there to an arbitrary number of hidden states.

A.1 The generative model of presynaptic activities

The generative model of presynaptic activities has two hidden and one visible layers (Figure 2—figure supplement 1). The state of the system is described by a global discrete state variable, z, and its dynamics is characterised by the state transition matrix, Ω:

(A1) P(zt=z|zt-δt=z') =δzz'(1-δtΩ¯z')+(1-δzz')δtΩzz'

where Ωzz' is the rate of transitioning from state z' to z, and

Ω¯z'=zz'Ωzz', Ωzz'0,δt1maxz' Ω¯z'

and δzz' is the Kronecker delta function. This description becomes identical to the state switching dynamics presented in the main text in the case of a binary state variable, z{-,+}.

The dynamics of the membrane potentials u are modeled as a mOU process where the resting potential, u¯(zt), the time constant of the neurons, τ(zt), and the covariance of the process noise, Q(zt), may depend on the current state, zt:

(A2) P(ut|ut-δt,zt)=𝒩(ut;(1-δtτ(zt)) ut-δt+δtτ(zt) u¯(zt),δt Q(zt))

For simplicity, in the main text we assumed that only the resting potential depends on the state variable.

The observations are spike counts, s, which are given by an inhomogeneous Poisson process for each neuron independently from the other neurons given its own membrane potential:

(A3) P(st|ut)=i Poisson (st,i; δt g eβ ut,i)

where g is the baseline firing rate and β is a parameter controlling the nonlinearity of the spiking.

A.2 The inference problem

Our goal is to infer the posterior distribution of the current membrane potentials based on the spiking pattern observed up to time t:

(A4) P(ut|s0:t)=ztP(ut|zt,s0:t)P(zt|s0:t)
(A5) =ztP(ut,zt|s0:t)

Inference of the value of the hidden variables requires information from two different sources: the likelihood of emitting a particular spiking pattern (observation) and the dynamics of the hidden variables combined with the previous estimate (innovation).

(A6) P(ut=u,zt=z|s0:t)P(st=s|ut=u)fs(δt,u): observationzP(zt=z|ztδt=z)P(ztδt=z|s0:tδt)duP(ut=u|utδt=u,zt=z)P(utδt=u|ztδt=z,s0:tδt)h(δt,u,z)=P(ut=u,zt=z|s0:tδt): innovation

where the likelihood P(st|ut) is defined by Equation A3, the dynamics of the global state variable P(zt|zt-δt) is defined by Equation A1 and the state-dependent membrane potential dynamics P(ut=u|ut-δt=u',zt=z) is given by Equation A2. Equation A6 is the filtering equation, analogous to Equation 15 in the main text. Note that both the observation fs(δt,u) and the innovation h(δt,u,z) terms depend on δt, the time step used in the inference.

In general, computing Equation A6 is intractable so we take an assumed density filtering approach in which we approximate the posterior in each time step with a particular parametric form and derive how the parameters of this approximation change due to Equation A6. Specifically, we represent the posterior using the following factorisation (without approximation):

(A7) P(ut=u,zt=z|s0:t)=P(zt=z|s0:t)P(ut=u|zt=z,s0:t)

The first term of Equation A7 is the posterior over the current value of the discrete hidden state variable, zt, which is a discrete distribution that can be parametrised by a single (vector) parameter, ζt (still without approximation):

(A8) P(zt=z|s0:t)=ζt(z)

The second term of Equation A7 is the distribution of the membrane potentials conditioned on the current state, which we approximate by a multivariate Gaussian with parameters μt(z) and Σt(z) respectively describing the conditional mean and covariance of the presynaptic membrane potentials:

(A9) P(ut=u|zt=z,s0:t)𝒩(u;μt(z),Σt(z))

In sum, this approximate posterior has three sets of parameters that fully characterise it: ζt, μt(z), and Σt(z). In Sections A.3-A.6 we derive the update dynamics for these parameters when observing arbitrary spiking patterns.

For notational convenience, we also introduce a further quantity, which can be derived from these parameters: the expected firing rate of the cells in each hidden state, either written as a vector γt(z) or as a diagonal matrix Γt(z), whose elements are

(A10) γt,i(z)=Γt,ii(z)=P(ut,i=u|zt=z,s0:t)geβudu=geβμt,i(z)+12 β2Σt,ii(z)

A.3 Parameter updates

The advantage of describing the elements of the posterior in a parametric form is that we only need to update the parameters of Equations A8-A9. As the parameter updates are derived using Equation A6, in which the innovation and the observation terms depended on δt, the parameter updates will also depend on δt. Eventually, to derive continuous-time dynamics, we will take the limit of δt0. Incidentally, the Gaussian approximation in Equation A9 is also more accurate with small δt.

We rewrite the parameters of the approximate posterior by using auxiliary variables (this also makes it explicit how the update rule depends on δt):


where the auxiliaries are defined as:


where fs(δt,u) and h(δt,u,z) were defined in Equation A6.

A.4 First-order approximations to parameter updates

Our goal is to derive differential equations for the parameter updates. Therefore we will assume that δt is sufficiently small (i.e. δt0) to use a first order approximation of Equations A11-A13 around δt=0:




In particular, in order to compute Equations A19-A21 we will need the value of h(δt,u,z) and h(δt,u,z) with δt=0 (see Section A.8.1):


Having derived the general form of the parameter updates we need to apply it to the possible observations, i.e. derive how observing a particular presynaptic spiking pattern at time t changes the posterior distribution over the state variable, P(zt=z|s0:t) and the presynaptic membrane potentials, P(ut=u|zt=z,s0:t). In the limit of δt0 we consider only two alternative outcomes of the observation process: the case of observing zero or one spike in a single time bin.

A.5 Observation: No spikes

We first calculate the likelihood (sec. A.5.1) and the auxiliaries (sec. A.5.2) with st=0 and then derive the corresponding parameter updates in sec. A.5.3.

A.5.1 Likelihood

(A28)f0(δt,u)=P(st=0|ut=u)(A29)=iPoisson(0;δtgeβui)(A30)=eδtgieβui(A31) f0(δt=0,u)=1(A32)f0(δt,u)=(gieβui)f0(δt,u)(A33)f0(δt=0,u)=gieβui

A.5.2 Auxiliaries

Here we describe the values of the auxiliaries required to compute Equations A19-A21 with the likelihood function given above:


The following quantities have been derived in Ujfalussy et al., 2011:


where γ¯t(z)=iγt,i(z) is the expected population firing rate in state z.

The derivation of the following auxiliaries is given in Section A.8.1:


A.5.3 Parameter updates

Now we are ready to substitute Equation A34-A43 into Equation A19-A21 to derive the parameter updates.

State probabilities

(A44) ζt(z)(δt)=a0(0,z)b0(0)+δta0'(0,z)b0(0)-a0(0,z)b0'(0)b02(0)=ζt-δt(z)+δt[-ζt-δt(z)γ¯t-δt(z)+z'ζt-δt(z') [(1-δzz')Ωzz'-δzz'Ω¯z']--ζt-δt(z)(z''-ζt-δt(z'')γ¯t-δt(z'')+z',z''ζt-δt(z')[(1-δz''z')Ωz''z'-δz''z'Ω¯z'])]=ζt-δt(z)+δt[-ζt-δt(z)(γ¯t-δt(z)-z'ζt-δt(z')γ¯t-δt(z'))++z'ζt-δt(z')[(1-δzz')Ωzz'-δzz'Ω¯z']-ζt-δt(z)z',z''ζt-δt(z')[(1-δz''z') Ωz''z'-δz''z'Ω¯z']]

However, since  Ω¯z'=z''Ωz''z' the last terms in Equation A44 cancel each other and we can write that

(A45) ζt(z)(δt)=ζtδt(z)+δt[ζtδt(z)(γ¯tδt(z)γ¯tδt)+zζtδt(z)[(1δzz)ΩzzδzzΩ¯z]]

where denotes averaging over z', and so γ¯t= z'ζt-δt(z') γ¯t(z') is the average expected population firing rate. In Equation A45 two different processes are changing the posterior state probabilities. First, changes in ζ(z) are proportional to the difference between the total firing rate in state z, γ¯(z), and the average firing rate, γ¯. This process causes a decay in the posterior probability of those states in which the expected firing rate is high and thus incompatible with the current observation (no spikes). Second, posterior state probabilities are changing according to the prior state transition dynamics captured by Ω.

Posterior mean

(A46) μt(z)(δt)m0(0,z)a0(0,z)+δtm0(0,z)a0(0,z)m0(0,z)a0(0,z)a02(0,z)=ζtδt(z)μtδt(z)ζtδt(z)+δt1ζtδt(z)2[(ζtδt(z)[γ¯tδt(z)μtδt(z)+βΣtδt(z)γtδt(z)]+======+zζtδt(z)[(1δzz)ΩzzδzzΩ¯z]μtδt(z)+ζtδt(z)u¯(z)μtδt(z)τ(z))ζtδt(z)====ζtδt(z)μtδt(z)(ζtδt(z)γ¯tδt(z)+zζtδt(z)[(1δzz)ΩzzδzzΩ¯z])]=μtδt(z)+δt[u¯(z)μtδt(z)τ(z)βΣtδt(z)γtδt(z)+zζtδt(z)ζtδt(z)(μtδt(z)μtδt(z))[(1δzz)ΩzzδzzΩ¯z]]=μtδt(z)+δt[u¯(z)μtδt(z)τ(z)βΣtδt(z)γtδt(z)+zzΩzzζtδt(z)ζtδt(z)(μtδt(z)μtδt(z))]

The evolution of the conditional mean is governed by three factors. The first term in the bracket describes the decay of the posterior mean towards the prior mean with time constant τ(z). The second term captures the effect of the observation (absence of spikes) where the decrease in the posterior mean is proportional to the posterior covariance Σ(z) (because when more uncertain a priori, the observation should have a larger impact) and the expected firing rate γ(z) (because a higher expected firing rate is more incompatible with the absence of spikes and thus requires a larger correction). This term is omitted in the absence of observations, i.e. during the absolute refractory period. Finally, the third term expresses the effect of a possible transition from each of the other states z' to state z which is proportional to the product of the prior transition rate Ωzz' and the relative probabilities of the two states (i.e. the relative proportion of the probability mass coming from state z against that already existing in state z), and the difference between the respective posterior means (as the more different these are, the larger the bias contributed by the probability mass coming from state z is).

Posterior covariance

(A47) Ξt(z)(δt)=Ξt(z)(0)+δtΞt(z)(0)=C0(0,z)a0(0,z)+δtC0(0,z)a0(0,z)C0(0,z)a0(0,z)a02(0,z)=ζtδt(z)(Σtδt(z)+μtδt(z)(μtδt(z))T)ζtδt(z)+==+δt1ζtδt(z)2[ζtδt(z)2(γ¯tδt(z)[μtδt(z)(μtδt(z))T+Σtδt(z)]+2βμtδt(z)(γtδt(z))TΣtδt(z)+β2Σtδt(z)Γtδt(z)Σtδt(z))+====+ζtδt(z)zζtδt(z)[(1δzz)ΩzzδzzΩ¯z](μtδt(z)(μtδt(z))T+Σtδt(z))====+ζtδt(z)2[(Q(z)2τ(z)Σtδt(z))+1τ(z)μtδt(z)(u¯(z)μtδt(z))T+1τ(z)(u¯(z)μtδt(z))(μtδt(z))T]====ζtδt(z)(Σtδt(z)+μtδt(z)(μtδt(z))T)(ζtδt(z)γ¯tδt(z)+zζtδt(z)[(1δzz)ΩzzδzzΩ¯z])]=Σtδt(z)+μtδt(z)(μtδt(z))T+δt[2βμtδt(z)(γtδt(z))TΣtδt(z)β2Σtδt(z)Γtδt(z)Σtδt(z)+====+zζtδt(z)ζtδt(z)[(1δzz)ΩzzδzzΩ¯z](μtδt(z)(μtδt(z))T+Σtδt(z))+====+(Q(z)2τ(z)Σtδt(z))+1τ(z)μtδt(z)(u¯(z)μtδt(z))T+1τ(z)(u¯(z)μtδt(z))(μtδt(z))T====(Σtδt(z)+μtδt(z)(μtδt(z))T)(zζtδt(z)ζtδt(z)[(1δzz)ΩzzδzzΩ¯z])]=Σtδt(z)+μtδt(z)(μtδt(z))T+δt[Q(z)2τ(z)Σtδt(z)β2Σtδt(z)Γtδt(z)Σtδt(z)+====+zζtδt(z)ζtδt(z)[(1δzz)ΩzzδzzΩ¯z](μtδt(z)(μtδt(z))Tμtδt(z)(μtδt(z))T+Σtδt(z)Σtδt(z))+====+1τ(z)μtδt(z)(u¯(z)μtδt(z))T+1τ(z)(u¯(z)μtδt(z))(μtδt(z))T2βμtδt(z)(γtδt(z))TΣtδt(z)]

And finally, in order to compute the covariance, we will need:

(A48) μt(z)(μt(z))T=μtδt(z)(μtδt(z))T+δt[u¯(z)μtδt(z)τ(z)(μtδt(z))T+μtδt(z)(u¯(z)μtδt(z)τ(z))T2βΣtδt(z)γtδt(z)(μtδt(z))T+====zζtδt(z)ζtδt(z)(μtδt(z)μtδt(z))(μtδt(z))T[(1δzz)ΩzzδzzΩ¯z]+====zζtδt(z)ζtδt(z)μtδt(z)(μtδt(z)μtδt(z))T[(1δzz)ΩzzδzzΩ¯z]]+O(δt2)

Now we substitute Equation A47-A48 into Equation A14 to obtain the linear form for the change of the covariance matrix:

(A49) Σt(z)=Σtδt(z)+δt[Q(z)2τ(z)Σtδt(z)β2Σtδt(z)Γtδt(z)Σtδt(z)+====+zζtδt(z)ζtδt(z)[(1δzz)ΩzzδzzΩ¯z](Σtδt(z)Σtδt(z))====+zζtδt(z)ζtδt(z)[(1δzz)ΩzzδzzΩ¯z](μtδt(z)μtδt(z))(μtδt(z)μtδt(z))T]=Σtδt(z)+δt[Q(z)2τ(z)Σtδt(z)β2Σtδt(z)Γtδt(z)Σtδt(z)+====+zzΩzzζtδt(z)ζtδt(z)(Σtδt(z)Σtδt(z))+zzΩzzζtδt(z)ζtδt(z)(μtδt(z)μtδt(z))(μtδt(z)μtδt(z))T]

The evolution of the posterior covariance is governed by the same three factors as the evolution of the conditional mean. The first is the decay of the posterior covariance towards the prior covariance τ(z)2Q(z) with time constant τ(z)2. (Note that this decay is twice faster than that of the mean.) The second term captures the effect of the observation (absence of spikes) such that the decrease in the posterior covariance is proportional to the square of the covariance Σ(z) and the expected firing rate Γ(z) (because this observation is more informative, i.e. it causes a larger surprise, when the expected firing rate is higher). Finally, the last term expresses the effect of possible hidden transitions between state z and state z' on the posterior covariance (which is again proportional to the relative proportion of the probability masses and the differences between the two states).

A.6 Observation: cell i emits a spike

Now we assume that cell i emits a spike while all other neurons remain silent in the current time step. Note that as δt0 the probability of having multiple presynaptic neurons spiking in the same time bin, or the same cell firing twice, converges to zero.

A.6.1 Likelihood

In this case the current spiking pattern is denoted by st=s^(i), that is s^j(i)=δij and the likelihood is given by


The derivative of the likelihood is


In the following, we follow the same steps as in the previous section, but replacing f0(0,u) and f0(0,u) with fs^(i)(0,u) and fs^(i)(0,u), respectively.

A.6.2 Auxiliaries

Again, the goal is to derive differential equations for the parameter updates. As the number of time bins with spikes does not grow with δt0 we only need to compute terms up to zeroth order (cf. Equations A19-A21). However, as the likelihood (Equation A51) is 0 when δt=0, the values of the auxiliaries in Equations A54-A56 are also 0 (Equations A15-A18), and so we shall use l’Hôpital’s rule to deal with the δt0 limit:

(A54) ζt(z)(δt)ζt(z)(0)=ast(0,z)bst(0)=ast'(0,z)bst'(0)
(A55) μt(z)(δt)μt(z)(0)=mst(0,z)ast(0,z)=mst'(0,z)ast'(0,z)
(A56) Ξt(z)(δt)=Ξt(z)(0)=Cst(0,z)ast(0,z)=Cst'(0,z)ast'(0,z)

Substituting Equation A53 into Equations A22-A25 gives the following values for the corresponding derivatives (for details see Ujfalussy et al., 2011):


A.6.3 Parameter updates

Finally, by substituting Equations A57-A60 back into Equations A54-A56, we can derive the parameter updates in the case of observing a spike.

State probabilities

(A61) ζt(z)(δt=0)=as^(i)'(0,z)z'as^(i)'(0,z')=ζt-δt(z)γt-δt,i(z)z'ζt-δt(z')γt-δt,i(z')= ζt-δt(z)+ζt-δt(z)(γt-δt,i(z)-γt-δt,i)γt-δt,i

where γt,i=z'ζt-δt(z')γt,i(z') is the average (over states) expected firing rate of cell i. Equation A61 shows that the posterior state probabilities change instantaneously after observing a spike, and the change is proportional to the normalised difference between the expected firing rate of cell i in state z and its state-averaged expected firing rate. Therefore if state z is more compatible with the observed spiking pattern than state z', i.e. γi(z)>γi(z'), then the posterior probability of being in state z, ζ(z), will increase while ζ(z') will decrease.

Equation A61 can be rewritten into vector form as follows:


which in turn can be rewritten in differential equation form by taking the δt0 limit:

(A62) ζ(z).=ζ(z)s(t)Γ-1(γ(z)-γ)

where s(t)=limδt0st/δt is the sum of Dirac-delta functions representing the presynaptic spike trains.

Posterior mean

(A63) μt(z)(δt=0)=ms^(i)'(0,z)as^(i)'(0,z)=μt-δt(z)+βΣt-δt(z)st

Thus, the posterior conditional means of the presynaptic membrane potentials also change instantaneously after observing a spike in the presynaptic population. The change is proportional to the posterior variance in the case of the cell that emitted the spike while for other presynaptic neurons the change is proportional to the posterior covariance between the given cell and the neuron that emitted the spike. Note that, similar to the change in the posterior state probabilities, the change of the mean is instantaneous and it does not depend on the time step.

The differential equation form of Equation A63 reads as

(A64) μt(z)μtδt(z)δt=βΣtδt(z)stδtμ˙(z)=βΣtδt(z)s(t)

where s(t) is, again, the sum of Dirac-delta functions representing the presynaptic spike trains.

Posterior covariance


Thus, Equation A66 indicates that the posterior covariance does not change directly after observing a spike – only indirectly, through the increase in the posterior mean (Equation A63) and thus Γ (Equation A10), which in turn decreases the covariance in the silent period following the spike (Equation A49).

A.7 Differential equations

Here we summarize the results of Equations A45,A46,A49, and Equation A61-A66 as differential equations in continuous time:


Once again note that in the continuous time limit the presynaptic spike trains are represented by the sum of Dirac-delta functions and we denote it with s(t) instead of st.

In the case of two states (+ and -), we can write Equation A67 as

(A70) ζ+˙=ζ+(1ζ+)(γ¯+γ¯)+ζ+(1ζ+)s(t)TΓ1(γ+γ)+(1ζ+)Ω+ζ+Ω

where we emphasise that the observation of either a spike or the absence of a spike causes changes in the state probabilities that are proportional to their uncertainty, ζ+(1-ζ+).

A.8 Derivations of auxiliaries

A.8.1 Derivation of Equations A26-A27

After substituting Equations A1-A2 and Equation A9 into Equation A6 we can derive the δt dependence of the innovation term:

(A71) h(δt,u,z)=P(ut=u,zt=z|s0:t-δt)=z'P(zt=z|zt-δt=z')P(zt-δt=z'|s0:t-δt)du'P(ut=u|ut-δt=u',zt=z)P(ut-δt=u'|zt-δt=z',s0:t-δt)=z'ζt-δt(z')[δzz'(1-δtΩ¯z')+(1-δzz')δtΩzz']du'𝒩(u;(1-δtτ(z))u'+δtτ(z)u¯(z),δtQ(z))𝒩(u';μt-δt(z'),Σt-δt(z'))=z'ζt-δt(z')[δzz'(1-δtΩ¯z')+(1-δzz')δtΩzz']11-δtτ(z)du'𝒩(u';u-δtτ(z)u¯(z)1-δtτ(z),δt(1-δtτ(z))2Q(z))𝒩(u';μt-δt(z'),Σt-δt(z'))=z'ζt-δt(z')[δzz'(1-δtΩ¯z')+(1-δzz')δtΩzz']11-δtτ(z)·𝒩(u-δtτ(z)u¯(z)1-δtτ(z);μt-δt(z'),δt(1-δtτ(z))2Q(z)+Σt-δt(z'))=z'ζt-δt(z')[δzz'(1-δtΩ¯z')+(1-δzz')δtΩzz']·𝒩(u;(1-δtτ(z))μt-δt(z')+δtτ(z)u¯(z),δt Q(z)+(1-δtτ(z))2Σt-δt(z'))

Substituting δt=0 into Equation A71 yields Equation A26.

Next, we take the derivative of Equation A71 with respect to δt:

(A72) h(δt,u,z)=zζtδt(z)[(1δzz)ΩzzδzzΩ¯z]==N(u;(1δtτ(z))μtδt(z)+δtτ(z)u¯(z),δtQ(z)+(1δtτ(z))2Σtδt(z))+==+ζtδt(z)[δzz(1δtΩ¯z)+(1δzz)δtΩzz]==[N(u;μtδt(z)+δtu¯(z)μtδt(z )τ(z),Σtδt(z)+δt(Q(z)2τ(z)Σtδt(z))+δt2Σtδt(z)τ(z)2)]

where the derivative of the last term is provided below in general form.


Substituting δt=0 into Equation A72 after using Equation A74 to express its last term yields Equation A27.

A.8.2 Derivation for Equations A41-A43

The above result can be used to compute terms appearing in the integrals of Equations A22-A25 which are involved in computing Equations A41-A43

[N(u;a0+δta1,B0+δtB1+δt2B2)]δt=0du==12N(u;a0,B0)[Tr(B01B1)2a1TB01(ua0)(ua0)TB01B1B01(ua0)]du=(A75)=12[Tr(B01B1)Tr(B01B1B01B0)]=0u[N(u;a0+δta1,B0+δtB1+δt2B2)]δt=0du=(A76)=12N(u;a0,B0)u[Tr(B01B1)2a1TB01(ua0)(ua0)TB01B1B01(ua0)]du==12[a0Tr(B01B1)2(N(u;a0,B0)ua1TB01uua1TB01a0du)(A77)===N(u;a0,B0)(uuTB01B1B01u2ua0TB01B1B01u+ua0TB01B1B01a0)du]=12[a0Tr(B01B1)2(B01a1(B0+a0a0T)a0a1TB01a0) =====a0Tr(B01B1)B0B01B1B01a0(B0+a0a0T)B01B1B01a0+(A78)=====+2(B0+a0a0T)B01B1B01a0a0a0TB01B1B01a0](A79)=a1u[N(u;a0+δta1,B0+δtB1+δt2B2)]|δt=0uTdu(A80)=12N(u;a0,B0)u[Tr(B01B1)2a1TB01(ua0)(ua0)TB01B1B01(ua0)]uTdu =12[(B0+a0a0T)Tr(B01B1) ===2N(u;a0,B0)(ua1TB01uuTua1TB01a0uT)du(A81) ===N(u;a0,B0)(uuTB01B1B01uuT2ua0TB01B1B01uuT+ua0TB01B1B01a0uT)]=12[(B0+a0a0T)Tr(B01B1)===2(a0a1TB01(B0+a0a0T)+(B0+a0a0T)B01a1a0T+a1TB01a0(B0a0a0T)a1TB01a0(B0+a0a0T))===(2(B0+a0a0T)B01B1B01(B0+a0a0T)+a0TB01B1B01a0(B0a0a0T)+(B0+a0a0T)Tr(B01B1))+===+2(a0a0TB01B1B01(B0+a0a0T)+(B0+a0a0T)B01B1B01a0a0T+a0TB01B1B01a0(B0a0a0T))(A82)===a0TB01B1B01a0(B0+a0a0T)](A83)=12[2(a0a1T+a1a0T)2B1]=B1+a0a1T+a1a0T

B The optimality of the sigmoidal nonlinearity

Neurons need to combine incoming spikes nonlinearly to perform efficient analog computations based on digital spikes. In the main text, and the previous sections of this Appendix, we derived the optimal way of combining incoming spikes (Equations 20-22), but the exact implementation of this optimal nonlinear mapping within the dendritic tree is unrealistic. A plausible alternative would be if each dendrite branch acted as a simple linear-nonlinear unit, computing a nonlinear function of the linear combination of its linearly filtered input spike trains (Poirazi et al., 2003), but it is unclear whether this could efficiently approximate the optimal response, and if so then what is the specific functional form of the nonlinearity that is able to make this approximation tight. In the main paper, we demonstrated that a sigmoidal nonlinearity approximates the optimal response remarkably well if the dynamics of the presynaptic population is dominated by simultaneous switching between a quiescent and an active state. In this section we provide additional details for this derivation.

B.1 Simplified presynaptic dynamics

In the following derivations, for convenience, we assume that the weights of the linear computation are roughly uniform and scale inversely with N, wi~1/N, although our derivations also remain valid with other scalings of wi as long as the number of weights significantly greater than 0 scales with N. We start by noting that the optimal response can always be trivially rewritten as requiring the inference of the state, z(t), and conditioned on the state, the average presynaptic membrane potential directly, m(t)=iwiui(t)=1/N iui(t) (instead of the individual membrane potentials, as in Equations 3 and 6):


where, for simplicity, we took τpost=0, although it is not essential for our argument – which instead applies to the total input when τpost0. Note that we do not generally use this seemingly simpler form because the likelihood describing the spike generation process (Equation 8) becomes more complex, and notably non-factorised, when conditioned on m(t) instead of u(t). Nevertheless, it is instructive to write down the prior over m (cf. Equation 7):

(A86) P(mt|mt-δt,zt=z)=𝒩(mt;(1-δtτ) mt-δt+δtτm¯(z),δtq¯2)

where m¯(z)=1NiNu¯i(z) is the mean and q¯2=1N2ijQij is the process noise variance defined in terms of the parameters of the original mOU process. In the following sections, we derive simplified forms of inference, that can be readily related to dendritic processing, by taking different limits to the original model, and in particular Equation A86.

B.2 Sigmoidal nonlinearity with state switching dynamics

We start by assuming that the presynaptic neurons are equally variable and conditionally independent given the population state (Qi,j=q2δij). It is easy to see that in this case q¯2=q2/N and thus the prior variance of m (which is q¯2τ/2, as can be seen from Equation A86) also scales as 1/N. Thus, for a large presynaptic population, N, the prior variance of m diminishes which means that the posterior inference is also greatly simplified because there is no uncertainty any more about m given the population state, i.e. P(m(t)|z(t)=z,s(0:t))=δ(m(t)-m¯(z)), and so the optimal response (Equations A85) for two population states simplifies to

(A87) v~(t)=ζ(t)m¯++1-ζ(t)m¯-

Although it seems that the only quantity that needs updating in Equation A87 is ζ(t), in general this still requires keeping track of the membrane potentials of individual presynaptic neurons because the dynamics of ζ(t) (Equation 20) depends on the individual expected firing rates (conditioned on each population state). Therefore, to allow further simplification, we assume that all expected firing rates within a state are identical and constant, equal to their prior value, i.e. γi+γ+=geβu¯++12β2q2τ/2, and so the population rate is simply γ¯+Nγ+ (with corresponding definitions for γ- and γ¯-). With these assumptions the dynamics of the posterior estimate of population state reduces to

(A88) ζ˙=Ω+-(Ω++Ω-)ζA+γ+-γ-ζγ++(1-ζ)γ-B(ζ)ζ(1-ζ)s(t)-(γ¯+-γ¯-)Cζ(1-ζ)

As we do not track the membrane potential fluctuations of the individual neurons in this reduced model, we also replaced the s(t) vector by the total spike train of the population, s(t)=isi(t).

Amongst the parameters of Equation A88 only C scales with N, therefore the last term dominates Equation A88 during silent periods. Conversely, the second term (involving B) will dominate during spikes. Thus, the first term (involving A) can be ignored. Finally, we note that

(A89) B(ζ)=1ζ+γ-γ+-γ-=1ζ+1eβ(u¯+-u¯-)-1

and therefore B(ζ) is nearly constant if the difference between γ+ and γ- is small (i.e. βIn2u¯+-u¯-), in which case

(A90) Beβ(u¯+-u¯-)1

These considerations allow us to write the approximate form of Equation A88 as (cf. Equation 27):

(A91) ζ˙=ζ(1-ζ)[Bs(t)-C]

Solving Equation A91 for ζ(t) is possible by noting that it is separable, and so the solution can be written as

(A92) 1ζ(1-ζ)dζ=[Bs(t)-C]dt

where solving the integral on the left yields

(A93) In (ζ1-ζ)+χ=[Bs(t)-C]dt

In general, the value of χ would need to be determined by solving the initial value problem, but for simplicity and without loss of generality here we take χ=0 and assume that ν(0) has the appropriate value to map to the required ζ(0).

From Equation A93, ζ can be expressed as

(A95) ζ=11+e-[Bs(t)-C]dt

which in turn can be rewritten as

(A96) ζ=11+e-ν

where we introduced the new variable ν=[Bs(t)-C]dt whose definition can be equally written as a differential equation:

(A97) ν˙=Bs(t)-C

The system defined by Equations A96-A97 has a form that is closely analogous to that of the standard linear-nonlinear model of dendritic processing with a logistic sigmoid nonlinearity, and is presented in the main text (Methods) as Equations 28-29.

B.3 Sublinear integration with second order correlations

We also considered another limiting case when the dynamics of the presynaptic population was fully characterized by purely second-order correlations between the neurons. This means that there is no state switching dynamics, so inference is simplified by taking ζ=1, but we also cannot assume (as we did in the previous section) that the posterior correlations and thus the posterior variance of m vanishes in the large N limit. Instead, we seek a formalism in which the population activity of N cells, represented by the variables {u,s}, is captured by the lower dimensional dynamics of {m,s} (where s(t)=isi(t) is still the ‘total’ input spike train, as before). Under this simplified model, the prior dynamics of m is as described by Equation A86, and the likelihood is analogous to the one we had for individual neurons (Equation 8):

(A98) P(st|mt)=Poisson(st;δt rt),  with rt=g¯ eβ¯ ut

As the likelihoods of the two models can not be exactly matched, we chose the parameters β¯ and g¯ to match the mean and the variance of the population firing rate. Thus, there is now a single variable that characterizes the whole population, m, whose dynamics (both the prior and likelihood) are analogous to those we had for a single membrane potential in the previous model. Consequently, inference in this collapsed model requires the manipulation of only two scalar variables – instead of the original 𝒪(N2) –, the posterior mean and variance of m, μ¯ and σ¯2 respectively, whose dynamics can be derived analogously to those obtained before for the full population (Equations 21-22):


where, in line with previous notation, γ¯=g¯eβ¯μ¯+12β¯2σ¯2 is the posterior mean estimate of the average firing rate of the presynaptic population. (These equations only apply to the special case when wi=1/N and Qij=δijq+(1-δij)ρq, but an extension to the more general case is relatively straightforward.) We used this reduced model to infer the mean membrane potential of the presynaptic population – which is relevant in the special case when wi=1/N and τpost=0 (see above) so that v~=μ¯. Figure 3—figure supplement 1A–C demonstrates that our reduced model approximates the full model remarkably well provided that there are substantial correlations in the presynaptic population.

Next, we noted that during inference the dynamics of the posterior variance is faster than the dynamics of the posterior mean (Equations A99-A100) and therefore the posterior variance can be approximated by its (mean-dependent) steady state value, σ¯2(μ¯). If we replace the posterior variance in Equation A100 with its steady state value we obtain the following approximate one-dimensional dynamics:

(A101) μ¯˙m¯-μ¯τ+β¯σ¯2(μ¯) (s(t)-g¯eβ¯μ¯+12β¯2σ¯2(μ¯))

As the optimal response to a single spike is proportional to the posterior variance, the functional form of the steady state is highly informative about the nonlinearity of the integration. Figure 3—figure supplement 1D shows that σ¯2(μ¯) is a decreasing sigmoidal function, thus responses to consecutive spikes will add sublinearly in this case, as each spike increases μ¯, which in turn reduces σ¯2, which results in a smaller response for the next spike.

Expressing Equation A101 directly in the form of linear-nonlinear dynamics, as in the case of switching dynamics, did not seem feasible, so we focussed instead on establishing a correspondence only at the times of presynaptic spikes, when s(t)>0. This required that there exists an instantaneous mapping h(vlin) such that

(A102) μ¯=h(vlin)

where vlin is the linearly filtered and integrated inputs, whose dynamics vlin˙ is given by Equation 24. Differentiating both sides of Equation A102 wrt. time, and substituting Equation A101 and Equation 24, respectively, to each side yields

(A103) β¯σ¯2(h(vlin))=h'(vlin)

after only keeping the dominating terms including s(t) on both sides and subsequently simplifying by s(t).

Although there was no closed-form analytical solution for h(vlin) from Equation A103, it could be obtained by numerical integration (Figure 3—figure supplement 1E, blue line). It is clear that the optimal mapping h is indeed sublinear in this case and it can be reasonably well approximated using the concave half of the logistic sigmoid function derived above, for the switching dynamics, even during silent periods (Figure 3—figure supplement 1E, red line versus black dots; Figure 3). There is a simple intuition for the sublinearity implied by h (following on from the intuition we gave for the sublinearity of the response after Equation A101): according to Equation A103, its derivative is proportional to the steady state posterior variance, which in turn is a non-negative monotonically decreasing function (Figure 3—figure supplement 1D), and hence the final non-linearity must be monotonically increasing and, importantly, concave – i.e. sublinear.


  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
    The Hippocampus Book
    1. P Andersen
    2. R Morris
    3. D Amaral
    4. T Bliss
    5. J O’Keefe
    Oxford University Press.
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
    Adaptation without parameter change: dynamic gain control in motion detection
    1. A Borst
    2. VL Flanagin
    3. H Sompolinsky
    Proceedings of the National Academy of Sciences of the United States of America 102:6172–6176.
  11. 11
  12. 12
  13. 13
  14. 14
    Membrane potential and firing rate in cat primary visual cortex
    1. M Carandini
    2. D Ferster
    Journal of Neuroscience 20:470–484.
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
    Fast network oscillations in the hippocampal CA1 region of the behaving rat
    1. J Csicsvari
    2. H Hirase
    3. A Czurkó
    4. A Mamiya
    5. G Buzsáki
    Journal of Neuroscience 19:RC20.
  21. 21
  22. 22
    Theoretical Neuroscience
    1. P Dayan
    2. LF Abbott
    The MIT press.
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
    Patterned activity in stratum lacunosum moleculare inhibits CA1 pyramidal neuron firing
    1. H Dvorak-Carbone
    2. EM Schuman
    Journal of Neurophysiology 82:3213–.3222.
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
    Retinal ganglion cells: a functional interpretation of dendritic morphology
    1. C Koch
    2. T Poggio
    3. V Torres
    Philosophical Transactions of the Royal Society B: Biological Sciences 298:227–263.
  59. 59
    Biophysics of Computation
    1. C Koch
    Oxford University Press.
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
    Translation-invariant orientation tuning in visual "complex" cells could derive from intradendritic computations
    1. BW Mel
    2. DL Ruderman
    3. KA Archie
    Journal of Neuroscience 18:4325–.4334.
  77. 77
  78. 78
    Theta oscillations decrease spike synchrony in the hippocampus and entorhinal cortex
    1. K Mizuseki
    2. G Buzsaki
    Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences, 369, 10.1098/rstb.2012.0530.
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
  86. 86
    Short-term facilitation as a normative consequence of presynaptic spike-rate adaptation
    1. J-P Pfister
    2. SC Surace
    In Cosyne Abstracts.
  87. 87
  88. 88
  89. 89
  90. 90
  91. 91
  92. 92
  93. 93
  94. 94
    R: A Language and Environment for Statistical Computing
    1. R Core Team
    Austria: R Foundation for Statistical Computing, Vienna.
  95. 95
  96. 96
  97. 97
    Parallel distributed processing, volume 1: explorations in the microstructure of cognition: foundations, MIT Press
    1. DE Rumelhart
    2. JL McClelland
    3. PDP Research Group
    Parallel distributed processing, volume 1: explorations in the microstructure of cognition: foundations, MIT Press.
  98. 98
  99. 99
  100. 100
  101. 101
    Simple models for reading neuronal population codes
    1. HS Seung
    2. H Sompolinsky
    Proceedings of the National Academy of Sciences of the United States of America 90:10749–10753.
  102. 102
    The variable discharge of cortical neurons: implications for connectivity, computation, and information coding
    1. MN Shadlen
    2. WT Newsome
    Journal of Neuroscience 18:3870–.3896.
  103. 103
  104. 104
    Neuronal arithmetic
    1. RA Silver
    Nature Reviews. Neuroscience 11:474–489.
  105. 105
  106. 106
  107. 107
  108. 108
  109. 109
  110. 110
  111. 111
  112. 112
  113. 113
  114. 114
  115. 115
    Advances in Neural Information Processing Systems 24
    1. BB Ujfalussy
    2. M Lengyel
    1188–1196, Active dendrites: adaptation to spike-based communication, Advances in Neural Information Processing Systems 24.
  116. 116
    Feed-forward inhibition in hippocampal microcircuits: adaptation to spike-based computation.
    1. BB Ujfalussy
    2. M Lengyel
    In Cosyne Abstracts.
  117. 117
    Dendritic coding of multiple sensory inputs in single cortical neurons in vivo
    1. Z Varga
    2. H Jia
    3. B Sakmann
    4. A Konnerth
    Proceedings of the National Academy of Sciences of the United States of America 108:15420–15425.
  118. 118
  119. 119
  120. 120
  121. 121
  122. 122
  123. 123
  124. 124
  125. 125
    Sharp wave-associated high-frequency oscillation (200 hz) in the intact hippocampus: network and intracellular mechanisms
    1. A Ylinen
    2. A Bragin
    3. Z Nádasdy
    4. G Jandó
    5. I Szabó
    6. A Sik
    7. G Buzsáki
    Journal of Neuroscience 15:30–46.
  126. 126

Decision letter

  1. Frances K Skinner
    Reviewing Editor; University Health Network, Canada

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

[Editors’ note: a previous version of this study was rejected after peer review, but the authors submitted for reconsideration. The first decision letter after peer review is shown below.]

Thank you for choosing to send your work entitled "Dendritic nonlinearities are tuned for efficient spike-based computations in cortical circuits" for consideration at eLife. Your full submission has been evaluated by Eve Marder (Senior editor) and three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the decision was reached after very extensive discussions between the reviewers. Based on our discussions and the individual reviews below, we have decided not to consider the present manuscript further at this time. However, we would be willing to consider a significantly revised resubmission (below).

While the reviewers all felt that the work was quite interesting, concern was raised in the ensuing discussions as to the unclear presentation of the essence of the work. In fact, the reviewers spent a considerable time discussing what they thought was the structure of the model, and what it means. The fact that the reviewers went back and forth over these issues indicates a lack of clarity in the Results section. Therefore, it was decided that the work needed significant revisions and significant effort would be required to make the work accessible to the general reader, not only in making the take home message clear, but also in explaining how and why detailed mathematical derivations are needed and used.

Overall, two major aspects require consideration, as summarized below with some suggestions. Additional detailed comments from the reviewers are appended below.

1) A clearer and simpler version of the paper is needed.

A clear line through the reasoning that will allow everyone to see the core of what has been done, and why, needs to be presented. Specifically, the paper needs to be transparent and math that obfuscates should be removed or simplified or explained. The authors are encouraged to present their essential results in a much more straightforward fashion to help elucidate their mathematical derivations.

A suggestion is to use a flow chart of the process/protocol (capturing presynaptic firing statistics, fitting, optimizing etc., and lay bare the limitations/assumptions – e.g., the output computation f(u) is assumed to be a linear function), describe the essential results in words, and point to the mathematical parts in methods/or a mathematical appendix of how and why. We understand and appreciate that you wish to publish the full mathematical treatment, but most eLife readers won't benefit from the math in its present form. The mathematical results could be made complete and readable in an Appendix, while freeing the Results to show the logic of the model and how it works. It looks complicated because of using the most general formulations (multi-state state transitions, and multi-variate normal with arbitrary state-dependent covariance), but the main results are obtained via somewhat simpler formulations. More detailed mathematical derivations could go into a mathematical appendix if necessary.

In this way, people (of diverse backgrounds) would be able to sink their teeth into the contribution, build on it, test it and so on.

2) Take-home message(s) and biological/biophysical intuition needs to be presented.

No biophysical model seems to be included, even simply. Everything seems to be folded into the optimal g(s), so that it is not obvious how best to make a biological/biophysical connection.

The derived 'optimal traces' do, in principle, bring a more stringent test to the theory, but in Figure 4 they have access to the whole pre-synaptic dynamics so the optimal trace can be replaced by the averaged pre-synaptic potential, while in Figure 5 the experimental pattern of stimulation is not sufficiently complex to really probe the detailed temporal structure predicted by the theory. The other figures support qualitative statements. This, for example, could be explicitly stated and rationalized for the reader.

Some aspects came through but because of the first point above, there was not always a clear consensus of what the authors intended as the main message(s) from the work. Could these intents (if appropriate) be better explained?

For example:i) Presynaptic neurons with correlated inputs should target neighboring regions of the dendritic tree so that the inputs can sum nonlinearly while presynaptic neurons with uncorrelated inputs should map onto different portions of the tree and the inputs sum linearly?ii) An optimizing aspect/principle that could ultimately predict the 'location structure' of synapses as a crucial test of their theory.iii) Optimally decoding the information present in the spikes to do the desired computation depends upon the correlationsiv) Addressing the question of why there are nonlinearities in the first place.v) An optimal way to recover the averaged presynaptic potential only from the pre-synaptic spike trains? Can the biophysics approximate this optimal 'decoding' operation?

Reviewer #1:

In this most interesting study, the authors propose a principle that dendritic nonlinearity is optimized to integrate synaptic inputs from the statistics of the presynaptic cells. This principle assumes that communication is mainly via spikes whereas computation is not 'punctate', thus identifying the bottleneck of dendritic processing. They go on to use models (clustered connectivity prediction) and experiment to show that this principle is followed.

The authors may have uncovered a fundamental principle, and as they state in the Discussion, "Once patterned dendritic stimulation over a broader and more realistic range of inputs becomes feasible, our theory will provide a principled method for dissecting the roles of presynaptic correlations vs. genuine nonlinear computations in shaping dendritic nonlinearities." Nice I think.

I have a few comments.

1) The authors could state that graded synaptic transmission exists (many examples in invertebrates), so that their principle may not apply there, I assume. Is it definitively known that graded synaptic transmission is not present in cortical circuits?

2) It would be helpful if the authors could expand on the discussion regarding biophysical substrate. That is, to fill in the blanks on how and why "NMDA spikes may provide a general solution". Was some modeling (not shown) done to be able to state this? Please provide some additional discussion/rationale/assumptions for making these statements.

3) Regarding inhibition, the authors discuss potential addition from the perspective of excitatory cells. However, perspectives from inhibitory cells are not expressed. Do the authors expect their principle to hold from the perspective of inhibitory cells (receiving excitatory and/or inhibitory inputs)? The authors should provide some discussion/rationale/assumptions of why this is not included/considered. I do not think it is a given that computation is only from the perspective of excitatory cells? For example, consider the reviews by Klausberger and Somogyi (Science 2008) and Chamberland and Topolnik (Frontiers in Neuroscience 2012).

Reviewer #2:

Nonlinear dendritic integration is often seen as a thorn in the foot. Following in vitro work by Larkum, Magee, Stuart, Hausser and others, it is now clear that dendrites introduce various types of nonlinearity to input integration. Further observations showed physiological patterns that appeared to tune specifically the dendritic nonlinearities. Computations that are carried out in the dendrites (as in Taylor et al. 2000) were then discovered. Thus, it has become increasingly difficult to ignore their computational role. The search for a theoretical role (mainly by Mel and coworkers), on the other hand, was mainly focused on rate-based computations. Yet, rate and spiking descriptions are often qualitatively different (as in plasticity or synfire chains). Understanding the key mechanisms for spike-based dendritic computations remains a fundamental problem, but a very difficult one.

In this article, Ujfalussy and coworkers address the hypothesis that nonlinear dendritic integration serves to decode the correlated pre-synaptic spike trains. They describe mainly three observations that back up this hypothesis. First, they show mathematically that the canonical model of nonlinear integration of Poirazi and Mel (P&M) approximates an optimal decoding of the switching between active and quiescent states. Second, they compared how different biophysically detailed models approximated an optimal decoding strategy. They found that synaptic input clustered on nonlinear dendrites performed better at this task than when the same input was lumped in the soma, or synapses randomly assigned to dendrites. Last, they show that in vitro measurements of dendritic integration in a specific cell type matched the optimal integration strategy expected to decode the input statistics of that particular cell in vivo. These are the three main observations, but a particular strength of this article is that it draws a clear top-down rhetoric and provides general estimates of (bayes) optimal spiking communication.

Although I don't subscribe to all the assumptions described here, they are sensible and worth a careful examination. Ujfalussy and coworkers offer a thorough and elegant treatment of the question relying substantially on most of the state-of-the-art experimental recordings relevant to this problem. The conclusions are well tested, often with very stringent criteria. The relevant literature is properly cited. The article is generally well written.

The main contributions will appeal to researchers coming from different approaches. The mathematical neuroscientist will respect the relation between the P&M model and optimal decoding. Those considering decoding spike train from a bayesian perspective will check the derivations in the supplementary material. Others coming from the literature on noise-correlation will appreciate that, with dendrites, some of the reasonings should be revised. Finally, the new theoretical role for dendritic nonlinearities can clearly be tested with present experimental techniques. For these reasons, I believe that even if the supposed role is later shown to be false or incomplete, the article will have had a strong impact on the community.

Reviewer #2 (Minor Comments):

Generally, confusion arises in the number of models considered. I think that, when a comparison between models is the main result, these different models should be mentioned at the beginning of the section. For instance in the last paragraph of the section “The form of the optimal nonlinearity depends on the statistics of presynaptic inputs” one suddenly learns of the actual comparisons being made.

On a first reading, I could not understand what was done in the first paragraph of the section” Nonlinear integration in cortical neurons is matched to their input statistics”. After reading the supplementary, it is clear, but I think a few sentences should be added to describe in a little more details what was actually compared. For instance, one can read from the first sentence that to test the theory, we need to measure presynaptic statistics and derive the optimal nonlinearity, then measure the nonlinearity in vitro and compare the nonlinearities. That is not what was done, I know now, but I could not rule that out from the paragraph as is. It is also difficult to understand the end of second paragraph of the same section on the first go.

What is the experimental stimulus protocol used in the uncaging experiments? How broad is the distribution of ISI? Evenly distributed on Figure 5c and g? Are each burst made of evenly spaced pre-synaptic encaging? Can you really say overfitting is small?

Why exactly the multi-state description in Eq. 11 and 19 if it is never used? Personally, it is only when I saw Eq. S67 that I finally reached the proper mathematical intuitions. Other than the need for simplicity, I am concerned that some of the conclusions would not hold in the multi-state scenario suggested by Eq. 11.

Figure 1: 1B is argued to offer a better fit to optimal than 1C. But there are two differences, correlation and scale. Could it be that in the uncorrelated case, the fit to optimal is not as good when the potential fluctuates on greater amplitudes? In fact, I don't understand why the uncorrelated case should not have a saturating dendritic nonlinearity to counteract the exponential spiking nonlinearity."

Reviewer #3:

The authors present a detailed mathematical analysis, computations and experimental results aimed at supporting the conclusion that dendritic nonlinearities used to integrate synaptic inputs in a neuron are optimized based upon the statistics of that neuron's presynaptic connections. In particular, the suggested result is that if a neuron's presynaptic connections are uncorrelated, sublinear responses are optimal. If, however, the firing of presynaptic neurons are correlated, supralinear responses are optimal for that neuron. If correct, I would find this conclusion to be interesting.

One of the main difficulties with the manuscript is that the details are not easily approachable. There are many pages of complicated mathematics that one must work through, and in many places there hasn't been much attempt to try to explain things in simpler terms to the general reader – I suspect that one will not be able to follow the details here unless one is already an expert.

While the major focus on presynaptic activity is in terms of spikes (and I will grant that a good explanation for this is given), I think it's a concern is that the focus on postsynaptic activity is in terms of subthreshold responses. Granted, the authors say that they can also perform their calculation if the firing rate is substituted for the subthreshold response. This seems like an inconsistency, however – if two or more neurons share synaptic inputs from overlapping sets of presynaptic neurons, and the correlations in the spiking patterns of the presynaptic neurons matter, why is it that correlations in the output spikes of those two or more neurons don't matter? This would seem to be a basic internal inconsistency in the manuscript.

In addition, while there is a huge amount of detail associated with the mathematical calculations, I found the details associated with the experiments to be a bit too sparse. It would have been useful, for example, to see an image of the neocortical pyramidal cell and the specific synapses which are being stimulated.

[Editors’ note: what now follows is the decision letter after the authors submitted for further consideration.]

Thank you for submitting your work entitled "Dendritic nonlinearities are tuned for efficient spike-based computations in cortical circuits" for peer review at eLife. Your submission has been favorably evaluated by Eve Marder (Senior Editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors.

The reviewers have discussed the reviews with one another and the Reviewing editor has drafted this decision to help you prepare a revised submission.

All of the reviewers are strongly positive about this work but they also want to ensure that the work is accessible, understandable and widely appreciated. They would like all aspects be made as clear as possible. Thus, pending the following revisions, we expect to accept the paper.

Please respond directly to the revisions on a point-by-point basis, revising the paper where needed.

1) Double check all equations in the mathematical appendix – there was some concern given potential sign errors in other equations (see below). Additional explanatory guidance through the mathematical steps should be provided for those readers who may be interested in following these details. The math is still very complicated.

2) Check and correct potential inconsistencies related to the description of the function s(t). In the subsection “Presynaptic statistics” it is still stated that in the limit in which the time step goes to zero the discrete version becomes binary, i.e., zero or 1. In the subsection “Inference and the optimal response”, however it's stated that s(t) is a sum of Dirac-delta functions. The two descriptions can't be consistent, of course, unless there is a division by a time step somewhere. There does not appear to be a careful definition in the manuscript, however.

3) Check the solution of equation (22) given in (23) and (24) for potential sign errors.

4) The additional details about the experiments were appreciated. However, a question arose about the blocking of NMDA channels. It's stated that substantially poorer fits were obtained for this case. Please confirm/clarify whether the postsynaptic filtering parameters were re-fit in this case?

5) It seems like a bit of an inconsistency that while a big point is made that presynaptic neurons must communicate with spikes, no spiking is considered in the postsynaptic neurons. If spiking is the only way that neurons can communicate (the authors' point), why should one care about the subthreshold response of the postsynaptic neuron? Since so much of the manuscript covers the subthreshold response, something explicit (explanation-wise) should be said about this in the manuscript (and we realize that they consider the case of firing rates).

6) Include a 'toy model' explanation (without math, or if possible, with straightforward enough math equations that most can follow) that summarizes the link between pairwise correlation (temporal) and dendritic integration (spatial) before any specific statistical models are discussed. This would help the reader follow the more detailed case, and avoid confusion in consideration of later statements in the paper. A good place for this might be at the beginning of the Results section.

Specifically, the way the text reads is confusing about when spatial correlations (for lack of a better term) between different presynaptic neurons are being discussed, and when temporal correlations between inputs are being considered. Part of the issue may be that the mathematical formulation in the main text completely hides issues of temporal correlation in the details. Thus one can be left wondering if one of the two types of correlations might be more important and if the theory could shed light on this.

For example, consider these comments from the manuscript:

"However, if presynaptic neurons became correlated the optimal response became nonlinear and the best linear response was unable to accurately follow the fluctuations in the input (Figure 1D)."

"Same as (A-C) but for presynaptic neurons exhibiting synchronized switches between a quiescent and an active state (D, bottom), introducing higher order correlations between the neurons." (Figure 2 caption)

"In particular, sublinear integration was optimal when presynaptic activities exhibited simple Gaussian random walks and thus they could not contain statistical dependencies beyond second order correlations (Figure 2A-C)."

The authors here are trying to provide an intuitive explanation as to why input correlations lead to the observed behavior and this, of course, is good and to be encouraged. In each case, however, what type of correlation is being thought of – spatial or temporal? Further elaboration might provide additional insight.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for submitting your work entitled "Dendritic nonlinearities are tuned for efficient spike-based computations in cortical circuits" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Eve Marder as the Senior Editor.

The reviewers have discussed the reviews with one another and the Reviewing editor has drafted this decision to help you prepare a revised submission.

The reviewers appreciated the revisions done by the authors, but it was felt that the intuition as provided was incomplete and the mathematical appendix was still not clearly presented. Specifically, it was noted that is unclear how the authors distinguish autocorrelation from cross-correlation (spatial) in the formalism and the text, and as this is a main point of the paper, it should be up front early on in the Discussion (and not just in caption of Figure 1 and later in Methods). Also, eq A91 has an exact solution (separable). In essence, the math appendix should only include what is needed to explain the methods – anything that is more extensive than needed should be either removed or simplified.

In summary, the authors should carefully go through the entire ms to ensure that these aspects are addressed in general. Please also note specific additional comments from the reviewers to address:

1) The manuscript still has trouble delineating spatial vs temporal correlations when it tries to explain different points. For example:

Introduction section: "We pursued this principle in understanding the integrative properties of individual cortical neurons, for which the relevant statistical input patterns are those characterising the dynamically evolving spike trains of their presynaptic partners." This sounds like the manuscript is concerned with temporal issues.

"The optimal response determined by this statistical model, for essentially any setting of parameters, was inherently nonlinear because the additional effect of a presynaptic spike depended on the pattern of spikes that had been previously received from the presynaptic population and thus the integrated effect of multiple spikes could not be computed as a simple linear sum of their individual effects in isolation." This also sounds to me as if a claim that the temporal pattern of inputs is what's leading to nonlinearity.

2) Regarding the errors in the mathematics, we thought that when the authors repaired the solution of Equation 27 they would also fix other instances of this equation in the manuscript. Equation 27 is recapitulated in the mathematical appendix as Equation A91, but the discussion around Equation A91 is clearly wrong. Somehow a different solution from the solution of Equation 27 is obtained; the answer should be equation A92 with Α = 0,Β = B and C = −A. Here’s how the solution goes: we start with


This equation is separable:


The left-hand size can be simplified by partial fractions:


so that


Both sides of the equation can now be integrated. Assuming 0 < ζ < 1, we get


or equivalently,




Then solving


We don’t know what the authors are trying to do with the rest of Section B.2.2, but it appears to be wrong. We also don’t know to what extent fixing this will affect the rest of the results.


Author response

[Editors’ note: the author responses to the first round of peer review follow.]

We included a thorough answer to all points raised by the reviewers. In particular, we substantially simplified the presentation of the paper by rewriting the introduction and the first part of the results section; adding a flowchart to Figure 1 giving a concise summary of the logic of the manuscript; and rewriting several parts of the Methods section to streamline it. Moreover, we highlighted the take-home messages at the beginning of the Discussion. In addition, we also included experiments using NMDA receptor blockade to provide more mechanistic insights; added a supplementary figure demonstrating that our stimulation protocol is sufficiently complex to probe dendritic integration in the relevant input range and added a supplementary figure where we compared the responses of different models to the averaged presynaptic potential instead of the optimal response.

[…] Overall, two major aspects require consideration, as summarized below with some suggestions. Additional detailed comments from the reviewers are appended below.

1) A clearer and simpler version of the paper is needed. A clear line through the reasoning that will allow everyone to see the core of what has been done, and why, needs to be presented. Specifically, the paper needs to be transparent and math that obfuscates should be removed or simplified or explained.

We substantially simplified the presentation of the math at the beginning of the paper and removed all math that is not essential to follow the logic of the presentation. We believe that the equations that remained in the main text (4 out of 7 in the previous version) are essential to introduce key concepts of the paper. These equations are now explained and illustrated in the text and presented in a highly simplified way.

The authors are encouraged to present their essential results in a much more straightforward fashion to help elucidate their mathematical derivations. A suggestion is to use a flow chart of the process/protocol (capturing presynaptic firing statistics, fitting, optimizing etc., and lay bare the limitations/assumptionse.g., the output computation f(u) is assumed to be a linear function),

As suggested, we added a flowchart presenting the logic of the paper to the first figure. This new panel also serves as a roadmap to the rest of the paper. In the flowchart, we highlighted our assumptions, illustrated the relationships between different parts of the paper and emphasized how specific figures in the paper support our conclusions. Moreover, we added a new paragraph describing the structure of the paper at the end of the Introduction and another paragraph in the Results that uses the flowchart to explain the logic of the Results section after introducing the key concepts at the beginning. We believe that these changes will substantially help the reader to follow the presentation of the paper.

[…] Describe the essential results in words, and point to the mathematical parts in methods/or a mathematical appendix of how and why.

We removed a substantial fraction of the equations from the main text (see above), and clarified the presentation in both the Results and Methods sections. Our main goal was to keep only those mathematical formulae in the main text that were absolutely essential to follow the logic of the paper. Those details that are only necessary to reproduce our results are included in the Methods section while further details of the derivations have been moved to the mathematical appendix.

We understand and appreciate that you wish to publish the full mathematical treatment, but most eLife readers won't benefit from the math in its present form. The mathematical results could be made complete and readable in an Appendix, while freeing the Results to show the logic of the model and how i