Optimal compensation for neuron loss
 Cited
 2
 Views
 1,229
 Comments
 0
Abstract
The brain has an impressive ability to withstand neural damage. Diseases that kill neurons can go unnoticed for years, and incomplete brain lesions or silencing of neurons often fail to produce any behavioral effect. How does the brain compensate for such damage, and what are the limits of this compensation? We propose that neural circuits instantly compensate for neuron loss, thereby preserving their function as much as possible. We show that this compensation can explain changes in tuning curves induced by neuron silencing across a variety of systems, including the primary visual cortex. We find that compensatory mechanisms can be implemented through the dynamics of networks with a tight balance of excitation and inhibition, without requiring synaptic plasticity. The limits of this compensatory mechanism are reached when excitation and inhibition become unbalanced, thereby demarcating a recovery boundary, where signal representation fails and where diseases may become symptomatic.
https://doi.org/10.7554/eLife.12454.001Introduction
The impact of neuron loss on information processing is poorly understood (Palop et al., 2006; Montague et al., 2012). Chronic diseases such as Alzheimer’s disease cause a considerable decline in neuron numbers (Morrison and Hof, 1997), yet can go unnoticed for years. Acute events such as a stroke and traumatic brain injury can kill large numbers of cells rapidly, yet the vast majority of strokes are ‘silent’ (Leary and Saver, 2003). Similarly, the partial lesion or silencing of a targeted brain area may ‘fail’ to produce any measurable, behavioral effect (Li et al., 2016). This resilience of neural systems to damage is especially impressive when compared to manmade computer systems that typically lose all function following only minor destruction of their circuits. A thorough understanding of the interplay between neural damage and information processing is therefore crucial for our understanding of the nervous system, and may also help in the interpretation of various experimental manipulations such as pharmacological silencing (Aksay et al., 2007; Crook and Eysel, 1992), lesion studies (Keck et al., 2008), and optogenetic perturbations (Fenno et al., 2011).
In contrast to the study of information representation in damaged brains, there has been substantial progress in our understanding of information representation in healthy brains (Abbott, 2008). In particular, the theory of efficient coding, which states that neural circuits represent sensory signals optimally given various constraints (Barlow, 1961; Atick, 1992; Olshausen and Field, 1996; Rieke et al., 1997; Simoncelli and Olshausen, 2001; Salinas, 2006), has successfully accounted for a broad range of observations in a variety of sensory systems, in both vertebrates (Simoncelli and Olshausen, 2001; Atick, 1992; Olshausen and Field, 1996; Smith and Lewicki, 2006; Greene et al., 2009) and invertebrates (Rieke et al., 1997; Fairhall et al., 2001; Machens et al., 2005). However, an efficient representation of information is of little use if it cannot withstand some perturbations, such as normal cell loss. Plausible mechanistic models of neural computation should be able to withstand the type of perturbations that the brain can withstand.
In this work, we propose that neural systems maintain stable signal representations by optimally compensating for the loss of neurons. We show that this compensation does not require synaptic plasticity, but can be implemented instantaneously by a tightly balanced network whose dynamics and connectivity are tuned to implement efficient coding (Boerlin et al., 2013; Denève and Machens, 2016). When too many cells disappear, the balance between excitation and inhibition is disrupted, and the signal representation is lost. We predict how much cell loss can be tolerated by a neural system and how tuning curves change shape following optimal compensation. We illustrate these predictions using three specific neural systems for which experimental data before and after silencing are available – the oculomotor integrator in the hindbrain (Aksay et al., 2000, 2007), the cricket cercal system (Theunissen and Miller, 1991; Mizrahi and Libersat, 1997), and the primary visual cortex (Hubel and Wiesel, 1962; Crook and Eysel, 1992; Crook et al., 1996, 1997, 1998). In addition, we show that many input/output nonlinearities in the tuning of neurons can be reinterpreted to be the result of compensation mechanisms within neural circuits. Therefore, beyond dealing with neuronal loss, the proposed optimal compensation principle expands the theory of efficient coding and provides important insights and constraints for neural population codes.
Results
The impact of neuron loss on neural representations
We begin by studying how neuron loss can influence neural representations. We assume that information is represented within neural systems such that it can be read out or decoded by a downstream area through a linear combination of neural firing rates. More specifically, we imagine a system of $N$ neurons that represents a set of signals $\mathbf{x}\left(t\right)=\left({x}_{1}\left(t\right),{x}_{2}\left(t\right),\dots ,{x}_{M}\left(t\right)\right)$. These signals may be timedependent sensory signals such as visual images or sounds, for instance, or, more generally, they may be the result of some computation from within a neural circuit. We assume that these signals can be read out from a weighted summation of the neurons’ instantaneous firing rates, ${r}_{i}(t)$, such that
where $\hat{\mathbf{x}}\left(t\right)$ is the signal estimate and $\mathbf{D}}_{i$ is a $M$dimensional vector of decoding weights that determines how the firing of the $i$th neuron contributes to the readout. Such linear readouts are used in many contexts and broadly capture the integrative nature of dendritic summation.
What happens to such a readout if one of the neurons is suddenly taken out of the circuit? A specific example is illustrated in Figure 1A. Here the neural system consists of only two neurons that represent a single, scalar signal, $x\left(t\right)$, in the sum of their instantaneous firing rates, $\hat{x}\left(t\right)={r}_{1}\left(t\right)+{r}_{2}\left(t\right)$. If initially both neurons fired at 150 Hz, yielding a signal estimate $\hat{x}=300$, then the sudden loss of one neuron will immediately degrade the signal estimate to $\hat{x}=150$. More generally, in the absence of any corrective action, the signal estimate will be half the size it used to be.
There are two solutions to this problem. First, a downstream or readout area could recover the correct signal estimate by doubling its decoder weight for the remaining neuron. In this scenario, the twoneuron circuit would remain oblivious to the loss of a neuron, and the burden of adjustment would be borne by downstream areas. Second, the firing rate of the remaining neuron could double, so that a downstream area would still obtain the correct signal estimate. In this scenario, the twoneuron circuit would correct the problem itself, and downstream areas would remain unaffected. While actual neural systems may make use of both possibilities, here we will study the second solution, and show that it has crucial advantages.
The principle of optimal compensation
To quantify the ability of a system to compensate for neuron loss, we will assume that the representation performance can be measured with a simple costbenefit tradeoff. We formalize this tradeoff in a loss function,
Here, the first term quantifies the signal representation error—the smaller the difference between the readout, $\hat{\mathbf{x}}$, and the actual signals, $\mathbf{x}$, the smaller the representation error. The second term quantifies the cost of the representation and is problemspecific and systemspecific. Generally, we assume that the costs are small if the signal representation (1) is shared amongst all neurons and (2) uses as few spikes as possible. The parameter $\beta $ determines the tradeoff between this cost and the representation error. Quadratic loss functions such as Equation 2 have been a mainstay of efficient coding theories for many years, for example, in stimulus reconstruction (Rieke et al., 1997) and in sparse coding (Olshausen and Field, 1996).
The minimum of our loss function indicates the set of firing rates that represent a given signal optimally. In the twoneuron example in Figure 1A, the initial minimum is reached if both neurons fire equally strong (assuming a cost of $C\left(\mathbf{r}\right)={r}_{1}^{2}+{r}_{2}^{2}$). If we kill neuron 1, then we can minimize the loss function under the constraint that ${r}_{1}=0$, and the minimum is now reached if neuron 2 doubles its firing rates (assuming a small tradeoff parameter $\beta $). The induced change in the firing rate of neuron 2 preserves the signal representation, and therefore compensates for the loss of neuron 1. More generally, if the firing rates of a neural system represent a given signal optimally with respect to some loss function, then we define 'optimal compensation' as the change in firing rates necessary to remain at the minimum of that loss function.
Clearly, a system’s ability to compensate for a lost neuron relies on some redundancy in the representation. If there are more neurons than signals, then different combinations of firing rates may represent the same signal (see Figure 1A, dashed line, for the twoneuron example). Up to a certain point (which we will study below), a neural system can always restore signal representation by adjusting its firing rates.
Optimal compensation through instantaneous restoration of balance
Next, we investigate how this compensation can be implemented in a neural circuit. One possibility is that a circuit rewires through internal plasticity mechanisms in order to correct for the effect of the lost neurons. However, we find that plasticity is not necessary. Rather, neural networks can be wired up such that the dynamics of their firing rates rapidly evolves into the minimum of the loss function, Equation 2 (Hopfield, 1984; Dayan and Abbott, 2001; Rozell et al., 2008). Such dynamics can be formulated not just for rate networks, but also for networks of integrateandfire neurons (Boerlin et al., 2013). The step towards spiking neurons will lead us to two crucial insights. First, we will link the optimal compensation principle to the balance of excitation and inhibition. Second, unlike networks of idealized rate units, networks of spiking neurons can only generate positive firing rates. We will show that this constraint alters the nature of compensation in nontrivial ways.
In order to move towards spiking networks, we assume that the instantaneous firing rates, ${r}_{i}\left(t\right)$, are equivalent to filtered spike trains, similar to the postsynaptic filtering of actual spike trains (see Materials and methods for details). Specifically, a spike fired by neuron $i$ contributes a discrete unit to its instantaneous firing rate, ${r}_{i}\to {r}_{i}+1$, followed by an exponential decay. Next, we assume that each neuron will only fire a spike if the resulting change in the instantaneous firing rate reduces the loss function (Equation 2). From these two assumptions, the dynamics and connectivity of the network can be derived (Boerlin et al., 2013; Bourdoukan et al., 2012; Barrett et al., 2013). Even though neurons in these networks are ordinary integrateandfire neurons, the voltage of each neuron acquires an important functional interpretation, as it represents a transformation of the signal representation error, $\mathbf{x}\hat{\mathbf{x}}$ (see Materials and methods).
If we apply this formalism to the twoneuron example in Figure 1, we obtain a network with two integrateandfire neurons driven by an excitatory input signal and coupled by mutual inhibition (Figure 1B). If we neglect the cost term, then the neuron’s (subthreshold) voltages directly reflect the difference between the scalar input signal, $x\left(t\right)$, and its linear readout, $\hat{x}\left(t\right)$, so that ${V}_{i}=D\left(x\hat{x}\right)$ for $i=\{1,2\}$ (Figure 1C). Initially, the excitatory input signal depolarizes the voltages of both neurons, which reflects an increase in the signal representation error. The neuron that reaches threshold first produces a spike and corrects the signal readout, $\hat{x}\left(t\right)$. In turn, the errors or voltages in both neurons are reset, due to the selfreset of the spiking neuron, and due to fast inhibition of the other neuron. This procedure repeats, so that both neurons work together, and take turns at producing spikes (Figure 1C). Now, when one neuron dies (Figure 1C, red arrow), the remaining neuron no longer receives inhibition from its partner neuron and it becomes much more strongly excited, spiking twice as often.
These ideas can be scaled up to larger networks of excitatory and inhibitory neurons, as illustrated in Figure 2. Here a homogeneous population of excitatory neurons, again modeled as leaky integrateandfire neurons (Figure 2A), is driven by a timevarying input signal, $x\left(t\right)$. Just as in the simplified twoneuron example, the excitatory neurons take turns at generating spikes, such that the linear readout of their spike trains, Equation 1, yields an estimate $\hat{x}\left(t\right)$ that minimizes the loss function from Equation 2 (Figure 2D).
Mechanistically, the excitatory population receives an excitatory input signal, $x\left(t\right)$, that is matched by an equally strong inhibitory input. This inhibitory input is generated by a separate population of inhibitory interneurons, and is effectively equivalent to the (rerouted) signal estimate, $\hat{x}\left(t\right)$. As a result, the (subthreshold) voltage of each excitatory neuron is once again determined by the signal representation error, $x\left(t\right)\hat{x}\left(t\right)$, and a neuron will only fire when this error exceeds its (voltage) threshold. By design, each spike improves the signal estimate, and decreases the voltages. The spike rasters of both populations are shown in Figure 2E.
Since the signals are excitatory and the signal estimates are inhibitory, the tight tracking of the signal by the signal estimate corresponds to a tight tracking of excitation by inhibition. The network is therefore an example of a balanced network in which inhibition and excitation track each other spike by spike (Denève and Machens, 2016). Importantly, the excitatoryinhibitory (EI) balance is a direct correlate of the signal representation accuracy (see Materials and methods for mathematical details).
Now, when some portion of the excitatory neural population is eliminated (Figure 2B), the inhibitory neurons receive less excitation, and in turn convey less inhibition to the remaining excitatory neurons. Consequently, the excitatory neurons increase their firing rates, and the levels of inhibition (or excitation) readjust automatically such that EI balance is restored (see Figure 2G for an example neuron). The restoration of balance guarantees that the signals continue to be represented accurately (Figure 2D). This compensation happens because each excitatory neuron seeks to keep the signal representation error in check, with or without the help of the other excitatory neurons. When one neuron dies, the remaining excitatory neurons automatically and rapidly assume the full burden of signal representation.
Similar principles hold when part of the inhibitory population is knocked out (Figure 2C). The inhibitory population here is constructed to faithfully represent its ‘input signals’, which are given by the postsynaptically filtered spike trains of the excitatory neurons. Consequently, the inhibitory population acts similarly to the excitatory population, and its response to neuron loss is almost identical. When inhibitory neurons are knocked out, the remaining inhibitory neurons increase their firing rates in order to restore the EI balance. Again, the compensation happens because each inhibitory neuron seeks to keep its signal representation error in check, independent of the other inhibitory neurons.
The recovery boundary
Naturally, the recovery from neural loss is not unlimited. As we will show now, the resultant recovery boundary is marked by a breakdown in the balance of excitation and inhibition. In Figure 2, this recovery boundary coincides with a complete knockout of either the excitatory or the inhibitory population. The nature of the recovery boundary becomes more complex, however, if a network tracks more than one signal.
In Figure 3 we show an example network that represents two sinusoidal signals, $\mathbf{x}\left(t\right)=\left({x}_{1}\left(t\right),{x}_{2}\left(t\right)\right)$. The spikes of each neuron contribute to two readouts, and the decoding weights $\mathbf{D}}_{i$ of the neurons correspond to twodimensional vectors (Figure 3D). The two signals and the corresponding signal estimates are shown in Figure 3E, and the spike trains of the excitatory population are shown in Figure 3F. (Here and in the following, we will focus on neuron loss in the excitatory population only, and replace the inhibitory population by direct inhibitory connections between excitatory neurons, see Figure 3A–C. This simplification does not affect the compensatory properties of the network, see Materials and methods.)
When part of the excitatory population is lost (Figure 3B,D), the remaining neurons compensate by adjusting their firing rates, and the signal representation remains intact (Figure 3E,F, first k.o.). However, when too many cells are eliminated from the network (Figure 3C,D), some portions of the signal can no longer be represented, no matter how the remaining neurons change their firing rates (Figure 3E,F, second k.o.). In this example, the recovery boundary occurs when all the neurons with negativevalued readout weights along the ${x}_{1}$axis have been lost, so that the network can no longer represent the negative component of the first signal (Figure 3E, second k.o.). The representation of signal ${x}_{2}\left(t\right)$ is largely unaffected, so the deficits are only partial.
The recovery boundary is characterized by the network’s inability to correct signal representation errors. Since the neurons’ voltages correspond to transformations of these signal representation errors, the recovery boundary occurs when some of these voltages run out of bounds (Figure 3G, arrows). Such aberrations of the voltages are caused by an excess of either positive or negative currents. Indeed, the ratio of positive to negative currents becomes disrupted after the second k.o., precisely when ${x}_{1}\left(t\right)$ becomes negative (Figure 3H). Whereas a neuron can always ‘eliminate’ a positive, depolarizing current by spiking, it cannot eliminate any negative, hyperpolarizing currents, and these latter currents therefore reflect signal representation errors that are not eliminated within the network.
Here, positive currents are generated by excitatory inputs, whereas negative currents are generated by inhibitory inputs as well as the selfreset currents after an action potential. The selfreset currents become negligible in networks in which the number of neurons, $N$, is much larger than the number of input signals, $M$ (see Figure 2; see also Materials and methods). Even in Figure 3 this is approximately the case, so that the ratio of positive to negative currents is similar to the ratio of excitatory to inhibitory synaptic inputs (Figure 3I). The recovery boundary is then signified by a dramatic breakdown in the balance of excitatory and inhibitory currents.
In this simple example, the recovery boundary emerged because neurons were knocked out in a systematic and ordered fashion (according to the features that they represented). Such a scenario may correspond to lesions or knockouts in systems in which neural tuning is topologically organized. If the representational features of neurons are randomly interspersed, we find a different scenario. Indeed, when neurons are knocked out at random in our model, the network can withstand the loss of many more neurons. We find that signals are well represented until only a few neurons remain (Figure 3J).
At the recovery boundary, the few remaining neurons will shoulder the full representational load. This can easily become an unwieldy situation because the firing rates of the remaining neurons must become unrealistically large to compensate for all the lost neurons. In reality, neural activity will saturate at some point, in which case the recovery boundary occurs much earlier. For instance, we find that the example network tolerates up to $70$–$80\%$ of random neuron loss if the maximum firing rate of a neuron is unbounded (Figure 3J), whereas it will only tolerate $40$–$50\%$ of random neuron loss if the maximum firing rate is 80 Hz (Figure 3K).
The influence of idle neurons on tuning curve shapes
So far we have considered what happens when neurons are lost permanently in a neural circuit, either because of their death or because of experimental interventions. We have shown that networks can compensate for this loss up to a recovery boundary. Independent of whether compensation is successful or not, it is always marked by changes in firing rates. We will now show how to quantify and predict these changes.
To do so, we first notice that neurons can also become inactive temporarily in a quite natural manner. In sensory systems, for instance, certain stimuli may inhibit neurons and silence them. This natural, temporary inactivity of neurons leads to similar, ‘compensatory’ effects on the population level in our networks. These effects will provide important clues to understanding how neurons’ firing rates need to change whenever neurons become inactive, whether in a natural or nonnatural context.
For simplicity, let us focus on constant input signals, $\mathbf{x}\left(t\right)=\mathbf{x}$, and study the average firing rates of the neurons, $\overline{\mathrm{r}}$. For constant inputs, the average firing rates must approach the minimum of the same loss function as in Equation 2 so that,
where the signal estimate, $\hat{\mathbf{x}}={\sum}_{i}{\mathbf{D}}_{i}{\overline{\mathrm{r}}}_{i}$, is formed using the average firing rates. This minimization is performed under the constraint that firing rates must be positive, since the firing rates of spiking neurons are positive valued quantities, by definition. Mathematically, the corresponding optimization problem is known as ‘quadratic programming’ (see Materials and methods for more details). Traditionally, studies of population coding or efficient coding assume both positive and negative firing rates (Dayan and Abbott, 2001; Olshausen and Field, 1996; Simoncelli and Olshausen, 2001), and the restriction to positive firing rates is generally considered an implementational problem (Rozell et al., 2008). Surprisingly, however, we find that the constraint changes the nature of the solutions and provides fundamental insights into the shapes of tuning curves in our networks, as clarified below.
An example is shown in Figure 4 where we focus on networks of increasing complexity that represent two signals. For simplicity, we keep the second signal constant (‘fixed background’ signal, $x}_{2}={r}_{B$) and study the firing rates of all neurons as a function of the first signal, ${x}_{1}=x$. Solving Equation 3 for a range of values, we obtain $N$ neural tuning curves, $\overline{\mathbf{r}}\left({x}_{1}\right)=\left({r}_{1}\left({x}_{1}\right),{r}_{2}\left({x}_{1}\right),\dots \right)$ (Figure 4, second column). These tuning curves closely match those measured in simulations of the corresponding spiking networks (Figure 4, third and fourth column; see supplementary materials for mathematical details).
We observe that the positivity constraint produces nonlinearities in neural tuning curves. We illustrate this using a twoneuron system with two oppositevalued readout weights for the first signal (Figure 4A, first column). At signal value $x=0$, both neurons fire at equal rates so that the readout $\hat{x}$, Equation 1, correctly becomes $\hat{x}\propto \left({\overline{r}}_{1}{\overline{r}}_{2}\right)=0$. When we move to higher values of $x$, the firing rate of the first neuron, $\overline{r}}_{1$, increases linearly (Figure 4A, second column, orange line), and the firing rate of the second neuron, $\overline{r}}_{2$, decreases linearly (Figure 4A, second column, blue line), so that in each case, $\hat{x}\approx x$. Eventually, around $x=0.4$, the firing rate of neuron two hits zero (Figure 4A, black arrow). At this point, neuron two has effectively disappeared from the network. Since it cannot decrease below zero, and because the estimated value $\hat{x}$ must keep growing with $x$, the firing rate of neuron one must grow at a faster rate. This causes a kink in its tuning curve (Figure 4A, black arrow). This kink in the tuning curve slope is an indirect form of optimal compensation, where the network is compensating for the temporary silencing of a neuron.
More generally, the firing rates of neurons are piecewise linear functions of the input signals. In Figure 4B,C, every time one of the neurons hits the zero firing rate lower bound, the tuning curve slopes of all the other neurons change. We furthermore note that the tuning curves also depend on the form of the cost terms, $C\left(\overline{\mathbf{r}}\right)$. If there is no cost term, there are many equally favorable firing rate combinations that produce identical readouts (Figure 1A). The precise choice of a cost term determines which of these solutions is found by the network. We provide further geometric insights into the shape of the tuning curves obtained in our networks in Figure 4—figure supplements 1–3 and in the Materials and methods.
Tuning curves before and after neuronal silencing
We can now calculate how tuning curves change shape in our spiking network following neuronal loss. When a set of neurons are killed or silenced within a network, their firing rates are effectively set to zero. We can include this silencing of neurons in our loss function by simply clamping the respective neurons’ firing rates to zero:
where $X$ denotes the set of healthy neurons and $Y$ the set of dead (or silenced) neurons. This additional clamping constraint is the mathematical equivalent of killing neurons. In turn, we can study the tuning curves of neurons in networks with knockedout neurons without having to simulate the corresponding spiking networks every single time.
As a first example, we revisit the network shown in Figure 4C, which represents a signal ${x}_{1}=x$, and a constant ‘background’ signal $x}_{2}={r}_{B$. This network exhibits a complex mixture of tuning curves, i.e., firing rates as a function of the primary signal ${x}_{1}=x$, with positive and negative slopes, and diverse threshold crossings (Figure 5B). When we knock out a subset of neurons (Figure 5A, neurons labeled ‘C’), we find that neurons with similar tuning to the knocked out neurons increase their firing rates and dissimilarlytuned neurons decrease their firing rates, especially for negative values of the signal $x$ (Figure 5C, left). In this way, signal representation is preserved as much as possible (Figure 5C, right, black line). In comparison, a network that does not change its tuning curves after the cell loss has drastically worse representation error (Figure 5C, right, dashed line). Once all the neurons with positive tuning curve slopes are killed (Figure 5A, neurons labeled ‘D’), we cross the recovery boundary, and even optimal compensation can no longer preserve the representation of negativelyvalued signals (Figure 5D).
Next, we investigate these compensatory mechanisms in neural systems that have bellshaped tuning curves (BenYishai et al., 1995). Our framework captures these cases if we assume that a network represents circular signals embedded in a twodimensional space, $\mathbf{x}=\left(\mathrm{cos}\left(\theta \right),\mathrm{sin}\left(\theta \right)\right)$, where $\theta $ is the direction of our signal. To do so, we construct a network in which the decoding weights of the neurons are spaced around a circle, with some irregularity added (Figure 5E). As a result, we obtain an irregular combination of bellshaped tuning curves (Figure 5F), similar to those found in the primary visual or primary motor cortex, with each neuron having a different preferred direction and maximum firing rate as determined by the value of the decoding weights. We note that, apart from the smaller number of neurons and the greater irregularity of the decoding weights, this network is similar to the spiking network investigated in Figure 3.
If we now kill neurons within a specific range of preferred directions (Figure 5E), then the remaining neurons with the most similar directions increase their firing rates and shift their tuning curves towards the preferred directions of the missing neurons (Figure 5G, left). Furthermore, neurons further away from the knockedout region will slightly skew their tuning curves by decreasing their firing rates for directions around zero (Figure 5F,G, arrows). Overall, the portion of space that is underrepresented following cell loss becomes populated by neighboring neurons, which thereby counteract the loss. In turn, the signal representation performance of the population is dramatically improved following optimal compensation compared to a system without compensatory mechanisms (Figure 5G, right). Once all neurons with positive readout weights along the first axis are killed, the network can no longer compensate, and signal representation fails, despite the strong changes in firing rates of the neurons (Figure 5H). This latter scenario is equivalent to the one we observed in Figure 3 after the second knockout.
The ability of the system to compensate for the loss of neurons depends on the redundancy of its representation (although redundancy alone is not sufficient for compensation). If our network consists of only four neurons with equally spaced readout weights (Figure 5I), then it cannot compensate for the loss of neurons. Rather, we find that this model system crosses the recovery boundary following the loss of a single neuron, and so, optimal compensation does not produce any changes in the remaining neurons (Figure 5K, left). It occurs because the four neurons exactly partition the four quadrants of the signal space, and the remaining neurons cannot make any changes that would improve the signal representation.
Comparison to experimental data
Does optimal compensation happen in real neural systems? Our framework makes strong predictions for how firing rates should change following neuronal knockouts. Unfortunately, there is currently little data on how neural firing rates change directly after the loss of neurons in a given system. However, we found data from three systems—the oculomotor integrator in the hindbrain, the cricket cercal system, and the primary visual cortex—which allows us to make a first comparison of our predictions to reality. We emphasize that a strict test of our theory is left for future work.
The first system is the horizontal velocitytoposition integrator of the vertebrate oculomotor system, which is responsible for horizontal eye fixations, and which is comparable to the network model in Figure 5A–D. This system is usually considered to represent the signals that need to be sent to the two muscles (lateral and medial rectus) that control horizontal eye movements. In this case, the signal ${x}_{1}$ corresponds to the eye position, controlled by the difference in muscular activity, with zero representing the central eye position, positive values representing rightside eye positions and negative values representing leftside eye positions. The fixed background signal corresponds to the net sum of the two muscle activities, which we assume to remain constant for simplicity.
We find that the tuning curves of neurons measured in the right half of the oculomotor system (Aksay et al., 2000) (Figure 6A) are similar to the positivelysloped tuning curves in our network model calculated using Equation 3 (Figure 6B). In both cases, neurons that encode rightside eye positions have positive slopes, and these follow a recruitment order, where neurons with shallow tuning curve slopes become active before neurons with steep tuning curve slopes (Fuchs et al., 1988; Aksay et al., 2000; Pastor and GonzalezForero, 2003) (Figure 6A,B inset). Similar observations hold for the left half of the oculomotor system, which has negativelysloped tuning curves (data not shown).
Now, when the left half of the oculomotor system is inactivated using lidocaine and muscimol injections (Aksay et al., 2007), the system can still represent rightside eye positions, but is unable to represent leftside eye positions (Figure 6C). In our network model, this inactivation corresponds to eliminating all negativelysloped tuning curves (Figure 5D). The optimal compensation ensures that the right part of the signal is still correctly represented, but the left part can no longer be represented due to the lack of negatively tuned neurons (Figure 5D and Figure 6D). Hence, these measurements are consistent with our optimal compensation and recovery boundary predictions.
The second system is a small sensory system called the cricket (or cockroach) cercal system, which represents wind velocity (Theunissen and Miller, 1991), and which is comparable to the network model in Figure 5I–K. In the case of the cercal system, $\theta $ is the winddirection. The tuning curves we obtain are similar to those measured, with each neuron having a different, preferred wind direction (Theunissen and Miller, 1991), see Figure 6E. The similarity between measured tuning curves (Figure 6E) and predicted tuning curves (Figure 5J) suggests that we can interpret cercal system neurons to be optimal for representing wind direction using equally spaced readout weights.
When one neuron in the cercal system is killed, the remaining neurons do not change their firing rates (Libersat and Mizrahi, 1996; Mizrahi and Libersat, 1997), see Figure 6F. This is analogous to our predictions (Figure 5K). Indeed, our model of the cercal system has no redundancy (see Materials and methods for more details). The cercal system in all its simplicity is therefore a nice example of a system that exists on the ‘edge’ of the recovery boundary.
A highdimensional example: optimal compensation in V1
The relatively simple models we have described so far are useful for understanding the principle of optimal compensation, but they are unlikely to capture the complexity of large neural populations in the brain. Even though the network model in Figure 5E–G resembles common network models for the primary visual or primary motor cortex, it does not capture the highdimensional nature of representations found in these areas. To investigate the compensatory mechanisms in more complex networks, we make a final generalization to systems that can represent highdimensional signals, and for concreteness we focus on the (primary) visual cortex, which is our third test system.
The visual cortex is generally thought to construct representations of images consisting of many pixel values (Figure 7A). The tuning of V1 simple cells, for instance, can largely be accounted for by assuming that neural firing rates provide a sparse code of natural images (Olshausen and Field, 1996; Simoncelli and Olshausen, 2001). In accordance with this theory, we choose a model that represents image patches (size $12\times 12$, corresponding to $M=144$ dimensions), and we use decoding weights that are optimized for natural images (see Materials and methods). Neurons in this model are tuned to both the orientation and polarity of edgelike images (Figure 7D), where the polarity is either a bright edge with dark flanks, or the opposite polarity – a dark edge with bright flanks. Orientation tuning emerges because natural images typically contain edges at many different orientations, and a sparse code captures these natural statistics (Olshausen and Field, 1996; Simoncelli and Olshausen, 2001). Polarity tuning emerges as a natural consequence of the positivity constraint, because a neuron with a positive firing rate cannot represent edges at two opposing polarities. Similar polarity tuning has been obtained before, but with an additional constraint that decoding weights be strictly positive (Hoyer, 2003, 2004).
As before, we compute the firing rates of our neural population by solving Equation 3, which provides a convenient approximation of the underlying spiking network. We then silence all neurons with a vertical orientation preference (Figure 7E) and calculate the resulting changes in firing rates in the network. Without optimal compensation, (i.e., without any changes in firing rates), we find that the silencing of an orientation column damages image representation, especially for image components that contain edges parallel to the preferred orientations of the dead neurons (Figure 7B, orange arrow). When the population implements optimal compensation, the firing rates of many neurons change, and the image representation is recovered (Figure 7C).
To illustrate the nature of the network compensation, we study how the tuning of neurons changes when part of the population is lost. This focus will allow us to compare our predictions to experimental data. The tuning curves of all cells in response to the directions of various gratings are shown in Figure 8A. If we knock out a (small) subset of the cells, e.g., 50% of all neurons with preferred directions around $\theta ={0}^{\circ}$, then the firing rates of several of the remaining neurons increase at the preferred direction of the silenced cells (Figure 8B,D). As a consequence, the preferred directions of many cells shift toward the preferred direction of the silenced cells (Figure 8C). Unlike the simple example in Figure 5G, however, these shifts occur in neurons throughout the population, and not just in neurons with neighboring directions. For the affected stimuli, the shifts decrease the representation errors compared to a system without compensation (Figure 8E).
The more complicated pattern of firing rate changes occurs for two reasons. First, neurons represent more than the direction of a grating—they represent a 12 × 12 image patch, i.e., a vector in a highdimensional space (Figure 8F). Such a vector will generally be represented by the firing of many different neurons, each of which contributes with its decoding vector, ${\mathbf{\mathbf{D}}}_{i}$, weighted by its firing rate, ${r}_{i}$. However, these neurons will rarely point exactly in the right (stimulus) direction, and only their combined effort allows them to represent the stimulus. When some of the neurons contributing to this stimulus representation are knocked out, the network will recruit from the pool of all neurons whose decoding vectors are nonorthogonal to the stimulus vector. In turn, each of these remaining neurons gets assigned a new firing rate such that the cost term in Equation 2 is minimized. Second, highdimensional spaces literally provide a lot of space. Whereas stacking 20 neurons in a twodimensional space (as in Figure 5E,F) immediately causes crowding, with neighboring neurons sharing a substantial overlap of information, stacking e.g. 1000 neurons in a 100dimensional space causes no such thing. Rather, neurons can remain almost independent, while sharing only small amounts of information. As a consequence, not a single pair of the naturally trained decoding weights in our model will point in similar directions. In other words, when we knock out a set of neurons, there are no similarly tuned neurons that could change their firing rates in order to restore the representation. Rather, the compensation for a single lost neuron will require the concerted effort of several other neurons, and many of the required changes can be small and subtle. However, all of these changes will seek to restore the lost information, and cells will shift their tuning towards that of the lost neurons.
As a consequence of this additional complexity, we find that firing rates at the antipreferred direction also increase (Figure 8B,D). A key reason for this behavior is that many neurons are strongly tuned to orientation, but only weakly tuned to direction, so that eliminating neurons with ${0}^{\circ}$ degree direction preference also eliminates some of the representational power at the antipreferred direction.
Finally, although our model is simply a (positive) sparse coding model, we note that most of its predictions are consistent with experiments in the cat visual cortex, in which sites with specific orientation or direction tuning are silenced using GABA (Crook et al., 1996, 1997, 1998) (Figure 9 and Figure 9—figure supplement 1), while recording the firing rates of nearby neurons. Neurons that escape silencing in the cat visual cortex also shift their tuning curves towards the preferred directions of the silenced neurons (Figure 9A). More specifically, three types of changes were found: (1) Neurons whose preferred direction was different to that of the silenced neurons (PD$>{22.5}^{\circ}$) will increase their firing rates at the knockedout direction and broaden their tuning (see Figure 9A, top row for an example). (2) Neurons whose preferred direction was opposite to the silenced neurons will increase their firing rates towards the knocked out direction (see Figure 9A, middle row for an example). (3) Neurons whose preferred direction was similar to the silenced neurons (PD$<{22.5}^{\circ}$) will either increase their firing to the knocked out direction (see Figure 9A, bottom row) or decrease their firing to the knocked out direction (data not shown).
In our sparse coding model, we find increases in firing rates to the knockedout directions in all three types of neurons (with different direction preference, opposite direction preference, or similar direction preference), see Figure 9B. We do not observe a decrease in firing to the knockedout directions. While this mismatch likely points towards the limits of our simplified model (which models a relatively small set of simple cells) in comparison with the data (which examines both simple and complex cells in both V1 and V2), it does not violate the optimal compensation principle. Due to the complexity of interactions in a highdimensional space, a decrease in firing of a few neurons to the missing directions is not a priori ruled out in our model, as long as there is a net increase to the missing directions across the whole population. Indeed, when increasing the redundancy in our V1 model (e.g. by moving from $N=288$ neurons to $N=500$ neurons while keeping the dimensionality of the stimulus fixed), we find that several neurons start decreasing their firing rates in response to the knockedout directions, even though the average firing rate changes in the population remain positive (data not shown).
We emphasize that there are only a few free parameters in our model. Since the decoding weights are learnt from natural images using the (positive) sparse coding model, we are left with the dimensionality of the input space, the number of neurons chosen, and the number of neurons knocked out. The qualitative results are largely independent of these parameters. For instance, the precise proportion of neurons knocked out in the experimental studies (Crook and Eysel, 1992) is not known. However, for a range of reasonable parameter choices (50–75% knock out of all neurons with a preferred direction), our predicted tuning curve changes are consistent with the recorded population response to neuron silencing (Crook and Eysel, 1992) (Figure 9—figure supplement 1B). Furthermore, if we increase the redundancy of the system by scaling up the number of neurons, but not the number of stimulus dimensions, we shift the recovery boundary to a larger fraction of knockedout neurons, but we do not change the nature of the compensatory response (Figure 9—figure supplement 2A,B). At high degrees of redundancy (or socalled ‘overcompleteness’), quantitive fluctuations in tuning curve responses are averaged out, indicating that optimal compensation becomes invariant to overcompleteness (Figure 9—figure supplement 2C,D). Given the few assumptions that enter our model, we believe that the broad qualitative agreement between our theory and the data is quite compelling.
Discussion
To our knowledge, optimal compensation for neuron loss has not been proposed before. Usually, cell loss or neuron silencing is assumed to be a wholly destructive action, and the immediate neural response is assumed to be pathological, rather than corrective. Synaptic plasticity is typically given credit for the recovery of neural function. For example, synaptic compensation has been proposed as a mechanism for memory recovery following synaptic deletion (Horn et al., 1996), and optimal adaptation following perturbations of sensory stimuli (Wainwright, 1999) and motor targets (Shadmehr et al., 2010; Braun et al., 2009; Kording et al., 2007) has also been observed, but on a slow time scale consistent with synaptic plasticity. In this work, we have explored the properties of an instantaneous, optimal compensation, that can be implemented without synaptic plasticity, on a much faster time scale, through balanced network dynamics.
Our work is built upon a connection between two separate theories: the theory of balanced networks, which is widely regarded to be the standard model of cortical dynamics (van Vreeswijk and Sompolinsky, 1996, 1998; Amit and Brunel, 1997; Renart et al., 2010; Denève and Machens, 2016) and the theory of efficient coding, which is arguably our most influential theory of neural computation (Barlow, 1961; Salinas, 2006; Olshausen and Field, 1996; Bell and Sejnowski, 1997; Greene et al., 2009). This connection relies on the following two derivations: (1) We derive a tightly balanced spiking network from a quadratic loss function, following the recent work of (Boerlin et al., 2013; Boerlin and Denève, 2011) by focusing on the part of their networks that generate a spikebased representation of the information. (2) We show that the firing rates in these spiking networks also obey a quadratic loss function, albeit with a positivity constraint on the firing rate (Barrett et al., 2013). This constrained minimization problem, called quadratic programming, provides a novel link between spiking networks and firing rate calculations. While the importance of positivity constraints has been noted in other contexts (Lee and Seung, 1999; Hoyer, 2003, 2004; Salinas, 2006), here we show its dramatic consequences in shaping tuning curves, which had not been appreciated previously. In turn, we obtain a single normative explanation for the polarity tuning of simple cells (Figure 7D), tuning curves in the oculomotor system (Figure 6A,B), and tuning curves in the cricket cercal system (Figure 6E), as well as the mechanisms underlying the generation of these tuning curves, and the response of these systems to neuron loss.
Several alternative network models have been proposed that minimize similar loss functions as in our work (Dayan and Abbott, 2001; Rozell et al., 2008; Hu et al., 2012; Charles et al., 2012; Druckmann and Chklovskii, 2012). However, in all these models, neurons produce positive and negative firing rates (Dayan and Abbott, 2001; Rozell et al., 2008; Charles et al., 2012; Druckmann and Chklovskii, 2012), or positive and negative valued spikes (Hu et al., 2012). The compensatory response of these systems will be radically different, because oppositely tuned neurons can compensate for each other by increasing their firing rates and changing sign, which is impossible in our networks. Similar reasoning holds for any efficient coding theories that assume positive and negative firing rates.
Whether general spiking networks support some type of compensation will depend on the specifics of their connectivity (see Materials and methods). For instance, spiking networks that learn to efficiently represent natural images could potentially compensate for neuron loss (Zylberberg et al., 2011; King et al., 2013). Furthermore, networks designed to generate EI balance through strong recurrent dynamics will automatically restore balance when neurons are lost (van Vreeswijk and Sompolinsky, 1996, 1998; Amit and Brunel, 1997; Renart et al., 2010). In turn, networks whose representations and computations are based on such randomly balanced networks may be able to compensate for neuron loss as well (Vogels et al., 2011; Hennequin et al., 2014), as has been shown for neural integrators (Lim and Goldman, 2013). However, the optimality or the speed of the compensation is likely to increase with the tightness of the balancing. In turn, the tightness of EI balance is essentially determined by synaptic connectivity, whether through experiencedependent learning of the synaptic connections (Haas et al., 2006; Vogels et al., 2011; Bourdoukan et al., 2012; Luz and Shamir, 2012), through scaling laws for random connectivities (van Vreeswijk and Sompolinsky, 1996, 1998; Amit and Brunel, 1997; Renart et al., 2010), or by design, via the decoding weights of linear readouts, as in our case (Boerlin et al., 2013; Barrett et al., 2013). Future work may resolve the commonalities and differences of these approaches.
The principle of optimal compensation is obviously an idealization, and any putative compensatory mechanism of an actual neural system may be more limited. That said, we have catalogued a series of experiments in which pharmacologically induced changes in neural activities and in behavior can be explained in terms of optimal compensation. These experiments were not originally conducted to test any specific compensatory mechanisms, and so, the results of each individual experiment were explained by separate, alternative mechanisms (Aksay et al., 2007; Gonçalves et al., 2014; Libersat and Mizrahi, 1996; Mizrahi and Libersat, 1997; Crook et al., 1996, 1997, 1998). The advantage of our optimal compensation theory is that it provides a simple, unifying explanation for all these experiments. Whether it is the correct explanation can only be answered through further experimental research.
To guide such research, we can use our theory to make a number of important predictions about the impact of neural damage on neural circuits. First, we can predict how tuning curves change shape to compensate for neuron loss. Specifically, neurons throughout the network will, on average, increase their firing rates to the signals that specifically activated the dead neurons. This happens because the remaining neurons automatically seek to carry the informational load of the knocked out neurons (which is equivalent to maintaining a balance of excitation and inhibition). This is a strong prediction of our theory, and as such, an observation inconsistent with this prediction would invalidate our theory. There have been very few experiments that measure neural tuning before and after neuron silencing, but in the visual cortex, where this has been done, our predictions are consistent with experimental observations (Figure 9).
Our second prediction is that optimal compensation is extremely fast—faster than the time scale of neural spiking. This speed is possible because optimal compensation is supported by the tight balance of excitation and inhibition, which responds rapidly to neuron loss—just as balanced networks can respond rapidly to changes in inputs (Renart et al., 2010; van Vreeswijk and Sompolinsky, 1998, 1996). We are not aware of any experiments that have explicitly tested the speed of firing rate changes following neuron silencing. In the pharmacological silencing of directionselective cells in the visual cortex, the time scales of the reagents are too slow to outrule the possibility that there is some synaptic plasticity (Crook et al., 1996, 1997, 1998). Nonetheless, these experiments are consistent with our prediction, because the changes observed in tuning curve shape are at least as fast, if not faster than the speed of pharmacological silencing. Ideally, these predictions could be tested using neuronal ablations or optogenetic silencing.
Finally, we predict that all neural systems have a cell loss recovery boundary, beyond which neural function disintegrates. Existing measurements from the oculomotor system (Aksay et al., 2007) seem to be consistent with this prediction (Figure 6). We predict that this recovery boundary coincides with a disruption in the balance of excitation and inhibition. This has not been explored experimentally, although the disruption of balance has recently been implicated in a range of neural disorders such as epilepsy (Bromfield, 2006) and schizophrenia (Yizhar et al., 2011). Anecdotally, there have been many unreported experiments where neural ablation has failed to cause a measurable behavioral effect. Our theory suggests that such ‘failed’ lesion experiments may be far more interesting than previously thought, and that the boundary between a measurable and unnoticeable behavioral effect deserves specific attention. Indeed, the properties of a recovery boundary may also shed light on the progression of neurodegenerative diseases—especially those that are characterized by a period of asymptomatic cell loss, followed by a transition to a disabled symptomatic state, as in Alzheimer’s disease and stroke (Leary and Saver, 2003). We speculate that these transitions occur at the recovery boundary of the diseased system. If this were the case, then an accumulation of asymptomatic damage, through aging for example, or through acute conditions such as silent stroke, will increase the susceptibility of the brain to symptomatic damage, by moving it closer to the recovery boundary.
These predictions, and more broadly, the principle of optimal compensation that we have developed here, promise to be useful across a number of areas. First, as a neuroscience tool, our work provides a framework for the interpretation of experimental manipulations such as pharmacological silencing (Aksay et al., 2007), lesion studies (Keck et al., 2008) and optogenetic perturbations (Fenno et al., 2011; Li et al., 2016). Second, in the study of neural computation, optimal compensation may be a useful guiding principle, because plausible models of neural computation should be designed specifically to withstand the type of damage that the brain can withstand. Finally, our work may provide new perspectives on how neurodegenerative diseases impact behavior through neural networks, by generalizing the theory of efficient coding from the intact brain state to the damaged brain state (Morrison and Hof, 1997; Bredesen et al., 2006).
Materials and methods
We have described the properties of optimal compensation, and given a variety of examples of optimal compensation across a range of systems. Here, we present further technical details. First, we describe how we tune spiking networks to represent signals optimally, including both networks that obey Dale’s law and those that do not. Next, we explain quadratic programming with an analytically tractable example. We then specify our choice of parameters for each figure. For the highdimensional sparse coding example, we describe how we calculate sparse coding receptive fields and direction tuning curves. Finally, we provide additional details on the knockout calculations, and we prove that our spiking model is tightly balanced and performs optimal compensation. The Matlab code that we used to generate all the figures in this paper is published online (http://github.com/machenslab/spikes).
Derivation of network model using spikes
In this section, we derive the connectivity and dynamics of a network that can optimally compensate for neuron loss. For simplicity, we will for now ignore the constraint that neurons need to be either excitatory or inhibitory, which will be revisited in the next section. We consider a network of $N$ leaky integrateandfire neurons receiving timevarying inputs $\mathbf{\mathbf{x}}\left(t\right)=({x}_{1}\left(t\right),\mathrm{\dots},{x}_{j}\left(t\right),\mathrm{\dots},{x}_{M}\left(t\right))$, where $M$ is the dimension of the input and ${x}_{j}\left(t\right)$ is the ${j}^{th}$ input signal. In response to this input, the network produces spike trains $\mathbf{\mathbf{s}}\left(t\right)=({s}_{1}\left(t\right),\mathrm{\dots},{s}_{i}\left(t\right),\mathrm{\dots},{s}_{N}\left(t\right))$, where ${s}_{i}\left(t\right)={\sum}_{l}\delta \left(t{t}_{l}^{i}\right)$ is the spike train of the ${i}^{th}$ neuron and $\left\{{t}_{l}^{i}\right\}$ are the spike times of that neuron. Here, we describe the general formulation of our framework for arbitrary networks with $N\ge M$.
A neuron fires a spike whenever its membrane potential exceeds a spiking threshold. We can write this as ${V}_{i}>{T}_{i}$, where ${V}_{i}$ is the membrane potential of neuron $i$ and ${T}_{i}$ is the spiking threshold. The dynamics of the membrane potentials are given by:
where ${\mathrm{\Omega}}_{ik}$ is the connection strength from neuron $k$ to neuron $i$, ${F}_{ij}$ is the connection strength from input $j$ to neuron $i$, $g\left({x}_{j}\right)$ is a function (or functional) applied to the input ${x}_{j}$, $\lambda $ is the neuron leak and ${\sigma}_{V}$ is the standard deviation of intrinsic neural noise, represented by a white noise term ${\eta}_{i}$ (Knight, 1972; Dayan and Abbott, 2001). For brevity, we will not explicitly indicate the timedependency of variables. That said, we note that the input signals, ${x}_{j}\left(t\right)$, the voltages, ${V}_{i}\left(t\right)$, and the spike trains, ${s}_{k}\left(t\right)$, are all timedependent quantities, whereas the thresholds, ${T}_{i}$, the leak, $\lambda $, and the connection strengths, ${\mathrm{\Omega}}_{ik}$ and ${F}_{ij}$, are all constants. The feedforward term, $g\left({x}_{j}\right)$, is a placeholder for any feedforward input into the networks, but also for slower recurrent synaptic inputs that are not explicitly modeled here, such as the slow synapses in Boerlin et al. (2013). (Indeed, including slower recurrent dynamics does not affect the fast compensatory response of the network.) When a neuron spikes, its membrane potential is reset to the value $V={R}_{i}$. For ease of presentation, we include this reset in the selfconnections ${\mathrm{\Omega}}_{ii}$, i.e., we assume that the selfconnections are negative, ${\mathrm{\Omega}}_{ii}<0$, so that the reset is given by ${R}_{i}={T}_{i}+{\mathrm{\Omega}}_{ii}$.
We assume that this network provides a representation of the input signal $\mathbf{\mathbf{x}}$ using a simple linear decoder, rewritten from Equation 1,
where ${\mathbf{\mathbf{D}}}_{k}$ is the fixed contribution of neuron $k$ to the signal, and ${r}_{k}$ is the instantaneous firing rate of neuron $k$. We generally refer to ${\mathbf{\mathbf{D}}}_{k}$ as the vector of readout or decoding weights. The instantaneous firing rate, ${r}_{k}$, is a timedependent quantity that we obtain by filtering the spike train with an exponential filter:
where $\tau $ is the timescale of the filtering. This firing rate definition is particularly informative because it has the form of a simple model of a postsynaptic potential, which is a biologically important quantity. Note that the units of this firing rate are given by $\tau $, so we must multiply by ${\tau}^{1}$Hz to obtain units of Hz.
Our goal is to tune all the parameters of this network so that it produces appropriate spike trains at appropriate times to provide an accurate representation of the input $\mathbf{\mathbf{x}}$, both before and after cell loss (Boerlin et al., 2013; Bourdoukan et al., 2012; Barrett et al., 2013). We measure the accuracy of the representation with the loss function, rewritten from Equation 2,
with ${(\cdot )}^{2}$ denoting an inner product, and $C\left(\mathbf{\mathbf{r}}\right)={\sum}_{k}{r}_{k}^{2}$. We then require our network to obey the following rule: at a given time point $t$, each neuron only fires a spike whenever a spike reduces the loss function,
Then, since a spike changes the rate by ${r}_{i}\to {r}_{i}+1$, and hence the signal estimate by $\hat{\mathrm{x}}\to \hat{\mathrm{x}}+\mathbf{D}}_{i$, we can restate this spiking rule for the $i$th neuron as:
where ${\delta}_{ik}$ is the Kronecker delta. By expanding the righthandside, and canceling equal terms, this spiking rule can be rewritten as
This equation describes a rule under which neurons fire to produce spikes that reduce the loss function. Since a neuron $i$ spikes whenever its voltage ${V}_{i}$ exceeds its threshold ${T}_{i}$, we can interpret the lefthandside of this spiking condition (Equation 11) as the membrane potential of the ${i}^{th}$ neuron:
and the righthandside as the spiking threshold for that neuron:
We can identify the connectivity and the parameters that produce optimal coding spike trains by calculating the derivative of the membrane potential (as interpreted in Equation 12) and matching the result to the dynamical equations of our integrateandfire network (Equation 5):
From Equation 6, we obtain $d\hat{\mathbf{x}}/dt=\sum _{k}{\mathbf{D}}_{k}d{r}_{k}/dt$ and from Equation 7 we obtain $d{r}_{k}/dt={r}_{k}/\tau +{s}_{k}$, which yields a simple differential equation for the readout:
By inserting these expressions into Equation 14 we obtain:
Finally, using the voltage definition from Equation 12 to write $\mathbf{D}}_{i}^{\mathrm{\top}}\hat{\mathbf{x}}={V}_{i}+{\mathbf{D}}_{i}^{\mathrm{\top}}\mathbf{x}\beta {r}_{i$ we can replace the second term on the right hand side and obtain:
This equation describes the voltage dynamics of a neuron that produces spikes to represent signal $\mathbf{\mathbf{x}}$. If we now compare this equation with our original integrateandfire network, Equation 5, we see that these are equivalent if
Here, the notation ${\left[\mathbf{\mathbf{z}}\right]}_{j}$ refers to the $j$th element of the vector $\mathbf{\mathbf{z}}$. A network of integrateandfire neurons with these parameters and connection strengths can produce spike trains that represent the signal $\mathbf{\mathbf{x}}$ to a high degree of accuracy. Elements of this calculation have been produced before (Boerlin et al., 2013), but are reproduced here for the sake of completeness. Also, it has been shown that this connectivity can be learned using simple spike timingdependent plasticity rules (W. Brendel, R. Bourdoukan, P. Vertechi, et al, unpublished observations; Bourdoukan et al. (2012)), so extensive finetuning is not required to obtain these spiking networks. We note that the input into the network consists of a combination of the original signal, ${x}_{j}$, and its derivative, $d{x}_{j}/dt$. This can be interpreted in several ways. First, we can think of the original signal, ${x}_{j}\left(t\right)$, as a filtered version of the actual input, so that $d{x}_{j}/dt=\lambda {x}_{j}+{c}_{j}\left(t\right)$, where ${c}_{j}\left(t\right)$ is the actual input (and note that ${c}_{j}=\lambda {x}_{j}+d{x}_{j}/dt$). Second, we can think of this as a biophysical computation, in the sense that the neuron receives two types of input, the original signal input, ${x}_{j}\left(t\right)$, and its derivative, $d{x}_{j}/dt$. The latter could be computed through a simple circuit that combines direct excitatory signal inputs with delayed inhibitory signal inputs (e.g. through feedforward inhibition).
Our derivation of network dynamics directly from our loss function allows us to interpret the properties of this network in terms of neural function. Each spike can be interpreted as a greedy error reduction mechanism—it moves $\hat{\mathbf{x}}$ closer to the signal $\mathbf{x}$. This error reduction is communicated back to the network through recurrent connectivity, thereby reducing the membrane potential of the other neurons. The membrane potential, in turn, can be interpreted as a representation error—it is determined by a linear projection of $\mathbf{x}\hat{\mathbf{x}}$ onto the neuron’s decoding weights, ${\mathbf{\mathbf{D}}}_{i}$ (Equation 12). Following this interpretation, whenever the error becomes too big, the voltage becomes too big and it reaches threshold. This produces a spike, which reduces the error, and so on.
We can also understand these network dynamics in terms of attractor dynamics. This network implements a point attractor—firing rates evolve towards a stable fixed point in Ndimensional firing rate space. The location of this point attractor depends on neural input and network connectivity. When a neuron dies, the point attractor is projected into a subspace given by ${r}_{k}=0$, where neuron $k$ is the neuron that has died.
Note that in this derivation, we used a quadratic cost. This cost increases the value of the spiking threshold (Equation 13) and the spiking reset (Equation 18). We can also derive network parameters for alternative cost term choices. For example, if we use a linear cost, we simply need to drop the second term ($\beta {\delta}_{ik}$) in Equation 18, while keeping all other parameters the same. In other words, we can implement a quadratic cost by increasing the spiking threshold and the spiking reset, and we can implement a linear cost by increasing the spiking threshold without increasing the spiking reset. In this way, the spiking threshold, and the reset determine the cost function. It is conceivable that these variables may be learned, just as network connectivity may be learned. Alternatively, these values may be predetermined for various brain areas, depending on the computational target of each brain area.
Derivation of network model with separate excitatory and inhibitory populations
When these ideas are applied to large and heterogeneous networks, we run into one problem concerning their biological interpretation. Individual neurons in these networks can sometimes target postsynaptic neurons with both excitatory and inhibitory synapses, so that they violate Dale’s law. To avoid this problem, we consider a network that consists of separate pools of excitatory and inhibitory neurons. The external signals feed only into the excitatory population, and the inhibitory population remains purely local. This scenario is captured by the equations,
The four connectivity matrices, ${\mathrm{\Omega}}_{ik}^{EE}$, ${\mathrm{\Omega}}_{ik}^{EI}$, ${\mathrm{\Omega}}_{ik}^{IE}$, ${\mathrm{\Omega}}_{ik}^{II}$ are all assumed to have only positive entries, which enforces Dale’s law in this network. The neuron’s selfresets are now explicitly captured in the terms ${R}_{i}^{E}{s}_{i}^{E}$ and ${R}_{i}^{I}{s}_{i}^{I}$, and are no longer part of the connectivity matrices themselves. We will now show how to design the four connectivity matrices such that the resulting network becomes functionally equivalent to the networks in the previous section. To do so, we apply the basic ideas outlined in Boerlin et al. (2013), and simplified due to W. Brendel, R. Bourdoukan, P. Vertechi, and the authors (personal communication).
We start with the network in the previous section, as described in Equation 5, and split its (nonDalian) recurrent connectivity matrix, ${\mathrm{\Omega}}_{ik}$, into selfresets, excitatory connections, and inhibitory connections. The selfresets are simply the diagonal entries of ${\mathrm{\Omega}}_{ik}$, see Equation 18,
where the choice of signs is dictated by the sign conventions in Equation 23 and Equation 24. The excitatory connections are the positive entries of ${\mathrm{\Omega}}_{ik}$, for which we define
where the notation ${\left[x\right]}_{+}$ denotes a thresholdlinear function, so that ${\left[x\right]}_{+}=x$ if $x>0$ and ${\left[x\right]}_{+}=0$ otherwise. The inhibitory connections are the negative, offdiagonal entries of ${\mathrm{\Omega}}_{ik}$, for which we write
With these three definitions, we can reexpress the recurrent connectivity as ${\mathrm{\Omega}}_{ik}={\mathrm{\Omega}}_{ik}^{EE}{W}_{ik}{\delta}_{ik}{R}_{k}$.
Using this split of the nonDalian connectivity matrix, we can trivially rewrite Equation 5 as an equation for an excitatory population,
which is identical to Equation 23 above, except for the term with the inhibitory synapses, ${W}_{ik}$, on the righthandside. Here, the inhibitory synapses are multiplied with ${s}_{k}^{E}$ rather than ${s}_{k}^{I}$, and their number is identical to the number of excitatory synapses. Accordingly, this term violates Dale’s law, and we need to replace it with a genuine input from the inhibitory population. Since this nonDalian term consists of a series of deltafunctions, we can only replace it approximately, which will suffice for our purposes. We simply require that the genuine inhibitory input approximates the nonDalian term on the level of the postsynaptic potentials, i.e., the level of filtered spike trains,
Here the lefthandside is the given nonDalian input, and the righthandside is the genuine inhibitory input, which we are free to manipulate and design.
The solution to the remaining design question is simple. In the last section, we showed how to design arbitrary spiking networks such that they track a given set of signals. Following these recipes, we assume that the inhibitory population output optimally tracks the (filtered) spike trains of the excitatory population. More specifically, we assume that we can reconstruct the instantaneous firing rates of the excitatory neurons, ${r}_{i}^{E}$, by applying a linear readout to the instantaneous firing rates of the inhibitory neurons, ${r}_{j}^{I}$, so that
Note that all entries of ${\mathbf{\mathbf{D}}}_{j}^{I}$, the inhibitory decoding weights, must be positive. If the estimate of the excitatory firing rates closely tracks the real firing rates, then we can fulfil Equation 29.
To obtain this relation, we will assume that the inhibitory population minimizes its own loss function,
where the first term is the error incurred in reconstructing the excitatory firing rates, and the second term is a cost associated with the firing of the inhibitory neurons. Apart from the replacement of the old input signal, $\mathbf{\mathbf{x}}$, by the new ‘input signal’, ${\mathbf{\mathbf{r}}}^{E}$, this is exactly the same loss function as in Equation 8. An important difference is that here the number of input signals, which corresponds to the number of excitatory neurons, may be larger than the number of output spike trains, which corresponds to the number of inhibitory neurons. Our spiking networks will in this case still minimize the meansquareerror above, though the representation could be lossy. Since the effective dimensionality explored by the instantaneous firing rates of the excitatory population is limited by the dimensionality of the input signal space, however, these losses are generally small for our purposes.
Given the above loss function, we can redo all derivations from the previous section, to obtain the connections, thresholds, and resets for the inhibitory population. Defining the shorthand
we find that
Here the first equation is equivalent to Equation 18, except for the different sign convention (${\mathrm{\Omega}}_{ik}^{II}$ enters negatively in Equation 24), and for the subtraction of the selfreset term, which is not part of the recurrent connectivity. The second equation is equivalent to Equation 19. The third equation is equivalent to Equation 22, and the fourth equation is the selfreset term, containing the diagonal elements of Equation 18. Note that, due to the positivity of the inhibitory decoding weights, the two connectivity matrices have only positive entries, as presupposed above in Equation 24. Consequently, the inhibitory subnetwork obeys Dale’s law.
In a last step, we replace the excitatory spike trains in Equation 28 with their inhibitory approximation. Specifically, we rewrite Equation 29 as
where we used Equation 30 in the approximation step. This approximation generally works very well for larger networks; in smaller networks, finite size effects come into play.
Finally, when we take into account the results from the previous section, write ${\mathbf{\mathbf{D}}}_{i}^{E}$ for the excitatory decoder weight of the $i$th neuron, and introduce the shorthand
then we find that the excitatory subnetwork has the connections, thresholds, and resets (compare Equation 25–Equation 27),
Importantly, the EI network therefore consists of two populations, one excitatory, one inhibitory, both of which minimize a loss function. As a consequence, the excitatory subnetwork will compensate optimally as long as the inhibitory subnetwork remains fully functional. In this limit, the excitatory subnetwork is essentially identical to the networks discussed in the previous section. The inhibitory subnetwork will also compensate optimally until its recovery boundary is reached. In the following, we will focus on one subnetwork (and one loss function again). For notational simplicity, we will leave out the superscript references ‘$E$’.
Tuning curves and quadratic programming
For constant input signals, the instantaneous firing rates of the neurons will fluctuate around some mean value. Our goal is to determine this mean value. We will start with the most general case, and rewrite the instantaneous firing rates as ${r}_{i}(t)={\overline{r}}_{i}+{\eta}_{i}(t,{\overline{r}}_{i})$, were ${\eta}_{i}(t,{\overline{r}}_{i})$ is a zeromean ’noise’ term that captures the fluctuations of the instantaneous firing rates around its mean value. Note that these fluctuations may depend on the neuron’s mean rate. In turn, neglecting the costs for a moment, we can average the objective function, Equation 8, over time to obtain
For larger networks, we can assume that the spike trains of neurons are only weakly correlated, so that $\u27e8{\eta}_{i}({\overline{r}}_{i}){\eta}_{j}({\overline{r}}_{j}){\u27e9}_{t}=\text{Var}({\eta}_{i}({\overline{r}}_{i})){\delta}_{ij}$. Here, $\text{Var}({\eta}_{i}({\overline{r}}_{i}))$ is the variance of the noise term. We obtain
We furthermore notice that the spike train statistics are often Poisson (Boerlin et al., 2013), in which case we can make the replacement $\text{Var}({\eta}_{i}({\overline{r}}_{i}))={\overline{r}}_{i}$. The loss function then becomes a quadratic function of the mean firing rate, which needs to be minimized under a positivity constraint on the mean firing rate. This type of problem is known as ‘quadratic programming’ in the literature. In this study, we focused on networks for which the contributions of the firing rate fluctuations can be neglected, which is generally the case for sufficiently small readout weights and membrane voltage noise (see Figure 4—figure supplement 3). In this case, we obtain the loss function used in Equation 3,
with the cost term included again.
In general, quadratic programming is mathematically intractable, so the objective function must be minimized numerically. However, in networks with a small number of neurons, we can solve the problem analytically and gain some insight into the nature of quadratic programming. Here, we do this for the two neuron example (Figure 4A).
In this example, we assumed that the first neuron contributes with ${\mathbf{\mathbf{D}}}_{1}=(w,c)$ to the readout, and the second neuron with ${\mathbf{\mathbf{D}}}_{2}=(w,c)$. Accordingly, the firing rates $\overline{r}}_{1$ and $\overline{r}}_{2$ are given by the solution to the following equation:
where $x$ is the variable signal, and ${r}_{B}$ is the fixed background signal. The positivity constraint partitions the solution of this equation into three regions, determined by the value of $x$: region ${\mathcal{R}}_{1}$ where ${\overline{r}}_{1}=0$ and ${\overline{r}}_{2}\ge 0$, region ${\mathcal{R}}_{2}$ where ${\overline{r}}_{1}\ge 0$ and ${\overline{r}}_{2}\ge 0$, and region ${\mathcal{R}}_{3}$ where ${\overline{r}}_{2}=0$ and ${\overline{r}}_{1}\ge 0$ (Figure 4—figure supplement 1A). In region 2, we can then easily solve Equation 50 by setting the derivative of our loss function to zero. This gives $\overline{\mathbf{r}}=({\mathbf{w}}^{\mathrm{\top}}\mathbf{w}+{\mathbf{c}}^{\mathrm{\top}}\mathbf{c}+\beta \mathbf{1}{)}^{1}(\mathbf{w}x+\mathbf{c}{r}_{B})$ where $\mathbf{\mathbf{w}}=(w,w)$, $\mathbf{\mathbf{c}}=(c,c)$, and $\mathbf{1}$ is the identity matrix. Looking at this equation, we see that ${\overline{r}}_{1}=0$ when $x=c{r}_{B}/w$ and ${\overline{r}}_{2}=0$ when $x=c{r}_{B}/w$. Therefore, the firing rate solution for region two is valid when $c{r}_{B}/w\le x\le c{r}_{B}/w$. For $x\ge c{r}_{B}/w$, we have ${\overline{r}}_{2}=0$ because of the positivity constraint in Equation 50. This is region 3. We can calculate $\overline{r}}_{1$ by setting ${\overline{r}}_{2}=0$ and then minimizing the loss function. This gives ${\overline{r}}_{1}=({w}^{2}+{c}^{2}+\beta {)}^{1}(wx+c{r}_{B})$. Similarly, for $x\le c{r}_{b}/w$ we have ${\overline{r}}_{1}=0$ because of the positivity constraint. This is region one and we obtain ${\overline{r}}_{2}=({w}^{2}+{c}^{2}+\beta {)}^{1}(wx+c{r}_{B})$. The firing rates within each region are given by a simple linear projection of ${\mathbf{\mathbf{x}}}^{\prime}=(x,{r}_{B})$, although the size and direction of this projection is different in each region. As such, the solution to this quadratic programming problem is a piecewise linear function of $x$.
In networks with larger numbers of neurons, the solution will still be a piecewise linear function of $x$, although there will be more regions and the firing rate solutions are more complicated because more neurons are simultaneously active. In contrast, the transformation from firing rates to ${\mathbf{\mathbf{x}}}^{\prime}$ is very simple (Equation 6). It is given by a simple linear transformation, and is region independent (Figure 4—figure supplement 1B).
Readout weights and cost terms: 1d and 2d example
There are a number of free parameters in our model, such as the cost terms and the readout weights. The choice of these values determine the precise shape of tuning curves. In general, however, the precise values of these terms have little influence on the coding capability of our system, once certain properties have been satisfied, which we will outline here.
The cost term that we used for the examples in Figures 1–6 is a quadratic cost, $C\left(\mathbf{\mathbf{r}}\right)={\sum}_{k}{r}_{k}^{2}$. This cost term encourages the system to find a solution in which all neurons share in the signal representation. (Here, and in the following, we will no longer distinguish ${r}_{i}$, the instantaneous firing rate used in the spiking network, and $\overline{r}}_{i$, the mean firing rates used in quadratic programming. The appropriate notation will be determined by the context.) The cost term for Figures 7–9, the V1 model, is a linear cost, $C\left(\mathbf{\mathbf{r}}\right)={\sum}_{k}{r}_{k}$. This cost term limits the overall number of spikes, and generally encourages sparse solutions, in which most neurons contribute very little to the signal representation. Regardless of this choice, our general predictions about optimal compensation are qualitatively similar, as long as the cost term does not dominate the loss function.
The other important parameters that we must choose are the decoding or readout weights. In Figure 2, the decoding weights of the excitatory population were drawn from a Gaussian distribution with mean ${\mu}_{E}=2$, and standard deviation ${\sigma}_{E}=0.2$. For the inhibitory population, we used a mean ${\mu}_{I}=0.3$ and a standard deviation ${\sigma}_{I}=0.03$. For the networks in Figures 3–6, we have used regularly spaced decoding weights, with the addition of some random noise. The parameter values that we used are plotted in Figure 3D, Figure 4A–C, left column, and Figure 5A,E,I.
All of our quadratic programming and optimal compensation predictions for tuning curve shapes still hold for alternative choices of readout weights, once the following properties are satisfied. First, it is important that the readout vectors span the space of the signal that we want our network to represent. Otherwise, the system will not be able to represent signals along certain directions, or compensate for neuron loss. Second, it is important to set the scale of the readout, so that the cost does not dominate. There is a natural scaling that we can use to avoid such problems. We require the size of firing rates and the readout to be independent of network size. From ${\hat{x}}_{j}=\sum _{k}^{N}{D}_{jk}{r}_{k}\sim \mathcal{\mathcal{O}}(1)$, it follows that ${D}_{jk}\sim \mathcal{O}\left(1/N\right)$. As a consequence, the offdiagonal elements of the recurrent connectivity are small ${\mathrm{\Omega}}_{ik}\sim \mathcal{O}\left(1/{N}^{2}\right)$, for $i\ne k$ (Equation 18), and if we assume that the diagonal elements are on the same order of magnitude, then $\beta \sim \mathcal{O}\left(1/{N}^{2}\right)$. This scaling provides a principled basis for parameter choices in our model.
We may also want to scale our decoder weights without changing the shape of our tuning curve prediction (Figure 4—figure supplement 3B). To do this, the cost parameter $\beta $ and the membrane potential leak $\lambda $ must also be scaled together. Specifically, if the readout weights are given by $\left\{\alpha \times {D}_{jk}\right\}$, where $\alpha $ is a scaling parameter that characterizes the size of the decoder weights and $\left\{{D}_{jk}\right\}$ are fixed decoder weights, then the spiking cost parameter must be set to ${\alpha}^{2}\times \beta $ and the membrane potential leak must be set to $\alpha \times \lambda $. We can see that this preserves the shape of tuning curves by looking at the resulting structure of our loss function (Equation 2):
As before, the minimum of this loss function gives firing rates in units of the inverse membrane potential leak $\left(\alpha \lambda \right)$. Therefore, we must divide $\mathbf{\mathbf{r}}$ by $\left(\alpha \lambda \right)$ to obtain firing rates in units of Hz. Our loss function then becomes:
This loss function is independent of $\alpha $, and so, using this scaling the optimal tuning curves will have the same shape for all values of $\alpha $.
Readout weights and cost terms: V1 example
There are many possible choices of decoder weights $\left\{{\mathbf{\mathbf{D}}}_{i}\right\}$ that provide a faithful representation of a signal. In positive sparse coding, we choose the decoder weights that provide the most efficient signal representation, for a sparse cost term ($C\left(\mathbf{\mathbf{r}}\right)={\sum}_{k}{r}_{k}$), under the constraint that firing rates must be positive. Here, we describe how we calculate these positive sparse coding weights, which we will use in several of our figures (Figures 7–9 and Figure 9—figure supplement 1–2).
We use the signal vector $\mathbf{\mathbf{x}}=({x}_{1},\mathrm{\dots},{x}_{j},\mathrm{\dots},{x}_{M})$ to denote an image patch, where each element ${x}_{j}$ represents a pixel from the image patch (Olshausen and Field, 1996). We quantify the efficiency of a sparse representation using the following loss function:
where $\u27e8...\u27e9$ denotes an average across image patches. This is simply Equation 2 with a sparse cost term, averaged over image patches. The first term in this loss function is the image representation error. The second term quantifies the sparsity of the representation. The decoding filters that minimize this loss function will be the positive sparse coding filters for natural images.
We assume that the decoding filters are optimized to represent natural images, such as forest scenes, flowers, sky, water and other images from nature. Natural images are chosen because they are representative of the images that surround animals throughout evolution. We randomly select 2000 image patches of size $12\times 12$ from eight natural images taken from Van Hateren’s Natural Image Database (van Hateren and van der Schaaf, 1998). These images are preprocessed by removing loworder statistics, so that our sparse coding algorithm can more easily learn the higherorder statistical structure of natural images. First of all, images are centered so as to remove first order statistics:
Next, images are whitened, so as to remove secondorder statistics:
where $\mathbf{M}=\u27e8\mathbf{x}{\mathbf{x}}^{T}\u27e9$ is a decorrelating matrix.
We calculate sparse coding filters by minimizing the loss function (Equation 54) using a two step procedure. First, for each image patch in a subset of 50 image patches, we calculate the firing rates that minimize the loss function under the constraint that firing rates must be positive:
The positivity constraint reduces the representational power of the neural population, so, to counteract this, we use a large population containing twice as many neurons as signal dimensions, $N=2M$ and we initialize our decoder matrix to ${D}_{jk}={\delta}_{j,k}{\delta}_{j,k+M}$ to ensure that our neural population can easily span the signal space throughout the sparse coding calculation.
Next, we update our decoding vectors by stochastic gradient descent on the loss function, using the optimal firing rates calculated in the previous step:
Decoding weights are normalized for each neuron, so that they do not become arbitrarily large. This calculation is similar to sparse coding calculations performed before (Olshausen and Field, 1996), except that we enforce the additional requirement that firing rates must be positive. This constraint was used before for sparse nonnegative matrix factorization (Hoyer, 2003, 2004). However, this method requires an additional constraint that decoding weights be strictly positive, so that the nonnegative matrix factorization method can be applied. This additional constraint will reduce the efficiency of the representation, because it precludes solutions that may be more efficient, but violate the additional constraint.
To compare the properties of optimal compensation in this positive sparse coding model to experiment, we calculate the direction tuning of neurons in our model using a protocol similar to that used in the experimental work (Crook and Eysel, 1992). Specifically, we drive our model using a directed, edgelike stimulus. These stimuli are Gabor filters, with a profile similar to the sparse coding filters. We calculate the firing rate response of each neuron using Equation 57 for 16 different stimulus directions. For each direction, the Gabor filter is positioned at regularly spaced locations along a line perpendicular to the Gabor filter orientation and the maximum firing rate of each neuron along this line is taken as the firing rate response of that neuron at that direction. These firing rate responses form the direction tuning curves that we can compare to experimental recordings.
The impact of neuron loss
We investigate the impact of neuron loss in our spiking model by simulating the network both before and after neuron loss (Figure 2 and Figure 3). Specifically, we use the Euler method to iterate the dynamics described by Equation 17 or Equation 23, Equation 24, and we measure the readout $\hat{\mathbf{x}}$, the spike trains $\mathbf{\mathbf{s}}$, and the membrane potentials $\mathbf{\mathbf{V}}$. We simulate the loss of a neuron by setting all the connections to and from that neuron to zero. This is equivalent to silencing a neuron. We continue to measure the network activity and the readout after neuron loss.
We also calculate the impact of neuron loss on tuning curves with and without optimal compensation (Figure 5). To calculate tuning curves without compensation we solve Equation 3 in the intact state. We then constrain the firing rates of neurons selected for loss to zero and calculate the impact of this on our signal representation. To calculate the impact of neuron loss with optimal compensation, we solve Equation 4.
The integrateandfire model that we have described can produce spikes at arbitrarily large firing rates. This can be a problem, especially when neurons die and the remaining neurons compensate with extremely high, unrealistic firing rates. To avoid this, we can include a form of adaptation in our spiking model. Specifically, we can extend our spiking rule so that a neuron $i$ will only spike if its firing rate ${f}_{i}$ is lower than a maximum ${f}_{max}$. Here, the firing rate $\mathbf{\mathbf{f}}$ is given by the following differential equation:
and ${\tau}_{A}$ is the time scale of adaptation. This time scale is much slower than the time scale of the spiking dynamics $\tau $. We use this extended neural model to calculate the recovery boundary in Figure 3J,K, where we kill neurons in random order.
In turn, we can compute the firing rates and recovery boundary of this extended integrateandfire model using quadratic programming (data not shown). Specifically, the firing rates of this network are given by the solution of the following optimization problem:
where $R$ denotes the set of constraints that the firing rates must satisfy:
and where $X$ is the set of all dead (or silenced) neurons.
To study the effects of redundancy (or overcompleteness) in the V1 model (Figure 9—figure supplement 2A), we define an overcompleteness factor, $K$, given by $K=N/M$, where $N$ is the number of neurons in the representation and $M$ is the signal dimension. In the sparse coding calculation above, we had $K=2$. To increase the redundancy or overcompleteness of the representation, we generated a new set of readout vectors by two methods. In method one, we randomly drew $N=KM$ decoding vectors ${\mathbf{\mathbf{D}}}_{i}^{\left(K\right)}$ from the V1 model and then added multiplicative noise, in order to avoid parallel readout weights. Specifically, the new readout vectors ${\mathbf{\mathbf{D}}}_{i}^{\left(K\right)}$ were obtained from the old readout vectors, ${\mathbf{\mathbf{D}}}_{i}^{\left(2\right)}$, using ${\mathbf{\mathbf{D}}}_{i+a}^{\left(K\right)}={\mathbf{\mathbf{D}}}_{i}^{\left(2\right)}\cdot \left(1+{\mathit{\bm{\xi}}}_{i+a}\right)/\left(K/2\right)$, where $i\in [1,\mathrm{\dots},2M]$, ${\xi}_{ij}\sim \mathcal{N}(0,0.1)$, $a=2Mm$ and $m\in [0,\mathrm{\dots},K/21]$. In method two, we also randomly drew decoding vectors from the V1 model, but then shifted the corresponding $12\times 12$ image patch randomly in space. Both methods yielded similar results. We used these methods to calculate readout weights for overcomplete representations because the sparse coding calculation becomes computationally prohibitive for large values of $M$.
Tight balance of excitation and inhibition
In simulations of our spiking model, we see that optimal coding generally coincides with a balance of excitatory and inhibitory currents into each neuron (Figure 2G and Figure 3H,I). To characterize this balance, we define three currents for each excitatory neuron. First, we define the excitation ${E}_{i}$ as the total input current received through excitatory synapses. Second, we define the inhibition ${I}_{i}$ as the total input current received through inhibitory synapses. Third, we define the reset current $\overline{R}}_{i$ as the total current caused by a neuron’s selfreset after an action potential. Since our idealized neuron model uses deltafunctions to capture the synaptic currents evoked by single spikes, we smooth the deltafunctions in order to obtain more realistic synaptic currents. For simplicity, we use the same filter as in the computation of the instantaneous firing rates, although other filters could be used as well. In the EI network, Equation 23 and Equation 24, the three currents into the $i$th excitatory neuron are then given by
where ${F}_{ij}^{E}\equiv {F}_{ij}$ if ${F}_{ij}{x}_{j}>0$ (and ${F}_{ij}^{E}\equiv 0$ otherwise), and ${F}_{ij}^{I}\equiv {F}_{ij}$ if ${F}_{ij}{x}_{j}<0$ (and ${F}_{ij}^{I}\equiv 0$ otherwise). When the inhibitory subpopulation is not explicitly simulated, we simply set ${\mathrm{\Omega}}_{ik}^{EE}{r}_{k}^{E}\equiv {\mathrm{\Omega}}_{ik}{r}_{k}$ unless ${\mathrm{\Omega}}_{ik}<0$ and ${\mathrm{\Omega}}_{ik}^{EI}{r}_{k}^{I}\equiv {\mathrm{\Omega}}_{ik}{r}_{k}$ unless ${\mathrm{\Omega}}_{ik}>0$.
Using the above definitions of currents, we can integrate Equation 23 to reexpress the voltage of an excitatory neuron as $V}_{i}^{E}={E}_{i}{I}_{i}{\overline{R}}_{i$. This voltage is trivially bounded from above by a neuron’s threshold, ${V}_{i}^{E}<{T}_{i}$. Due to the design properties of the network, the voltage is also bounded from below: since the voltage reflects a representation error, neurons with decoding vectors opposite to those of neuron $i$ will keep negative errors in check. If we neglect cost terms, we obtain ${V}_{i}^{E}>{R}_{i}^{E}$. (Otherwise the bound is lower). Reexpressed for the EI ratio, we obtain the following bounds
The EI ratio will therefore approach one (perfect balance) if the inhibitory synaptic currents are much larger than the selfreset currents and voltage thresholds, or, if excitatory currents are cancelled by inhibitory currents rather than by a neuron’s spikes and subsequent selfresets.
The network does not reach this balanced regime under two conditions. First, if a network has a small redundancy, i.e., the number of neurons, $N$, is not much larger than the number of input signals, $M$, then each neuron carries a large load of the representation, and the reset currents become similar in magnitude to the synaptic currents. This is the case for the cricket cercal system, for instance, for which $N=4$ and $M=2$. Second, if a system approaches the recovery boundary, the representational burden of some neurons also increases a lot, leading to a similar effect. This is the case for the green neuron in Figure 3I, even after the first k.o, and leads to a more regular firing pattern. In other words, neurons that compensate for neuron loss can sometimes experience excessive excitatory currents, even if the compensation is successful.
To capture the mechanistic difference between compensation and recovery boundary more clearly, we therefore also defined the ratio of positive to negative currents, ${E}_{i}/({I}_{i}+{\overline{R}}_{i})$ (Figure 3H). This ratio is bounded by
and remains close to one as long as there are sufficiently large negative currents, or, if excitatory currents are cancelled by either recurrent inhibition or by spiking. As long as the network is fully functional, the latter will always be the case. Once the recovery boundary is hit, however, the lower bound on the voltages of some neurons disappears, and so will the lower bound on the above ratio of positive to negative currents (as well as the EI ratio), which is the effect we observe in Figure 3HI.
Traditionally, balance is often understood in the limit of very large networks (van Vreeswijk and Sompolinsky, 1996, 1998), and these classical scaling arguments can also be applied to our networks (Boerlin et al., 2013). Neglecting Dale’s law once more for simplicity, we can increase the network size, while keeping the relative size and number of the input signals and the neuron’s firing rates constant. Since ${\hat{x}}_{j}=\sum _{k}{D}_{jk}{r}_{k}\sim \mathcal{\mathcal{O}}(1)$ and ${r}_{k}\sim \mathcal{O}\left(1\right)$, we find that ${D}_{jk}\sim \mathcal{O}\left(1/N\right)$ and $\beta \sim \mathcal{O}\left(1/{N}^{2}\right)$. As a consequence of Equation 18 and Equation 13, we observe that both the recurrent connection strengths and the spiking thresholds are small, so that ${\mathrm{\Omega}}_{ik}\sim \mathcal{O}\left(1/{N}^{2}\right)$ and ${T}_{i}\sim \mathcal{O}\left(1/{N}^{2}\right)$. Now, because recurrent input is summed across the entire population of $N$ neurons, we find that ${V}_{i}\sim \mathcal{O}\left(N/{N}^{2}\right)\sim \mathcal{O}\left(1/N\right)$, and similar observations hold for ${I}_{i}$, ${E}_{i}$, and $\overline{R}}_{i$. Hence, these currents are $N$ times larger than the spiking threshold, ${T}_{i}$, or the reset, ${R}_{i}$. The EI ratio in Equation 65 then approaches one, and excitatory and inhibitory currents cancel precisely. We note that this balance of excitation and inhibition is much tighter than in balanced networks with random connectivity (van Vreeswijk and Sompolinsky, 1996, 1998). In randomly connected networks the balance between excitation and inhibition is of order $\mathcal{O}\left(1/\sqrt{N}\right)$ (van Vreeswijk and Sompolinsky, 1998), whereas here these fluctuations are $\mathcal{O}\left(1/N\right)$. We note that these observations only hold when the number of input signals, $M$, is kept constant; if that number grows with $N$, then the scaling is different. The conceptual differences between loose and tight balance were recently highlighted in a review (Denève and Machens, 2016).
Optimal compensation proof
The spiking network that we use is capable of rapidly implementing optimal compensation, without requiring any synaptic plasticity mechanisms. We can prove this by showing that an optimal network of $N1$ neurons is equivalent to an optimal network of $N$ neurons after the loss of one neuron. For the EI network, the argument applies separately to the excitatory and inhibitory subnetworks.
For the sake of argument, we suppose that the $N}^{th$ neuron dies. At a mechanistic level, the loss of this neuron is equivalent to cutting all the connections to and from the dead neuron and from the dead neuron to the readout $\hat{\mathbf{x}}$. Therefore, a network where the ${N}^{th}$ neuron has died is equivalent to a network with $N1$ neurons with readout matrix ${D}_{ij}^{*}={D}_{ij}\forall j\in [1,N1]$ and $\forall i\in [1,M]$, with feedforward connectivity ${F}_{ij}^{*}={F}_{ij}\forall i\in [1,N1]$ and $\forall j\in [1,M]$, with recurrent connectivity ${\mathrm{\Omega}}_{ik}^{*}={\mathrm{\Omega}}_{ik}\forall i,k\in [1,N1]$ and with spiking thresholds ${T}_{i}^{*}={\mathrm{\Omega}}_{ii}/2\forall i\in [1,N1]$.
Now, we compare this damaged network to an optimal network consisting of $N1$ neurons. To make a fair comparison, we assume that this network has the same readout matrix ${\mathbf{\mathbf{D}}}^{\prime}\equiv {\mathbf{\mathbf{D}}}^{*}$ as the reduced damaged network. Then, the recurrent connectivity for this network is given by ${\mathbf{\mathbf{\Omega}}}^{\prime}={\mathbf{\mathbf{\Omega}}}^{*}$, the feedforward connectivity is given by ${\mathbf{\mathbf{F}}}^{\prime}\equiv {\mathbf{\mathbf{F}}}^{*}$ and the spiking thresholds are given by ${\mathbf{\mathbf{T}}}^{\prime}\equiv {\mathbf{\mathbf{T}}}^{*}$. This configuration is equivalent to the reduced damaged network. Therefore, a spiking neural network whose neurons are individually tuned to represent a signal optimally before cell loss will perform optimal compensation and provide an optimal signal representation after cell loss.
References
 1

2
Anatomy and discharge properties of premotor neurons in the goldfish medulla that have eyeposition signals during fixationsJournal of Neurophysiology 84:1035–1049.

3
Functional dissection of circuitry in a neural integratorNature Neuroscience 10:494–504.https://doi.org/10.1038/nn1877
 4

5
Could information theory provide an ecological theory of sensory processing?Network: Computation in Neural Systems 3:213–251.https://doi.org/10.1088/0954898X_3_2_009

6
Sensory Communication217–234, Possible principles underlying the transformation of sensory messages, Sensory Communication.

7
Firing rate predictions in optimal balanced networksAdvances in Neural Information Processing Systems 26. pp. 1538–1546.

8
The "independent components" of natural scenes are edge filtersVision Research 37:3327–3338.https://doi.org/10.1016/S00426989(97)001211
 9

10
Spikebased population coding and working memoryPLoS Computational Biology 7:e1001080.https://doi.org/10.1371/journal.pcbi.1001080

11
Predictive coding of dynamical variables in balanced spiking networksPLoS Computational Biology 9:e1003258.https://doi.org/10.1371/journal.pcbi.1003258

12
Learning optimal spikebased representationsAdvances in Neural Information Processing 25. pp. 2294–2302.

13
Learning optimal adaptation strategies in unpredictable motor tasksJournal of Neuroscience 29:6472–6478.https://doi.org/10.1523/JNEUROSCI.307508.2009
 14
 15
 16

17
GABAinduced inactivation of functionally characterized sites in cat visual cortex (area 18): effects on orientation tuningJournal of Neuroscience 12:1816–1825.

18
GABAinduced inactivation of functionally characterized sites in cat visual cortex (area 18): effects on direction selectivityJournal of Neurophysiology 75:2071–2088.
 19

20
Evidence for a contribution of lateral inhibition to orientation tuning and direction selectivity in cat visual cortex: reversible inactivation of functionally characterized sites combined with neuroanatomical tracing techniquesEuropean Journal of Neuroscience 10:2056–2075.https://doi.org/10.1046/j.14609568.1998.00218.x

21
Theoretical Neuroscience: Computational and Mathematical Modeling of Neural SystemsCambridge, Massachusetts: MIT Press.

22
Efficient codes and balanced networksNature Neuroscience 19:375–382.https://doi.org/10.1038/nn.4243
 23
 24

25
The development and application of optogeneticsAnnual Review of Neuroscience 34:389–412.https://doi.org/10.1146/annurevneuro061010113817

26
Discharge patterns and recruitment order of identified motoneurons and internuclear neurons in the monkey abducens nucleusJournal of Neurophysiology 60:1874–1895.

27
Optogenetic perturbations reveal the dynamics of an oculomotor integratorFrontiers in Neural Circuits 8:10.https://doi.org/10.3389/fncir.2014.00010

28
Sparse coding of birdsong and receptive field structure in songbirdsNetwork: Computation in Neural Systems 20:162–177.https://doi.org/10.1080/09548980903108267

29
Spiketimingdependent plasticity of inhibitory synapses in the entorhinal cortexJournal of Neurophysiology 96:3305–3313.https://doi.org/10.1152/jn.00551.2006
 30
 31
 32

33
Nonnegative matrix factorization with sparseness constraintsThe Journal of Machine Learning Research 5:1457–1469.

34
Modeling receptive fields with nonnegative sparse codingNeurocomputing 5254:547–552.https://doi.org/10.1016/S09252312(02)007828
 35

36
Receptive fields, binocular interaction and functional architecture in the cat's visual cortexThe Journal of Physiology 160:106–154.https://doi.org/10.1113/jphysiol.1962.sp006837
 37
 38

39
Dynamics of encoding in a population of neuronsThe Journal of General Physiology 59:734–766.https://doi.org/10.1085/jgp.59.6.734

40
The dynamics of memory as a consequence of optimal adaptation to a changing bodyNature Neuroscience 10:779–786.https://doi.org/10.1038/nn1901

41
Annual incidence of first silent stroke in the united States: a preliminary estimateCerebrovascular Diseases 16:280–285.https://doi.org/10.1159/000071128
 42
 43

44
In situ visualization and photoablation of individual neurons using a low cost fiber optic based systemJournal of Neuroscience Methods 67:157–162.https://doi.org/10.1016/01650270(96)00043X

45
Balanced cortical microcircuitry for maintaining information in working memoryNature Neuroscience 16:1306–1314.https://doi.org/10.1038/nn.3492

46
Balancing feedforward excitation and inhibition via hebbian inhibitory synaptic plasticityPLoS Computational Biology 8:e1002334.https://doi.org/10.1371/journal.pcbi.1002334
 47

48
Independent coding of wind direction in cockroach giant interneuronsJournal of Neurophysiology 78:2655–2661.

49
Computational psychiatryTrends in Cognitive Sciences 16:72–80.https://doi.org/10.1016/j.tics.2011.11.018
 50
 51
 52

53
Recruitment order of cat abducens motoneurons and internuclear neuronsJournal of Neurophysiology 90:2240–2252.https://doi.org/10.1152/jn.00402.2003
 54
 55

56
Sparse coding via thresholding and local competition in neural circuitsNeural Computation 20:2526–2563.https://doi.org/10.1162/neco.2008.0307486
 57

58
Error correction, sensory prediction, and adaptation in motor controlAnnual Review of Neuroscience 33:89–108.https://doi.org/10.1146/annurevneuro060909153135

59
Natural image statistics and neural representationAnnual review of neuroscience 24:1193–1216.https://doi.org/10.1146/annurev.neuro.24.1.1193
 60

61
Representation of sensory information in the cricket cercal sensory system. II. information theoretic calculation of system accuracy and optimal tuningcurve widths of four primary interneuronsJournal of neurophysiology 66:1690–1703.

62
Independent component filters of natural images compared with simple cells in primary visual cortexProceedings of the Royal Society B: Biological Sciences 265:359–366.https://doi.org/10.1098/rspb.1998.0303
 63

64
Chaotic balanced state in a model of cortical circuitsNeural Computation 10:1321–1371.https://doi.org/10.1162/089976698300017214
 65

66
Visual adaptation as optimal information transmissionVision Research 39:3960–3974.https://doi.org/10.1016/S00426989(99)001017
 67
 68
Decision letter

Frances K SkinnerReviewing Editor; University Health Network, Canada
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your work entitled "Optimal compensation for neuron death" for consideration by eLife. Your article has been favorably evaluated by Eve Marder (Senior Editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
All of the reviewers felt that the submitted theoretical work on compensating for cell death was novel and interesting. However, it was also felt that many specifics were unclear or buried in other places, making biological interpretation difficult (e.g., excitatory, inhibitory cell types). The reviewers encourage the authors to consider the five following points, and at least the first four should be addressed in a revised submission. Additional points raised by the reviewers are detailed below, and should also be addressed.
1) State clearly that the network is nonDalian and explain the reasoning for this choice.
2) State clearly what the recovery boundaries are, and plot figures that show which variables affect the recovery boundary for a given network with a given task. In general, spend a bit more time on the general mechanism before going into the additional models, or alternatively, explain in detail the additional models, like e.g. Figure 7, which is devoid of explanatory power.
3) An example of a Dalian network, independently of whether it works (same boundary?) or not (why does it not work?) should be added.
4) Show the compensatory mechanisms at play more carefully in figures, i.e. show how balance is changing and finally breaking at the recovery boundary.
5) Consider building a reservoir network model as in Sussillo & Abbott trained to perform a task. Delete neurons, and check whether it is able to selfbalance. If it doesn't, address whether this is related with the complexity of the task.
We are including the original reviews as they provide context for the revision points shown above.
Reviewer #1:
In this clearly written model theory paper, the authors propose that neural circuits compensate for neuronal death immediately, and show this using firing rate network models and several experiments. Although overall I found the work compelling, in general, I felt that the authors need to better describe 'biological correlates' for their models and to 'fill in the blanks' in various places to help the reader relate and better appreciate the work.
Specifically:
1) 'quadratic programming algorithm' and 'loss function' – could the authors provide suggestions of biological correlates of this. Presumably it would relate to excitatoryinhibitory balances in some way?
2) While spiking rate models are used, differences between excitatory and inhibitory cells are not apparent to me. Given this, then the suggestion in the Discussion that 'these predictions could be tested using neuronal ablations or optogenetic silencing' would seem to suggest that cell type (at least as far as excitatory or inhibitory) does not matter, but this is clearly not true (one of many examples would be epilepsy work by Soltesz group – KrookMagnuson et al. 2013 Nature Communications). For the reader to better appreciate what is intended by the various discussion suggestions, the authors should 'fill in the blanks' between their theoretical model and experimental suggestions being made.
In the Methods it is stated that "To be biologically more plausible, however, the derivative could be computed through a simple circuit that combines direct excitatory signal inputs with delayed inhibitory signal inputs (e.g. through feedforward inhibition)."
So, could the authors provide description/interpretation of excitatory/inhibitory cells/circuits in terms of their results/predictions? (i.e., biological plausibility).
3) The authors use only positive firing rates (unlike other theoretical studies with positive and negative etc.), and this is a critical difference in what they achieve. While it is clear that negative firing rates may not make sense, the biological correlate of the authors' model is not clear – see 2.
Reviewer #2:
Barret et al. investigate a question rarely addressed in theoretical neuroscience: how do neural circuits compensate the death of neurons? They provide an elegant model showing that neurons in the balanced networks can dynamically adjust their rates so as to minimize the impact neuron death. They show how to theoretically compute the optimal tuning curves of neurons in a network encoding a multidimensional variable x as a quadratic minimization problem with only constraint that firing rates must be positive. They continue to show that the method presented in their previous work (Boerling et al. 2013) in which they build an integrateandfire network optimized to minimize a cost function quantifying the estimation error of an input signal can in fact implement these optimal tuning curves almost exactly. They investigate how the tuning curves change when a fraction of the neurons are silenced and validate some of the predictions derived from the model with three data sets: the goldfish oculomotor system, the cercal system of crickets/cockroaches, V1 orientation tuned neurons. The paper nicely connects two distinct approaches in theoretical neuroscience, network dynamics and efficient coding, and obtains some very general results showing how constraints such as neuronal silencing or more simply, the rectification of the rate, affect the efficient encoding of external variables in a network of spiking neurons. The methodology and the findings presented can potentially have a substantial impact in our understanding of the properties of balanced networks and population encoding. There are however some aspects that the authors need to address in order to present the limitations of the model and the comparison with the data more clearly (see Comments below).
Comments:
Dale's law and the explanation of the EI balance breakdown. In Figure 2, the authors illustrate how when a fraction of the neurons below the recovery boundary is inactivated, the network dynamically compensates and keeps encoding x(t): the remaining neurons change their firing rate and the EI balance is maintained. Then they show that when all neurons with positive readout weight are removed, the balance can no longer be maintained because "there are no longer enough inhibitory cells to balance excitation (or enough excitatory cells to balance inhibition)". This is a very misleading statement given that in this network there are no excitatory or inhibitory neurons (i.e. does not respect Dale's principle). I was very confused when I read this example thinking that neurons in the Figure should have been organized according to whether they are E or I, and not according to the sign of their readout weight. I had to go to the Methods section to later find out that there are no E and I cells in this network. This is in my opinion a main drawback given that (1) Dale's law seems to be true in cortical circuits and (2) the authors are presenting their work as an extension of classical work describing EI networks in which E and I populations are distinct. In their previous manuscript (Boerling et al. 2013) they address this question and propose potential solutions. All of it should be explicitly mentioned and discussed. Moreover, when describing Figure 2 it should be explained why cells with positive readout weights (w+) can naively be viewed as excitatory cells when they provide both excitation and inhibition? Without this most readers will be misled to interpret that w+ neurons are Excitatory and w neurons are inhibitory.
Biased cost. The authors use a biased quadratic cost function containing a second term which tries to maintain a particular readout of the rates (coefs c_{k}) equal to a constant that they call the background rate. The authors describe in much detailed the implication of this bias in the intercept of the tuning curves (Figure 3C and Figure 3—figure supplement 2BC). The motivation for this choice is however very succinct. Is there any benefit on having a nonzero intercept regarding the estimation error? I ask this because it is a fairly common observation that cortical circuits seem to maintain the overall average population firing rate constant when dynamically encoding different variables (see e.g. Figure 2B Fujisawa et al., Nat Neuroscience 11 (7), 2008). This observation seems to be obtained to the network in which there is a nonzero r_{B} and nonzero uniform c_{k}'s which seems interesting. However, the implications of this additional bias regarding the encoding ability of the circuit (estimation error of x(t)), its plausibility regarding the available data, etc. and not discussed.
The prediction on the changes in tuning for V1 neurons is only shown in three units (Figure 8B). In Figure 8—figure supplement 1, the full population of neurons is examined but the changes are only reported in response to the "preferred stimulus orientations". Thus, evaluating the prediction that there should be an increase of firing rate on the nonsilenced neurons in response to the "preferred orientation of the silenced cells" is not possible. Either a different population analysis is performed or the statement about the outcome of the prediction at the population level (Results, last paragraph) should be modified.
Results on the comparison with data should be brought forward. Since the paper is quite long as it is now, and the most novel and most interesting part is the comparison of the model predictions with data from three systems (Figures 5–8), I suggest moving the mechanistic part of how to implement the optimal solution using a spiking network (e.g. Figures 1–2) to an Appendix box or some sort of "side note figure". This mechanistic part is an important aspect but, once it has been shown that it works, which has somehow already been covered in Boerling et al. 2013, it is no longer required to make the comparison with data (Figures 5–8, which are all derived by numerically solving Eqs. 45). Doing so, the reader will be able to quickly reach the most novel aspects without the need to go over the network implementation details.
In the Discussion, the authors state that the strongest prediction of the model is that "neurons with similar tuning to the dead neurons increase their firing rates and neurons with dissimilar tuning decrease their firing rates (unless they are already silent)." The second part of this statement seems at odds to what is shown in Figures 6I, Figure 8A (top) and Figure 8C where dissimilar neurons either don't change much (e.g. Df is always >=0 in Figure 8C) or tend to increase their rate in response to the preferred orientation of the dead cells (Figure 8A top). This rate decrease in dissimilar neurons seems to occur in the network shown in Figure 4B with monotonically increasing tuning curves. The details about the changes predicted and the generality of these changes in different systems (bellshaped vs. monotonic tuning curves) need to be clarified.
At the end of the Methods, the authors provide an explanation of the Tight balance between excitation and inhibition and a scaling argument that concludes that in the network consider here, the cancellation between the mean excitatory and inhibitory drives occurs with precision 1/N. They compare this cancellation with the classical balanced network presented in van Vreeswijk and Sompolinsky (1998) where the cancellation occurs up to order 1/sqrt(N). This comparison is interesting but in order to make it the authors should provide some more information about the behavior of the network in the large N limit: if x is constant and independent of N, do the firing rates converge to a constant value as N increases? This seems hard to visualize as, even when counting on the cancellation of the mean E and I inputs, the magnitude of the total input current fluctuations σ will grow with respect to the threshold (σ~1/N^2/3 whereas threshold is 1/N^2). This is a reflection of the fact that the rate of incoming spikes grows with N whereas the size of each PSP is always of the order of the threshold (i.e. 1/N^2). I don't see how this network can asymptotically converge to nonzero rates in this large N limit.
A final comment: the authors present as an advantage the speed of the compensation mechanism which they say is "faster than the timescale of neural spiking". To me this comment remarks the mismatch of timescales between the two processes: (1) neural death, an infrequent event, relative to the time scale of neurons, and (2) balanced dynamics, almost instantaneous. More than a bonus I see it as an indication that there are probably other slower processes at work when compensating for such an irreversible damage that happens in such long timescales. Presented as a prediction for manipulation experiments, the compensation can be viewed as the signature mechanism of balanced dynamics, as done here in the comparison with the experimental data. Additionally, there might be circumstances in which groups of neurons become transiently and frequently inactive (e.g. local DOWN state). The fast compensation described here would seem more tuned to address those fast inactivation events instead of neural death.
Reviewer #3:
The manuscript at hand describes a phenomenon that is a routine occurrence for any living nervous system, namely the loss of some of its neurons through cell death. Even in the most severe cases of diseases like presumably Alzheimer's, such neuronal losses often go unnoticed for prolonged periods of time, presumably because they are A) tolerable in a system that relies on massively redundant population code, or B) the system can quickly compensate for the loss of neurons as they fall silent. Due to the lack of cortical plausible, mechanistically implementable computations, the modelling community has not approached this interesting research topic with much vigour.
Here, Barret et al. pick up from the welldescribed recurrently balanced neural network literature and study how balanced networks of various sizes compensate for neuron loss. They set out with two simple assumptions, i.e. that (1) outside stimuli can be described by combinations of quasisteadystate firing rates of all neurons in the network and that (2) these firing rates are bounded by a cost function that prevents combinatorial codes in which the majority of neurons are silent in favour of a few active ones. With these two simple ingredients the authors show that the total number of spikes remains constant in the face of dramatic cell losses, as long as a tight balance of excitation and inhibition in the remaining neurons can be maintained. In fact, it is this balance that, when disturbed by cell loss, increases or decreases the firing rates in the remaining neurons to maintain faithful cumulative signal representation. The authors go on to show that this simple feature of balanced networks can account for compensatory behaviour in more complicated models such as Ben Yishai and Sompolinsky's (1995) bump attractor model (that is unfortunately not cited) and even more laborious models of sparse coding in V1. The results evoke vacillating responses in this reviewer of alleging outrageous triviality (anyone modelling balanced systems intuitively knows that this compensation happens) and respecting the originality of the thought process behind the sequence of presented results. After some deliberation, I agree with the authors fully in that optimal compensation by means of shifting, detailed balance has not been proposed before and is a worthwhile idea to entertain.
To show the link of shifting balanced dynamics in response to cell death, I would like to ask the authors to include better explanations and figures panels for all models that exhaustively show the mechanistic origins of the phenomenon they showcase, and not just its effects, i.e. the final tuning curves before and after. Additionally, I would wish for a better graphical way to discern between the two cases of cell loss on either side of the recovery boundary. In my opinion, Figures 3, 4, 5 could be integrated into Figure 2 or outsourced to supplementary materials and give way for the interesting cases of Figure 6–8 without much loss of detail. Lastly, I would like to ask the authors to discuss clearly what experimental results could discriminate the initially mentioned scenarios A and B (redundancy and compensation, respectively), and what they would expect in the case of temporally varying neuronal representations such as the en vogue neuronal dynamics of Churchland and Shennoy (2012).
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "Optimal compensation for neuron loss" for further consideration at eLife. Your revised article has been favorably evaluated by Eve Marder (Senior Editor) and two reviewers, one of whom is a member of our Board of Reviewing Editors.
The authors have addressed four of the five suggestions we made in the first round of reviews, and the paper has improved. However, the fundamental logic of the paper is still not quite clear and needs to be made transparent. Specifically, the Methods section is hard to follow, and there seem to be some discrepancies in updating (e.g., it refers to a Figure 2I that does not exist, and Figure 2E is not referred to in the text). The derivation and translation into Dalian spiking networks is far from clear. Further, please note the following four points:
1) They create what they call "tightly balanced" networks, i.e. networks with very fast, precise / detailed inhibition by means of mathematical derivation and go on to say that the networks they built are much more balanced than vanVreesvijk models. We cannot assess this, because they don't show any reference simulations, or measures of balance. It is surprising that their technique works so well, as there are publications that claim the opposite, i.e. not easy to create such balance in networks.
2) What's more, they present "tight balance" as if it exists in a vacuum, ("Also, it has been shown that this connectivity can be learned using simple spike timingdependent plasticity rules (W. Brendel, R. Bourdoukan, P. Vertechi, et al., unpublished observations; Bourdoukan et al. (2012)), so extensive finetuning is not required to obtain these spiking networks.") but there are at least four other published methods of creating "tight" / detailed / precise" balance in recurrent networks: Haas et al., JNeurophys. 2006, Vogels et al., Science, 2011, Luz & Shamir, PlosCB, 2012 and Hennequin et al., Neuron, 2015. It's pretty powerful that the authors can just derive their architecture as they should be. To make this more interesting (and clear), the authors need to discuss and compare their result to these other studies. What are the crucial differences of "tight balance" to these other balances?
3) Another issue is the seemingly carefully calibrated "spike reset threshold", that makes sure that the inhibitory activity is equal to x^? How does that work, and why is this a plausible assumption, or is this just a trick to make sure the inhibitory input is equal to the desired output? Please explain.
4) Finally, they write "To be biologically more plausible, however, the derivative could be computed through a simple circuit that combines direct excitatory signal inputs with delayed inhibitory signal inputs (e.g. through feedforward inhibition)." It is not clear that FF inhibition would do this, because it crucially misses the translation of the dynamics to be ==x^. In previous work, it has been shown that deleting cells out of a FF network has immediate effects on the dynamics of the read out (Vogels, NN09, Figure 4). This may be a different scenario, but it is far from clear in the Methods, and if this constraint is crucial, it needs to be carefully discussed to avoid the impression that the paper at hand is inherently circular.
https://doi.org/10.7554/eLife.12454.016Author response
All of the reviewers felt that the submitted theoretical work on compensating for cell death was novel and interesting. However, it was also felt that many specifics were unclear or buried in other places, making biological interpretation difficult (e.g., excitatory, inhibitory cell types). The reviewers encourage the authors to consider the five following points, and at least the first four should be addressed in a revised submission. Additional points raised by the reviewers are detailed below, and should also be addressed.
1) State clearly that the network is nonDalian and explain the reasoning for this choice.
The reviewers point out a fundamental omission in our original work: the difference between excitatory and inhibitory neurons, and their (potentially) contrasting effects on a network’s response to neuron loss. We have addressed this problem in the following way:
First, we have added a Dalian network to our work, following the recipe of Boerlin et al. (2013), and use this model now as a starting point. Second, we now clearly state when we use nonDalian networks. Third, we explain the reason for using nonDalian networks. In a nutshell, using nonDalian networks does not change our predictions, but makes life simpler. More specifically, whether we take the Dalian (E/I) network proposed in Boerlin et al. 2013, or we take a nonDalian network, the essential predictions about optimal compensation and recovery boundary remain the same. The key reason is that both networks in effect minimize the same objective function, and, since compensation effects are guided by the objective function, both networks show the same compensatory responses. This is now clearly explained in the Results and Methods sections – see also point 3 below. Given this equivalence, we use the simpler nonDalian network because it removes one layer of complexity, which we hope aids the reader in understanding (or simulating) the networks.
2) State clearly what the recovery boundaries are, and plot figures that show which variables affect the recovery boundary for a given network with a given task. In general, spend a bit more time on the general mechanism before going into the additional models, or alternatively, explain in detail the additional models, like e.g. Figure 7, which is devoid of explanatory power.
We now explain the recovery boundary in greater detail. Specifically, we changed these items:
A) We now explain both the general compensatory mechanisms and the recovery boundary in greater detail by splitting them into two separate figures (Figure 2 and 3). Figure 2, which now contains a Dalian network, explains compensation in the simplest possible case (one input signal in a network of 100 neurons). Figure 3 explains compensation and the recovery boundary in greater detail, and specifically shows (1) the interplay of neural tuning, neuron loss, and compensation, and (2) the breakdown in E/I balance in part of the population as a function of input signals and neural tuning. The latter part highlights that at the recovery boundary, balance will generally break down in only part of a population, and for only part of the input signals.
B) We clearly explain the recovery boundary for both the network with monotonic tuning curves and the networks with bellshaped tuning curves. Both networks have been condensed to a single figure (now Figure 5).
We also included a mechanistic subtlety we did not address in the previous manuscript, namely the difference between two definitions of E/I balance: a standard definition in terms of synaptic inputs, and a nonstandard definition in which a neuron’s selfreset is considered part of the inhibitory inputs. In large and massively redundant networks, the difference can be neglected. In smaller networks, or large networks with limited redundancy (defined as number of input signals over number of neurons), the difference is important, and the nonclassical definition more clearly captures what we call rebalancing or a breakdown in balance.
3) An example of a Dalian network, independently of whether it works (same boundary?) or not (why does it not work?) should be added.
As stated above, using a Dalian network does not change the results. In a nutshell, the Dalian network has the same recovery boundary. To show that, and to be clearer about our use of nonDalian networks, we now changed the order of the text, and we start with a Dalian network (new Figure 2). We furthermore added a section in the methods that explicitly explains how to construct Dalian networks that minimize an objective function, and that explains the equivalence, in terms of compensation effects and recovery boundaries, of Dalian and nonDalian networks.
In short, here’s how we solve the problem of Dale’s law. First, we consider that all neurons simulated in the nonDalian networks are excitatory, except that they have direct inhibitory connections with each other. In the Dalian network, these direct inhibitory connections are then rerouted through a separate population of inhibitory neurons. This population simply seeks to represent the firing of the excitatory neurons as faithfully as possible, using the same principle of efficient coding used in the excitatory population. As long as the inhibitory population is fully functional, it acts essentially identical to the direct inhibitory connections, and the compensatory properties of the excitatory neurons in the Dalian and nonDalian networks are therefore the same. (‘Essentially’, because there can be minor quantitative differences due to approximation errors in smaller networks; as the network size grows, they become exactly the same.) The inhibitory population can similarly compensate for the loss of its neurons (up to a recovery boundary), and its compensatory properties are similar.
4) Show the compensatory mechanisms at play more carefully in figures, i.e. show how balance is changing and finally breaking at the recovery boundary.
As stated above, we have expanded the mechanistic explanations in the first figures (Figures 2 and 3). We note that EI balance – when defined to include a neuron’s self reset – is not changing as long as the system can compensate for the loss of neurons. Once it is no longer able to compensate for certain signal dimensions, balance in all neurons coding for those signal dimensions breaks down. Importantly, the only measurable signature of successful compensation are therefore changes in firing rates. That is also one of the main reasons why the figures (now Figures 5–8) that discuss various implementations of the framework for different neural systems focus mostly on firing rate changes.
5) Consider building a reservoir network model as in Sussillo & Abbott trained to perform a task. Delete neurons, and check whether it is able to selfbalance. If it doesn't, address whether this is related with the complexity of the task.
This is an interesting suggestion, and we believe it is indeed important to evaluate how different types of network models respond to neural damage. We have opted not to include this in this paper for two reasons. First, we feel that our paper, as it stands, is already quite full with materials, and we did not want to overburden the reader any further. Second, we think this point really deserves a wider investigation, i.e., should include models beyond the Susillo & Abbott model, to study their ability to compensate for neuron loss. A graduate student is currently actively working on this larger problem, and we intend to publish these results elsewhere.
Additional changes:Besides these major changes, we point out a few other changes in the main manuscript:
A) Based on a comment of reviewer #2, we found that our overemphasis of ‘neuron death’ may mislead readers, so we now prefer to opt for the slightly more neutral term ‘neuron loss’. Accordingly, we changed the title from ‘Optimal compensation for neuron death’ to ‘Optimal compensation for neuron loss’.
B) The manuscript changed substantially, so we opted not to provide a version in which changes are tracked because that version looked too messy.
We are including the original reviews as they provide context for the revision points shown above.
Reviewer #1:
In this clearly written model theory paper, the authors propose that neural circuits compensate for neuronal death immediately, and show this using firing rate network models and several experiments. Although overall I found the work compelling, in general, I felt that the authors need to better describe 'biological correlates' for their models and to 'fill in the blanks' in various places to help the reader relate and better appreciate the work.
Specifically:
1) 'quadratic programming algorithm' and 'loss function' – could the authors provide suggestions of biological correlates of this. Presumably it would relate to excitatoryinhibitory balances in some way?
Throughout the manuscript, we have tried to deemphasize mathematical concepts (e.g. quadratic programming and loss function) in favor of biological concepts. We have also expanded and rewritten the beginning of the Results section to explain these concepts more carefully.
In a nutshell, the key idea is the following: the patterns of spikes emitted by our networks ‘selforganize’ such that downstream readouts can extract the input signals from a highly efficient code. This self organization is coordinated by the networks’ own internal dynamics and corresponds, mathematically speaking, to the minimization of an objective or loss function (which measures the quality of the population code). This idea of minimizing an objective with a network is quite similar in spirit to e.g. the Hopfield model. However, in our network we can actually identify a simple biological correlate of the loss function since each neuron’s membrane voltage measures a part of the coding error. Hence, information about the loss function is hidden away in the (subthreshold) membrane voltages, excitatory inputs cause increases in coding errors, and inhibitory inputs cause decreases in coding errors. If excitation and inhibition are balanced in every single neuron, then all coding errors are kept in check.
2) While spiking rate models are used, differences between excitatory and inhibitory cells are not apparent to me. Given this, then the suggestion in the Discussion that 'these predictions could be tested using neuronal ablations or optogenetic silencing' would seem to suggest that cell type (at least as far as excitatory or inhibitory) does not matter, but this is clearly not true (one of many examples would be epilepsy work by Soltesz group – KrookMagnuson et al. 2013 Nature Communications). For the reader to better appreciate what is intended by the various discussion suggestions, the authors should 'fill in the blanks' between their theoretical model and experimental suggestions being made.
In the Methods it is stated that "To be biologically more plausible, however, the derivative could be computed through a simple circuit that combines direct excitatory signal inputs with delayed inhibitory signal inputs (e.g. through feedforward inhibition)."
So, could the authors provide description/interpretation of excitatory/inhibitory cells/circuits in terms of their results/predictions? (i.e., biological plausibility).
Thanks for pointing out this fundamental omission. It was certainly not our intention to suggest that cell types are irrelevant. To address this point, we now start with a network that explicitly distinguishes excitatory and inhibitory neurons. We show that for each of these populations similar compensatory effects apply. We explain the resulting compensatory boundaries, and then move to networks that leave out the inhibitory subpopulations for simplicity. We discuss the recovery boundary for the excitatory population, but we do not explicitly discuss what happens in our networks when the inhibitory population can no longer compensate neuron loss (the result is an explosion of activity, whose exact nature we did not investigate).
The work by the Soltesz group is really interesting, but we are not sure that it directly applies to our case. Rather than moving from an intact to a pathological case, KrookMagnuson et al. move from the pathological to the intact state (through inhibiting a subset of cells) Our framework makes clear predictions for what should happen when a system compensates for neuron loss; once a system moves beyond the recovery boundary, we essentially only know that the representation has to partially break down, and we can predict the immediate nature of the errors. However, it is much harder to predict (and likely problemspecific) how these immediate errors will affect the subsequent dynamics of the system.
3) The authors use only positive firing rates (unlike other theoretical studies with positive and negative etc.), and this is a critical difference in what they achieve. While it is clear that negative firing rates may not make sense, the biological correlate of the authors' model is not clear – see 2.
The biological correlate is always a balanced, excitatoryinhibitory network, and we now explain this in greater detail at the beginning of the Results. To understand what is happening in these quite complex networks, we make two simplification steps. We first eliminate the inhibitory population and replace it by direct inhibitory connections between the excitatory neurons. We then show how one can understand the firing rates of the resulting networks via a mathematical optimization procedure (quadratic programming) that provides additional insights. Importantly, these are only mathematical simplifications, but not biological simplifications. In other words, all the results we obtain for the simplified networks or the firing rate calculations still apply to the general excitatoryinhibitory network. We now explain this latter bit in the Methods section.
Reviewer #2:
[…] There are however some aspects that the authors need to address in order to present the limitations of the model and the comparison with the data more clearly (see Comments below).
Comments:
Dale's law and the explanation of the EI balance breakdown. In Figure 2, the authors illustrate how when a fraction of the neurons below the recovery boundary is inactivated, the network dynamically compensates and keeps encoding x(t): the remaining neurons change their firing rate and the EI balance is maintained. Then they show that when all neurons with positive readout weight are removed, the balance can no longer be maintained because "there are no longer enough inhibitory cells to balance excitation (or enough excitatory cells to balance inhibition)". This is a very misleading statement given that in this network there are no excitatory or inhibitory neurons (i.e. does not respect Dale's principle). I was very confused when I read this example thinking that neurons in the Figure should have been organized according to whether they are E or I, and not according to the sign of their readout weight. I had to go to the Methods section to later find out that there are no E and I cells in this network. This is in my opinion a main drawback given that (1) Dale's law seems to be true in cortical circuits and (2) the authors are presenting their work as an extension of classical work describing EI networks in which E and I populations are distinct. In their previous manuscript (Boerling et al. 2013) they address this question and propose potential solutions. All of it should be explicitly mentioned and discussed.
The reviewer raises a valid and important point which we did not address in the original manuscript. We have now corrected this omission. Specifically, we now start the Results with a general EI network, following the core ideas of Boerlin et al., 2013, and we separately consider the compensatory properties of the excitatory and inhibitory populations. We then move on towards the nonDalian networks, a step that is done purely for mathematical convenience, since it removes one layer of complexity (separate inhibitory neurons) without changing the results. Importantly, the compensatory mechanisms are the same for these EI networks compared to the simpler nonDalian networks. We explain the equivalence of the Dalian and nonDalian networks in the Methods section.
Concerning the unfortunate confusion resulting from the sign of the readout weights in the old Figure 2: we have now opted to eliminate this additional complexity from the figure, so that readers will not confuse the sign of the readout weights with excitation vs. inhibition.
Moreover, when describing Figure 2 it should be explained why cells with positive readout weights (w+) can naively be viewed as excitatory cells when they provide both excitation and inhibition? Without this most readers will be misled to interpret that w+ neurons are Excitatory and w neurons are inhibitory.
Our description of these cells was indeed misleading, and we apologize for the resulting confusion. We have now replaced Figure 2 with a simulation of an EI network, and we no longer consider +/ neurons in order to simplify the explanations.
Biased cost. The authors use a biased quadratic cost function containing a second term which tries to maintain a particular readout of the rates (coefs c_{k}) equal to a constant that they call the background rate. The authors describe in much detailed the implication of this bias in the intercept of the tuning curves (Figure 3C and Figure 3—figure supplement 2BC). The motivation for this choice is however very succinct. Is there any benefit on having a nonzero intercept regarding the estimation error?
The introduction of the biased cost is indeed unfortunate. We now eliminated the term ‘biased cost’ and provide a different explanation for it. Briefly, we assume that the respective systems represent two signals, one that is variable, and one that is constant, which we now call ‘background signal.’
Of course, these changes are cosmetic and do not explain why we include such a background signal in the first place. Our explanation is twofold. The first reason is pedagogical. Without the background signal, the tuning curves of onedimensional systems are not particularly interesting, so we need a second signal to obtain more interesting tuning curves. In some sense, this is the simplest system that has interesting (piecewise linear) tuning curves. The second reason is the oculomotor integrator. We reinterpret this system in the following way. Rather than assuming that the system represents eye position per se, we now assume that it seeks to represent the two muscle activities that steer (horizontal) eye movements. These two muscles are antagonists, meaning that they contract and relax in opposition to each other. Assuming that we call the muscle activations m_{L} and m_{R}, then the eye position is, to first approximation, given by the difference in muscle activation, x = m_{R} − m_{L}. At the same time, the sum of the two muscle activations, y = m_{R} + m_{L} will remain somewhat constant, again to first approximation. This is obviously a simplification, but we think a mild one. This second variable, y, is then mathematically identical to the constant, ‘biased cost’ from the previous manuscript.
I ask this because it is a fairly common observation that cortical circuits seem to maintain the overall average population firing rate constant when dynamically encoding different variables (see e.g. Figure 2B Fujisawa et al., Nat Neuroscience 11 (7), 2008). This observation seems to be obtained to the network in which there is a nonzero r_{B} and nonzero uniform c_{k}'s which seems interesting. However, the implications of this additional bias regarding the encoding ability of the circuit (estimation error of x(t)), its plausibility regarding the available data, etc. and not discussed.
The link to cortex is indeed quite interesting. However, our rewording of the oculomotor system in terms of muscle activities (as opposed to a biased cost) weakens that link a bit. There may still be a common underlying principle, but we don’t have a clearcut, solid answer for what that would be.
The prediction on the changes in tuning for V1 neurons is only shown in three units (Figure 8B). In Figure 8—figure supplement 1, the full population of neurons is examined but the changes are only reported in response to the "preferred stimulus orientations". Thus, evaluating the prediction that there should be an increase of firing rate on the nonsilenced neurons in response to the "preferred orientation of the silenced cells" is not possible. Either a different population analysis is performed or the statement about the outcome of the prediction at the population level (Results, last paragraph) should be modified.
We now plot the tuning curves of the full population in the new Figure 8A, and we show all changes in firing rates in the new Figure 8B. The plots illustrate that many (but not all) neurons change their firing rates in response to the preferred orientations of the silenced cells.
Our initial choice of showing a few examples was largely motivated by comparing our results with the original experimental papers (by Crook and colleagues). This series of papers is based on describing the firing rate changes in single neurons, and has few summary statistics. We therefore kept the single neuron examples that we compare to the data, and moved them into the new Figure 9.
Results on the comparison with data should be brought forward. Since the paper is quite long as it is now, and the most novel and most interesting part is the comparison of the model predictions with data from three systems (Figures 5–8), I suggest moving the mechanistic part of how to implement the optimal solution using a spiking network (e.g. Figures 1–2) to an Appendix box or some sort of "side note figure". This mechanistic part is an important aspect but, once it has been shown that it works, which has somehow already been covered in Boerling et al. 2013, it is no longer required to make the comparison with data (Figures 5–8, which are all derived by numerically solving Eqs. 45). Doing so, the reader will be able to quickly reach the most novel aspects without the need to go over the network implementation details.
We would ideally like the paper to be selfcontained and oriented towards biologists. While we agree that readers that have read and digested the paper by Boerlin et al. will find the mechanistic explanations potentially repetitive, we do not want to redirect the more biological readership to the Boerlin paper. We therefore chose to keep the mechanistic explanations, but we sought to focus better on the compensation vs recovery boundary predictions of the model, and we also now explain an EI network. However, we did shorten the midsection of the paper which concerned the computations of firing rates, and we provided additional simulations for the V1 model.
In terms of comparison with the data, we provide a more detailed account of the data in the main text. We emphasize that the original series of studies by Crook and Eysel is largely based on analyzing single neurons one by one, so we did not change our plots. Our V1 model captures the core qualitative aspects of the data (namely, that firing rates across the population change in the direction of the silenced neurons), but not every single aspect of it, which we now also explain in the Results section.
In the Discussion, the authors state that the strongest prediction of the model is that "neurons with similar tuning to the dead neurons increase their firing rates and neurons with dissimilar tuning decrease their firing rates (unless they are already silent)." The second part of this statement seems at odds to what is shown in Figures 6I, Figure 8A (top) and Figure 8C where dissimilar neurons either don't change much (e.g. Df is always >=0 in Figure 8C) or tend to increase their rate in response to the preferred orientation of the dead cells (Figure 8A top). This rate decrease in dissimilar neurons seems to occur in the network shown in Figure 4B with monotonically increasing tuning curves. The details about the changes predicted and the generality of these changes in different systems (bellshaped vs. monotonic tuning curves) need to be clarified.
This is an excellent point, and has allowed us to uncover some mistakes and conceptual imprecisions in our previous version.
First a mistake: in the previous version, we used the word ‘orientation tuning’ (which goes from 0−180^{◦}), where we should have used the word ‘direction tuning’ (which goes from 0 − 360^{◦}). Indeed, the experiments by Crook and Eysel knock out sets of neurons with specific direction tuning (not orientation tuning). While we had done exactly that, it was not clear from the main text. This is now corrected.
Second, the conceptual problems: We had indeed written similar and dissimilar tuning, in order to emphasize a straight line of thought from the simple 1d and 2d models. However, this was simply not the right way to think about compensation in a highdimensional system. The key problem is that ‘similar’ or ‘dissimilar’ tuning will usually be assessed with respect to a few parameters (such as the direction of a moving grating, e.g.). However, to understand compensation one has to assess tuning in the original, highdimensional space, and compare the relative directions of the decoding weights of the knockedout neurons with the direction of the stimulus, and the direction of the remaining neurons.
The core reason why neural firing rates increase to both the preferred and antipreferred direction in the V1 model can therefore only be understood in this highdimensional space. Briefly, most neurons are only weakly directiontuned. Consequently, when we knock out a set of neurons with a given preferred direction, the network model experiences a strong representational deficit for stimuli of this preferred direction, and a weaker representational deficit for stimuli of the antipreferred direction. Since none of the neurons are likely to directly point in the correct directions in the highdimensional space, the representational deficit is most easily fixed by increasing the firing rates of a combination of the other neurons. While decreases in firing rates are also possible, they generally tend to have less of an impact.
We have reworded the complete V1 section in order to clarify these points. We have also added a schematic (new Figure 8F) to illustrate graphically how compensation works in highdimensional spaces.
At the end of the Methods, the authors provide an explanation of the Tight balance between excitation and inhibition and a scaling argument that concludes that in the network consider here, the cancellation between the mean excitatory and inhibitory drives occurs with precision 1/N. They compare this cancellation with the classical balanced network presented in van Vreeswijk and Sompolinsky (1998) where the cancellation occurs up to order 1/sqrt(N). This comparison is interesting but in order to make it the authors should provide some more information about the behavior of the network in the large N limit: if x is constant and independent of N, do the firing rates converge to a constant value as N increases? This seems hard to visualize as, even when counting on the cancellation of the mean E and I inputs, the magnitude of the total input current fluctuations σ will grow with respect to the threshold (σ~1/N^2/3 whereas threshold is 1/N^2). This is a reflection of the fact that the rate of incoming spikes grows with N whereas the size of each PSP is always of the order of the threshold (i.e. 1/N^2). I don't see how this network can asymptotically converge to nonzero rates in this large N limit.
The section on scaling at the end of the Methods was indeed quite brief, and we have now added more details. Concerning the reviewer’s question, we note that a crucial point in scaling up net works is which variables change and which remain constant. When we scale up our networks, we keep two quantities fixed. First, we assume that the readout (xˆ) stays the same. Second, we assume that the average firing rates of the neurons (ri) stay the same. In other words, the firing rates converge to constant values more or less by definition, or by our choice of scaling. Given that the signal and the firing rates remain constant, everything else has to scale. So it follows that the decoders Di have to scale with 1/N
(since xˆ = ∑_{i} D_{i}r_{i}), and from that follows the scaling of the connectivities and currents etc. This is now more clearly explained in the Methods.
A final comment: the authors present as an advantage the speed of the compensation mechanism which they say is "faster than the timescale of neural spiking". To me this comment remarks the mismatch of timescales between the two processes: (1) neural death, an infrequent event, relative to the time scale of neurons, and (2) balanced dynamics, almost instantaneous. More than a bonus I see it as an indication that there are probably other slower processes at work when compensating for such an irreversible damage that happens in such long timescales. Presented as a prediction for manipulation experiments, the compensation can be viewed as the signature mechanism of balanced dynamics, as done here in the comparison with the experimental data. Additionally, there might be circumstances in which groups of neurons become transiently and frequently inactive (e.g. local DOWN state). The fast compensation described here would seem more tuned to address those fast inactivation events instead of neural death.
That’s an excellent observation and we totally agree. Indeed, there are likely many compensation mechanisms, and the slower ones that have to do with adapation, learning, plasticity etc. are certainly much better investigated. We briefly mention this at the beginning of the Discussion. However, the reviewer’s observation has also helped us to note more clearly the mismatch between the title (‘neuron death’), the inactivation experiments we describe, and the instantaneous compensation we advocate. With hindsight, we believe that the emphasis on ‘neuron death’ may in fact be misleading for potential readers. Hence, we decided to change the title and wording to a slightly more neutral term: ‘neuron loss’.
Reviewer #3: […] Here, Barret et al. pick up from the welldescribed recurrently balanced neural network literature and study how balanced networks of various sizes compensate for neuron loss. They set out with two simple assumptions, i.e. that (1) outside stimuli can be described by combinations of quasisteadystate firing rates of all neurons in the network and that (2) these firing rates are bounded by a cost function that prevents combinatorial codes in which the majority of neurons are silent in favour of a few active ones. With these two simple ingredients the authors show that the total number of spikes remains constant in the face of dramatic cell losses, as long as a tight balance of excitation and inhibition in the remaining neurons can be maintained. In fact, it is this balance that, when disturbed by cell loss, increases or decreases the firing rates in the remaining neurons to maintain faithful cumulative signal representation. The authors go on to show that this simple feature of balanced networks can account for compensatory behaviour in more complicated models such as Ben Yishai and Sompolinsky's (1995) bump attractor model (that is unfortunately not cited) and even more laborious models of sparse coding in V1. The results evoke vacillating responses in this reviewer of alleging outrageous triviality (anyone modelling balanced systems intuitively knows that this compensation happens) and respecting the originality of the thought process behind the sequence of presented results. After some deliberation, I agree with the authors fully in that optimal compensation by means of shifting, detailed balance has not been proposed before and is a worthwhile idea to entertain.
We thank the reviewer for the helpful and honest comments. We totally agree that the rebalancing of balanced networks is, in many respects, a trivial property, and we now emphasize this a bit more in the Discussion, and seek to contrast this purely dynamical property of balanced networks with the notion of compensation, which implies a specific function. We also added a citation to the bump attractor model – thanks for pointing out this omission.
To show the link of shifting balanced dynamics in response to cell death, I would like to ask the authors to include better explanations and figures panels for all models that exhaustively show the mechanistic origins of the phenomenon they showcase, and not just its effects, i.e. the final tuning curves before and after. Additionally, I would wish for a better graphical way to discern between the two cases of cell loss on either side of the recovery boundary.
We agree with the reviewer’s suggestion. We have now expanded the Results section in order to explain the compensatory mechanism and its breakdown better. Specifically, we now included an EI model (new Figure 2), and we included a bellshaped tuning curve model (Figure 3) in which we explain in detail the compensatory mechanisms before and after the recovery boundary has been hit. We emphasize that the main experimental signature of successful compensation is a change in firing rate with EI balance remaining intact and essentially unchanged, whereas the main experimental signature of a failure to compensate is a breakdown in balance. We now explain that this breakdown in balance is signaldependent and only occurs in part of the population.
In my opinion, Figures 3, 4, 5 could be integrated into Figure 2 or outsourced to supplementary materials and give way for the interesting cases of Figure 6 – 8 without much loss of detail.
We have streamlined the presentation around Figures 3–5. Specifically, we have condensed and integrated the old Figures 3 and 6 into the new Figure 5. In turn, we have expanded the beginning in order to emphasize the mechanistic origin of the compensatory effects, and we have expanded the section around the experimental results, and specifically the V1 model, which is now explained in much more depth.
Lastly, I would like to ask the authors to discuss clearly what experimental results could discriminate the initially mentioned scenarios A and B (redundancy and compensation, respectively), and what they would expect in the case of temporally varying neuronal representations such as the en vogue neuronal dynamics of Churchland and Shennoy (2012).
We emphasize that some redundancy is always necessary to compensate for neuron loss – without any redundancy, there can be no compensation. However, redundancy by itself is not sufficient.
Scenario A is redundancy without compensation. In this case, there would be degradation of performance, essentially proportional to the number of neurons lost. In real systems, one may expect that plasticity mechanisms in turn correct for that degradation, albeit on a slower time scale than the instantaneous correction proposed in our work.
Scenario B is (at least some) redundancy with compensation, meaning that firing rates in the remaining neurons actively adjust in order to compensate for the lost neurons. In this case, there should be no degradation whatsoever – until the recovery boundary is hit. Figure 9F now provides additional graphical intuition about what optimal compensation does.
In short, the best test would be an instantaneous (e.g. optogenetic) knockout of a significant fraction of a network’s population. If the firing rates of the neurons adjust immediately so as to restore a (previously determined) readout, then some form of optimal compensation is at work. If the firing rates do not adjust, then the readout will be compromised to some extent, in which case we are just witnessing redundancy. Of course, the firing rates could also change in a way that is detrimental to a given readout, which could be a scenario C, a system that is not at all prepared to deal with neuron loss. Classical (nonrobust) models of line attractors would probably fall in this latter category.
Concerning temporally varying representations, such as those of Churchland and Shenoy, our predictions would remain the same. A smaller fraction of knockedout neurons could be compensated for, and a timevarying representation would remain intact. (In fact, this is exactly what happens when we take the slow connectivities from the Boerlin 2013 paper back into account.) Once the recovery boundary is hit, the representation would fail. In a dynamical system with timevarying representations, the resulting errors could then potentially build up, but studying the precise buildup of those initial representation errors would depend on the specifics of the underlying network architecture. (We are currently investigating some of these scenarios in detail, in order to better account for forthcoming experimental data; we chose not to include these results here, given that the current study is already quite lengthy.)
We discuss several possible experimental tests of our theory in the Results section. However, the key test is the one explained above: measure the instantaneous firing rate changes after partial knockdown of a neural population, and verify that these changes restore a (previously established) linear readout.
[Editors' note: further revisions were requested prior to acceptance, as described below.]
The authors have addressed four of the five suggestions we made in the first round of reviews, and the paper has improved. However, the fundamental logic of the paper is still not quite clear and needs to be made transparent. Specifically, the Methods section is hard to follow, and there seem to be some discrepancies in updating (e.g., it refers to a Figure 2I that does not exist, and Figure 2E is not referred to in the text). The derivation and translation into Dalian spiking networks is far from clear.
We thank the reviewers for their comments. Here is what we did to address them:
Methods: We now explain the derivation of the Dalian spiking network in much more detail, and we have eliminated some inconsistencies in the notation that may have caused confusion. Furthermore, we rewrote the section about our definition of EI balance.
Results: We have similarly rewritten the parts concerned with the Dalian networks and the explanation of EI balance, i.e., the text around Figures 2 and 3.
Figure References: We have systematically checked all figure references, and eliminated a few typos and inconsistencies, including those mentioned.
Code: We have included the MATLAB code for all figures in this revision, which will allow both the referees and future readers to replicate and modify the simulations that we show.
Further, please note the following four points:
1) They create what they call "tightly balanced" networks, i.e. networks with very fast, precise / detailed inhibition by means of mathematical derivation and go on to say that the networks they built are much more balanced than vanVreesvijk models. We cannot assess this, because they don't show any reference simulations, or measures of balance. It is surprising that their technique works so well, as there are publications that claim the opposite, i.e. not easy to create such balance in networks.
We agree that we did not give a detailed comparison of balance in our networks compared with those of van Vreeswijk & Sompolinsky. Briefly, when increasing network size, balance scales with
1/N in our networks, rather than $1\sqrt{N}$ as in the networks of van Vreeswijk & Sompolinsky. Hence, for larger networks, our networks are much more tightly balanced.
We did not provide a detailed comparison for two reasons. First, we had previously published the scaling arguments (Boerlin, Machens, Deneve, 2013, PLOS CB), and we believe that the brief recap in the Methods section is therefore sufficient. Second, we recently wrote a review (Deneve & Machens, 2016, Nature Neuroscience) that explains many of those differences on a more conceptual level. However, we agree that those articles were not properly referenced, and we now explicitly included references to tight/loose balance in the method section (‘Tight balance of excitation and inhibition’).
While we did not explicitly study or simulate other balanced networks, we do show measures of EI balance in our networks in Figures 2G and 3I. To better explain these measures, we have expanded and rewritten the section ‘Tight balance of excitation and inhibition’ in the Methods.
We are not entirely sure who argued that it should be difficult to create such a tight balance in networks. To show that doing so is rather straightforward and does indeed work well, and to allow readers to easily replicate our simulations and manipulate them, we included the complete simulation code (in MATLAB) for all figures.
2) What's more, they present "tight balance" as if it exists in a vacuum, ("Also, it has been shown that this connectivity can be learned using simple spike timingdependent plasticity rules (W. Brendel, R. Bourdoukan, P. Vertechi, et al., unpublished observations; Bourdoukan et al. (2012)), so extensive finetuning is not required to obtain these spiking networks.") but there are at least four other published methods of creating "tight" / detailed / precise" balance in recurrent networks: Haas et al., JNeurophys. 2006, Vogels et al., Science, 2011, Luz & Shamir, PlosCB, 2012 and Hennequin et al., Neuron, 2015. It's pretty powerful that the authors can just derive their architecture as they should be. To make this more interesting (and clear), the authors need to discuss and compare their result to these other studies. What are the crucial differences of "tight balance" to these other balances?
It was certainly not our intention to slight the literature here, and pretend that we are the first to generate tightly balanced networks. We added a paragraph in the Discussion about ‘tightly balanced networks’ and included the mentioned references.
To address the specific point raised, we note that not all tightly balanced networks are the same. Apart from different assumptions about the underlying neuron models, and different definitions of balance, the network models also have quite different connectivities. We included a reference to an (as of yet) unpublished paper only because we wanted to address a reader’s potential worry that the connectivity in our networks is essentially designed (and not random as in most balanced networks).
More generally speaking, and referring back to point (1) as well, we absolutely agree that the commonalities and differences between our balanced networks and other balanced networks need to be worked out in greater detail. In fact, we are currently working on this (together in collaboration with Alfonso Renart – who has developed yet another tightly balanced networks). However, working out precisely which differences are important and which are unimportant is essentially a full scientific study in itself. While we appreciate and share the reviewer’s concern about understanding better how other types of networks relate to our framework, or how they may respond to neuron loss, we think that these topics are beyond the boundaries of the current paper, whose main focus is compensation for neuron loss, and its effect on neurons’ tuning.
3) Another issue is the seemingly carefully calibrated "spike reset threshold", that makes sure that the inhibitory activity is equal to x^? How does that work, and why is this a plausible assumption, or is this just a trick to make sure the inhibitory input is equal to the desired output? Please explain.
At first sight, it may seem that each neuron has a precisely tuned threshold and reset, which are coupled to the connectivity via the decoder weights. However, there is no fine tuning or careful calibration necessary. To see that, we first note that we can set all thresholds to the same value by redefining the voltages of the neurons (see e.g. Boerlin et al. 2013, PLOS CB, where this exercise is carried out in the subsection ‘Scaling and Physical Units’, Eq. (12)(14)). After rescaling, the voltage reset is simply given by the equation
reset = −threshold − cost.
Accordingly, the reset does not need to match the (negative) threshold. Rather, any mismatch between the two can be interpreted as a cost term in the loss function. Such cost terms – as long as they are not excessive – will change the population spike patterns, but not the function of the system. Accordingly, we do not need to calibrate threshold and reset, rather, any mismatch will simply change the cost. Changing the cost, in turn, does change the precise spiking pattern of the system, but not its functionality and not its robustness to neuron loss.
4) Finally, they write "To be biologically more plausible, however, the derivative could be computed through a simple circuit that combines direct excitatory signal inputs with delayed inhibitory signal inputs (e.g. through feedforward inhibition)." It is not clear that FF inhibition would do this, because it crucially misses the translation of the dynamics to be ==x^. In previous work, it has been shown that deleting cells out of a FF network has immediate effects on the dynamics of the read out (Vogels, NN09, Figure 4). This may be a different scenario, but it is far from clear in the Methods, and if this constraint is crucial, it needs to be carefully discussed to avoid the impression that the paper at hand is inherently circular.
We believe that this is a misunderstanding. This section is simply about the biophysical interpretation of the external inputs into the recurrent network. The section does not contain any additional constraints or assumptions about the network model.
To clarify, we note that each neuron receives external inputs of the general form ‘x + dx/dt’, i.e., a combination of the original signal, x _{j}, and its derivative, dx_{j}/dt. There are two ways in which we can interpret this external input biophysically. First, we can simply think of this as a single external current input c(t) with c = x + dx/dt. In this case, the signal x(t), which we use throughout the paper, is simply a filtered version of an actual external input signal c(t). Second, we can think of this as a biophysical computation that operates on the signal x(t). In this case, each neuron receives two types of external inputs, the original signal input, x _{j} (t), and its derivative, dx_{j}/dt. The latter could be computed through a combination of direct excitatory signal inputs with delayed inhibitory signal inputs – and this is what we were referring to.
We have rewritten this section to explain the logic better and avoid any confusion.
https://doi.org/10.7554/eLife.12454.017Article and author information
Author details
Funding
Deutsche Forschungsgemeinschaft (EmmyNoether)
 Christian K Machens
Agence Nationale de Recherche (Chaire d'Excellence)
 Christian K Machens
James S. McDonnell Foundation
 Sophie Denève
European Research Council (ERC FP7PREDSPIKE)
 Sophie Denève
European Research Council (BIND MECTCT20095024831)
 Sophie Denève
European Research Council (BACS 796 FP6IST027140)
 Sophie Denève
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank Tony Movshon, Pedro Gonçalves, and Wieland Brendel for stimulating discussions and Alfonso Renart, Claudia Feierstein, Joe Paton, and Michael Orger for helpful comments on the manuscript. S.D. acknowledges the James McDonnell Foundation Award and EU grants BACS FP6IST027140, BIND MECTCT20095–024831, and ERC FP7PREDSPIKE. C.K.M. acknowledges an EmmyNoether grant of the Deutsche Forschungsgemeinschaft and a Chaire d’excellence of the Agence National de la Recherche.
Reviewing Editor
 Frances K Skinner, Reviewing Editor, University Health Network, Canada
Publication history
 Received: October 20, 2015
 Accepted: December 8, 2016
 Accepted Manuscript published: December 9, 2016 (version 1)
 Version of Record published: January 31, 2017 (version 2)
Copyright
© 2016, Barrett et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,229
 Page views

 427
 Downloads

 2
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.