## Abstract

The brain has an impressive ability to withstand neural damage. Diseases that kill neurons can go unnoticed for years, and incomplete brain lesions or silencing of neurons often fail to produce any behavioral effect. How does the brain compensate for such damage, and what are the limits of this compensation? We propose that neural circuits instantly compensate for neuron loss, thereby preserving their function as much as possible. We show that this compensation can explain changes in tuning curves induced by neuron silencing across a variety of systems, including the primary visual cortex. We find that compensatory mechanisms can be implemented through the dynamics of networks with a tight balance of excitation and inhibition, without requiring synaptic plasticity. The limits of this compensatory mechanism are reached when excitation and inhibition become unbalanced, thereby demarcating a recovery boundary, where signal representation fails and where diseases may become symptomatic.

## Main text

### Introduction

The impact of neuron loss on information processing is poorly understood (Palop et al., 2006; Montague et al., 2012). Chronic diseases such as Alzheimer’s disease cause a considerable decline in neuron numbers (Morrison and Hof, 1997), yet can go unnoticed for years. Acute events such as a stroke and traumatic brain injury can kill large numbers of cells rapidly, yet the vast majority of strokes are ‘silent’ (Leary and Saver, 2003). Similarly, the partial lesion or silencing of a targeted brain area may ‘fail’ to produce any measurable, behavioral effect (Li et al., 2016). This resilience of neural systems to damage is especially impressive when compared to man-made computer systems that typically lose all function following only minor destruction of their circuits. A thorough understanding of the interplay between neural damage and information processing is therefore crucial for our understanding of the nervous system, and may also help in the interpretation of various experimental manipulations such as pharmacological silencing (Aksay et al., 2007; Crook and Eysel, 1992), lesion studies (Keck et al., 2008), and optogenetic perturbations (Fenno et al., 2011).

In contrast to the study of information representation in damaged brains, there has been substantial progress in our understanding of information representation in healthy brains (Abbott, 2008). In particular, the theory of efficient coding, which states that neural circuits represent sensory signals optimally given various constraints (Barlow, 1961; Atick, 1992; Olshausen and Field, 1996; Rieke et al., 1997; Simoncelli and Olshausen, 2001; Salinas, 2006), has successfully accounted for a broad range of observations in a variety of sensory systems, in both vertebrates (Simoncelli and Olshausen, 2001; Atick, 1992; Olshausen and Field, 1996; Smith and Lewicki, 2006; Greene et al., 2009) and invertebrates (Rieke et al., 1997; Fairhall et al., 2001; Machens et al., 2005). However, an efficient representation of information is of little use if it cannot withstand some perturbations, such as normal cell loss. Plausible mechanistic models of neural computation should be able to withstand the type of perturbations that the brain can withstand.

In this work, we propose that neural systems maintain stable signal representations by optimally compensating for the loss of neurons. We show that this compensation does not require synaptic plasticity, but can be implemented instantaneously by a tightly balanced network whose dynamics and connectivity are tuned to implement efficient coding (Boerlin et al., 2013; Denève and Machens, 2016). When too many cells disappear, the balance between excitation and inhibition is disrupted, and the signal representation is lost. We predict how much cell loss can be tolerated by a neural system and how tuning curves change shape following optimal compensation. We illustrate these predictions using three specific neural systems for which experimental data before and after silencing are available – the oculomotor integrator in the hindbrain (Aksay et al., 2000, 2007), the cricket cercal system (Theunissen and Miller, 1991; Mizrahi and Libersat, 1997), and the primary visual cortex (Hubel and Wiesel, 1962; Crook and Eysel, 1992; Crook et al., 1996, 1997, 1998). In addition, we show that many input/output non-linearities in the tuning of neurons can be re-interpreted to be the result of compensation mechanisms within neural circuits. Therefore, beyond dealing with neuronal loss, the proposed optimal compensation principle expands the theory of efficient coding and provides important insights and constraints for neural population codes.

### Results

#### The impact of neuron loss on neural representations

We begin by studying how neuron loss can influence neural representations. We assume that information is represented within neural systems such that it can be read out or decoded by a downstream area through a linear combination of neural firing rates. More specifically, we imagine a system of $N$ neurons that represents a set of signals $\mathbf{x}\left(t\right)=\left({x}_{1}\left(t\right),{x}_{2}\left(t\right),\dots ,{x}_{M}\left(t\right)\right)$. These signals may be time-dependent sensory signals such as visual images or sounds, for instance, or, more generally, they may be the result of some computation from within a neural circuit. We assume that these signals can be read out from a weighted summation of the neurons’ instantaneous firing rates, ${r}_{i}(t)$, such that$\hat{\mathbf{x}}\left(t\right)=\sum _{i=1}^{N}{\mathbf{D}}_{i}{r}_{i}\left(t\right),$(1)

where $\hat{\mathbf{x}}\left(t\right)$ is the signal estimate and $\mathbf{D}}_{i$ is a $M$-dimensional vector of decoding weights that determines how the firing of the $i$-th neuron contributes to the readout. Such linear readouts are used in many contexts and broadly capture the integrative nature of dendritic summation.

What happens to such a readout if one of the neurons is suddenly taken out of the circuit? A specific example is illustrated in Figure 1A. Here the neural system consists of only two neurons that represent a single, scalar signal, $x\left(t\right)$, in the sum of their instantaneous firing rates, $\hat{x}\left(t\right)={r}_{1}\left(t\right)+{r}_{2}\left(t\right)$. If initially both neurons fired at 150 Hz, yielding a signal estimate $\hat{x}=300$, then the sudden loss of one neuron will immediately degrade the signal estimate to $\hat{x}=150$. More generally, in the absence of any corrective action, the signal estimate will be half the size it used to be.

There are two solutions to this problem. First, a downstream or read-out area could recover the correct signal estimate by doubling its decoder weight for the remaining neuron. In this scenario, the two-neuron circuit would remain oblivious to the loss of a neuron, and the burden of adjustment would be borne by downstream areas. Second, the firing rate of the remaining neuron could double, so that a downstream area would still obtain the correct signal estimate. In this scenario, the two-neuron circuit would correct the problem itself, and downstream areas would remain unaffected. While actual neural systems may make use of both possibilities, here we will study the second solution, and show that it has crucial advantages.

#### The principle of optimal compensation

To quantify the ability of a system to compensate for neuron loss, we will assume that the representation performance can be measured with a simple cost-benefit trade-off. We formalize this trade-off in a loss function,$E={\left(\mathbf{x}\mathbf{-}\hat{\mathbf{x}}\right)}^{2}+\beta C\left(\mathbf{r}\right).$(2)

Here, the first term quantifies the signal representation error—the smaller the difference between the readout, $\hat{\mathbf{x}}$, and the actual signals, $\mathbf{x}$, the smaller the representation error. The second term quantifies the cost of the representation and is problem-specific and system-specific. Generally, we assume that the costs are small if the signal representation (1) is shared amongst all neurons and (2) uses as few spikes as possible. The parameter $\beta $ determines the trade-off between this cost and the representation error. Quadratic loss functions such as Equation 2 have been a mainstay of efficient coding theories for many years, for example, in stimulus reconstruction (Rieke et al., 1997) and in sparse coding (Olshausen and Field, 1996).

The minimum of our loss function indicates the set of firing rates that represent a given signal optimally. In the two-neuron example in Figure 1A, the initial minimum is reached if both neurons fire equally strong (assuming a cost of $C\left(\mathbf{r}\right)={r}_{1}^{2}+{r}_{2}^{2}$). If we kill neuron 1, then we can minimize the loss function under the constraint that ${r}_{1}=0$, and the minimum is now reached if neuron 2 doubles its firing rates (assuming a small trade-off parameter $\beta $). The induced change in the firing rate of neuron 2 preserves the signal representation, and therefore compensates for the loss of neuron 1. More generally, if the firing rates of a neural system represent a given signal optimally with respect to some loss function, then we define 'optimal compensation' as the change in firing rates necessary to remain at the minimum of that loss function.

Clearly, a system’s ability to compensate for a lost neuron relies on some redundancy in the representation. If there are more neurons than signals, then different combinations of firing rates may represent the same signal (see Figure 1A, dashed line, for the two-neuron example). Up to a certain point (which we will study below), a neural system can always restore signal representation by adjusting its firing rates.

#### Optimal compensation through instantaneous restoration of balance

Next, we investigate how this compensation can be implemented in a neural circuit. One possibility is that a circuit rewires through internal plasticity mechanisms in order to correct for the effect of the lost neurons. However, we find that plasticity is not necessary. Rather, neural networks can be wired up such that the dynamics of their firing rates rapidly evolves into the minimum of the loss function, Equation 2 (Hopfield, 1984; Dayan and Abbott, 2001; Rozell et al., 2008). Such dynamics can be formulated not just for rate networks, but also for networks of integrate-and-fire neurons (Boerlin et al., 2013). The step towards spiking neurons will lead us to two crucial insights. First, we will link the optimal compensation principle to the balance of excitation and inhibition. Second, unlike networks of idealized rate units, networks of spiking neurons can only generate positive firing rates. We will show that this constraint alters the nature of compensation in non-trivial ways.

In order to move towards spiking networks, we assume that the instantaneous firing rates, ${r}_{i}\left(t\right)$, are equivalent to filtered spike trains, similar to the postsynaptic filtering of actual spike trains (see Materials and methods for details). Specifically, a spike fired by neuron $i$ contributes a discrete unit to its instantaneous firing rate, ${r}_{i}\to {r}_{i}+1$, followed by an exponential decay. Next, we assume that each neuron will only fire a spike if the resulting change in the instantaneous firing rate reduces the loss function (Equation 2). From these two assumptions, the dynamics and connectivity of the network can be derived (Boerlin et al., 2013; Bourdoukan et al., 2012; Barrett et al., 2013). Even though neurons in these networks are ordinary integrate-and-fire neurons, the voltage of each neuron acquires an important functional interpretation, as it represents a transformation of the signal representation error, $\mathbf{x}-\hat{\mathbf{x}}$ (see Materials and methods).

If we apply this formalism to the two-neuron example in Figure 1, we obtain a network with two integrate-and-fire neurons driven by an excitatory input signal and coupled by mutual inhibition (Figure 1B). If we neglect the cost term, then the neuron’s (subthreshold) voltages directly reflect the difference between the scalar input signal, $x\left(t\right)$, and its linear readout, $\hat{x}\left(t\right)$, so that ${V}_{i}=D\left(x-\hat{x}\right)$ for $i=\{1,2\}$ (Figure 1C). Initially, the excitatory input signal depolarizes the voltages of both neurons, which reflects an increase in the signal representation error. The neuron that reaches threshold first produces a spike and corrects the signal readout, $\hat{x}\left(t\right)$. In turn, the errors or voltages in both neurons are reset, due to the self-reset of the spiking neuron, and due to fast inhibition of the other neuron. This procedure repeats, so that both neurons work together, and take turns at producing spikes (Figure 1C). Now, when one neuron dies (Figure 1C, red arrow), the remaining neuron no longer receives inhibition from its partner neuron and it becomes much more strongly excited, spiking twice as often.

These ideas can be scaled up to larger networks of excitatory and inhibitory neurons, as illustrated in Figure 2. Here a homogeneous population of excitatory neurons, again modeled as leaky integrate-and-fire neurons (Figure 2A), is driven by a time-varying input signal, $x\left(t\right)$. Just as in the simplified two-neuron example, the excitatory neurons take turns at generating spikes, such that the linear readout of their spike trains, Equation 1, yields an estimate $\hat{x}\left(t\right)$ that minimizes the loss function from Equation 2 (Figure 2D).

Mechanistically, the excitatory population receives an excitatory input signal, $x\left(t\right)$, that is matched by an equally strong inhibitory input. This inhibitory input is generated by a separate population of inhibitory interneurons, and is effectively equivalent to the (re-routed) signal estimate, $\hat{x}\left(t\right)$. As a result, the (subthreshold) voltage of each excitatory neuron is once again determined by the signal representation error, $x\left(t\right)-\hat{x}\left(t\right)$, and a neuron will only fire when this error exceeds its (voltage) threshold. By design, each spike improves the signal estimate, and decreases the voltages. The spike rasters of both populations are shown in Figure 2E.

Since the signals are excitatory and the signal estimates are inhibitory, the tight tracking of the signal by the signal estimate corresponds to a tight tracking of excitation by inhibition. The network is therefore an example of a balanced network in which inhibition and excitation track each other spike by spike (Denève and Machens, 2016). Importantly, the excitatory-inhibitory (EI) balance is a direct correlate of the signal representation accuracy (see Materials and methods for mathematical details).

Now, when some portion of the excitatory neural population is eliminated (Figure 2B), the inhibitory neurons receive less excitation, and in turn convey less inhibition to the remaining excitatory neurons. Consequently, the excitatory neurons increase their firing rates, and the levels of inhibition (or excitation) readjust automatically such that EI balance is restored (see Figure 2G for an example neuron). The restoration of balance guarantees that the signals continue to be represented accurately (Figure 2D). This compensation happens because each excitatory neuron seeks to keep the signal representation error in check, with or without the help of the other excitatory neurons. When one neuron dies, the remaining excitatory neurons automatically and rapidly assume the full burden of signal representation.

Similar principles hold when part of the inhibitory population is knocked out (Figure 2C). The inhibitory population here is constructed to faithfully represent its ‘input signals’, which are given by the postsynaptically filtered spike trains of the excitatory neurons. Consequently, the inhibitory population acts similarly to the excitatory population, and its response to neuron loss is almost identical. When inhibitory neurons are knocked out, the remaining inhibitory neurons increase their firing rates in order to restore the EI balance. Again, the compensation happens because each inhibitory neuron seeks to keep its signal representation error in check, independent of the other inhibitory neurons.

#### The recovery boundary

Naturally, the recovery from neural loss is not unlimited. As we will show now, the resultant *recovery boundary* is marked by a breakdown in the balance of excitation and inhibition. In Figure 2, this recovery boundary coincides with a complete knock-out of either the excitatory or the inhibitory population. The nature of the recovery boundary becomes more complex, however, if a network tracks more than one signal.

In Figure 3 we show an example network that represents two sinusoidal signals, $\mathbf{x}\left(t\right)=\left({x}_{1}\left(t\right),{x}_{2}\left(t\right)\right)$. The spikes of each neuron contribute to two readouts, and the decoding weights $\mathbf{D}}_{i$ of the neurons correspond to two-dimensional vectors (Figure 3D). The two signals and the corresponding signal estimates are shown in Figure 3E, and the spike trains of the excitatory population are shown in Figure 3F. (Here and in the following, we will focus on neuron loss in the excitatory population only, and replace the inhibitory population by direct inhibitory connections between excitatory neurons, see Figure 3A–C. This simplification does not affect the compensatory properties of the network, see Materials and methods.)

When part of the excitatory population is lost (Figure 3B,D), the remaining neurons compensate by adjusting their firing rates, and the signal representation remains intact (Figure 3E,F, first k.o.). However, when too many cells are eliminated from the network (Figure 3C,D), some portions of the signal can no longer be represented, no matter how the remaining neurons change their firing rates (Figure 3E,F, second k.o.). In this example, the recovery boundary occurs when all the neurons with negative-valued readout weights along the ${x}_{1}$-axis have been lost, so that the network can no longer represent the negative component of the first signal (Figure 3E, second k.o.). The representation of signal ${x}_{2}\left(t\right)$ is largely unaffected, so the deficits are only partial.

The recovery boundary is characterized by the network’s inability to correct signal representation errors. Since the neurons’ voltages correspond to transformations of these signal representation errors, the recovery boundary occurs when some of these voltages run out of bounds (Figure 3G, arrows). Such aberrations of the voltages are caused by an excess of either positive or negative currents. Indeed, the ratio of positive to negative currents becomes disrupted after the second k.o., precisely when ${x}_{1}\left(t\right)$ becomes negative (Figure 3H). Whereas a neuron can always ‘eliminate’ a positive, depolarizing current by spiking, it cannot eliminate any negative, hyperpolarizing currents, and these latter currents therefore reflect signal representation errors that are not eliminated within the network.

Here, positive currents are generated by excitatory inputs, whereas negative currents are generated by inhibitory inputs as well as the self-reset currents after an action potential. The self-reset currents become negligible in networks in which the number of neurons, $N$, is much larger than the number of input signals, $M$ (see Figure 2; see also Materials and methods). Even in Figure 3 this is approximately the case, so that the ratio of positive to negative currents is similar to the ratio of excitatory to inhibitory synaptic inputs (Figure 3I). The recovery boundary is then signified by a dramatic breakdown in the balance of excitatory and inhibitory currents.

In this simple example, the recovery boundary emerged because neurons were knocked out in a systematic and ordered fashion (according to the features that they represented). Such a scenario may correspond to lesions or knock-outs in systems in which neural tuning is topologically organized. If the representational features of neurons are randomly interspersed, we find a different scenario. Indeed, when neurons are knocked out at random in our model, the network can withstand the loss of many more neurons. We find that signals are well represented until only a few neurons remain (Figure 3J).

At the recovery boundary, the few remaining neurons will shoulder the full representational load. This can easily become an unwieldy situation because the firing rates of the remaining neurons must become unrealistically large to compensate for all the lost neurons. In reality, neural activity will saturate at some point, in which case the recovery boundary occurs much earlier. For instance, we find that the example network tolerates up to $70$–$80\%$ of random neuron loss if the maximum firing rate of a neuron is unbounded (Figure 3J), whereas it will only tolerate $40$–$50\%$ of random neuron loss if the maximum firing rate is 80 Hz (Figure 3K).

#### The influence of idle neurons on tuning curve shapes

So far we have considered what happens when neurons are lost permanently in a neural circuit, either because of their death or because of experimental interventions. We have shown that networks can compensate for this loss up to a recovery boundary. Independent of whether compensation is successful or not, it is always marked by changes in firing rates. We will now show how to quantify and predict these changes.

To do so, we first notice that neurons can also become inactive temporarily in a quite natural manner. In sensory systems, for instance, certain stimuli may inhibit neurons and silence them. This natural, temporary inactivity of neurons leads to similar, ‘compensatory’ effects on the population level in our networks. These effects will provide important clues to understanding how neurons’ firing rates need to change whenever neurons become inactive, whether in a natural or non-natural context.

For simplicity, let us focus on constant input signals, $\mathbf{x}\left(t\right)=\mathbf{x}$, and study the *average* firing rates of the neurons, $\overline{\mathrm{r}}$. For constant inputs, the average firing rates must approach the minimum of the same loss function as in Equation 2 so that,$\overline{\mathbf{r}}(\mathbf{x})=\mathrm{arg}\underset{{\overline{r}}_{i}\ge 0,\phantom{\rule{thinmathspace}{0ex}}\mathrm{\forall}i}{min}\left[{\left(\mathbf{x}-\hat{\mathbf{x}}\right)}^{2}+\beta C(\overline{\mathbf{r}})\right]$(3)

where the signal estimate, $\hat{\mathbf{x}}={\sum}_{i}{\mathbf{D}}_{i}{\overline{\mathrm{r}}}_{i}$, is formed using the average firing rates. This minimization is performed under the constraint that firing rates must be positive, since the firing rates of spiking neurons are positive valued quantities, by definition. Mathematically, the corresponding optimization problem is known as ‘quadratic programming’ (see Materials and methods for more details). Traditionally, studies of population coding or efficient coding assume both positive and negative firing rates (Dayan and Abbott, 2001; Olshausen and Field, 1996; Simoncelli and Olshausen, 2001), and the restriction to positive firing rates is generally considered an implementational problem (Rozell et al., 2008). Surprisingly, however, we find that the constraint changes the nature of the solutions and provides fundamental insights into the shapes of tuning curves in our networks, as clarified below.

An example is shown in Figure 4 where we focus on networks of increasing complexity that represent two signals. For simplicity, we keep the second signal constant (‘fixed background’ signal, $x}_{2}={r}_{B$) and study the firing rates of all neurons as a function of the first signal, ${x}_{1}=x$. Solving Equation 3 for a range of values, we obtain $N$ neural tuning curves, $\overline{\mathbf{r}}\left({x}_{1}\right)=\left({r}_{1}\left({x}_{1}\right),{r}_{2}\left({x}_{1}\right),\dots \right)$ (Figure 4, second column). These tuning curves closely match those measured in simulations of the corresponding spiking networks (Figure 4, third and fourth column; see supplementary materials for mathematical details).

We observe that the positivity constraint produces non-linearities in neural tuning curves. We illustrate this using a two-neuron system with two opposite-valued readout weights for the first signal (Figure 4A, first column). At signal value $x=0$, both neurons fire at equal rates so that the readout $\hat{x}$, Equation 1, correctly becomes $\hat{x}\propto \left({\overline{r}}_{1}-{\overline{r}}_{2}\right)=0$. When we move to higher values of $x$, the firing rate of the first neuron, $\overline{r}}_{1$, increases linearly (Figure 4A, second column, orange line), and the firing rate of the second neuron, $\overline{r}}_{2$, decreases linearly (Figure 4A, second column, blue line), so that in each case, $\hat{x}\approx x$. Eventually, around $x=0.4$, the firing rate of neuron two hits zero (Figure 4A, black arrow). At this point, neuron two has effectively disappeared from the network. Since it cannot decrease below zero, and because the estimated value $\hat{x}$ must keep growing with $x$, the firing rate of neuron one must grow at a faster rate. This causes a kink in its tuning curve (Figure 4A, black arrow). This kink in the tuning curve slope is an indirect form of optimal compensation, where the network is compensating for the temporary silencing of a neuron.

More generally, the firing rates of neurons are piecewise linear functions of the input signals. In Figure 4B,C, every time one of the neurons hits the zero firing rate lower bound, the tuning curve slopes of all the other neurons change. We furthermore note that the tuning curves also depend on the form of the cost terms, $C\left(\overline{\mathbf{r}}\right)$. If there is no cost term, there are many equally favorable firing rate combinations that produce identical readouts (Figure 1A). The precise choice of a cost term determines which of these solutions is found by the network. We provide further geometric insights into the shape of the tuning curves obtained in our networks in Figure 4—figure supplements 1–3 and in the Materials and methods.

#### Tuning curves before and after neuronal silencing

We can now calculate how tuning curves change shape in our spiking network following neuronal loss. When a set of neurons are killed or silenced within a network, their firing rates are effectively set to zero. We can include this silencing of neurons in our loss function by simply clamping the respective neurons’ firing rates to zero:$\overline{\mathbf{r}}\left(\mathbf{x}\right)=\mathrm{a}\mathrm{r}\mathrm{g}\underset{\genfrac{}{}{0ex}{}{{\overline{r}}_{i}\phantom{\rule{thinmathspace}{0ex}}\ge \phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}if\phantom{\rule{thinmathspace}{0ex}}i\phantom{\rule{thinmathspace}{0ex}}\in \phantom{\rule{thinmathspace}{0ex}}X}{{\overline{r}}_{j}\phantom{\rule{thinmathspace}{0ex}}=\phantom{\rule{thinmathspace}{0ex}}0\phantom{\rule{thinmathspace}{0ex}}if\phantom{\rule{thinmathspace}{0ex}}j\phantom{\rule{thinmathspace}{0ex}}\in \phantom{\rule{thinmathspace}{0ex}}Y}}{min}\left[{\left(\mathbf{x}-\hat{\mathbf{x}}\right)}^{2}+\beta C\left(\overline{\mathbf{r}}\right)\right]$(4)

where $X$ denotes the set of healthy neurons and $Y$ the set of dead (or silenced) neurons. This additional clamping constraint is the mathematical equivalent of killing neurons. In turn, we can study the tuning curves of neurons in networks with knocked-out neurons without having to simulate the corresponding spiking networks every single time.

As a first example, we revisit the network shown in Figure 4C, which represents a signal ${x}_{1}=x$, and a constant ‘background’ signal $x}_{2}={r}_{B$. This network exhibits a complex mixture of tuning curves, i.e., firing rates as a function of the primary signal ${x}_{1}=x$, with positive and negative slopes, and diverse threshold crossings (Figure 5B). When we knock out a subset of neurons (Figure 5A, neurons labeled ‘C’), we find that neurons with similar tuning to the knocked out neurons increase their firing rates and dissimilarly-tuned neurons decrease their firing rates, especially for negative values of the signal $x$ (Figure 5C, left). In this way, signal representation is preserved as much as possible (Figure 5C, right, black line). In comparison, a network that does not change its tuning curves after the cell loss has drastically worse representation error (Figure 5C, right, dashed line). Once all the neurons with positive tuning curve slopes are killed (Figure 5A, neurons labeled ‘D’), we cross the recovery boundary, and even optimal compensation can no longer preserve the representation of negatively-valued signals (Figure 5D).

Next, we investigate these compensatory mechanisms in neural systems that have bell-shaped tuning curves (Ben-Yishai et al., 1995). Our framework captures these cases if we assume that a network represents circular signals embedded in a two-dimensional space, $\mathbf{x}=\left(\mathrm{cos}\left(\theta \right),\mathrm{sin}\left(\theta \right)\right)$, where $\theta $ is the direction of our signal. To do so, we construct a network in which the decoding weights of the neurons are spaced around a circle, with some irregularity added (Figure 5E). As a result, we obtain an irregular combination of bell-shaped tuning curves (Figure 5F), similar to those found in the primary visual or primary motor cortex, with each neuron having a different preferred direction and maximum firing rate as determined by the value of the decoding weights. We note that, apart from the smaller number of neurons and the greater irregularity of the decoding weights, this network is similar to the spiking network investigated in Figure 3.

If we now kill neurons within a specific range of preferred directions (Figure 5E), then the remaining neurons with the most similar directions increase their firing rates and shift their tuning curves towards the preferred directions of the missing neurons (Figure 5G, left). Furthermore, neurons further away from the knocked-out region will slightly skew their tuning curves by decreasing their firing rates for directions around zero (Figure 5F,G, arrows). Overall, the portion of space that is under-represented following cell loss becomes populated by neighboring neurons, which thereby counteract the loss. In turn, the signal representation performance of the population is dramatically improved following optimal compensation compared to a system without compensatory mechanisms (Figure 5G, right). Once all neurons with positive read-out weights along the first axis are killed, the network can no longer compensate, and signal representation fails, despite the strong changes in firing rates of the neurons (Figure 5H). This latter scenario is equivalent to the one we observed in Figure 3 after the second knock-out.

The ability of the system to compensate for the loss of neurons depends on the redundancy of its representation (although redundancy alone is not sufficient for compensation). If our network consists of only four neurons with equally spaced readout weights (Figure 5I), then it cannot compensate for the loss of neurons. Rather, we find that this model system crosses the recovery boundary following the loss of a single neuron, and so, optimal compensation does not produce any changes in the remaining neurons (Figure 5K, left). It occurs because the four neurons exactly partition the four quadrants of the signal space, and the remaining neurons cannot make any changes that would improve the signal representation.

#### Comparison to experimental data

Does optimal compensation happen in real neural systems? Our framework makes strong predictions for how firing rates should change following neuronal knock-outs. Unfortunately, there is currently little data on how neural firing rates change directly after the loss of neurons in a given system. However, we found data from three systems—the oculomotor integrator in the hindbrain, the cricket cercal system, and the primary visual cortex—which allows us to make a first comparison of our predictions to reality. We emphasize that a strict test of our theory is left for future work.

The first system is the horizontal velocity-to-position integrator of the vertebrate oculomotor system, which is responsible for horizontal eye fixations, and which is comparable to the network model in Figure 5A–D. This system is usually considered to represent the signals that need to be sent to the two muscles (lateral and medial rectus) that control horizontal eye movements. In this case, the signal ${x}_{1}$ corresponds to the eye position, controlled by the difference in muscular activity, with zero representing the central eye position, positive values representing right-side eye positions and negative values representing left-side eye positions. The fixed background signal corresponds to the net sum of the two muscle activities, which we assume to remain constant for simplicity.

We find that the tuning curves of neurons measured in the right half of the oculomotor system (Aksay et al., 2000) (Figure 6A) are similar to the positively-sloped tuning curves in our network model calculated using Equation 3 (Figure 6B). In both cases, neurons that encode right-side eye positions have positive slopes, and these follow a recruitment order, where neurons with shallow tuning curve slopes become active before neurons with steep tuning curve slopes (Fuchs et al., 1988; Aksay et al., 2000; Pastor and Gonzalez-Forero, 2003) (Figure 6A,B inset). Similar observations hold for the left half of the oculomotor system, which has negatively-sloped tuning curves (data not shown).

Now, when the left half of the oculomotor system is inactivated using lidocaine and muscimol injections (Aksay et al., 2007), the system can still represent right-side eye positions, but is unable to represent left-side eye positions (Figure 6C). In our network model, this inactivation corresponds to eliminating all negatively-sloped tuning curves (Figure 5D). The optimal compensation ensures that the right part of the signal is still correctly represented, but the left part can no longer be represented due to the lack of negatively tuned neurons (Figure 5D and Figure 6D). Hence, these measurements are consistent with our optimal compensation and recovery boundary predictions.

The second system is a small sensory system called the cricket (or cockroach) cercal system, which represents wind velocity (Theunissen and Miller, 1991), and which is comparable to the network model in Figure 5I–K. In the case of the cercal system, $\theta $ is the wind-direction. The tuning curves we obtain are similar to those measured, with each neuron having a different, preferred wind direction (Theunissen and Miller, 1991), see Figure 6E. The similarity between measured tuning curves (Figure 6E) and predicted tuning curves (Figure 5J) suggests that we can interpret cercal system neurons to be optimal for representing wind direction using equally spaced readout weights.

When one neuron in the cercal system is killed, the remaining neurons do not change their firing rates (Libersat and Mizrahi, 1996; Mizrahi and Libersat, 1997), see Figure 6F. This is analogous to our predictions (Figure 5K). Indeed, our model of the cercal system has no redundancy (see Materials and methods for more details). The cercal system in all its simplicity is therefore a nice example of a system that exists on the ‘edge’ of the recovery boundary.

#### A high-dimensional example: optimal compensation in V1

The relatively simple models we have described so far are useful for understanding the principle of optimal compensation, but they are unlikely to capture the complexity of large neural populations in the brain. Even though the network model in Figure 5E–G resembles common network models for the primary visual or primary motor cortex, it does not capture the high-dimensional nature of representations found in these areas. To investigate the compensatory mechanisms in more complex networks, we make a final generalization to systems that can represent high-dimensional signals, and for concreteness we focus on the (primary) visual cortex, which is our third test system.

The visual cortex is generally thought to construct representations of images consisting of many pixel values (Figure 7A). The tuning of V1 simple cells, for instance, can largely be accounted for by assuming that neural firing rates provide a sparse code of natural images (Olshausen and Field, 1996; Simoncelli and Olshausen, 2001). In accordance with this theory, we choose a model that represents image patches (size $12\times 12$, corresponding to $M=144$ dimensions), and we use decoding weights that are optimized for natural images (see Materials and methods). Neurons in this model are tuned to both the orientation and polarity of edge-like images (Figure 7D), where the polarity is either a bright edge with dark flanks, or the opposite polarity – a dark edge with bright flanks. Orientation tuning emerges because natural images typically contain edges at many different orientations, and a sparse code captures these natural statistics (Olshausen and Field, 1996; Simoncelli and Olshausen, 2001). Polarity tuning emerges as a natural consequence of the positivity constraint, because a neuron with a positive firing rate cannot represent edges at two opposing polarities. Similar polarity tuning has been obtained before, but with an additional constraint that decoding weights be strictly positive (Hoyer, 2003, 2004).

As before, we compute the firing rates of our neural population by solving Equation 3, which provides a convenient approximation of the underlying spiking network. We then silence all neurons with a vertical orientation preference (Figure 7E) and calculate the resulting changes in firing rates in the network. Without optimal compensation, (i.e., without any changes in firing rates), we find that the silencing of an orientation column damages image representation, especially for image components that contain edges parallel to the preferred orientations of the dead neurons (Figure 7B, orange arrow). When the population implements optimal compensation, the firing rates of many neurons change, and the image representation is recovered (Figure 7C).

To illustrate the nature of the network compensation, we study how the tuning of neurons changes when part of the population is lost. This focus will allow us to compare our predictions to experimental data. The tuning curves of all cells in response to the directions of various gratings are shown in Figure 8A. If we knock out a (small) subset of the cells, e.g., 50% of all neurons with preferred directions around $\theta ={0}^{\circ}$, then the firing rates of several of the remaining neurons increase at the preferred direction of the silenced cells (Figure 8B,D). As a consequence, the preferred directions of many cells shift toward the preferred direction of the silenced cells (Figure 8C). Unlike the simple example in Figure 5G, however, these shifts occur in neurons throughout the population, and not just in neurons with neighboring directions. For the affected stimuli, the shifts decrease the representation errors compared to a system without compensation (Figure 8E).

The more complicated pattern of firing rate changes occurs for two reasons. First, neurons represent more than the direction of a grating—they represent a 12 × 12 image patch, i.e., a vector in a high-dimensional space (Figure 8F). Such a vector will generally be represented by the firing of many different neurons, each of which contributes with its decoding vector, ${\mathbf{\mathbf{D}}}_{i}$, weighted by its firing rate, ${r}_{i}$. However, these neurons will rarely point exactly in the right (stimulus) direction, and only their combined effort allows them to represent the stimulus. When some of the neurons contributing to this stimulus representation are knocked out, the network will recruit from the pool of all neurons whose decoding vectors are non-orthogonal to the stimulus vector. In turn, each of these remaining neurons gets assigned a new firing rate such that the cost term in Equation 2 is minimized. Second, high-dimensional spaces literally provide a lot of space. Whereas stacking 20 neurons in a two-dimensional space (as in Figure 5E,F) immediately causes crowding, with neighboring neurons sharing a substantial overlap of information, stacking e.g. 1000 neurons in a 100-dimensional space causes no such thing. Rather, neurons can remain almost independent, while sharing only small amounts of information. As a consequence, not a single pair of the naturally trained decoding weights in our model will point in similar directions. In other words, when we knock out a set of neurons, there are no similarly tuned neurons that could change their firing rates in order to restore the representation. Rather, the compensation for a single lost neuron will require the concerted effort of several other neurons, and many of the required changes can be small and subtle. However, all of these changes will seek to restore the lost information, and cells will shift their tuning towards that of the lost neurons.

As a consequence of this additional complexity, we find that firing rates at the anti-preferred direction also increase (Figure 8B,D). A key reason for this behavior is that many neurons are strongly tuned to orientation, but only weakly tuned to direction, so that eliminating neurons with ${0}^{\circ}$ degree direction preference also eliminates some of the representational power at the anti-preferred direction.

Finally, although our model is simply a (positive) sparse coding model, we note that most of its predictions are consistent with experiments in the cat visual cortex, in which sites with specific orientation or direction tuning are silenced using GABA (Crook et al., 1996, 1997, 1998) (Figure 9 and Figure 9—figure supplement 1), while recording the firing rates of nearby neurons. Neurons that escape silencing in the cat visual cortex also shift their tuning curves towards the preferred directions of the silenced neurons (Figure 9A). More specifically, three types of changes were found: (1) Neurons whose preferred direction was different to that of the silenced neurons (PD$>{22.5}^{\circ}$) will increase their firing rates at the knocked-out direction and broaden their tuning (see Figure 9A, top row for an example). (2) Neurons whose preferred direction was opposite to the silenced neurons will increase their firing rates towards the knocked out direction (see Figure 9A, middle row for an example). (3) Neurons whose preferred direction was similar to the silenced neurons (PD$<{22.5}^{\circ}$) will either increase their firing to the knocked out direction (see Figure 9A, bottom row) or decrease their firing to the knocked out direction (data not shown).

In our sparse coding model, we find increases in firing rates to the knocked-out directions in all three types of neurons (with different direction preference, opposite direction preference, or similar direction preference), see Figure 9B. We do not observe a decrease in firing to the knocked-out directions. While this mismatch likely points towards the limits of our simplified model (which models a relatively small set of simple cells) in comparison with the data (which examines both simple and complex cells in both V1 and V2), it does not violate the optimal compensation principle. Due to the complexity of interactions in a high-dimensional space, a decrease in firing of a few neurons to the missing directions is not a priori ruled out in our model, as long as there is a net increase to the missing directions across the whole population. Indeed, when increasing the redundancy in our V1 model (e.g. by moving from $N=288$ neurons to $N=500$ neurons while keeping the dimensionality of the stimulus fixed), we find that several neurons start decreasing their firing rates in response to the knocked-out directions, even though the average firing rate changes in the population remain positive (data not shown).

We emphasize that there are only a few free parameters in our model. Since the decoding weights are learnt from natural images using the (positive) sparse coding model, we are left with the dimensionality of the input space, the number of neurons chosen, and the number of neurons knocked out. The qualitative results are largely independent of these parameters. For instance, the precise proportion of neurons knocked out in the experimental studies (Crook and Eysel, 1992) is not known. However, for a range of reasonable parameter choices (50–75% knock out of all neurons with a preferred direction), our predicted tuning curve changes are consistent with the recorded population response to neuron silencing (Crook and Eysel, 1992) (Figure 9—figure supplement 1B). Furthermore, if we increase the redundancy of the system by scaling up the number of neurons, but not the number of stimulus dimensions, we shift the recovery boundary to a larger fraction of knocked-out neurons, but we do not change the nature of the compensatory response (Figure 9—figure supplement 2A,B). At high degrees of redundancy (or so-called ‘over-completeness’), quantitive fluctuations in tuning curve responses are averaged out, indicating that optimal compensation becomes invariant to over-completeness (Figure 9—figure supplement 2C,D). Given the few assumptions that enter our model, we believe that the broad qualitative agreement between our theory and the data is quite compelling.

### Discussion

To our knowledge, optimal compensation for neuron loss has not been proposed before. Usually, cell loss or neuron silencing is assumed to be a wholly destructive action, and the immediate neural response is assumed to be pathological, rather than corrective. Synaptic plasticity is typically given credit for the recovery of neural function. For example, synaptic compensation has been proposed as a mechanism for memory recovery following synaptic deletion (Horn et al., 1996), and optimal adaptation following perturbations of sensory stimuli (Wainwright, 1999) and motor targets (Shadmehr et al., 2010; Braun et al., 2009; Kording et al., 2007) has also been observed, but on a slow time scale consistent with synaptic plasticity. In this work, we have explored the properties of an instantaneous, optimal compensation, that can be implemented without synaptic plasticity, on a much faster time scale, through balanced network dynamics.

Our work is built upon a connection between two separate theories: the theory of balanced networks, which is widely regarded to be the standard model of cortical dynamics (van Vreeswijk and Sompolinsky, 1996, 1998; Amit and Brunel, 1997; Renart et al., 2010; Denève and Machens, 2016) and the theory of efficient coding, which is arguably our most influential theory of neural computation (Barlow, 1961; Salinas, 2006; Olshausen and Field, 1996; Bell and Sejnowski, 1997; Greene et al., 2009). This connection relies on the following two derivations: (1) We derive a tightly balanced spiking network from a quadratic loss function, following the recent work of (Boerlin et al., 2013; Boerlin and Denève, 2011) by focusing on the part of their networks that generate a spike-based representation of the information. (2) We show that the firing rates in these spiking networks also obey a quadratic loss function, albeit with a positivity constraint on the firing rate (Barrett et al., 2013). This constrained minimization problem, called quadratic programming, provides a novel link between spiking networks and firing rate calculations. While the importance of positivity constraints has been noted in other contexts (Lee and Seung, 1999; Hoyer, 2003, 2004; Salinas, 2006), here we show its dramatic consequences in shaping tuning curves, which had not been appreciated previously. In turn, we obtain a single normative explanation for the polarity tuning of simple cells (Figure 7D), tuning curves in the oculomotor system (Figure 6A,B), and tuning curves in the cricket cercal system (Figure 6E), as well as the mechanisms underlying the generation of these tuning curves, and the response of these systems to neuron loss.

Several alternative network models have been proposed that minimize similar loss functions as in our work (Dayan and Abbott, 2001; Rozell et al., 2008; Hu et al., 2012; Charles et al., 2012; Druckmann and Chklovskii, 2012). However, in all these models, neurons produce positive and negative firing rates (Dayan and Abbott, 2001; Rozell et al., 2008; Charles et al., 2012; Druckmann and Chklovskii, 2012), or positive and negative valued spikes (Hu et al., 2012). The compensatory response of these systems will be radically different, because oppositely tuned neurons can compensate for each other by increasing their firing rates and changing sign, which is impossible in our networks. Similar reasoning holds for any efficient coding theories that assume positive and negative firing rates.

Whether general spiking networks support some type of compensation will depend on the specifics of their connectivity (see Materials and methods). For instance, spiking networks that learn to efficiently represent natural images could potentially compensate for neuron loss (Zylberberg et al., 2011; King et al., 2013). Furthermore, networks designed to generate EI balance through strong recurrent dynamics will automatically restore balance when neurons are lost (van Vreeswijk and Sompolinsky, 1996, 1998; Amit and Brunel, 1997; Renart et al., 2010). In turn, networks whose representations and computations are based on such randomly balanced networks may be able to compensate for neuron loss as well (Vogels et al., 2011; Hennequin et al., 2014), as has been shown for neural integrators (Lim and Goldman, 2013). However, the optimality or the speed of the compensation is likely to increase with the tightness of the balancing. In turn, the tightness of EI balance is essentially determined by synaptic connectivity, whether through experience-dependent learning of the synaptic connections (Haas et al., 2006; Vogels et al., 2011; Bourdoukan et al., 2012; Luz and Shamir, 2012), through scaling laws for random connectivities (van Vreeswijk and Sompolinsky, 1996, 1998; Amit and Brunel, 1997; Renart et al., 2010), or by design, via the decoding weights of linear readouts, as in our case (Boerlin et al., 2013; Barrett et al., 2013). Future work may resolve the commonalities and differences of these approaches.

The principle of optimal compensation is obviously an idealization, and any putative compensatory mechanism of an actual neural system may be more limited. That said, we have catalogued a series of experiments in which pharmacologically induced changes in neural activities and in behavior can be explained in terms of optimal compensation. These experiments were not originally conducted to test any specific compensatory mechanisms, and so, the results of each individual experiment were explained by separate, alternative mechanisms (Aksay et al., 2007; Gonçalves et al., 2014; Libersat and Mizrahi, 1996; Mizrahi and Libersat, 1997; Crook et al., 1996, 1997, 1998). The advantage of our optimal compensation theory is that it provides a simple, unifying explanation for all these experiments. Whether it is the correct explanation can only be answered through further experimental research.

To guide such research, we can use our theory to make a number of important predictions about the impact of neural damage on neural circuits. First, we can predict how tuning curves change shape to compensate for neuron loss. Specifically, neurons throughout the network will, on average, increase their firing rates to the signals that specifically activated the dead neurons. This happens because the remaining neurons automatically seek to carry the informational load of the knocked out neurons (which is equivalent to maintaining a balance of excitation and inhibition). This is a strong prediction of our theory, and as such, an observation inconsistent with this prediction would invalidate our theory. There have been very few experiments that measure neural tuning before and after neuron silencing, but in the visual cortex, where this has been done, our predictions are consistent with experimental observations (Figure 9).

Our second prediction is that optimal compensation is extremely fast—faster than the time scale of neural spiking. This speed is possible because optimal compensation is supported by the tight balance of excitation and inhibition, which responds rapidly to neuron loss—just as balanced networks can respond rapidly to changes in inputs (Renart et al., 2010; van Vreeswijk and Sompolinsky, 1998, 1996). We are not aware of any experiments that have explicitly tested the speed of firing rate changes following neuron silencing. In the pharmacological silencing of direction-selective cells in the visual cortex, the time scales of the reagents are too slow to out-rule the possibility that there is some synaptic plasticity (Crook et al., 1996, 1997, 1998). Nonetheless, these experiments are consistent with our prediction, because the changes observed in tuning curve shape are at least as fast, if not faster than the speed of pharmacological silencing. Ideally, these predictions could be tested using neuronal ablations or optogenetic silencing.

Finally, we predict that all neural systems have a cell loss recovery boundary, beyond which neural function disintegrates. Existing measurements from the oculomotor system (Aksay et al., 2007) seem to be consistent with this prediction (Figure 6). We predict that this recovery boundary coincides with a disruption in the balance of excitation and inhibition. This has not been explored experimentally, although the disruption of balance has recently been implicated in a range of neural disorders such as epilepsy (Bromfield, 2006) and schizophrenia (Yizhar et al., 2011). Anecdotally, there have been many unreported experiments where neural ablation has failed to cause a measurable behavioral effect. Our theory suggests that such ‘failed’ lesion experiments may be far more interesting than previously thought, and that the boundary between a measurable and unnoticeable behavioral effect deserves specific attention. Indeed, the properties of a recovery boundary may also shed light on the progression of neurodegenerative diseases—especially those that are characterized by a period of asymptomatic cell loss, followed by a transition to a disabled symptomatic state, as in Alzheimer’s disease and stroke (Leary and Saver, 2003). We speculate that these transitions occur at the recovery boundary of the diseased system. If this were the case, then an accumulation of asymptomatic damage, through aging for example, or through acute conditions such as silent stroke, will increase the susceptibility of the brain to symptomatic damage, by moving it closer to the recovery boundary.

These predictions, and more broadly, the principle of optimal compensation that we have developed here, promise to be useful across a number of areas. First, as a neuroscience tool, our work provides a framework for the interpretation of experimental manipulations such as pharmacological silencing (Aksay et al., 2007), lesion studies (Keck et al., 2008) and optogenetic perturbations (Fenno et al., 2011; Li et al., 2016). Second, in the study of neural computation, optimal compensation may be a useful guiding principle, because plausible models of neural computation should be designed specifically to withstand the type of damage that the brain can withstand. Finally, our work may provide new perspectives on how neurodegenerative diseases impact behavior through neural networks, by generalizing the theory of efficient coding from the intact brain state to the damaged brain state (Morrison and Hof, 1997; Bredesen et al., 2006).

### Materials and methods

We have described the properties of optimal compensation, and given a variety of examples of optimal compensation across a range of systems. Here, we present further technical details. First, we describe how we tune spiking networks to represent signals optimally, including both networks that obey Dale’s law and those that do not. Next, we explain quadratic programming with an analytically tractable example. We then specify our choice of parameters for each figure. For the high-dimensional sparse coding example, we describe how we calculate sparse coding receptive fields and direction tuning curves. Finally, we provide additional details on the knock-out calculations, and we prove that our spiking model is tightly balanced and performs optimal compensation. The Matlab code that we used to generate all the figures in this paper is published online (http://github.com/machenslab/spikes).

#### Derivation of network model using spikes

In this section, we derive the connectivity and dynamics of a network that can optimally compensate for neuron loss. For simplicity, we will for now ignore the constraint that neurons need to be either excitatory or inhibitory, which will be revisited in the next section. We consider a network of $N$ leaky integrate-and-fire neurons receiving time-varying inputs $\mathbf{\mathbf{x}}\left(t\right)=({x}_{1}\left(t\right),\mathrm{\dots},{x}_{j}\left(t\right),\mathrm{\dots},{x}_{M}\left(t\right))$, where $M$ is the dimension of the input and ${x}_{j}\left(t\right)$ is the ${j}^{th}$ input signal. In response to this input, the network produces spike trains $\mathbf{\mathbf{s}}\left(t\right)=({s}_{1}\left(t\right),\mathrm{\dots},{s}_{i}\left(t\right),\mathrm{\dots},{s}_{N}\left(t\right))$, where ${s}_{i}\left(t\right)={\sum}_{l}\delta \left(t-{t}_{l}^{i}\right)$ is the spike train of the ${i}^{th}$ neuron and $\left\{{t}_{l}^{i}\right\}$ are the spike times of that neuron. Here, we describe the general formulation of our framework for arbitrary networks with $N\ge M$.

A neuron fires a spike whenever its membrane potential exceeds a spiking threshold. We can write this as ${V}_{i}>{T}_{i}$, where ${V}_{i}$ is the membrane potential of neuron $i$ and ${T}_{i}$ is the spiking threshold. The dynamics of the membrane potentials are given by:$\frac{d{V}_{i}}{dt}=-\lambda {V}_{i}+\sum _{j=1}^{M}{F}_{ij}g\left({x}_{j}\right)+\sum _{k=1}^{N}{\mathrm{\Omega}}_{ik}{s}_{k}+{\sigma}_{V}{\eta}_{i},$(5)

where ${\mathrm{\Omega}}_{ik}$ is the connection strength from neuron $k$ to neuron $i$, ${F}_{ij}$ is the connection strength from input $j$ to neuron $i$, $g\left({x}_{j}\right)$ is a function (or functional) applied to the input ${x}_{j}$, $\lambda $ is the neuron leak and ${\sigma}_{V}$ is the standard deviation of intrinsic neural noise, represented by a white noise term ${\eta}_{i}$ (Knight, 1972; Dayan and Abbott, 2001). For brevity, we will not explicitly indicate the time-dependency of variables. That said, we note that the input signals, ${x}_{j}\left(t\right)$, the voltages, ${V}_{i}\left(t\right)$, and the spike trains, ${s}_{k}\left(t\right)$, are all time-dependent quantities, whereas the thresholds, ${T}_{i}$, the leak, $\lambda $, and the connection strengths, ${\mathrm{\Omega}}_{ik}$ and ${F}_{ij}$, are all constants. The feedforward term, $g\left({x}_{j}\right)$, is a placeholder for any feedforward input into the networks, but also for slower recurrent synaptic inputs that are not explicitly modeled here, such as the slow synapses in Boerlin et al. (2013). (Indeed, including slower recurrent dynamics does not affect the fast compensatory response of the network.) When a neuron spikes, its membrane potential is reset to the value $V={R}_{i}$. For ease of presentation, we include this reset in the self-connections ${\mathrm{\Omega}}_{ii}$, i.e., we assume that the self-connections are negative, ${\mathrm{\Omega}}_{ii}<0$, so that the reset is given by ${R}_{i}={T}_{i}+{\mathrm{\Omega}}_{ii}$.

We assume that this network provides a representation of the input signal $\mathbf{\mathbf{x}}$ using a simple linear decoder, rewritten from Equation 1,$\hat{\mathbf{x}}=\sum _{k=1}^{N}{\mathbf{D}}_{k}{r}_{k},$(6)

where ${\mathbf{\mathbf{D}}}_{k}$ is the fixed contribution of neuron $k$ to the signal, and ${r}_{k}$ is the instantaneous firing rate of neuron $k$. We generally refer to ${\mathbf{\mathbf{D}}}_{k}$ as the vector of readout or decoding weights. The instantaneous firing rate, ${r}_{k}$, is a time-dependent quantity that we obtain by filtering the spike train with an exponential filter:${r}_{k}\left(t\right)={\int}_{0}^{\mathrm{\infty}}{e}^{-{t}^{\mathrm{\prime}}/\tau}{s}_{k}\left(t-{t}^{\mathrm{\prime}}\right)d{t}^{\mathrm{\prime}},$(7)

where $\tau $ is the time-scale of the filtering. This firing rate definition is particularly informative because it has the form of a simple model of a postsynaptic potential, which is a biologically important quantity. Note that the units of this firing rate are given by $\tau $, so we must multiply by ${\tau}^{-1}$Hz to obtain units of Hz.

Our goal is to tune all the parameters of this network so that it produces appropriate spike trains at appropriate times to provide an accurate representation of the input $\mathbf{\mathbf{x}}$, both before and after cell loss (Boerlin et al., 2013; Bourdoukan et al., 2012; Barrett et al., 2013). We measure the accuracy of the representation with the loss function, rewritten from Equation 2,$E={\left(\mathbf{x}-\hat{\mathbf{x}}\right)}^{2}+\beta C\left(\mathbf{r}\right),$(8)

with ${(\cdot )}^{2}$ denoting an inner product, and $C\left(\mathbf{\mathbf{r}}\right)={\sum}_{k}{r}_{k}^{2}$. We then require our network to obey the following rule: at a given time point $t$, each neuron only fires a spike whenever a spike reduces the loss function,$E(\text{no}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\text{spike})\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}>\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thickmathspace}{0ex}}E(\text{with}\phantom{\rule{thickmathspace}{0ex}}\phantom{\rule{thinmathspace}{0ex}}\text{spike})$(9)

Then, since a spike changes the rate by ${r}_{i}\to {r}_{i}+1$, and hence the signal estimate by $\hat{\mathrm{x}}\to \hat{\mathrm{x}}+\mathbf{D}}_{i$, we can restate this spiking rule for the $i$-th neuron as:${\left(\mathbf{x}-\hat{\mathbf{x}}\right)}^{2}+\beta \sum _{k=1}^{N}{r}_{k}^{2}\phantom{\rule{1em}{0ex}}>\phantom{\rule{1em}{0ex}}{\left(\mathbf{x}-\left(\stackrel{}{\hat{\mathbf{x}}}+{\mathbf{D}}_{i}\right)\right)}^{2}+\beta \sum _{k=1}^{N}{\left({r}_{k}+{\delta}_{ik}\right)}^{2},$(10)

where ${\delta}_{ik}$ is the Kronecker delta. By expanding the right-hand-side, and canceling equal terms, this spiking rule can be rewritten as${\mathbf{D}}_{i}^{\mathrm{\top}}\left(\mathbf{x}-\hat{\mathbf{x}}\right)-\beta {r}_{i}\text{}\text{}\frac{{\mathbf{D}}_{i}^{\mathrm{\top}}{\mathbf{D}}_{i}+\beta}{2}.$(11)

This equation describes a rule under which neurons fire to produce spikes that reduce the loss function. Since a neuron $i$ spikes whenever its voltage ${V}_{i}$ exceeds its threshold ${T}_{i}$, we can interpret the left-hand-side of this spiking condition (Equation 11) as the membrane potential of the ${i}^{th}$ neuron:${V}_{i}\equiv {\mathbf{D}}_{i}^{\mathrm{\top}}\left(\mathbf{x}-\hat{\mathbf{x}}\right)-\beta {r}_{i},$(12)

and the right-hand-side as the spiking threshold for that neuron:${T}_{i}\equiv \frac{{\mathbf{\mathbf{D}}}_{i}^{\top}{\mathbf{\mathbf{D}}}_{i}+\beta}{2}.$(13)

We can identify the connectivity and the parameters that produce optimal coding spike trains by calculating the derivative of the membrane potential (as interpreted in Equation 12) and matching the result to the dynamical equations of our integrate-and-fire network (Equation 5):$\frac{d{V}_{i}}{dt}={\mathbf{D}}_{i}^{\mathrm{\top}}\left(\frac{d\mathbf{x}}{dt}-\frac{d\hat{\mathbf{x}}}{dt}\right)-\beta \frac{d{r}_{i}}{dt}.$(14)

From Equation 6, we obtain $d\hat{\mathbf{x}}/dt=\sum _{k}{\mathbf{D}}_{k}d{r}_{k}/dt$ and from Equation 7 we obtain $d{r}_{k}/dt=-{r}_{k}/\tau +{s}_{k}$, which yields a simple differential equation for the readout:$\frac{d\hat{\mathbf{x}}}{dt}=-\frac{1}{\tau}\hat{\mathbf{x}}+\sum _{k=1}^{N}{\mathbf{D}}_{k}{s}_{k}.$(15)

By inserting these expressions into Equation 14 we obtain:$\frac{d{V}_{i}}{dt}={\mathbf{D}}_{i}^{\mathrm{\top}}\frac{d\mathbf{x}}{dt}+\frac{1}{\tau}{\mathbf{D}}_{i}^{\mathrm{\top}}\hat{\mathbf{x}}-\sum _{k=1}^{N}\left({\mathbf{D}}_{i}^{\mathrm{\top}}{\mathbf{D}}_{k}{s}_{k}\right)+\frac{1}{\tau}\beta {r}_{i}-\beta {s}_{i}$(16)

Finally, using the voltage definition from Equation 12 to write $\mathbf{D}}_{i}^{\mathrm{\top}}\hat{\mathbf{x}}=-{V}_{i}+{\mathbf{D}}_{i}^{\mathrm{\top}}\mathbf{x}-\beta {r}_{i$ we can replace the second term on the right hand side and obtain:$\frac{d{V}_{i}}{dt}=-\frac{1}{\tau}{V}_{i}+{\mathbf{D}}_{i}^{\mathrm{\top}}\left(\frac{d\mathbf{x}}{dt}+\frac{\mathbf{x}}{\tau}\right)-\sum _{k=1}^{N}\left({\mathbf{D}}_{i}^{\mathrm{\top}}{\mathbf{D}}_{k}+\beta {\delta}_{ik}\right){s}_{k}\phantom{\rule{thinmathspace}{0ex}}.$(17)

This equation describes the voltage dynamics of a neuron that produces spikes to represent signal $\mathbf{\mathbf{x}}$. If we now compare this equation with our original integrate-and-fire network, Equation 5, we see that these are equivalent if$\begin{array}{}\text{(18)}& {\displaystyle {\mathrm{\Omega}}_{ik}}& {\displaystyle \equiv -\left({\mathbf{D}}_{i}^{\mathrm{\top}}{\mathbf{D}}_{k}+\beta {\delta}_{ik}\right)\phantom{\rule{thinmathspace}{0ex}},}& \text{(19)}& {\displaystyle {F}_{ij}}& {\displaystyle \equiv [{\mathbf{D}}_{i}{]}_{j}\phantom{\rule{thinmathspace}{0ex}},}& \text{(20)}& {\displaystyle g({x}_{j})}& {\displaystyle \equiv \frac{d{x}_{j}}{dt}+\frac{{x}_{j}}{\tau}\phantom{\rule{thinmathspace}{0ex}},}& \text{(21)}& {\displaystyle \lambda}& {\displaystyle \equiv \frac{1}{\tau}\phantom{\rule{thinmathspace}{0ex}},}& \text{(22)}& {\displaystyle {T}_{i}}& {\displaystyle \equiv \frac{{\mathbf{D}}_{i}^{\mathrm{\top}}{\mathbf{D}}_{i}+\beta}{2}\phantom{\rule{thinmathspace}{0ex}}.}& \end{array}$

Here, the notation ${\left[\mathbf{\mathbf{z}}\right]}_{j}$ refers to the $j$-th element of the vector $\mathbf{\mathbf{z}}$. A network of integrate-and-fire neurons with these parameters and connection strengths can produce spike trains that represent the signal $\mathbf{\mathbf{x}}$ to a high degree of accuracy. Elements of this calculation have been produced before (Boerlin et al., 2013), but are reproduced here for the sake of completeness. Also, it has been shown that this connectivity can be learned using simple spike timing-dependent plasticity rules (W. Brendel, R. Bourdoukan, P. Vertechi, et al, unpublished observations; Bourdoukan et al. (2012)), so extensive fine-tuning is not required to obtain these spiking networks. We note that the input into the network consists of a combination of the original signal, ${x}_{j}$, and its derivative, $d{x}_{j}/dt$. This can be interpreted in several ways. First, we can think of the original signal, ${x}_{j}\left(t\right)$, as a filtered version of the actual input, so that $d{x}_{j}/dt=-\lambda {x}_{j}+{c}_{j}\left(t\right)$, where ${c}_{j}\left(t\right)$ is the actual input (and note that ${c}_{j}=\lambda {x}_{j}+d{x}_{j}/dt$). Second, we can think of this as a biophysical computation, in the sense that the neuron receives two types of input, the original signal input, ${x}_{j}\left(t\right)$, and its derivative, $d{x}_{j}/dt$. The latter could be computed through a simple circuit that combines direct excitatory signal inputs with delayed inhibitory signal inputs (e.g. through feedforward inhibition).

Our derivation of network dynamics directly from our loss function allows us to interpret the properties of this network in terms of neural function. Each spike can be interpreted as a greedy error reduction mechanism—it moves $\hat{\mathbf{x}}$ closer to the signal $\mathbf{x}$. This error reduction is communicated back to the network through recurrent connectivity, thereby reducing the membrane potential of the other neurons. The membrane potential, in turn, can be interpreted as a representation error—it is determined by a linear projection of $\mathbf{x}-\hat{\mathbf{x}}$ onto the neuron’s decoding weights, ${\mathbf{\mathbf{D}}}_{i}$ (Equation 12). Following this interpretation, whenever the error becomes too big, the voltage becomes too big and it reaches threshold. This produces a spike, which reduces the error, and so on.

We can also understand these network dynamics in terms of attractor dynamics. This network implements a point attractor—firing rates evolve towards a stable fixed point in N-dimensional firing rate space. The location of this point attractor depends on neural input and network connectivity. When a neuron dies, the point attractor is projected into a subspace given by ${r}_{k}=0$, where neuron $k$ is the neuron that has died.

Note that in this derivation, we used a quadratic cost. This cost increases the value of the spiking threshold (Equation 13) and the spiking reset (Equation 18). We can also derive network parameters for alternative cost term choices. For example, if we use a linear cost, we simply need to drop the second term ($\beta {\delta}_{ik}$) in Equation 18, while keeping all other parameters the same. In other words, we can implement a quadratic cost by increasing the spiking threshold and the spiking reset, and we can implement a linear cost by increasing the spiking threshold without increasing the spiking reset. In this way, the spiking threshold, and the reset determine the cost function. It is conceivable that these variables may be learned, just as network connectivity may be learned. Alternatively, these values may be predetermined for various brain areas, depending on the computational target of each brain area.

#### Derivation of network model with separate excitatory and inhibitory populations

When these ideas are applied to large and heterogeneous networks, we run into one problem concerning their biological interpretation. Individual neurons in these networks can sometimes target postsynaptic neurons with both excitatory and inhibitory synapses, so that they violate Dale’s law. To avoid this problem, we consider a network that consists of separate pools of excitatory and inhibitory neurons. The external signals feed only into the excitatory population, and the inhibitory population remains purely local. This scenario is captured by the equations,$\begin{array}{}\text{(23)}& {\displaystyle \frac{d{V}_{i}^{E}}{dt}}& {\displaystyle =-\lambda {V}_{i}^{E}+\sum _{j=1}^{M}{F}_{ij}g({x}_{j})+\sum _{k=1}^{{N}^{E}}{\mathrm{\Omega}}_{ik}^{EE}{s}_{k}^{E}-\sum _{k=1}^{{N}^{I}}{\mathrm{\Omega}}_{ik}^{EI}{s}_{k}^{I}-{R}_{i}^{E}{s}_{i}^{E}+{\sigma}_{V}{\eta}_{i}\phantom{\rule{thinmathspace}{0ex}},}& \text{(24)}& {\displaystyle \frac{d{V}_{i}^{I}}{dt}}& {\displaystyle =-\lambda {V}_{i}^{I}+\sum _{k=1}^{{N}^{E}}{\mathrm{\Omega}}_{ik}^{IE}{s}_{k}^{E}-\sum _{k=1}^{{N}^{I}}{\mathrm{\Omega}}_{ik}^{II}{s}_{k}^{I}-{R}_{i}^{I}{s}_{i}^{I}+{\sigma}_{V}{\eta}_{i}\phantom{\rule{thinmathspace}{0ex}}.}& \end{array}$

The four connectivity matrices, ${\mathrm{\Omega}}_{ik}^{EE}$, ${\mathrm{\Omega}}_{ik}^{EI}$, ${\mathrm{\Omega}}_{ik}^{IE}$, ${\mathrm{\Omega}}_{ik}^{II}$ are all assumed to have only positive entries, which enforces Dale’s law in this network. The neuron’s self-resets are now explicitly captured in the terms ${R}_{i}^{E}{s}_{i}^{E}$ and ${R}_{i}^{I}{s}_{i}^{I}$, and are no longer part of the connectivity matrices themselves. We will now show how to design the four connectivity matrices such that the resulting network becomes functionally equivalent to the networks in the previous section. To do so, we apply the basic ideas outlined in Boerlin et al. (2013), and simplified due to W. Brendel, R. Bourdoukan, P. Vertechi, and the authors (personal communication).

We start with the network in the previous section, as described in Equation 5, and split its (non-Dalian) recurrent connectivity matrix, ${\mathrm{\Omega}}_{ik}$, into self-resets, excitatory connections, and inhibitory connections. The self-resets are simply the diagonal entries of $-{\mathrm{\Omega}}_{ik}$, see Equation 18,${R}_{i}={\mathbf{\mathbf{D}}}_{i}^{\top}{\mathbf{\mathbf{D}}}_{i}+\beta ,$(25)

where the choice of signs is dictated by the sign conventions in Equation 23 and Equation 24. The excitatory connections are the positive entries of ${\mathrm{\Omega}}_{ik}$, for which we define${\mathrm{\Omega}}_{ik}^{EE}={\left[-{\mathbf{\mathbf{D}}}_{i}^{\top}{\mathbf{\mathbf{D}}}_{k}-\beta {\delta}_{ik}\right]}_{+},$(26)

where the notation ${\left[x\right]}_{+}$ denotes a threshold-linear function, so that ${\left[x\right]}_{+}=x$ if $x>0$ and ${\left[x\right]}_{+}=0$ otherwise. The inhibitory connections are the negative, off-diagonal entries of ${\mathrm{\Omega}}_{ik}$, for which we write${W}_{ik}={\left[{\mathbf{\mathbf{D}}}_{i}^{\top}{\mathbf{\mathbf{D}}}_{k}+\beta {\delta}_{ik}\right]}_{+}-{\delta}_{ik}{R}_{k}.$(27)

With these three definitions, we can re-express the recurrent connectivity as ${\mathrm{\Omega}}_{ik}={\mathrm{\Omega}}_{ik}^{EE}-{W}_{ik}-{\delta}_{ik}{R}_{k}$.

Using this split of the non-Dalian connectivity matrix, we can trivially rewrite Equation 5 as an equation for an excitatory population,$\frac{d{V}_{i}^{E}}{dt}=-\lambda {V}_{i}^{E}+\sum _{j=1}^{M}{F}_{ij}g({x}_{j})+\sum _{k=1}^{{N}^{E}}{\mathrm{\Omega}}_{ik}^{EE}{s}_{k}^{E}-\sum _{k=1}^{{N}^{E}}{W}_{ik}{s}_{k}^{E}-{R}_{i}^{E}{s}_{i}^{E}+{\sigma}_{V}{\eta}_{i}\phantom{\rule{thinmathspace}{0ex}},$(28)

which is identical to Equation 23 above, except for the term with the inhibitory synapses, ${W}_{ik}$, on the right-hand-side. Here, the inhibitory synapses are multiplied with ${s}_{k}^{E}$ rather than ${s}_{k}^{I}$, and their number is identical to the number of excitatory synapses. Accordingly, this term violates Dale’s law, and we need to replace it with a genuine input from the inhibitory population. Since this non-Dalian term consists of a series of delta-functions, we can only replace it approximately, which will suffice for our purposes. We simply require that the genuine inhibitory input approximates the non-Dalian term on the level of the postsynaptic potentials, i.e., the level of filtered spike trains,$\sum _{k=1}^{{N}^{E}}{W}_{ik}{r}_{k}^{E}\approx \sum _{k=1}^{{N}^{I}}{\mathrm{\Omega}}_{ik}^{EI}{r}_{k}^{I}.$(29)

Here the left-hand-side is the given non-Dalian input, and the right-hand-side is the genuine inhibitory input, which we are free to manipulate and design.

The solution to the remaining design question is simple. In the last section, we showed how to design arbitrary spiking networks such that they track a given set of signals. Following these recipes, we assume that the inhibitory population output optimally tracks the (filtered) spike trains of the excitatory population. More specifically, we assume that we can reconstruct the instantaneous firing rates of the excitatory neurons, ${r}_{i}^{E}$, by applying a linear readout to the instantaneous firing rates of the inhibitory neurons, ${r}_{j}^{I}$, so that${\hat{\mathbf{r}}}^{E}=\sum _{j=1}^{{N}^{I}}{\mathbf{D}}_{j}^{I}{r}_{j}^{I}\phantom{\rule{thinmathspace}{0ex}}.$(30)

Note that all entries of ${\mathbf{\mathbf{D}}}_{j}^{I}$, the inhibitory decoding weights, must be positive. If the estimate of the excitatory firing rates closely tracks the real firing rates, then we can fulfil Equation 29.

To obtain this relation, we will assume that the inhibitory population minimizes its own loss function,${E}_{I}=({\mathbf{r}}^{E}-{\hat{\mathbf{r}}}^{E}{)}^{2}+{\beta}^{I}C({\mathbf{r}}^{I})\phantom{\rule{thinmathspace}{0ex}}.$(31)

where the first term is the error incurred in reconstructing the excitatory firing rates, and the second term is a cost associated with the firing of the inhibitory neurons. Apart from the replacement of the old input signal, $\mathbf{\mathbf{x}}$, by the new ‘input signal’, ${\mathbf{\mathbf{r}}}^{E}$, this is exactly the same loss function as in Equation 8. An important difference is that here the number of input signals, which corresponds to the number of excitatory neurons, may be larger than the number of output spike trains, which corresponds to the number of inhibitory neurons. Our spiking networks will in this case still minimize the mean-square-error above, though the representation could be lossy. Since the effective dimensionality explored by the instantaneous firing rates of the excitatory population is limited by the dimensionality of the input signal space, however, these losses are generally small for our purposes.

Given the above loss function, we can redo all derivations from the previous section, to obtain the connections, thresholds, and resets for the inhibitory population. Defining the short-hand${H}_{ik}^{I}={\left({\mathbf{\mathbf{D}}}_{i}^{I}\right)}^{\top}{\mathbf{\mathbf{D}}}_{k}^{I}+{\beta}^{I}{\delta}_{ik},$(32)

we find that$\begin{array}{}\text{(33)}& {\displaystyle {\mathrm{\Omega}}_{ik}^{II}}& {\displaystyle \equiv {H}_{ik}^{I}-{R}_{k}^{I}{\delta}_{ik}\phantom{\rule{thinmathspace}{0ex}},}& \text{(34)}& {\displaystyle {\mathrm{\Omega}}_{ik}^{IE}}& {\displaystyle \equiv [{\mathbf{D}}_{i}^{I}{]}_{k}\phantom{\rule{thinmathspace}{0ex}},}& \text{(35)}& {\displaystyle {T}_{k}}& {\displaystyle \equiv \frac{{H}_{kk}^{I}}{2}\phantom{\rule{thinmathspace}{0ex}},}& \text{(36)}& {\displaystyle {R}_{k}^{I}}& {\displaystyle \equiv {H}_{kk}^{I}\phantom{\rule{thinmathspace}{0ex}}.}& \end{array}$

Here the first equation is equivalent to Equation 18, except for the different sign convention (${\mathrm{\Omega}}_{ik}^{II}$ enters negatively in Equation 24), and for the subtraction of the self-reset term, which is not part of the recurrent connectivity. The second equation is equivalent to Equation 19. The third equation is equivalent to Equation 22, and the fourth equation is the self-reset term, containing the diagonal elements of Equation 18. Note that, due to the positivity of the inhibitory decoding weights, the two connectivity matrices have only positive entries, as presupposed above in Equation 24. Consequently, the inhibitory subnetwork obeys Dale’s law.

In a last step, we replace the excitatory spike trains in Equation 28 with their inhibitory approximation. Specifically, we rewrite Equation 29 as$\begin{array}{}\text{(37)}& \sum _{k=1}^{{N}^{E}}{W}_{ik}{r}_{k}^{E}& \approx \sum _{k=1}^{{N}^{E}}{W}_{ik}{\hat{r}}_{k}^{E}& \text{(38)}& & =\sum _{j=1}^{{N}^{E}}\sum _{k=1}^{{N}^{I}}{W}_{ij}{D}_{jk}^{I}{r}_{k}^{I}& \text{(39)}& & \equiv \sum _{k=1}^{{N}^{I}}{\mathrm{\Omega}}_{ik}^{EI}{r}_{k}^{I}\phantom{\rule{thinmathspace}{0ex}},& \end{array}$

where we used Equation 30 in the approximation step. This approximation generally works very well for larger networks; in smaller networks, finite size effects come into play.

Finally, when we take into account the results from the previous section, write ${\mathbf{\mathbf{D}}}_{i}^{E}$ for the excitatory decoder weight of the $i$-th neuron, and introduce the short-hand${H}_{ik}^{E}={\left({\mathbf{\mathbf{D}}}_{i}^{E}\right)}^{\top}{\mathbf{\mathbf{D}}}_{k}^{E}+{\beta}^{E}{\delta}_{ik},$(40)

then we find that the excitatory subnetwork has the connections, thresholds, and resets (compare Equation 25–Equation 27),$\begin{array}{}\text{(41)}& {\displaystyle {F}_{ij}}& {\displaystyle =[{\mathbf{D}}_{i}^{E}{]}_{j}\phantom{\rule{thinmathspace}{0ex}},}& \text{(42)}& {\displaystyle {\mathbf{\Omega}}_{ik}^{EE}}& {\displaystyle ={\left[-{H}_{ik}^{E}\right]}_{+}\phantom{\rule{thinmathspace}{0ex}},}& \text{(43)}& {\displaystyle {\mathbf{\Omega}}_{ik}^{EI}}& {\displaystyle =\sum _{j=1}^{{N}_{E}}({\left[{H}_{ij}^{E}\right]}_{+}-{\delta}_{ij}{R}_{j}^{E}){\mathbf{D}}_{jk}^{I}\phantom{\rule{thinmathspace}{0ex}},}& \text{(44)}& {\displaystyle {T}_{k}^{E}}& {\displaystyle \equiv \frac{{H}_{kk}^{E}}{2}\phantom{\rule{thinmathspace}{0ex}},}& \text{(45)}& {\displaystyle {R}_{k}^{E}}& {\displaystyle ={H}_{kk}^{E}\phantom{\rule{thinmathspace}{0ex}}.}& \end{array}$

Importantly, the EI network therefore consists of two populations, one excitatory, one inhibitory, both of which minimize a loss function. As a consequence, the excitatory subnetwork will compensate optimally as long as the inhibitory subnetwork remains fully functional. In this limit, the excitatory subnetwork is essentially identical to the networks discussed in the previous section. The inhibitory subnetwork will also compensate optimally until its recovery boundary is reached. In the following, we will focus on one subnetwork (and one loss function again). For notational simplicity, we will leave out the superscript references ‘$E$’.

#### Tuning curves and quadratic programming

For constant input signals, the instantaneous firing rates of the neurons will fluctuate around some mean value. Our goal is to determine this mean value. We will start with the most general case, and rewrite the instantaneous firing rates as ${r}_{i}(t)={\overline{r}}_{i}+{\eta}_{i}(t,{\overline{r}}_{i})$, were ${\eta}_{i}(t,{\overline{r}}_{i})$ is a zero-mean ’noise’ term that captures the fluctuations of the instantaneous firing rates around its mean value. Note that these fluctuations may depend on the neuron’s mean rate. In turn, neglecting the costs for a moment, we can average the objective function, Equation 8, over time to obtain$\begin{array}{}\text{(46)}& {\displaystyle \u27e8E{\u27e9}_{t}}& {\displaystyle =\u27e8(\mathbf{x}-\sum _{i}{\mathbf{D}}_{i}{r}_{i}(t){)}^{2}{\u27e9}_{t}}& \text{(47)}& & {\displaystyle =(\mathbf{x}-\sum _{i}{\mathbf{D}}_{i}{\overline{r}}_{i}{)}^{2}+\sum _{ij}{\mathbf{D}}_{i}^{\mathrm{\top}}{\mathbf{D}}_{j}\u27e8{\eta}_{i}({\overline{r}}_{i}){\eta}_{j}({\overline{r}}_{j}){\u27e9}_{t}.}& \end{array}$

For larger networks, we can assume that the spike trains of neurons are only weakly correlated, so that $\u27e8{\eta}_{i}({\overline{r}}_{i}){\eta}_{j}({\overline{r}}_{j}){\u27e9}_{t}=\text{Var}({\eta}_{i}({\overline{r}}_{i})){\delta}_{ij}$. Here, $\text{Var}({\eta}_{i}({\overline{r}}_{i}))$ is the variance of the noise term. We obtain$\u27e8E{\u27e9}_{t}=(\mathbf{x}-\sum _{i}{\mathbf{D}}_{i}{\overline{r}}_{i}{)}^{2}+\sum _{i}\Vert {\mathbf{D}}_{i}{\Vert}^{2}\text{Var}({\eta}_{i}({\overline{r}}_{i})).$(48)

We furthermore notice that the spike train statistics are often Poisson (Boerlin et al., 2013), in which case we can make the replacement $\text{Var}({\eta}_{i}({\overline{r}}_{i}))={\overline{r}}_{i}$. The loss function then becomes a quadratic function of the mean firing rate, which needs to be minimized under a positivity constraint on the mean firing rate. This type of problem is known as ‘quadratic programming’ in the literature. In this study, we focused on networks for which the contributions of the firing rate fluctuations can be neglected, which is generally the case for sufficiently small readout weights and membrane voltage noise (see Figure 4—figure supplement 3). In this case, we obtain the loss function used in Equation 3,$\u27e8E{\u27e9}_{t}=(\mathbf{x}-\sum _{i}{\mathbf{D}}_{i}{\overline{r}}_{i}{)}^{2}+\beta C(\overline{\mathbf{r}})$(49)

with the cost term included again.

In general, quadratic programming is mathematically intractable, so the objective function must be minimized numerically. However, in networks with a small number of neurons, we can solve the problem analytically and gain some insight into the nature of quadratic programming. Here, we do this for the two neuron example (Figure 4A).

In this example, we assumed that the first neuron contributes with ${\mathbf{\mathbf{D}}}_{1}=(w,c)$ to the readout, and the second neuron with ${\mathbf{\mathbf{D}}}_{2}=(-w,c)$. Accordingly, the firing rates $\overline{r}}_{1$ and $\overline{r}}_{2$ are given by the solution to the following equation:$\left({\overline{r}}_{1},{\overline{r}}_{2}\right)=\mathrm{arg}\underset{{\overline{r}}_{1}\ge 0,\phantom{\rule{thinmathspace}{0ex}}{\overline{r}}_{2}\ge 0}{min}\left[{\left(x-(w{\overline{r}}_{1}-w{\overline{r}}_{2})\right)}^{2}+({r}_{B}-(c{\overline{r}}_{1}+c{\overline{r}}_{2}){)}^{2}+\beta ({\overline{r}}_{1}^{2}+{\overline{r}}_{2}^{2})\right]\phantom{\rule{thinmathspace}{0ex}},$(50)

where $x$ is the variable signal, and ${r}_{B}$ is the fixed background signal. The positivity constraint partitions the solution of this equation into three regions, determined by the value of $x$: region ${\mathcal{R}}_{1}$ where ${\overline{r}}_{1}=0$ and ${\overline{r}}_{2}\ge 0$, region ${\mathcal{R}}_{2}$ where ${\overline{r}}_{1}\ge 0$ and ${\overline{r}}_{2}\ge 0$, and region ${\mathcal{R}}_{3}$ where ${\overline{r}}_{2}=0$ and ${\overline{r}}_{1}\ge 0$ (Figure 4—figure supplement 1A). In region 2, we can then easily solve Equation 50 by setting the derivative of our loss function to zero. This gives $\overline{\mathbf{r}}=({\mathbf{w}}^{\mathrm{\top}}\mathbf{w}+{\mathbf{c}}^{\mathrm{\top}}\mathbf{c}+\beta \mathbf{1}{)}^{-1}(\mathbf{w}x+\mathbf{c}{r}_{B})$ where $\mathbf{\mathbf{w}}=(w,-w)$, $\mathbf{\mathbf{c}}=(c,c)$, and $\mathbf{1}$ is the identity matrix. Looking at this equation, we see that ${\overline{r}}_{1}=0$ when $x=-c{r}_{B}/w$ and ${\overline{r}}_{2}=0$ when $x=c{r}_{B}/w$. Therefore, the firing rate solution for region two is valid when $-c{r}_{B}/w\le x\le c{r}_{B}/w$. For $x\ge c{r}_{B}/w$, we have ${\overline{r}}_{2}=0$ because of the positivity constraint in Equation 50. This is region 3. We can calculate $\overline{r}}_{1$ by setting ${\overline{r}}_{2}=0$ and then minimizing the loss function. This gives ${\overline{r}}_{1}=({w}^{2}+{c}^{2}+\beta {)}^{-1}(wx+c{r}_{B})$. Similarly, for $x\le -c{r}_{b}/w$ we have ${\overline{r}}_{1}=0$ because of the positivity constraint. This is region one and we obtain ${\overline{r}}_{2}=({w}^{2}+{c}^{2}+\beta {)}^{-1}(-wx+c{r}_{B})$. The firing rates within each region are given by a simple linear projection of ${\mathbf{\mathbf{x}}}^{\prime}=(x,{r}_{B})$, although the size and direction of this projection is different in each region. As such, the solution to this quadratic programming problem is a piece-wise linear function of $x$.

In networks with larger numbers of neurons, the solution will still be a piece-wise linear function of $x$, although there will be more regions and the firing rate solutions are more complicated because more neurons are simultaneously active. In contrast, the transformation from firing rates to ${\mathbf{\mathbf{x}}}^{\prime}$ is very simple (Equation 6). It is given by a simple linear transformation, and is region independent (Figure 4—figure supplement 1B).

#### Readout weights and cost terms: 1-d and 2-d example

There are a number of free parameters in our model, such as the cost terms and the readout weights. The choice of these values determine the precise shape of tuning curves. In general, however, the precise values of these terms have little influence on the coding capability of our system, once certain properties have been satisfied, which we will outline here.

The cost term that we used for the examples in Figures 1–6 is a quadratic cost, $C\left(\mathbf{\mathbf{r}}\right)={\sum}_{k}{r}_{k}^{2}$. This cost term encourages the system to find a solution in which all neurons share in the signal representation. (Here, and in the following, we will no longer distinguish ${r}_{i}$, the instantaneous firing rate used in the spiking network, and $\overline{r}}_{i$, the mean firing rates used in quadratic programming. The appropriate notation will be determined by the context.) The cost term for Figures 7–9, the V1 model, is a linear cost, $C\left(\mathbf{\mathbf{r}}\right)={\sum}_{k}{r}_{k}$. This cost term limits the overall number of spikes, and generally encourages sparse solutions, in which most neurons contribute very little to the signal representation. Regardless of this choice, our general predictions about optimal compensation are qualitatively similar, as long as the cost term does not dominate the loss function.

The other important parameters that we must choose are the decoding or readout weights. In Figure 2, the decoding weights of the excitatory population were drawn from a Gaussian distribution with mean ${\mu}_{E}=2$, and standard deviation ${\sigma}_{E}=0.2$. For the inhibitory population, we used a mean ${\mu}_{I}=0.3$ and a standard deviation ${\sigma}_{I}=0.03$. For the networks in Figures 3–6, we have used regularly spaced decoding weights, with the addition of some random noise. The parameter values that we used are plotted in Figure 3D, Figure 4A–C, left column, and Figure 5A,E,I.

All of our quadratic programming and optimal compensation predictions for tuning curve shapes still hold for alternative choices of readout weights, once the following properties are satisfied. First, it is important that the readout vectors span the space of the signal that we want our network to represent. Otherwise, the system will not be able to represent signals along certain directions, or compensate for neuron loss. Second, it is important to set the scale of the readout, so that the cost does not dominate. There is a natural scaling that we can use to avoid such problems. We require the size of firing rates and the readout to be independent of network size. From ${\hat{x}}_{j}=\sum _{k}^{N}{D}_{jk}{r}_{k}\sim \mathcal{\mathcal{O}}(1)$, it follows that ${D}_{jk}\sim \mathcal{O}\left(1/N\right)$. As a consequence, the off-diagonal elements of the recurrent connectivity are small ${\mathrm{\Omega}}_{ik}\sim \mathcal{O}\left(1/{N}^{2}\right)$, for $i\ne k$ (Equation 18), and if we assume that the diagonal elements are on the same order of magnitude, then $\beta \sim \mathcal{O}\left(1/{N}^{2}\right)$. This scaling provides a principled basis for parameter choices in our model.

We may also want to scale our decoder weights without changing the shape of our tuning curve prediction (Figure 4—figure supplement 3B). To do this, the cost parameter $\beta $ and the membrane potential leak $\lambda $ must also be scaled together. Specifically, if the readout weights are given by $\left\{\alpha \times {D}_{jk}\right\}$, where $\alpha $ is a scaling parameter that characterizes the size of the decoder weights and $\left\{{D}_{jk}\right\}$ are fixed decoder weights, then the spiking cost parameter must be set to ${\alpha}^{2}\times \beta $ and the membrane potential leak must be set to $\alpha \times \lambda $. We can see that this preserves the shape of tuning curves by looking at the resulting structure of our loss function (Equation 2):$\begin{array}{ll}E(\mathbf{r})& =(\mathbf{x}-(\alpha \mathbf{D})\cdot \mathbf{r}{)}^{2}+{\alpha}^{2}\beta \sum _{i=1}^{N}{{r}_{i}}^{2}\phantom{\rule{thinmathspace}{0ex}}.\end{array}$(51)

As before, the minimum of this loss function gives firing rates in units of the inverse membrane potential leak $\left(\alpha \lambda \right)$. Therefore, we must divide $\mathbf{\mathbf{r}}$ by $\left(\alpha \lambda \right)$ to obtain firing rates in units of Hz. Our loss function then becomes:$\begin{array}{}\text{(52)}& {\displaystyle E(\mathbf{r}/(\alpha \lambda ))}& {\displaystyle =(\mathbf{x}-(\alpha \mathbf{D})\cdot \mathbf{r}/(\alpha \lambda ){)}^{2}+{\alpha}^{2}\beta \sum _{i}{\left({r}_{i}/(\alpha \lambda )\right)}^{2}}& \text{(53)}& & {\displaystyle =(\mathbf{x}-\mathbf{D}\cdot \mathbf{r}/\lambda {)}^{2}+\beta \sum _{i}{\left({r}_{i}/\lambda \right)}^{2}\phantom{\rule{thinmathspace}{0ex}}.}& \end{array}$

This loss function is independent of $\alpha $, and so, using this scaling the optimal tuning curves will have the same shape for all values of $\alpha $.

#### Readout weights and cost terms: V1 example

There are many possible choices of decoder weights $\left\{{\mathbf{\mathbf{D}}}_{i}\right\}$ that provide a faithful representation of a signal. In positive sparse coding, we choose the decoder weights that provide the most efficient signal representation, for a sparse cost term ($C\left(\mathbf{\mathbf{r}}\right)={\sum}_{k}{r}_{k}$), under the constraint that firing rates must be positive. Here, we describe how we calculate these positive sparse coding weights, which we will use in several of our figures (Figures 7–9 and Figure 9—figure supplement 1–2).

We use the signal vector $\mathbf{\mathbf{x}}=({x}_{1},\mathrm{\dots},{x}_{j},\mathrm{\dots},{x}_{M})$ to denote an image patch, where each element ${x}_{j}$ represents a pixel from the image patch (Olshausen and Field, 1996). We quantify the efficiency of a sparse representation using the following loss function:$E=\u27e8(\mathbf{x}-\hat{\mathbf{x}}{)}^{2}\u27e9+\beta \u27e8\sum _{i}{r}_{i}\u27e9,$(54)

where $\u27e8...\u27e9$ denotes an average across image patches. This is simply Equation 2 with a sparse cost term, averaged over image patches. The first term in this loss function is the image representation error. The second term quantifies the sparsity of the representation. The decoding filters that minimize this loss function will be the positive sparse coding filters for natural images.

We assume that the decoding filters are optimized to represent natural images, such as forest scenes, flowers, sky, water and other images from nature. Natural images are chosen because they are representative of the images that surround animals throughout evolution. We randomly select 2000 image patches of size $12\times 12$ from eight natural images taken from Van Hateren’s Natural Image Database (van Hateren and van der Schaaf, 1998). These images are preprocessed by removing low-order statistics, so that our sparse coding algorithm can more easily learn the higher-order statistical structure of natural images. First of all, images are centered so as to remove first order statistics:$\mathbf{x}\to \mathbf{x}-\u27e8\mathbf{x}\u27e9.$(55)

Next, images are whitened, so as to remove second-order statistics:$\mathbf{\mathbf{x}}\to {\mathbf{\mathbf{M}}}^{-1/2}\mathbf{\mathbf{x}},$(56)

where $\mathbf{M}=\u27e8\mathbf{x}{\mathbf{x}}^{T}\u27e9$ is a decorrelating matrix.

We calculate sparse coding filters by minimizing the loss function (Equation 54) using a two step procedure. First, for each image patch in a subset of 50 image patches, we calculate the firing rates that minimize the loss function under the constraint that firing rates must be positive:$\mathbf{r}=\mathrm{arg}\underset{{r}_{i}\ge 0\phantom{\rule{thickmathspace}{0ex}}\mathrm{\forall}i}{min}E\phantom{\rule{thinmathspace}{0ex}}.$(57)

The positivity constraint reduces the representational power of the neural population, so, to counteract this, we use a large population containing twice as many neurons as signal dimensions, $N=2M$ and we initialize our decoder matrix to ${D}_{jk}={\delta}_{j,k}-{\delta}_{j,k+M}$ to ensure that our neural population can easily span the signal space throughout the sparse coding calculation.

Next, we update our decoding vectors by stochastic gradient descent on the loss function, using the optimal firing rates calculated in the previous step:${\mathbf{D}}_{i}\to {\mathbf{D}}_{i}-\u03f5\frac{dE}{d{\mathbf{D}}_{i}}={\mathbf{D}}_{i}+\u03f52\u27e8(\mathbf{x}-\hat{\mathbf{x}}){r}_{i}\u27e9\phantom{\rule{thinmathspace}{0ex}}.$(58)

Decoding weights are normalized for each neuron, so that they do not become arbitrarily large. This calculation is similar to sparse coding calculations performed before (Olshausen and Field, 1996), except that we enforce the additional requirement that firing rates must be positive. This constraint was used before for sparse non-negative matrix factorization (Hoyer, 2003, 2004). However, this method requires an additional constraint that decoding weights be strictly positive, so that the non-negative matrix factorization method can be applied. This additional constraint will reduce the efficiency of the representation, because it precludes solutions that may be more efficient, but violate the additional constraint.

To compare the properties of optimal compensation in this positive sparse coding model to experiment, we calculate the direction tuning of neurons in our model using a protocol similar to that used in the experimental work (Crook and Eysel, 1992). Specifically, we drive our model using a directed, edge-like stimulus. These stimuli are Gabor filters, with a profile similar to the sparse coding filters. We calculate the firing rate response of each neuron using Equation 57 for 16 different stimulus directions. For each direction, the Gabor filter is positioned at regularly spaced locations along a line perpendicular to the Gabor filter orientation and the maximum firing rate of each neuron along this line is taken as the firing rate response of that neuron at that direction. These firing rate responses form the direction tuning curves that we can compare to experimental recordings.

#### The impact of neuron loss

We investigate the impact of neuron loss in our spiking model by simulating the network both before and after neuron loss (Figure 2 and Figure 3). Specifically, we use the Euler method to iterate the dynamics described by Equation 17 or Equation 23, Equation 24, and we measure the readout $\hat{\mathbf{x}}$, the spike trains $\mathbf{\mathbf{s}}$, and the membrane potentials $\mathbf{\mathbf{V}}$. We simulate the loss of a neuron by setting all the connections to and from that neuron to zero. This is equivalent to silencing a neuron. We continue to measure the network activity and the readout after neuron loss.

We also calculate the impact of neuron loss on tuning curves with and without optimal compensation (Figure 5). To calculate tuning curves without compensation we solve Equation 3 in the intact state. We then constrain the firing rates of neurons selected for loss to zero and calculate the impact of this on our signal representation. To calculate the impact of neuron loss with optimal compensation, we solve Equation 4.

The integrate-and-fire model that we have described can produce spikes at arbitrarily large firing rates. This can be a problem, especially when neurons die and the remaining neurons compensate with extremely high, unrealistic firing rates. To avoid this, we can include a form of adaptation in our spiking model. Specifically, we can extend our spiking rule so that a neuron $i$ will only spike if its firing rate ${f}_{i}$ is lower than a maximum ${f}_{max}$. Here, the firing rate $\mathbf{\mathbf{f}}$ is given by the following differential equation:$\frac{d\mathbf{\mathbf{f}}}{dt}=-\mathbf{\mathbf{f}}/{\tau}_{A}+\mathbf{\mathbf{s}},$(59)

and ${\tau}_{A}$ is the time scale of adaptation. This time scale is much slower than the time scale of the spiking dynamics $\tau $. We use this extended neural model to calculate the recovery boundary in Figure 3J,K, where we kill neurons in random order.

In turn, we can compute the firing rates and recovery boundary of this extended integrate-and-fire model using quadratic programming (data not shown). Specifically, the firing rates of this network are given by the solution of the following optimization problem:$\mathbf{\mathbf{r}}\left(\mathbf{\mathbf{x}}\right)=\mathrm{arg}\underset{R}{\mathrm{min}}E\left(\mathbf{\mathbf{x}}\right),$(60)

where $R$ denotes the set of constraints that the firing rates must satisfy:$R=\{{r}_{max}\ge {r}_{i}\ge 0\forall i;{r}_{j}=0\forall j\in X\}$(61)

and where $X$ is the set of all dead (or silenced) neurons.

To study the effects of redundancy (or overcompleteness) in the V1 model (Figure 9—figure supplement 2A), we define an over-completeness factor, $K$, given by $K=N/M$, where $N$ is the number of neurons in the representation and $M$ is the signal dimension. In the sparse coding calculation above, we had $K=2$. To increase the redundancy or overcompleteness of the representation, we generated a new set of readout vectors by two methods. In method one, we randomly drew $N=KM$ decoding vectors ${\mathbf{\mathbf{D}}}_{i}^{\left(K\right)}$ from the V1 model and then added multiplicative noise, in order to avoid parallel readout weights. Specifically, the new readout vectors ${\mathbf{\mathbf{D}}}_{i}^{\left(K\right)}$ were obtained from the old readout vectors, ${\mathbf{\mathbf{D}}}_{i}^{\left(2\right)}$, using ${\mathbf{\mathbf{D}}}_{i+a}^{\left(K\right)}={\mathbf{\mathbf{D}}}_{i}^{\left(2\right)}\cdot \left(1+{\mathit{\bm{\xi}}}_{i+a}\right)/\left(K/2\right)$, where $i\in [1,\mathrm{\dots},2M]$, ${\xi}_{ij}\sim \mathcal{N}(0,0.1)$, $a=2Mm$ and $m\in [0,\mathrm{\dots},K/2-1]$. In method two, we also randomly drew decoding vectors from the V1 model, but then shifted the corresponding $12\times 12$ image patch randomly in space. Both methods yielded similar results. We used these methods to calculate readout weights for over-complete representations because the sparse coding calculation becomes computationally prohibitive for large values of $M$.

#### Tight balance of excitation and inhibition

In simulations of our spiking model, we see that optimal coding generally coincides with a balance of excitatory and inhibitory currents into each neuron (Figure 2G and Figure 3H,I). To characterize this balance, we define three currents for each excitatory neuron. First, we define the excitation ${E}_{i}$ as the total input current received through excitatory synapses. Second, we define the inhibition ${I}_{i}$ as the total input current received through inhibitory synapses. Third, we define the reset current $\overline{R}}_{i$ as the total current caused by a neuron’s self-reset after an action potential. Since our idealized neuron model uses delta-functions to capture the synaptic currents evoked by single spikes, we smooth the delta-functions in order to obtain more realistic synaptic currents. For simplicity, we use the same filter as in the computation of the instantaneous firing rates, although other filters could be used as well. In the EI network, Equation 23 and Equation 24, the three currents into the $i$-th excitatory neuron are then given by$\begin{array}{}\text{(62)}& {\displaystyle {E}_{i}}& {\displaystyle =\sum _{k=1}^{N}{\mathrm{\Omega}}_{ik}^{EE}{r}_{k}^{E}+\sum _{j=1}^{M}{F}_{ij}^{E}{x}_{j}\phantom{\rule{thinmathspace}{0ex}},}& \text{(63)}& {\displaystyle {I}_{i}}& {\displaystyle =\sum _{k=1}^{N}{\mathrm{\Omega}}_{ik}^{EI}{r}_{k}^{I}+\sum _{j=1}^{M}{F}_{ij}^{I}{x}_{j}\phantom{\rule{thinmathspace}{0ex}},}& \text{(64)}& {\displaystyle {\overline{R}}_{i}}& {\displaystyle ={R}_{i}^{E}{r}_{i}^{E}\phantom{\rule{thinmathspace}{0ex}},}& \end{array}$

where ${F}_{ij}^{E}\equiv {F}_{ij}$ if ${F}_{ij}{x}_{j}>0$ (and ${F}_{ij}^{E}\equiv 0$ otherwise), and ${F}_{ij}^{I}\equiv {F}_{ij}$ if ${F}_{ij}{x}_{j}<0$ (and ${F}_{ij}^{I}\equiv 0$ otherwise). When the inhibitory subpopulation is not explicitly simulated, we simply set ${\mathrm{\Omega}}_{ik}^{EE}{r}_{k}^{E}\equiv {\mathrm{\Omega}}_{ik}{r}_{k}$ unless ${\mathrm{\Omega}}_{ik}<0$ and ${\mathrm{\Omega}}_{ik}^{EI}{r}_{k}^{I}\equiv {\mathrm{\Omega}}_{ik}{r}_{k}$ unless ${\mathrm{\Omega}}_{ik}>0$.

Using the above definitions of currents, we can integrate Equation 23 to re-express the voltage of an excitatory neuron as $V}_{i}^{E}={E}_{i}-{I}_{i}-{\overline{R}}_{i$. This voltage is trivially bounded from above by a neuron’s threshold, ${V}_{i}^{E}<{T}_{i}$. Due to the design properties of the network, the voltage is also bounded from below: since the voltage reflects a representation error, neurons with decoding vectors opposite to those of neuron $i$ will keep negative errors in check. If we neglect cost terms, we obtain ${V}_{i}^{E}>{R}_{i}^{E}$. (Otherwise the bound is lower). Re-expressed for the EI ratio, we obtain the following bounds$1+\frac{{\overline{R}}_{i}-{R}_{i}^{E}}{{I}_{i}}\text{}\text{}\frac{{E}_{i}}{{I}_{i}}\text{}\text{}1+\frac{{\overline{R}}_{i}+{T}_{i}}{{I}_{i}}\phantom{\rule{thinmathspace}{0ex}}.$(65)

The EI ratio will therefore approach one (perfect balance) if the inhibitory synaptic currents are much larger than the self-reset currents and voltage thresholds, or, if excitatory currents are cancelled by inhibitory currents rather than by a neuron’s spikes and subsequent self-resets.

The network does not reach this balanced regime under two conditions. First, if a network has a small redundancy, i.e., the number of neurons, $N$, is not much larger than the number of input signals, $M$, then each neuron carries a large load of the representation, and the reset currents become similar in magnitude to the synaptic currents. This is the case for the cricket cercal system, for instance, for which $N=4$ and $M=2$. Second, if a system approaches the recovery boundary, the representational burden of some neurons also increases a lot, leading to a similar effect. This is the case for the green neuron in Figure 3I, even after the first k.o, and leads to a more regular firing pattern. In other words, neurons that compensate for neuron loss can sometimes experience excessive excitatory currents, even if the compensation is successful.

To capture the mechanistic difference between compensation and recovery boundary more clearly, we therefore also defined the ratio of positive to negative currents, ${E}_{i}/({I}_{i}+{\overline{R}}_{i})$ (Figure 3H). This ratio is bounded by$1-\frac{{R}_{i}^{E}}{{I}_{i}+{\overline{R}}_{i}}\phantom{\rule{thinmathspace}{0ex}}<\phantom{\rule{thinmathspace}{0ex}}\frac{{E}_{i}}{{I}_{i}+{\overline{R}}_{i}}\phantom{\rule{thinmathspace}{0ex}}<\phantom{\rule{thinmathspace}{0ex}}1+\frac{{T}_{i}}{{I}_{i}+{\overline{R}}_{i}}\phantom{\rule{thinmathspace}{0ex}}.$(66)

and remains close to one as long as there are sufficiently large negative currents, or, if excitatory currents are cancelled by either recurrent inhibition or by spiking. As long as the network is fully functional, the latter will always be the case. Once the recovery boundary is hit, however, the lower bound on the voltages of some neurons disappears, and so will the lower bound on the above ratio of positive to negative currents (as well as the EI ratio), which is the effect we observe in Figure 3HI.

Traditionally, balance is often understood in the limit of very large networks (van Vreeswijk and Sompolinsky, 1996, 1998), and these classical scaling arguments can also be applied to our networks (Boerlin et al., 2013). Neglecting Dale’s law once more for simplicity, we can increase the network size, while keeping the relative size and number of the input signals and the neuron’s firing rates constant. Since ${\hat{x}}_{j}=\sum _{k}{D}_{jk}{r}_{k}\sim \mathcal{\mathcal{O}}(1)$ and ${r}_{k}\sim \mathcal{O}\left(1\right)$, we find that ${D}_{jk}\sim \mathcal{O}\left(1/N\right)$ and $\beta \sim \mathcal{O}\left(1/{N}^{2}\right)$. As a consequence of Equation 18 and Equation 13, we observe that both the recurrent connection strengths and the spiking thresholds are small, so that ${\mathrm{\Omega}}_{ik}\sim \mathcal{O}\left(1/{N}^{2}\right)$ and ${T}_{i}\sim \mathcal{O}\left(1/{N}^{2}\right)$. Now, because recurrent input is summed across the entire population of $N$ neurons, we find that ${V}_{i}\sim \mathcal{O}\left(N/{N}^{2}\right)\sim \mathcal{O}\left(1/N\right)$, and similar observations hold for ${I}_{i}$, ${E}_{i}$, and $\overline{R}}_{i$. Hence, these currents are $N$ times larger than the spiking threshold, ${T}_{i}$, or the reset, ${R}_{i}$. The EI ratio in Equation 65 then approaches one, and excitatory and inhibitory currents cancel precisely. We note that this balance of excitation and inhibition is much tighter than in balanced networks with random connectivity (van Vreeswijk and Sompolinsky, 1996, 1998). In randomly connected networks the balance between excitation and inhibition is of order $\mathcal{O}\left(1/\sqrt{N}\right)$ (van Vreeswijk and Sompolinsky, 1998), whereas here these fluctuations are $\mathcal{O}\left(1/N\right)$. We note that these observations only hold when the number of input signals, $M$, is kept constant; if that number grows with $N$, then the scaling is different. The conceptual differences between loose and tight balance were recently highlighted in a review (Denève and Machens, 2016).

#### Optimal compensation proof

The spiking network that we use is capable of rapidly implementing optimal compensation, without requiring any synaptic plasticity mechanisms. We can prove this by showing that an optimal network of $N-1$ neurons is equivalent to an optimal network of $N$ neurons after the loss of one neuron. For the EI network, the argument applies separately to the excitatory and inhibitory subnetworks.

For the sake of argument, we suppose that the $N}^{th$ neuron dies. At a mechanistic level, the loss of this neuron is equivalent to cutting all the connections to and from the dead neuron and from the dead neuron to the readout $\hat{\mathbf{x}}$. Therefore, a network where the ${N}^{th}$ neuron has died is equivalent to a network with $N-1$ neurons with readout matrix ${D}_{ij}^{*}={D}_{ij}\forall j\in [1,N-1]$ and $\forall i\in [1,M]$, with feedforward connectivity ${F}_{ij}^{*}={F}_{ij}\forall i\in [1,N-1]$ and $\forall j\in [1,M]$, with recurrent connectivity ${\mathrm{\Omega}}_{ik}^{*}={\mathrm{\Omega}}_{ik}\forall i,k\in [1,N-1]$ and with spiking thresholds ${T}_{i}^{*}=-{\mathrm{\Omega}}_{ii}/2\forall i\in [1,N-1]$.

Now, we compare this damaged network to an optimal network consisting of $N-1$ neurons. To make a fair comparison, we assume that this network has the same readout matrix ${\mathbf{\mathbf{D}}}^{\prime}\equiv {\mathbf{\mathbf{D}}}^{*}$ as the reduced damaged network. Then, the recurrent connectivity for this network is given by ${\mathbf{\mathbf{\Omega}}}^{\prime}={\mathbf{\mathbf{\Omega}}}^{*}$, the feedforward connectivity is given by ${\mathbf{\mathbf{F}}}^{\prime}\equiv {\mathbf{\mathbf{F}}}^{*}$ and the spiking thresholds are given by ${\mathbf{\mathbf{T}}}^{\prime}\equiv {\mathbf{\mathbf{T}}}^{*}$. This configuration is equivalent to the reduced damaged network. Therefore, a spiking neural network whose neurons are individually tuned to represent a signal optimally before cell loss will perform optimal compensation and provide an optimal signal representation after cell loss.

## References

- Sensory Communication217-234, 1961
Possible principles underlying the transformation of sensory messages - Firing rate predictions in optimal balanced networks1538-1546, 2013
*Advances in Neural Information Processing Systems 26* - Learning optimal spike-based representations2294-2302, 2012
*Advances in Neural Information Processing 25* - An introduction to epilepsyAmerican Epilepsy Society, 2006
- Theoretical Neuroscience: Computational and Mathematical Modeling of Neural SystemsMIT Press, Cambridge, Massachusetts, 2001
- The development and application of optogeneticsAnnual Review of Neuroscience, 34, 389-412, 2011
- Sparse coding of birdsong and receptive field structure in songbirdsNetwork: Computation in Neural Systems, 20, 162-177, 2009
- Non-negative matrix factorization with sparseness constraintsThe Journal of Machine Learning Research, 5, 1457-1469, 2004
- Spikes - Exploring the Neural CodeMIT Press, Cambridge, Massachusetts, 1997
- Error correction, sensory prediction, and adaptation in motor controlAnnual Review of Neuroscience, 33, 89-108, 2010
- Independent component filters of natural images compared with simple cells in primary visual cortexProceedings of the Royal Society B: Biological Sciences, 265, 359-366, 1998

## Acknowledgements

We thank Tony Movshon, Pedro Gonçalves, and Wieland Brendel for stimulating discussions and Alfonso Renart, Claudia Feierstein, Joe Paton, and Michael Orger for helpful comments on the manuscript. S.D. acknowledges the James McDonnell Foundation Award and EU grants BACS FP6-IST-027140, BIND MECT-CT-20095–024831, and ERC FP7-PREDSPIKE. C.K.M. acknowledges an Emmy-Noether grant of the Deutsche Forschungsgemeinschaft and a Chaire d’excellence of the Agence National de la Recherche.

## Decision letter

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your work entitled "Optimal compensation for neuron death" for consideration by *eLife*. Your article has been favorably evaluated by Eve Marder (Senior Editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

All of the reviewers felt that the submitted theoretical work on compensating for cell death was novel and interesting. However, it was also felt that many specifics were unclear or buried in other places, making biological interpretation difficult (e.g., excitatory, inhibitory cell types). The reviewers encourage the authors to consider the five following points, and at least the first four should be addressed in a revised submission. Additional points raised by the reviewers are detailed below, and should also be addressed.

1) State clearly that the network is non-Dalian and explain the reasoning for this choice.

2) State clearly what the recovery boundaries are, and plot figures that show which variables affect the recovery boundary for a given network with a given task. In general, spend a bit more time on the general mechanism before going into the additional models, or alternatively, explain in detail the additional models, like e.g. Figure 7, which is devoid of explanatory power.

3) An example of a Dalian network, independently of whether it works (same boundary?) or not (why does it not work?) should be added.

4) Show the compensatory mechanisms at play more carefully in figures, i.e. show how balance is changing and finally breaking at the recovery boundary.

5) Consider building a reservoir network model as in Sussillo & Abbott trained to perform a task. Delete neurons, and check whether it is able to self-balance. If it doesn't, address whether this is related with the complexity of the task.

We are including the original reviews as they provide context for the revision points shown above.

*Reviewer #1:*

In this clearly written model theory paper, the authors propose that neural circuits compensate for neuronal death immediately, and show this using firing rate network models and several experiments. Although overall I found the work compelling, in general, I felt that the authors need to better describe 'biological correlates' for their models and to 'fill in the blanks' in various places to help the reader relate and better appreciate the work.

Specifically:

1) 'quadratic programming algorithm' and 'loss function' – could the authors provide suggestions of biological correlates of this. Presumably it would relate to excitatory-inhibitory balances in some way?

2) While spiking rate models are used, differences between excitatory and inhibitory cells are not apparent to me. Given this, then the suggestion in the Discussion that 'these predictions could be tested using neuronal ablations or optogenetic silencing' would seem to suggest that cell type (at least as far as excitatory or inhibitory) does not matter, but this is clearly not true (one of many examples would be epilepsy work by Soltesz group – Krook-Magnuson et al. 2013 Nature Communications). For the reader to better appreciate what is intended by the various discussion suggestions, the authors should 'fill in the blanks' between their theoretical model and experimental suggestions being made.

In the Methods it is stated that "To be biologically more plausible, however, the derivative could be computed through a simple circuit that combines direct excitatory signal inputs with delayed inhibitory signal inputs (e.g. through feedforward inhibition)."

So, could the authors provide description/interpretation of excitatory/inhibitory cells/circuits in terms of their results/predictions? (i.e., biological plausibility).

3) The authors use only positive firing rates (unlike other theoretical studies with positive and negative etc.), and this is a critical difference in what they achieve. While it is clear that negative firing rates may not make sense, the biological correlate of the authors' model is not clear – see 2.

*Reviewer #2:*

Barret et al. investigate a question rarely addressed in theoretical neuroscience: how do neural circuits compensate the death of neurons? They provide an elegant model showing that neurons in the balanced networks can dynamically adjust their rates so as to minimize the impact neuron death. They show how to theoretically compute the optimal tuning curves of neurons in a network encoding a multi-dimensional variable x as a quadratic minimization problem with only constraint that firing rates must be positive. They continue to show that the method presented in their previous work (Boerling et al. 2013) in which they build an integrate-and-fire network optimized to minimize a cost function quantifying the estimation error of an input signal can in fact implement these optimal tuning curves almost exactly. They investigate how the tuning curves change when a fraction of the neurons are silenced and validate some of the predictions derived from the model with three data sets: the goldfish oculomotor system, the cercal system of crickets/cockroaches, V1 orientation tuned neurons. The paper nicely connects two distinct approaches in theoretical neuroscience, network dynamics and efficient coding, and obtains some very general results showing how constraints such as neuronal silencing or more simply, the rectification of the rate, affect the efficient encoding of external variables in a network of spiking neurons. The methodology and the findings presented can potentially have a substantial impact in our understanding of the properties of balanced networks and population encoding. There are however some aspects that the authors need to address in order to present the limitations of the model and the comparison with the data more clearly (see Comments below).

Comments:

Dale's law and the explanation of the EI balance breakdown. In Figure 2, the authors illustrate how when a fraction of the neurons below the recovery boundary is inactivated, the network dynamically compensates and keeps encoding x(t): the remaining neurons change their firing rate and the EI balance is maintained. Then they show that when all neurons with positive readout weight are removed, the balance can no longer be maintained because "there are no longer enough inhibitory cells to balance excitation (or enough excitatory cells to balance inhibition)". This is a very misleading statement given that in this network there are no excitatory or inhibitory neurons (i.e. does not respect Dale's principle). I was very confused when I read this example thinking that neurons in the Figure should have been organized according to whether they are E or I, and not according to the sign of their readout weight. I had to go to the Methods section to later find out that there are no E and I cells in this network. This is in my opinion a main drawback given that (1) Dale's law seems to be true in cortical circuits and (2) the authors are presenting their work as an extension of classical work describing EI networks in which E and I populations are distinct. In their previous manuscript (Boerling et al. 2013) they address this question and propose potential solutions. All of it should be explicitly mentioned and discussed. Moreover, when describing Figure 2 it should be explained why cells with positive readout weights (w+) can naively be viewed as excitatory cells when they provide both excitation and inhibition? Without this most readers will be misled to interpret that w+ neurons are Excitatory and w- neurons are inhibitory.

Biased cost. The authors use a biased quadratic cost function containing a second term which tries to maintain a particular read-out of the rates (coefs c_{k}) equal to a constant that they call the background rate. The authors describe in much detailed the implication of this bias in the intercept of the tuning curves (Figure 3C and Figure 3—figure supplement 2B-C). The motivation for this choice is however very succinct. Is there any benefit on having a non-zero intercept regarding the estimation error? I ask this because it is a fairly common observation that cortical circuits seem to maintain the overall average population firing rate constant when dynamically encoding different variables (see e.g. Figure 2B Fujisawa et al., Nat Neuroscience 11 (7), 2008). This observation seems to be obtained to the network in which there is a non-zero r_{B} and non-zero uniform c_{k}'s which seems interesting. However, the implications of this additional bias regarding the encoding ability of the circuit (estimation error of x(t)), its plausibility regarding the available data, etc. and not discussed.

The prediction on the changes in tuning for V1 neurons is only shown in three units (Figure 8B). In Figure 8—figure supplement 1, the full population of neurons is examined but the changes are only reported in response to the "preferred stimulus orientations". Thus, evaluating the prediction that there should be an increase of firing rate on the non-silenced neurons in response to the "preferred orientation of the silenced cells" is not possible. Either a different population analysis is performed or the statement about the outcome of the prediction at the population level (Results, last paragraph) should be modified.

Results on the comparison with data should be brought forward. Since the paper is quite long as it is now, and the most novel and most interesting part is the comparison of the model predictions with data from three systems (Figures 5–8), I suggest moving the mechanistic part of how to implement the optimal solution using a spiking network (e.g. Figures 1–2) to an Appendix box or some sort of "side note figure". This mechanistic part is an important aspect but, once it has been shown that it works, which has somehow already been covered in Boerling et al. 2013, it is no longer required to make the comparison with data (Figures 5–8, which are all derived by numerically solving Eqs. 4-5). Doing so, the reader will be able to quickly reach the most novel aspects without the need to go over the network implementation details.

In the Discussion, the authors state that the strongest prediction of the model is that "neurons with similar tuning to the dead neurons increase their firing rates and neurons with dissimilar tuning decrease their firing rates (unless they are already silent)." The second part of this statement seems at odds to what is shown in Figures 6I, Figure 8A (top) and Figure 8C where dissimilar neurons either don't change much (e.g. Df is always >=0 in Figure 8C) or tend to increase their rate in response to the preferred orientation of the dead cells (Figure 8A top). This rate decrease in dissimilar neurons seems to occur in the network shown in Figure 4B with monotonically increasing tuning curves. The details about the changes predicted and the generality of these changes in different systems (bell-shaped vs. monotonic tuning curves) need to be clarified.

At the end of the Methods, the authors provide an explanation of the Tight balance between excitation and inhibition and a scaling argument that concludes that in the network consider here, the cancellation between the mean excitatory and inhibitory drives occurs with precision 1/N. They compare this cancellation with the classical balanced network presented in van Vreeswijk and Sompolinsky (1998) where the cancellation occurs up to order 1/sqrt(N). This comparison is interesting but in order to make it the authors should provide some more information about the behavior of the network in the large N limit: if x is constant and independent of N, do the firing rates converge to a constant value as N increases? This seems hard to visualize as, even when counting on the cancellation of the mean E and I inputs, the magnitude of the total input current fluctuations σ will grow with respect to the threshold (σ~1/N^2/3 whereas threshold is 1/N^2). This is a reflection of the fact that the rate of incoming spikes grows with N whereas the size of each PSP is always of the order of the threshold (i.e. 1/N^2). I don't see how this network can asymptotically converge to non-zero rates in this large N limit.

A final comment: the authors present as an advantage the speed of the compensation mechanism which they say is "faster than the timescale of neural spiking". To me this comment remarks the mismatch of time-scales between the two processes: (1) neural death, an infrequent event, relative to the time scale of neurons, and (2) balanced dynamics, almost instantaneous. More than a bonus I see it as an indication that there are probably other slower processes at work when compensating for such an irreversible damage that happens in such long time-scales. Presented as a prediction for manipulation experiments, the compensation can be viewed as the signature mechanism of balanced dynamics, as done here in the comparison with the experimental data. Additionally, there might be circumstances in which groups of neurons become transiently and frequently inactive (e.g. local DOWN state). The fast compensation described here would seem more tuned to address those fast inactivation events instead of neural death.

*Reviewer #3:*

The manuscript at hand describes a phenomenon that is a routine occurrence for any living nervous system, namely the loss of some of its neurons through cell death. Even in the most severe cases of diseases like presumably Alzheimer's, such neuronal losses often go unnoticed for prolonged periods of time, presumably because they are A) tolerable in a system that relies on massively redundant population code, or B) the system can quickly compensate for the loss of neurons as they fall silent. Due to the lack of cortical plausible, mechanistically implementable computations, the modelling community has not approached this interesting research topic with much vigour.

Here, Barret et al. pick up from the well-described recurrently balanced neural network literature and study how balanced networks of various sizes compensate for neuron loss. They set out with two simple assumptions, i.e. that (1) outside stimuli can be described by combinations of quasi-steady-state firing rates of all neurons in the network and that (2) these firing rates are bounded by a cost function that prevents combinatorial codes in which the majority of neurons are silent in favour of a few active ones. With these two simple ingredients the authors show that the total number of spikes remains constant in the face of dramatic cell losses, as long as a tight balance of excitation and inhibition in the remaining neurons can be maintained. In fact, it is this balance that, when disturbed by cell loss, increases or decreases the firing rates in the remaining neurons to maintain faithful cumulative signal representation. The authors go on to show that this simple feature of balanced networks can account for compensatory behaviour in more complicated models such as Ben Yishai and Sompolinsky's (1995) bump attractor model (that is unfortunately not cited) and even more laborious models of sparse coding in V1. The results evoke vacillating responses in this reviewer of alleging outrageous triviality (anyone modelling balanced systems intuitively knows that this compensation happens) and respecting the originality of the thought process behind the sequence of presented results. After some deliberation, I agree with the authors fully in that optimal compensation by means of shifting, detailed balance has not been proposed before and is a worthwhile idea to entertain.

To show the link of shifting balanced dynamics in response to cell death, I would like to ask the authors to include better explanations and figures panels for all models that exhaustively show the mechanistic origins of the phenomenon they showcase, and not just its effects, i.e. the final tuning curves before and after. Additionally, I would wish for a better graphical way to discern between the two cases of cell loss on either side of the recovery boundary. In my opinion, Figures 3, 4, 5 could be integrated into Figure 2 or outsourced to supplementary materials and give way for the interesting cases of Figure 6–8 without much loss of detail. Lastly, I would like to ask the authors to discuss clearly what experimental results could discriminate the initially mentioned scenarios A and B (redundancy and compensation, respectively), and what they would expect in the case of temporally varying neuronal representations such as the en vogue neuronal dynamics of Churchland and Shennoy (2012).

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Optimal compensation for neuron loss" for further consideration at *eLife*. Your revised article has been favorably evaluated by Eve Marder (Senior Editor) and two reviewers, one of whom is a member of our Board of Reviewing Editors.

The authors have addressed four of the five suggestions we made in the first round of reviews, and the paper has improved. However, the fundamental logic of the paper is still not quite clear and needs to be made transparent. Specifically, the Methods section is hard to follow, and there seem to be some discrepancies in updating (e.g., it refers to a Figure 2I that does not exist, and Figure 2E is not referred to in the text). The derivation and translation into Dalian spiking networks is far from clear. Further, please note the following four points:

1) They create what they call "tightly balanced" networks, i.e. networks with very fast, precise / detailed inhibition by means of mathematical derivation and go on to say that the networks they built are much more balanced than vanVreesvijk models. We cannot assess this, because they don't show any reference simulations, or measures of balance. It is surprising that their technique works so well, as there are publications that claim the opposite, i.e. not easy to create such balance in networks.

2) What's more, they present "tight balance" as if it exists in a vacuum, ("Also, it has been shown that this connectivity can be learned using simple spike timing-dependent plasticity rules (W. Brendel, R. Bourdoukan, P. Vertechi, et al., unpublished observations; Bourdoukan et al. (2012)), so extensive fine-tuning is not required to obtain these spiking networks.") but there are at least four other published methods of creating "tight" / detailed / precise" balance in recurrent networks: Haas et al., JNeurophys. 2006, Vogels et al., Science, 2011, Luz & Shamir, PlosCB, 2012 and Hennequin et al., Neuron, 2015. It's pretty powerful that the authors can just derive their architecture as they should be. To make this more interesting (and clear), the authors need to discuss and compare their result to these other studies. What are the crucial differences of "tight balance" to these other balances?

3) Another issue is the seemingly carefully calibrated "spike reset threshold", that makes sure that the inhibitory activity is equal to x^? How does that work, and why is this a plausible assumption, or is this just a trick to make sure the inhibitory input is equal to the desired output? Please explain.

4) Finally, they write "To be biologically more plausible, however, the derivative could be computed through a simple circuit that combines direct excitatory signal inputs with delayed inhibitory signal inputs (e.g. through feedforward inhibition)." It is not clear that FF inhibition would do this, because it crucially misses the translation of the dynamics to be ==x^. In previous work, it has been shown that deleting cells out of a FF network has immediate effects on the dynamics of the read out (Vogels, NN09, Figure 4). This may be a different scenario, but it is far from clear in the Methods, and if this constraint is crucial, it needs to be carefully discussed to avoid the impression that the paper at hand is inherently circular.