Are singlepeaked tuning curves tuned for speed rather than accuracy?
Abstract
According to the efficient coding hypothesis, sensory neurons are adapted to provide maximal information about the environment, given some biophysical constraints. In early visual areas, stimulusinduced modulations of neural activity (or tunings) are predominantly singlepeaked. However, periodic tuning, as exhibited by grid cells, has been linked to a significant increase in decoding performance. Does this imply that the tuning curves in early visual areas are suboptimal? We argue that the time scale at which neurons encode information is imperative to understand the advantages of singlepeaked and periodic tuning curves, respectively. Here, we show that the possibility of catastrophic (large) errors creates a tradeoff between decoding time and decoding ability. We investigate how decoding time and stimulus dimensionality affect the optimal shape of tuning curves for removing catastrophic errors. In particular, we focus on the spatial periods of the tuning curves for a class of circular tuning curves. We show an overall trend for minimal decoding time to increase with increasing Fisher information, implying a tradeoff between accuracy and speed. This tradeoff is reinforced whenever the stimulus dimensionality is high, or there is ongoing activity. Thus, given constraints on processing speed, we present normative arguments for the existence of the singlepeaked tuning organization observed in early visual areas.
Editor's evaluation
This fundamental study provides important insight into coding strategies in sensory areas. The study was well done, and the analysis and simulations were highly convincing. This study should be of particular interest to anybody who cares about efficient coding.
https://doi.org/10.7554/eLife.84531.sa0Introduction
One of the fundamental problems in systems neuroscience is understanding how sensory information can be represented in the spiking activity of an ensemble of neurons. The problem is exacerbated by the fact that individual neurons are highly noisy and variable in their responses, even to identical stimuli (Arieli et al., 1996). A common feature of early sensory representation is that the neocortical neurons in primary sensory areas change their average responses only to a small range of features of the sensory stimulus. For instance, some neurons in the primary visual cortex respond to moving bars oriented at specific angles (Hubel and Wiesel, 1962). This observation has led to the notion of tuning curves. Together, a collection of tuning curves provides a possible basis for a neural code.
A considerable emphasis has been put on understanding how the structure of noise and correlations affect stimulus representation given a set of tuning curves (Shamir and Sompolinsky, 2004; Averbeck and Lee, 2006; Franke et al., 2016; Zylberberg et al., 2016; MorenoBote et al., 2014; Kohn et al., 2016). More recently, the issue of local and catastrophic errors, dating back to the work of Shannon (Shannon, 1949), has been raised in the context of neuroscience (e.g. Xie, 2002; Sreenivasan and Fiete, 2011). Intuitively, local errors are small estimation errors that depend on the trialbytrial variability of the neural responses and the local shapes of the tuning curves surrounding the true stimulus condition (Figure 1a bottom plot, see s_{1}). On the other hand, catastrophic errors are very large estimation errors that depend on the trialbytrial variability and the global shape of the tuning curves (Figure 1a bottom plot, see s_{2}). While a significant effort has been put into studying how stimulus tuning and different noise structures affect local errors, less is known about the interactions with catastrophic errors. For example, Fisher information is a common measure of the accuracy of a neural code (Brunel and Nadal, 1998; Abbott and Dayan, 1999; Guigon, 2003; MorenoBote et al., 2014; Benichoux et al., 2017). The CramérRao bound states that a lower limit of the minimal mean squared error (MSE) for any unbiased estimator is given by the inverse of Fisher information (Lehmann and Casella, 1998). Thus, increasing Fisher information reduces the lower bound on MSE. However, because Fisher information can only capture local errors, the true MSE might be considerably larger in the presence of catastrophic errors (Xie, 2002; Kostal et al., 2015; Malerba et al., 2022), especially if the available decoding time is short (Bethge et al., 2002; Finkelstein et al., 2018).
A curious observation is that the tuning curves in early visual areas predominately use singlepeaked firing fields, whereas grid cells in the entorhinal cortex are known for their periodically distributed firing fields (Hafting et al., 2005). It has been shown that the multiple firing locations of grid cells increase the precision of the neural code compared to singlepeaked tuning curves (Sreenivasan and Fiete, 2011; Mathis et al., 2012; Wei et al., 2015). This raises the question of why periodic firing fields are not a prominent organization of early visual processing too?
The theoretical arguments in favor of periodic tuning curves have mostly focused on local errors under the assumption that catastrophic errors are negligible (Sreenivasan and Fiete, 2011). However, given the response variability, it takes a finite amount of time to accumulate a sufficient number of spikes to decode the stimulus. Given that fast processing speed is a common feature of visual processing (Thorpe et al., 1996; FabreThorpe et al., 2001; Rolls and Tovee, 1994; Resulaj et al., 2018), it is crucial that each neural population in the processing chain can quickly produce a reliable stimulusevoked signal. Therefore, the time required to produce signals without catastrophic errors will likely put fundamental constraints on any neural code, especially in early visual areas.
Here, we contrast Fisher information with the minimal decoding time required to remove catastrophic errors (i.e. the time until Fisher information becomes a reasonable descriptor of the MSE). We base the results on the maximum likelihood estimator for uniformly distributed stimuli (i.e., the maximum a posteriori estimator) using populations of tuning curves with different numbers of peaks. We show that the minimal decoding time tends to increase with increasing Fisher information in the case of independent Poissonian noise to each neuron. This suggests a tradeoff between the decoding accuracy of a neural population and the speed by which it can produce a reliable signal. Furthermore, we show that the difference in minimal decoding time grows with the number of jointly encoded stimulus features (stimulus dimensionality) and in the presence of ongoing (nonspecific) activity. Thus, singlepeaked tuning curves require shorter decoding times and are more robust to ongoing activity than periodic tuning curves. Finally, we illustrate the issue of large estimation errors and periodic tuning in simple spiking neural network model tracking either a steplike stimulus change or a continuously timevarying stimulus.
Results
Shapes of tuning curves, Fisher information, and catastrophic errors
To enable a comparison between singlepeaked and periodic (multipeaked) tuning curves, we consider circular tuning curves responding to a Ddimensional stimulus, $\mathbf{s}$, according to
where a_{i} is the peak amplitude of the stimulusrelated tuning curve $i$, $w$ is a width scaling parameter, ${\lambda}_{i}$ defines the spatial period of the tuning curve, ${s}_{i,j}^{\prime}$ determines the location of the peak(s) in the $j$:th stimulus dimension, and $b$ determines the amount of ongoing activity (see Figure 1a–b, top panels). The parameters are kept fixed for each neuron, thus ignoring any effect of learning or plasticity. In the following, the stimulus domain is set to $\mathbf{s}\in {[0,1)}^{D}$ for simplicity. To avoid boundary effects, we assume that the stimulus has periodic boundaries (i.e. ${s}_{j}=0$ and ${s}_{j}=1$ are the same stimulus condition) and adjust any decoded value to lie within the stimulus domain, for example,
see Materials and methods  ’Implementation of maximum likelihood estimator’ for details.
We assume that the stimulus is uniformly distributed across its domain and that its dimensions are independent. This can be seen as a worstcase scenario as it maximizes the entropy of the stimulus. In a single trial, we assume that the number of emitted spikes for each neuron is conditionally independent, and follows a Poisson distribution, given some stimulusdependent rate ${f}_{i}(s)$. Thus, the probability of observing a particular activity pattern, $\mathbf{r}$, in a population of $N$ neurons given the stimulusdependent rates and decoding time, $T$, is
Given a model of neural responses, the CramérRao bound provides a lower bound on the accuracy by which the population can communicate a signal as the inverse of the Fisher information. For sufficiently large populations, using the population and spike count models in Equation 1 and Equation 3, Fisher information is given by (for ${a}_{i}=a$ and $b=0$ for all neurons, see Sreenivasan and Fiete, 2011 or Appendix 2  'Fisher information and the CramérRao bound' for details)
where $\overline{{\lambda}^{2}}$ denotes the sample average of the squared inverse of the (relative) spatial periods across the population, and ${B}_{\alpha}(\cdot )$ denotes the modified Bessel functions of the first kind. Equation 4 (and similar expressions) suggests that populations consisting of periodic tuning curves, for which $\overline{{\lambda}^{2}}\gg 1$, are superior at communicating a stimulus signal than a population using tuning curves with only single peaks, where $\overline{{\lambda}^{2}}=1$. However, (inverse) Fisher information only predicts the amount of local errors for an efficient estimator. Hence, the presence of catastrophic errors (Figure 1a, bottom) can be identified by large deviations from the predicted MSE for an asymptotically efficient estimator (Figure 1c–d). Therefore, we define minimal decoding time as the shortest time required to approach the CramérRao bound (Figure 1c and e).
Periodic tuning curves and stimulus ambiguity
To understand why the amount of catastrophic error can differ with different spatial periods, consider first the problem of stimulus ambiguity that can arise with periodic tuning curves. If all tuning curves in the population share the same relative spatial period, $\lambda $, then the stimulusevoked responses can only provide unambiguous information about the stimulus in the range $[0,\lambda )$. Beyond this range, the response distributions are no longer unique. Thus, singlepeaked tuning curves ($\lambda =1$) provide unambiguous information about the stimulus. Periodic tuning curves ($\lambda <1$), on the other hand, require the use of tuning curves with two or more distinct spatial periods to resolve the stimulus ambiguity (Fiete et al., 2008; Mathis et al., 2012; Wei et al., 2015). In the following, we assume the tuning curves are organized into discrete modules, where all tuning curves within a module share spatial period (Figure 1b) mimicking the organization of grid cells (Stensola et al., 2012). For convenience, assume that $\lambda}_{1}>{\lambda}_{2}>...>{\lambda}_{L$ where $L$ is the number of modules. Thus, the first module provides the most coarsegrained resolution of the stimulus interval, and each successive module provides an increasingly finegrained resolution. It has been suggested that a geometric progression of spatial periods, such that ${\lambda}_{i}=c{\lambda}_{i1}$ for some spatial factor $0<c\le 1$, may be optimal for maximizing the resolution of the stimulus while reducing the required number of neurons (Mathis et al., 2012; Wei et al., 2015). However, trialbytrial variability can still cause stimulus ambiguity and catastrophic errors  at least for short decoding times, as we show later, even when using multiple modules with different spatial periods.
(Very) Short decoding times  when both Fisher information and MSE fails
While it is known that Fisher information is not an accurate predictor of the MSE when the decoding time is short (Bethge et al., 2002), less has been discussed about the issue of MSE. Although MSE is often interpreted as a measure of accuracy, its insensitivity to rare outliers makes it a poor measure of reliability. Therefore, comparing MSE directly between populations can be a misleading measure of reliability if the distributions of errors are qualitatively different. If the amounts of local errors differ, lower MSE does not necessarily imply fewer catastrophic errors. This is exemplified in Figure 2, comparing a singlepeaked and a periodic population encoding a twodimensional stimulus using the suggested optimal scale factor, $c\approx 1/1.44$ (Wei et al., 2015). During the first $\approx 30$ ms, the singlepeaked population has the lowest MSE of the two populations despite having lower Fisher information (Figure 2a). Furthermore, comparing the error distribution after the periodic population achieves a lower MSE (the black circle in Figure 2a) shows that the periodic population still suffers from rare errors that span the entire stimulus range (Figure 2b–c, insets). As we will show, a comparison of MSE, as a measure of reliability, only becomes valid once catastrophic errors are removed. Here we assume that catastrophic errors should strongly affect the usability of a neural code. Therefore, we argue that the first criterion for any ratebased neural code should be to satisfy its constraint on decoding time to avoid catastrophic errors.
Minimal decoding times in populations with two modules
How does the choice of spatial periods impact the required decoding time to remove catastrophic errors? To get some intuition, we first consider the case of populations encoding a onedimensional stimulus using only two different spatial scales, $\lambda}_{1$ and $\lambda}_{2$. From the perspective of a probabilistic decoder (Seung and Sompolinsky, 1993; Deneve et al., 1999; Ma et al., 2006), assuming that the stimulus is uniformly distributed, the maximum likelihood (ML) estimator is Bayesian optimal (and asymptotically efficient). The maximum likelihood estimator aims at finding the stimulus condition which is the most likely cause of the observed activity, $\mathbf{r}$, or
where $p(\mathbf{r}s)$ is called the likelihood function. The likelihood function equals the probability of observing the observed neural activity, $\mathbf{r}$, assuming that the stimulus condition was $s$. In the case of independent Poisson spike counts (or at least independence across modules), each module contributes to the joint likelihood function $p(\mathbf{r}s)$ with individual likelihood functions, Q_{1} and Q_{2} (Wei et al., 2015). Thus, the joint likelihood function can be seen as the product of the two individual likelihood functions, where each likelihood is ${\lambda}_{i}$periodic
In this sense, each module provides its own MLestimate of the stimulus, ${s}_{ML}^{(1)}={\mathrm{arg}\mathrm{max}}_{s}{Q}_{1}(\mathbf{r}s)$ and ${s}_{ML}^{(2)}={\mathrm{arg}\mathrm{max}}_{s}{Q}_{2}(\mathbf{r}s)$. Because of the periodicity of the tuning curves, there can be multiple modes for each of the likelihoods (e.g. Figure 3a and b, top panels). For the largest mode of the joint likelihood function to also be centered close to the true stimulus condition, the distance $\delta $ between ${s}_{ML}^{(1)}$ and ${s}_{ML}^{(2)}$ must be smaller than between any other pair of modes of Q_{1} and Q_{2}. Thus, to avoid catastrophic errors, $\delta $ must be smaller than some largest allowed distance ${\delta}^{*}$ which guarantees this relation (see Equations 25–30 for calculation of ${\delta}^{*}$ assuming the stimulus is in the middle of the domain). As $\delta $ varies from trial to trial, we limit the probability of the decoder experiencing catastrophic errors to some small error probability, ${p}_{error}$, by imposing that
Assuming that the estimation of each module becomes efficient before the joint estimation, Equation 7 can be reinterpreted as a lower bound on the required decoding time before the estimation based on the joint likelihood function becomes efficient
where $\text{erfinv}(\cdot )$ is the inverse of the error function and ${J}_{k,norm}$ refers to the timenormalized Fisher information of module $k$ (see Materials and methods for derivation). Thus, the spatial periods of the modules influence the minimal decoding time by determining: (1) the largest allowed distance ${\delta}^{*}$ between the estimates of the modules and (2) the variances of the estimations given by the inverse of their respective Fisher information.
To give some intuition of the approximation, if the spatial periods of the modules are very different, ${\lambda}_{2}\ll {\lambda}_{1}$, then there exist many peaks of Q_{2} around the peak of Q_{1} (Figure 3a). Additionally, there can be modes of Q_{1} and Q_{2} far away from the true stimulus close together. Thus, ${\lambda}_{2}\ll {\lambda}_{1}$ can create a highly multimodal joint likelihood function where small deviations in ${s}_{ML}^{(1)}$ and ${s}_{ML}^{(2)}$ can cause a shift, or a change, of the maximal mode of the joint likelihood. To avoid this, ${\delta}^{*}$ must be small, leading to longer decoding times by Equation 8. Furthermore, suppose the two modules have similar spatial periods ${\lambda}_{2}\sim {\lambda}_{1}$, or ${\lambda}_{1}$ is close to a multiple of ${\lambda}_{2}$. In that case, the distance between the peaks a few periods away is also small, again leading to longer decoding times (Figure 3b). In other words, periodic tuning suffers from the dilemma that small shifts in the individual stimulus estimates can cause catastrophic shifts in the joint likelihood function. Although these might be rare events, the possibility of such errors increases the probability of catastrophic errors. Thus, assuming ${\lambda}_{1}<1$, both small and large scale factors $c$ can lead to long decoding times. When ${\lambda}_{1}=1$, however, only smallscale factors $c$ pose such problems, at least unless the stimulus is close to the periodic edge (i.e. $s\approx 0$ or $s\approx 1$, see Figure 3—figure supplement 1). On the other hand, compared to singlepeaked tuning curves, periodic tuning generally leads to sharper likelihood functions, increasing the accuracy of the estimates once catastrophic errors are removed (e.g., compare the widths of the joint likelihood functions in Figure 3a–c).
To test the approximation in Equation 8, we simulated a set of populations ($N=600$ neurons) with different spatial periods. The populations were created using identical tuning parameters except for the spatial periods, whose distribution varied across the populations, and the amplitudes, which were adjusted to ensure an equal average firing rate (across all stimulus conditions) for all neurons (see Materials and methods for details on simulations). As described above, the spatial periods were related by a scale factor $c$. Different values of $c$ were tested for the largest period being either ${\lambda}_{1}=1$ or ${\lambda}_{1}=1/2$. Furthermore, only populations with unambiguous codes over the stimulus interval were included (i.e. $c\ne 1/2,1/3,1/4,\mathrm{\dots}$ for ${\lambda}_{1}=1/2$; Mathis et al., 2012). Note, however, that there is no restriction on the periodicity of the tuning curves to align with the periodicity of the stimulus (i.e. $1/{\lambda}_{i}$ does not need to be an integer). For each population, the minimal decoding time was found by gradually increasing the decoding time until the empirical MSE was lower than twice the predicted lower bound (i.e. $\alpha =2$, see Equation 10 and Materials and methods for details). Limiting the probability of catastrophic errors to ${p}_{error}={10}^{4}$, Equation 8 is a good predictor of the minimal decoding time (Figure 3d–e, bottom panels, coefficient of determination ${R}^{2}\approx 0.92$ and ${R}^{2}\approx 0.95$ for ${\lambda}_{1}=1$ and ${\lambda}_{1}=1/2$, respectively). For both ${\lambda}_{1}=1$ and ${\lambda}_{1}=1/2$, the minimal decoding time increases overall with decreasing scale factor, $c$ (see Figure 3d–e). However, especially for ${\lambda}_{1}=1/2$, the trend is interrupted by large peaks (Figure 3e). For ${\lambda}_{1}=1$, there are deviations from the predicted minimal decoding time for small scale factors, $c$. They occur whenever ${\lambda}_{2}$ is slightly below a multiple of ${\lambda}_{1}=1$, and get more pronounced when increasing the sensitivity to the threshold factor $\alpha =1.2$ (see Figure 3—figure supplement 2). We believe one cause of these deviations is the additional shifts across the periodic boundary (as in Figure 3—figure supplement 1) that can occur when $c$ is just below $1/2,1/3,1/4,\mathrm{\dots}$, etc.
To confirm that the estimated minimal decoding times have some predictive power on the error distributions, we resimulated a subset of the populations for various decoding times, $T$, using 15,000 randomly sampled stimulus conditions (Figure 3f). Both the RMSE and outlier errors (99.8th percentile and the maximal error, that is, 100th percentile) agree with the shape of minimal decoding times, suggesting that a singlepeaked population is good at removing large errors at very short time scales.
Minimal decoding times for populations with more than two modules
From the twomodule case above, it is clear that the choice of scale factor influences the minimal decoding time. However, Equation 8 is difficult to interpret and is only valid for twomodule systems ($L=2$). To approximate how the minimal decoding time scales with the distribution of spatial periods in populations with more than two modules, we extended the approximation method first introduced by Xie (Xie, 2002). The method was originally used to assess the number of neurons required to reach the CramérRao bound for singlepeaked tuning curves with additive Gaussian noise for the ML estimator. In addition, it only considered encoding a onedimensional stimulus variable. We adapted this method to approximate the required decoding time for stimuli with arbitrary dimensions, Poissondistributed spike counts, and tuning curves with arbitrary spatial periods. In this setting, the scaling of minimum decoding time with the spatial periods, ${\lambda}_{1},\mathrm{\dots},{\lambda}_{L}$, can be approximated as (see Materials and methods for derivation)
where $\overline{{\lambda}^{2}}$ and $\overline{{\lambda}^{3}}$ indicate the sample average across the inverse spatial periods (squared or cubed, respectively) in the population, $\overline{{f}_{stim}(D)}$ is the average evoked firing rate across the stimulus domain, and $A(w)$ (or ${A}^{\ast}(w)$) is a function of $w$ (see Materials and methods for detailed expression). The last approximation holds with equality whenever all tuning curves have an integer number of peaks. The derivation was carried out assuming the absence of ongoing activity and that the amplitudes within each population are similar, ${a}_{1}\approx \mathrm{\dots}\approx {a}_{N}$. Importantly, the approximation also assumes the existence of a unique solution to the maximum likelihood equations. Therefore, it is illequipped to predict the issues of stimulus ambiguity. Thus, going back to the twomodule cases, Equation 9 cannot capture the additional effects of ${\lambda}_{2}\ll {\lambda}_{1}$ or when ${\lambda}_{1}$ is close to a multiple of ${\lambda}_{2}$, as in Figure 3d–e. On the other hand, complementing the theory presented in Equation 8, Equation 9 provides a more interpretable expression of the scaling of minimal decoding time. For $c\le 1$, the minimal decoding time, ${T}_{th}$, is expected to increase with decreasing scale factor, $c$ (see Equation 47). The scaling should also be similar for different choices of ${\lambda}_{1}$. Furthermore, assuming all other parameters are constant, the minimal decoding time should grow roughly exponentially with the number of stimulus dimensions.
To confirm the validity of Equation 9, we simulated populations of $N=600$ tuning curves across $L=5$ modules. Again, the spatial periods across the modules were related by a scale factor, $c$ (Figure 4a). To avoid the effects of $c\ll 1$, we limited the range of the scale factor to $0.3\le c\le 1$. The upper bound on $c$ was kept (for ${\lambda}_{1}=1$) to include entirely singlepeaked populations. Again, the assumption of homogeneous amplitudes in Equation 9 was dropped in simulations (Figure 4b, left column) to ensure that the average firing rate across the stimulus domain is equal for all neurons (see Figure 4b, right column, for the empirical average firing rates). This had little effect on Fisher information, where the theoretical prediction was based on the average amplitudes across all populations with the same ${\lambda}_{1}$ and stimulus dimensionality $D$ (see Figure 4c, inset). As before, Fisher information grows with decreasing scale factor, $c$, and with decreasing spatial period ${\lambda}_{1}$. As expected, increasing the stimulus dimensionality decreases Fisher information if all other parameters are kept constant. On the other hand, the minimal decoding time increases with decreasing spatial periods and increases with stimulus dimensionality (Figure 4c). The increase in decoding time between $D=1$ and $D=2$ is also very well predicted by Equation 9, at least for $c>0.5$ (Figure 4—figure supplement 1a). In these simulations, the choice of width parameter is compatible with experimental data (Ringach et al., 2002), but similar trends were found for a range of different width parameters (although the differences become smaller for small $w$, see Figure 4—figure supplement 1b–d).
From Equation 9, we fitted two constants, K_{1} (regressor) and K_{2} (intercept), using least square regression across populations sharing the same largest period, ${\lambda}_{1}$, and stimulus dimensionality, $D$ (see Equation 47). Within the simulated range of scale factors, the regressions provide reasonable fits for the populations with ${\lambda}_{1}=1$ (Figure 4c, coefficient of determination ${R}^{2}\approx 0.89$ and ${R}^{2}\approx 0.90$ for $D=1$ and $D=2$, respectively). For the populations with ${\lambda}_{1}=1/2$, Equation 9 becomes increasingly unable to predict the behavior of the minimal decoding time as $c$ approaches 1 (see the red and yellow lines in Figure 4c–d). On the other hand, as was suggested above, the scaling of the minimal decoding time with $c$ is, in fact, similar for ${\lambda}_{1}=1$ and ${\lambda}_{1}=1/2$ whenever $c$ is less than $\approx 0.9$. As suggested by Figure 4d, there is also a strong correlation between Fisher information and minimal decoding time, again indicating a speedaccuracy tradeoff. Furthermore, similar results are obtained when either decreasing the threshold factor to $\alpha =1.2$ (Figure 4—figure supplement 2) or changing the minimal decoding time criterion to a onesided Kolmogorov–Smirnov test (KStest) between the empirical distribution of errors and the Gaussian error distribution predicted by the CramérRao bound (Figure 4—figure supplement 3, using an adhoc Bonferronitype correction for multiple sequential testing, $\alpha /j$, where $j$ is the $j$ th time comparison and $\alpha =0.05$ is the significance level.)
To further illustrate the relationship between minimum decoding time and the distribution of catastrophic errors, we resimulated the same populations using fixed decoding times, evaluating the RMSE together with the 99.8th and 100th (maximal error) percentiles of the root squared error distributions across 15,000 new uniformly sampled stimuli (Figure 4e–f). As suggested by the minimal decoding times, there is a clear tradeoff between minimizing RMSE over longer decoding times and removing outliers, especially the maximal error, over shorter decoding times. Figure 4—figure supplement 4 shows the time evolution for a few of these populations.
Additionally, to verify that the minimal decoding times are good predictors of the decoding time necessary to suppress large estimation errors, we compared the same error percentiles as in Figure 4e–f (i.e. the 99.8th and 100th percentiles) against the minimal decoding times, ${T}_{th}$, estimated in Figure 4c. For each population, we expect a strong reduction in the magnitude of the largest errors when the decoding time, $T$, is larger than the minimal decoding time, ${T}_{th}$. Figure 5 shows a clear difference in large estimation errors between populations for which ${T}_{th}<T$ and populations with ${T}_{th}>T$ (circles to the left and right of the magenta lines in Figure 5, respectively). Thus, although only using the difference between MSE and Fisher information, our criterion on minimal decoding time still carries important information about the presence of large estimation errors.
To summarize, while periodic tuning curves provide lower estimation errors for long decoding times by minimizing local errors (Figure 4c, inset), a population of singlepeaked tuning curves is faster at producing a statistically reliable signal by removing catastrophic errors (Equation 9 and Figure 4c). Generalizing minimal decoding times to an arbitrary number of stimulus dimensions reveals that the minimal decoding time also depends on the stimulus dimensionality (Figure 4c, compare lines for $D=1$ and $D=2$). Interestingly, however, the approximation predicts that although minimal decoding time grows with increasing stimulus dimensionality, the minimal required spike count might be independent of stimulus dimensionality, at least for populations with integer spatial frequencies, that is, integer number of peaks (see Equation A5.4). The populations simulated here have noninteger spatial frequencies. However, the trend of changes in the mean spike count is still just slightly below 1 (indicating that slightly fewer spikes across the population are needed with increasing $D$, see Figure 4—figure supplement 5). Thus, as the average firing rate decreases with the number of encoded features $D$ (Figure 4b, right column), the increase in minimal decoding time for stimuli of higher dimensionality can be primarily explained by requiring a longer time to accumulate the sufficient number of spikes across the population. Lastly, to rule out that the differences in minimal decoding time cannot be explained by the periodicity of the tuning curves not aligning to that of the stimulus, we also simulated populations with different combinations of integer peaks (Figure 4—figure supplement 6). Again, the same phenomenon is observed: periodic tuning curves increase the required decoding time to remove catastrophic errors. This also highlights that the approximation of minimal decoding time does not require the spatial periods to be related by a scale factor, $c$.
Effect of ongoing activity
Many cortical areas exhibit ongoing activity, that is, activity that is not stimulusspecific (Snodderly and Gur, 1995; Barth and Poulet, 2012). Thus, it is important to understand the impact of ongoing activity on the minimal decoding time, too. Unfortunately, because our approximation of the minimal decoding times did not include ongoing activity, we relied on simulations to study the effect of such nonspecific activity.
When including independent ongoing (background) activity at 2 spikes/s to all neurons for the same populations as in Figure 4, minimal decoding times were elevated across all populations (Figure 6). Furthermore, the minimal decoding time increased faster with decreasing $c$ in the presence of ongoing activity compared to the case without ongoing activity (ratios of fitted regressors ${K}_{1}(b=2)/{K}_{1}(b=0)$ using Equation 47 were approximately 1.69 and 1.72 for $D=1$ and $D=2$, respectively). Similar results are found using $\alpha =1.2$ (Figure 6—figure supplement 1) or the alternative criterion on minimal decoding time based on onesided KStests described earlier (Figure 6—figure supplement 2). Thus, ongoing activity can have a substantial impact on the time required to produce reliable signals. Figure 6 suggests that areas with ongoing activity are less suited for periodic tuning curves. Especially, the combination of multidimensional stimuli and ongoing activity leads to much longer minimal decoding times for tuning curves with small spatial periods ($c\ll 1$). For example, when encoding a twodimensional stimulus, only the populations with ${\lambda}_{1}=1$, $c=1$ and ${\lambda}_{1}=1$, $c=0.95$ could remove catastrophic errors in less than 40ms when ongoing activity at 2 sp/s was present. Thus, the ability to produce reliable signals at high speeds severely deteriorates for periodic tuning curves in the presence of nonspecific ongoing activity.
This result has an intuitive explanation. The amount of catastrophic errors depends on the probability that the trial variability reshapes the neural activity to resemble the possible activities for a distinct stimulus condition (see Figure 1a). From the analysis presented above, periodic tuning curves have been suggested to be more susceptible to such errors. Adding ongoing activity does not reshape the stimulusevoked parts of the tuning curves but only increases the trialbytrial variability. Thus, by this reasoning, it is not surprising that the systems which are already more susceptible also are even more negatively affected by the increased variability induced by ongoing activity. The importance of Figure 6 is that even ongoing activity as low as 2 sp/s can have a clearly visible effect on minimal decoding time.
Implications for a simple spiking neural network with suboptimal readout
Until this point, the arguments about minimal decoding time have relied on ratebased tuning curves encoding static stimuli. To extend beyond static stimuli and to exemplify the role of decoding time for spiking neurons, we simulated simple twolayer feedforward spiking neural networks to decode timevarying stimulus signals. The first layer (${N}_{1}=500$) corresponds to the tuning curves (without connections between the simulated neurons). The stimulusspecific tuning of the Poissonian inputs to these neurons is either fully singlepeaked, creating a population of singlepeaked tuning curves, or periodic with different spatial periods, creating a population of periodic tuning curves (Figure 7a, see Materials and methods for details). The second layer instead acts as a readout layer (${N}_{2}=400$, allowing a weak convergence of inputs from the first layer). This layer receives both stimulusspecific excitatory input from the first layer and external nonspecific Poissonian excitation (corresponding to background activity). The connection strength between the first and second layers depends on the difference in preferred stimulus conditions between the pre and postsynaptic neurons. Such connectivity could, for example, be obtained by unsupervised Hebbian learning. Because the tuning curves in the first layer can be periodic, they can also connect strongly to several readout neurons. We introduced lateral inhibition among the readout neurons (without explicitly modeling inhibitory neurons) to create a winnertakeall style of dynamics. The readout neurons with large differences in preferred stimulus inhibit each other more strongly. Decoding is assumed to be instantaneous and based on the preferred stimulus condition of the spiking neuron in the readout layer. However, to compare the readouts, we averaged the stimulus estimates in sliding windows.
We tested two different types of timevarying stimuli: (1) a steplike change from $s=0.25$ to $s=0.75$ (Figure 7b top row, green trace) and (2) a continuously timevarying stimulus drawn from an Ornstein–Uhlenbeck process (Figure 7b bottom row, green trace; see Materials and methods). In the case of a steplike stimulus change, the readout layer for the singlepeaked population required a shorter time to switch states than the periodic network (Figure 7c). The shorter switching time is consistent with the hypothesis that singlepeaked tuning curves have shorter minimal decoding times than periodic tuning curves. In these simulations, the difference is mainly due to some neurons in the first layer of the periodic network responding both before and after the step change. Thus, the correct readout neurons (after the change) must compensate for the hyperpolarization built up before the change and the continuing inhibitory input from the previously correct readout neurons (which still get excitatory inputs). Note that there are only minor differences in the population firing rates between the readout layers, confirming that this is not a consequence of different excitation levels but rather of the structures of excitation.
The continuously timevarying stimulus could be tracked well by both networks. However, averaging across trials shows that SNNs with periodic tuning curves have larger sporadic fluctuations (Figure 7d). This suggests that decoding with periodic tuning curves has difficulties in accurately estimating the stimulus without causing sudden, brief periods of large errors. To make a statistical comparison between the populations, we investigated the distributions of root mean squared error (RMSE) across trials. In both stimulus cases, there is a clear difference between the network with singlepeaked tuning curves and the network with periodic ones. For the steplike change in stimulus condition, a significant difference in RMSE arises roughly 100 ms after the stimulus change (Figure 8a). For the timevarying stimulus, using singlepeaked tuning curves also results in significantly lower RMSE compared to a population of periodic tuning curves (Figure 8b, RMSE calculated across the entire trial).
Discussion
Several studies have suggested that periodic tuning creates an unparalleled precise neural code by minimizing local errors (Sreenivasan and Fiete, 2011; Mathis et al., 2012; Wei et al., 2015; Malerba et al., 2022). Nevertheless, despite this advantage of periodic tuning, singlepeaked tuning curves are widespread in early sensory areas and especially in the early visual system. There is a long history of studying information representation using ratebased tuning curves. Still, the effect of spatial periodicity and catastrophic errors on the required decoding time has not been addressed. Here, we showed that the possibility of catastrophic estimation errors (Figure 1a) introduces the possibility that different shapes of tuning curves can have different minimal decoding times.
The emerging question is whether there is a tradeoff between the accuracy of a neural code and the minimal required decoding time for singlepeaked and periodic tuning. The answer is yes. We found that minimal decoding time increased with decreasing spatial periods of the tuning curves (Figure 4c), suggesting a tradeoff between accuracy and speed for populations of tuning curves. The differences in minimal decoding time cannot be explained by the periodicity of the tuning curves not aligning to the stimulus domain, as the same holds comparing populations with integer number of peaks (Figure 4—figure supplement 6). Furthermore, our results remained unchanged when we discarded any decoded stimuli which needed the $\text{mod}1$ operation to lie within the stimulus domain ${[0,1)}^{D}$, thus ruling out any possible distortion effect of the periodic stimulus and decoding approach. In addition, we show that our results are valid for a range of population sizes (Figure 9ab), ongoing (Figure 9ab) and evoked activities (Figure 9c), and stimulus dimensions (Figure 9d). We used the more conservative threshold factor on MSE, $\alpha =1.2$, to capture all the nuances w.r.t. the level of ongoing activity even for large population sizes. In simulated networks with spiking neurons, we showed that the use of periodic tuning curves increased the chances of large estimation errors, leading to longer times before switching ‘states’ (Figure 7c) and difficulties tracking a timevarying, onedimensional stimulus (Figure 7d).
Experimental data suggest that decoding times can be very short, of the order of tens of milliseconds, reflecting that a considerable part of the information contained in firing rates over long periods is also present in short sample periods (Tovée et al., 1993). Additionally, the first few spikes have been shown to carry significant amounts of task information in both visual (Resulaj et al., 2018), olfactory (Resulaj and Rinberg, 2015), and somatosensory areas (Panzeri et al., 2001; Petersen et al., 2001). As the tuning curves in this study all have equal average firing rates, we can reinterpret the minimal decoding time in terms of the prominence of the first spikes. In our simulations, tens of spikes carry enough information to produce a reliable stimulus estimate free of catastrophic errors (Figure 4—figure supplement 5). As with decoding time, singlepeaked tuning curves also need fewer spikes to produce reliable signals. Thus, the speedaccuracy tradeoff can be reinterpreted as a tradeoff between being accurate and efficient.
The notion of a speedaccuracy tradeoff is further strengthened when considering highdimensional stimuli that demand longer minimal decoding times. Natural stimuli typically have higher dimensionality than those used in animal experiments. Many sensory neurons are tuned to multiple features of the external stimulus, creating mixed selectivity of features (e.g. Garg et al., 2019). For neurons responding to taskrelated variables, mixed selectivity has been shown to enable linear separability and to improve discriminability (Rigotti et al., 2013; Fusi et al., 2016; Johnston et al., 2020). For continuous stimulus estimations, mixed selectivity has also been proposed to decrease MSE when decoding time is limited (Finkelstein et al., 2018). However, to remove catastrophic errors, which, as we have argued, is not necessarily synonymous with lower MSE, the exponential increase in minimal decoding time could easily lead to very long decoding times. Thus, minimal decoding time should set a bound on the number of features a population can jointly encode reliably. In addition, neurons in sensory areas often exhibit a degree of nonspecific activity (Snodderly and Gur, 1995; Barth and Poulet, 2012). Introducing ongoing activity to the populations in our simulations further amplified the differences in minimal decoding times (Figure 6). Thus, for jointly encoded stimuli, especially in areas with high degrees of ongoing activity, a population of singlepeaked tuning curves might be the optimal encoding strategy for rapid and reliable communication.
We note that these results might extend beyond the visual areas, too. Although this study focused on tuning curves encoding continuous variables, catastrophic errors can also occur in systems with discrete tuning curves. Sensory stimuli can be fastvarying (or even discontinuous), and large errors can potentially harm the animal. Therefore, constraints on decoding time are likely important for any early sensory area. In addition, hippocampal place cells involved in spatial navigation (O’Keefe and Dostrovsky, 1971; Wilson and McNaughton, 1993) are known for their singlepeaked tuning. The interesting observation in this context is that place cells produce reliable signals faster than their input signals from the medial entorhinal cortex with a combination of single and multipeaked tuning (Cholvin et al., 2021). On the other hand, for sufficiently slowvarying stimuli, a periodic population can be used together with error correction to remove catastrophic errors (Sreenivasan and Fiete, 2011). Furthermore, for nonperiodic stimuli with large domains, the combinatorial nature of periodic tuning curves can create unique stimulus representations far exceeding the spatial periods of the tuning curves (Fiete et al., 2008). Thus, periodic tuning curves are ideal for representing space, where the stimulus domain can be vast, and the change in position is constrained by the speed of movement. Interestingly, when faced with very large arenas, place cells can also exhibit multipeaked tuning (Eliav et al., 2021).
To summarize, we provide normative arguments for the singlepeaked tuning of early visual areas. Rapid decoding of stimulus is crucial for the survival of the animals. Consistent with this, animals and humans can process sensory information at impressive speeds. For example, the human brain can generate differentiating eventrelated potentials to go/nogo categorization tasks using novel complex visual stimuli in as little as 150 ms (Thorpe et al., 1996). These ‘decoding’ times do not decrease for highly familiar objects, suggesting that the speed of visual processing cannot be reduced by learning (FabreThorpe et al., 2001). Given constraints on low latency communication, it is crucial that each population can quickly produce a reliable signal. In this regard, singlepeaked tuning curves are superior to periodic ones. The fact that early visual areas exhibit ongoing activity and encode multidimensional stimuli further strengthens the relevance of the differences in minimal decoding time.
To conclude, our work highlights that minimum decoding time is an important attribute and should be considered while evaluating candidate neural codes. Our analysis suggests that decoding of highdimensional stimuli can be prohibitively slow with ratebased tuning curves. Experimental data on the representation of highdimensional stimuli is rather scant as relatively lowdimensional stimuli are typically used in experiments (e.g. oriented bars). Our work gives a compelling reason to understand whether and how biological brains can reliably encode highdimensional stimuli at behaviorally relevant time scales.
Materials and methods
Minimal decoding times  simulation protocols
Request a detailed protocolTo study the dependence of decoding time $T$ on MSE for populations with different distributions of spatial frequencies, we simulated populations of synthetic tuning curves (Equation 1). The stimulus was circular with a ${[0,1)}^{D}$ range. The preferred stimulus conditions ${\mathbf{s}}^{\prime}$ were sampled independently from a random uniform distribution over ${[0,1)}^{D}$ (independently and uniformly for each stimulus dimension). To ensure equal comparison, the preferred locations ${\mathbf{s}}^{\prime}$ were shared across all populations. Each neuron’s amplitude, a_{i}, was tuned according to Equation A1.5 to ensure an equal average firing rate across the stimulus domain for all neurons. In each trial, a stimulus $\mathbf{s}\in {[0,1)}^{D}$ was also independently sampled from a uniform distribution over ${[0,1)}^{D}$. The spike count for each neuron was then sampled according to Equation 3.
Minimal decoding time was defined as the shortest time for which the neural population approximately reaches the CramérRao bound. To estimate the reaction time in simulations, we incrementally increased the decoding time $T$ (using 1 ms increments, starting at $T=1$ ms) until
As the ML estimator is asymptotically efficient (attaining the CramérRao bound in the limit of infinite data), the threshold factor, $\alpha $, in Equation 10 was added as a relaxation (see figure captions for choices of $\alpha $). Note that the mean bars on the left and righthand sides of Equation 10 refer to the means across stimulus dimensions (for multidimensional stimuli) and that $\text{diag}(\cdot )$ refers to taking the diagonal elements from the inverse of the Fisher information matrix, $J(T,\lambda {)}^{1}$. For a given decoding time $T$, the estimation of MSE was done by repeatedly sampling random stimulus conditions (from a uniform distribution), sampling a noisy response to the stimulus (Poisson distributed spike counts), and then applying maximum likelihood estimation (see next section ’Implementation of maximum likelihood estimator’ for details on implementation). In Figures 3 and 9, 15000 stimulus conditions were drawn for each $T$, and in Figure 4, stimulus conditions were repeatedly drawn until the two first nonzero digits of the MSE were stable for 1000 consecutive trials. However, in controls not presented here, we could not see any significant difference between these two sampling approaches. Because the Fisher information matrix $J$ was analytically estimated only in the special case without ongoing activity, it was approximated in simulations by the elementwise average across 10,000 randomly sampled stimulus conditions (also uniformly distributed), where each element was calculated according to Equation A2.12or Equation A2.13 given a random stimulus trial.
Implementation of maximum likelihood estimator
Request a detailed protocolGiven some noisy neural responses, $\mathbf{r}$, the maximum likelihood estimator (MLE) chooses the stimulus condition which maximizes the likelihood function, ${\widehat{\mathbf{s}}}_{ML}={\mathrm{arg}\mathrm{max}}_{\mathbf{s}}\mathcal{L}(\mathbf{r},\mathbf{s})={\mathrm{arg}\mathrm{max}}_{\mathbf{s}}{\prod}_{i=1}^{N}p({r}_{i}\mathbf{s})$. A common approach is to instead search for the maximum of the loglikelihood function (the logarithm is a monotonic function and therefore preserves any maxima/minima). The stimulusdependent terms of the loglikelihood can then be expressed as
Unfortunately, the loglikelihood function is not guaranteed to be concave, and finding the stimulus condition ${\widehat{\mathbf{s}}}_{ML}$ which maximizes the loglikelihood function is not trivial (a nonconvex optimization problem). To overcome this difficulty, we combined gridsearch with the Nelder–Mead method, an unconstrained nonlinear program solver (implemented using MATLAB’s builtin function fminsearch, https://www.mathworks.com/help/matlab/ref/fminsearch.html).
Grid search was used to find a small set of starting points with large loglikelihood values. To do so, we sampled 100 random stimulus conditions within the stimulus interval ${[0,1)}^{D}$ and selected the four stimulus conditions with the largest loglikelihood values. The true stimulus condition, ${\mathbf{s}}_{true}$, was always added to the set of starting points regardless of the loglikelihood value of that condition (yielding a total of 5 starting points).
Then the Nelder–Mead method was used with these starting points to find a set of 5 (possibly local) maxima. The stimulus was decoded as the stimulus, $\widehat{\mathbf{s}}$, yielding the largest loglikelihood of the 5 maxima. As we always included the true stimulus condition in the NelderMead search, this approach should not overestimate the amount of threshold distortion but can potentially miss some global estimation errors instead. Finally, as the Nelder–Mead method is unconstrained but the stimulus domain periodic, the output of the maximum likelihood decoder was transformed into the stimulus interval ${[0,1)}^{D}$ by applying the mod 1 operation on each stimulus dimension,
Given an estimated stimulus, ${\widehat{s}}_{ML}$, the error was then evaluated along each stimulus dimension independently, taking into account the periodic boundary,
for $i\in \{1,\mathrm{\dots},D\}$.
Lastly, to rule out that the estimates before the modoperation, $\widehat{\mathbf{s}}$, outside of the stimulus domain ${[0,1)}^{D}$ did not influence the results, we also discarded these samples, but this produced similar results.
Spiking network model
Stimuli
Request a detailed protocolAs in the previous simulations, we assumed that the stimulus domain was a circular stimulus defined between $[0,1)$. We simulated the responses to two different types of stimuli, (1) a steplike change in stimulus condition from $s=0.25$ to $s=0.75$ and (2) a stimulus drawn from a modified Ornstein–Uhlenbeck process
For parameter values, see Table 1.
Network model
Request a detailed protocolThe spiking networks were implemented as twolayer, feedforward networks using LIF neurons with (Dirac) delta synapses. The dynamics of the membrane potential for the neurons in the first layer were described by
where ${V}_{i}$ is the voltage of neuron $i$, ${\tau}_{mem}$ the membrane time constant, t_{k} the timing of the $k$ th input spike to neuron $i$, and ${J}_{E}$ the induced EPSP. The neurons in the first layer were constructed to correspond to either singlepeaked or periodic tuning curves. Two networks were tested, one network where the first layer corresponds to singlepeaked tuning curves and a second network corresponding to periodic tuning curves (with $L=4$ modules). For each neuron $i$ in module $j$ in the first layer, the input was drawn from independent Poisson point processes with stimulusdependent rates ${f}_{i}^{(j)}(s(t))$
Here, the constants $a$ and $b$ were chosen such that the baseline firing rate was slightly above zero and the maximal firing rate was slightly below 20 sp/s (see Table 3 for all networkrelated parameter values). Because of the choice of ${\lambda}_{j}$, the modulation strengths of the inputs were such that the average input to each neuron was equal. For each module in the first layer, the preferred locations ${s}_{i}^{(j)}$ were equidistantly placed across $[0,{\lambda}_{j})$.
Similarly, for the second layer, the membrane potential was described by
where ${J}_{EE}({\mathrm{\Delta}}_{i,j})$ and ${J}_{I}({\mathrm{\Delta}}_{i,j})$ are synapsespecific EPSPs/IPSPs which depends on the difference in preferred tuning ${\mathrm{\Delta}}_{i,j}$ between the pre and postsynaptic neurons (see Equation 18), ${t}_{k}^{(j)}$ the timing of the $k$ th spike of presynaptic neuron $j$, and $d$ the delay (see Table 2, Table 3 for parameter values). The neurons in the second layer were only tuned to a single preferred stimulus location each, equidistantly placed across $[0,1)$. Whenever a spike occurred in the first layer, it elicited EPSPs with a delay of 1.5 ms in all neurons in the second layer. The size of the EPSPs depended on the difference in preferred tuning, ${\mathrm{\Delta}}_{i,j}$, between the pre and postsynaptic neurons
Here, ${J}_{EE}$ determines the maximal EPSP (mV), and the constant ${w}_{ro}$ was chosen such that the full width at half maximum of the EPSP kernels tiled the stimulus domain without overlap. Note that for periodically tuned neurons in the first layer (i.e. with multiple preferred locations), the ${\mathrm{\Delta}}_{i,j}$ were determined by the smallest difference in preferred tuning across the multiple preferred locations.
As for the excitatory neurons in the first layer, whenever a spike occurred in the second layer, it elicited IPSPs with a delay of 1.5 ms in all other neurons in the second layer. Again, the size of the IPSPs depended on the difference in preferred tuning, ${\mathrm{\Delta}}_{i,j}$ between the two neurons, but this time according to
Thus, the range of inhibition was much broader compared to the excitation.
Evaluating decoding performance
Request a detailed protocolWe assumed that the decoder was instantaneously based on the neuron index of the firing neuron in the readout layer. Let $\mathrm{\Phi}({t}_{k})$ denote a function that provides the index of the neuron firing at time t_{k}. Given the equidistant distribution of preferred locations for the readout neurons, the stimulus is instantaneously decoded by mapping the neuron identity to the interval $[0,1]$
where N_{2} is the number of neurons in the readout layer. For both stimulus cases, the decoding performance was evaluated using (1) the distribution of RMSE (Figure 7d) or estimated stimulus conditions (Figure 7e) in a sliding window or (2) the distributions of accumulated RMSE (Figure 8).
Parameters
Simulation tools
Request a detailed protocolAll the simulations were done using code written in MATLAB and Python (using Brian2 simulator Stimberg et al., 2019). The simulation code is available at https://github.com/movitzle/Short_Decoding_Time (copy archived at Lenninger, 2023 ).
Approximating minimal decoding time in twomodule systems
Request a detailed protocolTo gain an understanding of the interaction between two modules with different spatial periods, consider the likelihood function as a product of the likelihood functions of the two modules individually
Using the Laplace approximation, each of these functions can be approximated as a periodic sum of Gaussians (Wei et al., 2015). Assuming that each module becomes efficient before the joint likelihood, we only focus on the largest, periodically occurring, peaks
where ${\mathbf{r}}^{(i)}$ denotes the activity pattern of module $i$, ${s}_{ML}^{(i)}$ the peak closest to the true stimulus condition, s_{0}, and ${K}_{i}$ is large enough for ${\widehat{Q}}_{i}({r}^{(i)}s)$ to cover the entire stimulus range $[0,1)$. The approximation can be seen as ‘rolling out’ the stimulus domain from $[0,1)$ to $\mathbb{R}$. Therefore, to neglect the impact of the stimulus boundary, we assume that the stimulus is in the middle of the stimulus domain and ${K}_{1}=\lceil \frac{1}{2{\lambda}_{1}}\rceil $ and ${K}_{2}=\lceil \frac{1}{2{\lambda}_{2}}\rceil $. Furthermore, assuming that each module is efficient, the width of the Gaussians can be approximated as
where ${J}_{i}(s)\approx {J}_{i}$ is the Fisher information of module $i$. The joint likelihood function can thus be approximated as
As the likelihood functions depend on the particular realization of the spike counts, the distance between the modes of the respective likelihoods closest to the true stimulus condition s_{0}, ${\delta}_{0,0}={s}_{ML}^{(1)}{s}_{ML}^{(2)}$, is a random variable. Note that in the Results section, ${\delta}_{0,0}$ is simply referred to as $\delta $ for clarity.
The joint likelihood distribution $p(\mathbf{r}s)$ has its maximal peak close to the true stimulus condition s_{0} if ${\delta}_{0,0}$ is the smallest distance between any pairs of peaks of Q_{1} and Q_{2} (see Equation A3.7 for details). Assuming that both modules provide efficient estimates, the distance ${\delta}_{0,0}$ can be approximated as a normally distributed random variable
where ${J}_{k,norm}$ refers to the timenormalized Fisher information of module $k$. Thus, as the decoding time $T$ increases, the variance of ${\delta}_{0,0}$ decreases. Hence, it is necessary for the decoding time $T$ to be large enough such that it is rare for ${\delta}_{0,0}$ not to be the smallest distance between any pair of peaks. Similarly, the distance between the other pair of peaks in Q_{1} and Q_{2} within the stimulus range becomes
where ${n}_{1}\in \{{K}_{1},\mathrm{\dots},{K}_{1}\}$ and ${n}_{2}\in \{{K}_{2},\mathrm{\dots},{K}_{2}\}$ are indexing the different Gaussians as before. Thus, the threshold for catastrophic error is reached when there is another pair of modes with the same distance between them, that is,
for some n_{1} and n_{2} belonging to the index sets as above. Thus, to avoid catastrophic errors, it is necessary that
for all ${n}_{1}\in \{{K}_{1},\mathrm{\dots},{K}_{1}\}$ and ${n}_{2}\in \{{K}_{2},\mathrm{\dots},{K}_{2}\}$. By solving Equation 28, and taking into account that ${\delta}_{0,0}$ can be either positive or negative, we get the maximally allowed displacement
Note that for ${\lambda}_{1}=1$, all n_{1} represent the same mode (but one full rotation 1 away). Thus, we limit the search such that ${\lambda}_{1}{n}_{1}<1$ and ${\lambda}_{2}{n}_{2}<1$. Assuming that the period of the second module is a scaling of the first module, ${\lambda}_{2}=c{\lambda}_{1}$, the above equation becomes
Note that stimulus ambiguity can never be resolved if ${\delta}_{{n}_{1},{n}_{2}}={\delta}_{0,0}$ for some pair $({n}_{1},{n}_{2})\ne (0,0)$, which is analogous to the condition in Mathis et al., 2012. To limit the probability of catastrophic estimation errors from the joint distribution to some small error probability ${p}_{error}$, the following should hold
Because ${\delta}_{0,0}\sim N(0,{J}_{1}^{1}+{J}_{2}^{1})$, we have
where $\text{erf}(\cdot )$ is the errorfunction and $\sigma =\sqrt{{J}_{1}^{1}+{J}_{2}^{1}}$. By rearranging the terms and using Equation A2.8, we can obtain a lower bound on the required decoding time
where ${J}_{i,norm}$ is the timenormalized Fisher information of module $i$. Note that ${\delta}^{*}$ can easily be found using an exhaustive search according to Equation 29 or Equation 30.
Approximating minimal decoding time
Request a detailed protocolTo approximate the order by which the population reaction time scales with the distribution of spatial periods and the stimulus dimensionality, we extended the approximation method introduced by Xie, 2002. The key part of the approximation method is to use a Taylor series to reason about which conditions must hold for the distribution of errors to be normally distributed with a covariance equal to the inverse of the Fisher information matrix. Note that this approximation assumes the existence of a unique solution to the maximum likelihood equations, thus, it does not apply to ambiguous neural codes (e.g. $c=1/2,1/3,1/4,\mathrm{\dots}$, etc.).
First, let’s recollect the Taylor series with Lagrangian reminder for a general function g
where ${x}^{*}$ is somewhere on the interval $[x,x+\delta )$. Thus, in the multivariate case, the derivative in the j:th direction of the loglikelihood function for stimulus condition ${\widehat{s}}_{ML}=\widehat{s}$ can be rewritten using a Taylor series with Lagrangian reminder as
for all $k\in \{1,\mathrm{\dots},D\}$ where ${\mathbf{s}}^{\circ}$ is the true stimulus condition and ${\mathbf{s}}^{*}$ is a stimulus point between ${\mathbf{s}}^{\circ}$ and $\widehat{\mathbf{s}}$.
If the estimated stimulus is close to the true stimulus, then the quadratic order terms are small. If so, the variance of $(\widehat{\mathbf{s}}{\mathbf{s}}^{\circ})$ converges towards $N(0,{J}^{1})$ (in distribution), where $J$ is the Fisher information matrix (Lehmann and Casella, 1998). However, if the estimated stimulus is not close to the true stimulus, then the quadratic terms are not negligible. Therefore, when $T$ is sufficiently large, and the variance of the estimation follows the CramérRao bound, the following should hold for all $k\in \{1,\mathrm{\dots},D\}$
In this regime, we make the following termwise approximations
and
which gives
Because ${M}_{k,l,m}\approx 0$ unless $k=l=m$ (see Equation A4.3, Equation A4.4, Equation A4.5), Equation 39 simplifies to
Furthermore, because $J(\mathbf{s})$ is a diagonal matrix (see Equation A2.18), we have
Next, by taking the square of the absolute values, we obtain
Because we assumed that $N$ and $T$ are sufficiently large to meet the CramérRao bound, we have that
Inserting Equation 43 into Equation 42 gives
or, equivalently,
By approximating the term ${M}_{k,k,k}({\mathbf{s}}^{*})$ with an upper bound ${M}^{*}$ (see Equation A4.10) and using the expression for Fisher information (Equation A2.8), the expression for population reaction times can be obtained as
where $A(w)$ is a function of $w$. Lastly, by casting Equation 46 in terms of the scale factor $c$, and fitting using (for example) least square regression, we obtain
where $M$ is the number of neurons per module, and K_{1} and K_{2} are constants. Note that in the simulations, $w$ is fixed and $A(w)$ can therefore be incorporated into K_{1}.
Appendix 1
Tuning curves and spike count model
In the paper, we study the representation of a multidimensional stimulus $s=({s}_{1},\mathrm{\dots},{s}_{D})$. For simplicity, it is assumed that the range of the stimulus in each dimension is equal, such that ${s}_{j}\in [0,R)$ for all $j\in \{1,\mathrm{\dots},D\}$ for some stimulus range $R$ (in the main text, we consider $R=1$ for simplicity). Note that these assumption does not qualitatively change the results. Furthermore, we assume that the tuning curves were circular (von Mises) tuning curves
where a_{i} is the peak amplitude of the stimulusrelated tuning curve of neuron $i$, $w$ is a width scaling parameter, ${\lambda}_{i}$ defines the spatial period of the tuning, ${s}_{j,i}^{\prime}$ determines the location of the firing field(s) in the $j$:th dimension, and $b$ determines the amount of background activity. The amplitude parameters a_{i} were tuned such that all tuning curves had the same firing rate when averaged across all stimulus conditions (see Supplementary Equation A1.5).
It is possible to reparametrize the stimuli into a phase variable, $\varphi =\frac{{s}_{j}}{R}$. In the article, calculations and numerical simulation are based on phase variables $\varphi $. This only changes the MSE and Fisher information by a constant scaling $\frac{1}{{R}^{2}}$. As we are interested in comparing the minimal decoding times, not the absolute values of the MSE, we can drop the "unnormalized" stimulus $s$. The tuning curves in Supplementary Equation A1.1 can thus be rewritten using the phase variable $\varphi $ as
Given stimulus condition $\mathbf{s}$ (or $\varphi $) and decoding time $T$, the spike count of each neuron was independently sampled from a Poisson distribution with rate $T{f}_{i}(\mathbf{s})$. Thus, the probability of observing a particular spike count pattern $\mathbf{r}=({r}_{1},\mathrm{\dots},{r}_{N})$ given $\mathbf{s}$ is
Adjusting amplitudes
In order to make a fair comparison of decoding times across populations, we constrain each neuron to have the same average firing rate across the stimulus domain, $\overline{f}$. The average firing rate over the stimulus domain is
Thus, given a desired stimulusevoked firing rate, $\overline{{f}_{stim}}$, over a normalized stimulus range ($R=1$), the amplitudes will be set to
Note that the integrals in Equation A1.5 are analytically solvable whenever the relative spatial frequency ${\xi}_{i}=1/{\lambda}_{i}$ is a positive integer, in which case we have
regardless of ${\varphi}_{i,j}^{\prime}$, here ${B}_{0}(\cdot )$ is the modified Bessel function of the first kind. In simulations, $\overline{{f}_{stim,i}}=\overline{{f}_{stim}}$ was set such that tuning curves with integer spatial frequencies ($1/\lambda $) have amplitudes of 20 sp/s, that is,
Appendix 2
Fisher information and the CramérRao bound
Assuming a onedimensional variable, the CramérRao bound gives a lower bound on the MSE of any estimator $G$
where ${b}_{G}(s)=E[G(\mathbf{r})s]$ is the bias of the estimator $G$ and $J(s)$ is Fisher information, defined as
where the last equality holds if $p(\mathbf{r}s)$ is twice differentiable and the neural responses are conditionally independent (Lehmann and Casella, 1998). Assuming an unbiased estimator, the bound can be simplified to
For multiparameter estimation, let $J(\mathbf{s})$ denote the Fisher information matrix, with elements defined analogously to Supplementary Equation A2.2
then (for unbiased estimators) the CramérRao bound is instead stated as the following matrix inequality (Lehmann and Casella, 1998)
in the sense that the difference $\mathrm{\Sigma}{J}^{1}(\mathbf{s})$ is a positive semidefinite matrix. Thus, this implies the following lower bound for MSE of the k:th term
where ${G}^{({s}_{k})}={\widehat{s}}_{k}$, that is, the estimation of s_{k} using estimator $G$. Note that if $J(\mathbf{s})$ is a diagonal matrix, that is, ${\{J(\mathbf{s})\}}_{j,k}=0$ for all $j\ne k$, then the following also holds
For the tuning curves defined in Supplementary Equation A1.1, the diagonal elements of the Fisher information matrix can be analytically solved assuming ${a}_{i}\sim a$ within each module (see Supplementary Equation A2.19)
where the bar indicates the sample average across modules and $R$ is the stimulus range (note that in the main text, we assume $R=1$). The offdiagonal elements, on the other hand, can be shown to be 0 (see below). Thus we have equality in the last inequality of Supplementary Equation A2.6, and the MSE for each stimulus dimension is lower bounded by the inverse of Supplementary Equation A2.8.
Approximating Fisher information
To analytically approximate the Fisher information for a given neural population, we will neglect the impact of ongoing activity $b$. Then, the tuning curves in Supplementary Equation A1.1 factorize as ${f}_{i}(s)=a{q}_{1,i}({s}_{1})\mathrm{\dots}{q}_{D,i}({s}_{D})$ and the loglikelihood for $N$ neurons with conditionally independent spike counts becomes
By taking the second derivatives w.r.t. stimulus dimension, we get for $k=l$:
and for $k\ne l$
Consequently, the elements of the Fisher information matrix are given by
and for $k\ne l$
To simplify calculations, it is possible to reparametrize the stimulus as in Supplementary Equation A1.2 using the formula for Fisher information under reparametrization (Lehmann and Casella, 1998)
to obtain
We can approximate the elements of the Fisher information matrix $J(\varphi )$ in the limit of large $N$ by replacing the sums with integrals, for example,
where $L$ is the number of distinct modules, $M$ is the number of neurons in each module, $d{\varphi}^{\prime}=d{\varphi}_{1}^{\prime}\mathrm{\dots}d{\varphi}_{D}^{\prime}$, and the Ddimensional integral is taken over the interval $[{\varphi}_{p}\frac{1}{2}{\lambda}_{j},{\varphi}_{p}+\frac{1}{2}{\lambda}_{j})$ along each dimension. Making the variable substitution ${\theta}_{p}=\frac{2\pi}{{\lambda}_{j}}({\varphi}_{p}{\varphi}_{p}^{\prime})$ for $p=\{1,\mathrm{\dots},D\}$ we have
where the sample average is taken over the population’s distribution of spatial frequencies and ${B}_{\alpha}(\cdot )$ is the modified Bessel function of the first kind. Similar calculations for the case $k\ne l$ yield
Thus, the stimulus parameters will be asymptotically orthogonal for all of the populations considered in this paper. That is, the covariance matrix will be diagonal. The perneuron average contribution to each diagonal element of the Fisher information matrix, as reported in the main text, is, therefore
Appendix 3
Maximum of the joint likelihood function (2 module case)
Assuming that the responses of the two modules are independent, the joint likelihood function $p(\mathbf{r}s)$ can be decomposed into a product of the likelihood functions of the two modules. Using the approximation of each ${Q}_{1}(\mathbf{r}s)$ and ${Q}_{2}(\mathbf{r}s)$ as Gaussian sums (see Materials and methods), we have the following
Thus, the contribution of the $p$:th and $q$:th mode of ${\widehat{Q}}_{1}$ and ${\widehat{Q}}_{2}$ to the joint likelihood function is
where we in the last step renamed ${s}_{ML}^{(1)}+p{\lambda}_{1}R$ and ${s}_{ML}^{(2)}+q{\lambda}_{2}R$ to s_{p} and s_{q}, respectively. Unless the width $w$ of the tuning curves or the range $R$ is very large, all the modes of ${\widehat{Q}}_{1}$ and ${\widehat{Q}}_{2}$, respectively, are well separated (see the end of the section). Thus, it is a reasonable approximation that the maximum of $p(\mathbf{r}s)={Q}_{1}(s){Q}_{2}(s)$ is defined by the maximum of ${\widehat{Q}}_{1}^{p}(s){\widehat{Q}}_{2}^{q}(s)$ across all combinations of $p$ and $q$. Each combination ${\widehat{Q}}_{1}^{p}(s){\widehat{Q}}_{2}^{q}(s)$ reaches its maximum for some stimulus ${s}_{p,q}^{*}$:
Taking the derivative w.r.t. $s$ on the rightmost terms and solving gives
Thus, using ${\delta}_{p,q}=({s}_{p}{s}_{q})$, the maximal value of each pair ${\widehat{Q}}_{1}^{p}(s){\widehat{Q}}_{2}^{q}(s)$ is
Thus, the maximum likelihood choice will approximately be ${s}_{p,q}^{*}$ for the $p$:th and $q$:th mode with the smallest ${\delta}_{p,q}^{2}$, that is, the smallest distance between the modes. Lastly, all modes of ${\widehat{Q}}_{1}$ and ${\widehat{Q}}_{2}$, respectively, need to be sufficiently separated such that no two pairs of $p$ and $q$ reinforce each other. However, it is well known the full width at half maximum for a Gaussian function is $\text{FWHM}=2\sqrt{2\mathrm{ln}2}{\sigma}_{i}$, where for our functions ${\widehat{Q}}_{1}^{p}$ and ${\widehat{Q}}_{2}^{q}$, ${\sigma}_{1}=1/\sqrt{{J}_{1}}$ and ${\sigma}_{2}=1/\sqrt{{J}_{2}}$. Thus, given the expression for Fisher information in Equation A2.8, the FWHM can be expressed as
Thus, for the modes to be separated, it is reasonable to require that the FWHM is no longer than one period length of the module, that is, ${\lambda}_{i}$. Hence, we have that
Rewriting this into a bound on the time $T$ needed for the assumption of separation, we get
For the parameters used in our simulations, this is satisfied very fast, on the order of tens of microseconds. However, note that the assumption of each module providing efficient estimates, which is a prerequisite for these approximations, requires significantly longer time scales. Thus, the assumption that the individual modes of ${\widehat{Q}}_{1}$ and ${\widehat{Q}}_{2}$ are wellseparated is, in our case, not likely to be a restrictive assumption.
Appendix 4
Calculate ${M}_{k,l,m}$
To approximate the minimal decoding time, we need to calculate (see Equations 38; 39, main text)
For $k\ne l\ne m$, using Supplementary Equation A1.1, Equation A1.2, Equation A1.3, we have
Thus, ${M}_{k,l,m}$ for $k\ne l\ne m$ becomes
as odd functions over even intervals integrate to zero. For $k\ne l=m$ (note that $k=l\ne m$ and $k=m\ne l$ follows by symmetry) we have
and hence,
Lastly, for $k=l=m$ we have,
Thus ${M}_{k,k,k}({\mathbf{s}}^{\ast})$, becomes
Each term above have a dependence on $\mathrm{sin}(\frac{2\pi}{{\lambda}_{i}R}({s}_{k}^{*}{s}_{k,i}^{\prime}))$, with an odd power. Therefore, when multiplying with $f({\mathbf{s}}^{*})$ and integrating as above, these terms vanish. Hence, we can focus only on the terms including $f({\mathbf{s}}^{\circ})$. After some calculus manipulation, it is possible to reduce the expression to include only $T\mathrm{sin}(\frac{2\pi}{{\lambda}_{i}R}({s}_{k}^{*}{s}_{k,i}^{\prime})){f}_{i}({\mathbf{s}}^{\circ})$ (for all $i$).
Unfortunately, as this integral includes both ${s}_{k}^{*}$ and ${s}_{k}^{\circ}$, no simple expression can be obtained. Using the variable substitution ${\theta}_{k}^{*}=\frac{2\pi}{{\lambda}_{j}R}({s}_{k}^{*}{s}_{k}^{\prime})$, we can simplify it slightly to
Instead, we focus on the difference ${\varphi}_{j}^{*}={\varphi}_{k}({\lambda}_{j})={s}_{k}^{\circ}{s}_{k}^{*}$, which maximizes the above integral for each module. Thus, all integrals can be upper bounded by a constant ${C}^{*}$, yielding the upper bound
Note that the constant ${C}^{*}$ can be incorporated into the regression coefficient K_{1} in Equation 47 and that the stimulus range, $R$, is assumed to be $R=1$ in the main text.
Appendix 5
Approximating minimal required spike count
Given the approximation of minimal decoding time in Equation 9 (main text), we seek to reformulate the approximation in terms of the required total spike count, instead. The average total spike count for a given population and stimulus condition is
where $T$ is the decoding time. Thus, the average spike count over both stimulus conditions (assuming uniformly distributed stimulus and integer spatial frequencies) and trials for the entire population is
Consequently, the number of spikes evoked by the stimulusrelated tuning of the population is
Inserting Supplementary Equation A5.3 into Equation 46 (main text) reveals the number of stimulusevoked spikes, ${\mu}_{stim}^{*}$, the population must produce before reaching the predicted lower bound
Data availability
Code has been made publicly available on Github (https://github.com/movitzle/Short_Decoding_Time, copy archived at Lenninger, 2023).
References

Effects of noise correlations on information Encoding and decodingJournal of Neurophysiology 95:3633–3644.https://doi.org/10.1152/jn.00919.2005

Experimental evidence for sparse firing in the neocortexTrends in Neurosciences 35:345–355.https://doi.org/10.1016/j.tins.2012.03.008

Optimal shortterm population coding: When Fisher information failsNeural Computation 14:2317–2351.https://doi.org/10.1162/08997660260293247

Mutual information, Fisher information, and population codingNeural Computation 10:1731–1757.https://doi.org/10.1162/089976698300017115

Reading population codes: a neural implementation of ideal observersNature Neuroscience 2:740–745.https://doi.org/10.1038/11205

A limit to the speed of processing in ultrarapid visual Categorization of novel natural scenesJournal of Cognitive Neuroscience 13:171–180.https://doi.org/10.1162/089892901564234

What grid cells convey about rat locationThe Journal of Neuroscience 28:6858–6871.https://doi.org/10.1523/JNEUROSCI.568407.2008

Why neurons mix: high Dimensionality for higher cognitionCurrent Opinion in Neurobiology 37:66–74.https://doi.org/10.1016/j.conb.2016.01.010

Computing with populations of Monotonically tuned neuronsNeural Computation 15:2115–2127.https://doi.org/10.1162/089976603322297313

Receptive fields, Binocular interaction and functional architecture in the cat's visual cortexThe Journal of Physiology 160:106–154.https://doi.org/10.1113/jphysiol.1962.sp006837

Nonlinear mixed selectivity supports reliable neural computationPLOS Computational Biology 16:e1007544.https://doi.org/10.1371/journal.pcbi.1007544

Correlations and neuronal population informationAnnual Review of Neuroscience 39:237–256.https://doi.org/10.1146/annurevneuro070815013851

Performance breakdown in optimal stimulus decodingJournal of Neural Engineering 12:036012.https://doi.org/10.1088/17412560/12/3/036012

SoftwareShort_Decoding_Time, version swh:1:rev:10086d954d5baaf5bf2c4e5f5b8ec75492e21c19Software Heritage.

Bayesian inference with probabilistic population codesNature Neuroscience 9:1432–1438.https://doi.org/10.1038/nn1790

Optimal population codes for space: grid cells outperform place cellsNeural Computation 24:2280–2317.https://doi.org/10.1162/NECO_a_00319

Novel behavioral paradigm reveals lower temporal limits on mouse olfactory decisionsThe Journal of Neuroscience 35:11667–11673.https://doi.org/10.1523/JNEUROSCI.469314.2015

Orientation selectivity in Macaque V1: diversity and Laminar dependenceThe Journal of Neuroscience 22:5639–5651.https://doi.org/10.1523/JNEUROSCI.221305639.2002

Processing speed in the cerebral cortex and the Neurophysiology of visual maskingProceedings of the Royal Society of London. Series B 257:9–15.https://doi.org/10.1098/rspb.1994.0087

Nonlinear population codesNeural Computation 16:1105–1136.https://doi.org/10.1162/089976604773717559

Communication in the presence of noiseProceedings of the IRE 37:10–21.https://doi.org/10.1109/JRPROC.1949.232969

Grid cells generate an analog errorcorrecting code for singularly precise neural computationNature Neuroscience 14:1330–1337.https://doi.org/10.1038/nn.2901

Information Encoding and the responses of single neurons in the Primate temporal visual cortexJournal of Neurophysiology 70:640–654.https://doi.org/10.1152/jn.1993.70.2.640
Decision letter

Peter LathamReviewing Editor; University College London, United Kingdom

Panayiota PoiraziSenior Editor; Foundation for Research and Technology Hellas, Greece

Stefano PanzeriReviewer; UKE, Germany
Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Decision letter after peer review:
Thank you for submitting your article "Are singlepeaked tuning curves tuned for speed rather than accuracy?" for consideration by eLife. Your article has been reviewed by 2 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Panayiota Poirazi as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Stefano Panzeri (Reviewer #2).
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
Essential revisions:
While this paper makes an extremely important point, one that should definitely be communicated to the broader community, in our opinion it suffers from two (relatively easily fixable) problems:
I. Unless one is an expert, it's very hard to extract why multipeaked tuning curves lead to catastrophic errors.
II. It's difficult to figure out under what circumstances multipeaked tuning curves are bad. This is important, because there are a lot of neurons in sensory cortex, and one would like to know whether periodic tuning curves are really a bad idea there.
And here are the fixes:
I. Figure 1c seems like a missed opportunity to explain what's really going on, which is that on any particular trial the positions of the peaks of the log likelihood can shift in both phase and amplitude (with phase being more important). The reason it's a missed opportunity is that Figure 1c shows the average log likelihood, which makes it hard to understands why there are catastrophic errors. It would really help if Figure 1c were expanded into its own large figure, with sample log likelihoods showing catastrophic errors for multipeaked tuning curves but not for singlepeaked ones. You could also indicate why, when multipeaked tuning curves do give the right answer, the error tends to be small.
II. What the reader really wants to know is: would sensory processing in real brains be more efficient if multipeaked tuning curves were used? That's certainly hard to answer in all generality, but you could make a comparison between a code with singlepeaked tuning curves and a _good_ code with multipeaked tuning curves. Our guess is that a good code would have λ_1=1 and c around 0.5 (but not, of course, exactly 1/2; you could use the module ratio the grid cell people came up with  we think 1/sqrt(2)  although we doubt if it matters much). What would be great would be a relatively simple relation, possibly fitted empirically to simulations, that told us when multipeaked tuning curves are better than singlepeaked one, given the number of neurons, background and peak firing rates, and dimensionality. Of course, this may be impossible, but if so it's important to state that up front, so that people know how difficult this is.
In addition, we have a number of comments, all designed to improve the paper. Feel free to disagree with any of them; we don't feel massively strongly about them. Also, if you implement them, no reason to provide a reply  we're sensitive to the fact that replies to the reviewer are becoming longer than the paper!
Reviewer #1 (Recommendations for the authors):
1. When catastrophic errors are possible, we agree that it's very hard to find a good measure of error. RMS error clearly isn't very good. However, it's not clear minimal decoding time is any better, since it'd defined in terms of RMS error. Finding a good measure seems critical when comparing codes, given that Fisher info can change by 5 orders of magnitude while decoding time changes by only 1 (e.g., Figure 4d). Moreover, we could be wrong, but our intuition is that when λ_1=1, catastrophic errors aren't much of an issue. Is that correct? If so, time to threshold (defined to be time for the RMS error to be twice that predicted by the inverse of the Fisher info) could be a bad measure.
Here we don't have any really great suggestions. But you might try something like [<(error)^k>/((2k1)!!)]^{1/k} for k larger than 2 and even (the factor of (2k1)!!) makes this equal to the RMS error for Gaussian distributions. But other measures are possible.
2. Are you using periodic boundary conditions? Tuning curves are clearly periodic, but decoding near the boundaries is a bit tricky. For instance, the ML estimator can put the decoded value outside the range [0, 1). This should be clarified.
3. It should be clear that the expression for Fisher info (Equation 3) applies when a_i=a and b=0.
4. Figure 1a: it would be helpful to indicate the location of s_1 and s_2 in the top panel.
Also, isn't Figure 1a a bit misleading? The problem isn't a small number of spikes; it's a phase shift in the log likelihood.
5. l 99100: "For the largest mode of the joint likelihood function to also be centered close to the true stimulus condition, the distance δ between s^(1)_ML and s^(2)_ML must be smaller than between any other pair of modes of Q_1 and Q_2." We can see why this might be reasonable, but it's not totally obvious that the distance between the modes is the only thing that matters. This needs to be shown.
6. l 100101: "Thus, to avoid catastrophic errors, δ must be smaller than some largest allowed distance δ* (see Methods for calculation of δ*)." This makes it seems like δ* is something complicated. But in fact it's exactly half the smallest distance between any other pair of modes. Why not say so?
7. l 1201: "estimates can cause catastrophic shifts in the joint likelihood function. For singlepeaked tuning curves (λ_1 = 1), however, only small scale factors c can pose such problems." Should "For singlepeaked tuning curves" be "When the first mode has a single peak". If not, we're lost.
8. l 128: "Furthermore, only populations with unambiguous codes over the stimulus interval were included [22]." Can't parse.
9. Equation 8 seems unlikely to be correct, given that it doesn't depend on c, which we know is critical (as shown in Figure 1e). However, if you do keep it, it would be helpful to express it in terms of the average firing rate. If you do that, lots of terms go away.
10. Figure caption 2c: "colored lines show the estimated minimal decoding time from simulations and the black lines show the fitted theoretical predictions". Presumably the description of the fitted lines was given on on l. 178. However, we can't tell from this description what you actually did. Equations are needed.
And presumably this also comes up on l. 211: "ratios of fitted regressors K1 were approximately 1.69 and 1.72 for D = 1 and D = 2, respectively".
11. l. 179: "Within this range of scale factors". Within what range?
12. l 1845: "As suggested by Figure 2d, there is also a strong correlation between Fisher information and minimal decoding time again indicating a speedaccuracy tradeoff." There's certainly a tradeoff between Fisher info and minimal decoding time, but, as mentioned in point 1, minimal decoding time isn't always a great measure. You should be up front about this.
13. The tables should be moved to the main text  the parameters (especially the number of modules) are highly relevant. Or what would be even more useful would be to put the parameters in the figure captions.
14. Figure 5: by "Gaussian pop." and "Periodic pop." do you mean "Single peaked" and "multipeaked"? If so, this should be clear.
15. l 303: "Thus, minimal decoding time should set a bound on the number of features a population can jointly encode reliably." Very interesting prediction! Which you point out on lines 3257: "Our work gives a compelling reason to understand whether and how biological brains can reliably encode highdimensional stimuli at behaviorally relevant time scales." But this should be quantified; see point 1.
Reviewer #2 (Recommendations for the Authors):
The paper makes a simple but important point. It is also remarkably well explained especially considering the complexity of the calculations. For once, I do not feel the need to make many suggestions for improvement.
One point that could be better addressed regards the justification of speed of encoding based on data on sensory cortex. The introduction seems heavy on the visual system based on the results of Thorpe and colleagues. In the discussion (lines 288289) the authors point out (though the sentence is not very strongly linked to the overall text flow) that the first few spikes carry significant information in the visual and olfactory system. While the point of raising the importance f speed of encoding in sensory areas is valid, I am not sure that it should privilege the visual system or the visual and olfactory system. I think that somatosensory modalities encode information faster and earlier than visual modalities, including with regard to the prominence of first spikes (see e.g. Panzeri et al. Neuron 2001). If anything, visual cortical data show encoding in longer time scales. It would be nice to have a more precise introduction to and discussion of these topics to motivate this work and to evaluate its implications.
https://doi.org/10.7554/eLife.84531.sa1Author response
Essential revisions:
While this paper makes an extremely important point, one that should definitely be communicated to the broader community, in our opinion it suffers from two (relatively easily fixable) problems:
I. Unless one is an expert, it's very hard to extract why multipeaked tuning curves lead to catastrophic errors.
II. It's difficult to figure out under what circumstances multipeaked tuning curves are bad. This is important, because there are a lot of neurons in sensory cortex, and one would like to know whether periodic tuning curves are really a bad idea there.
And here are the fixes:
I. Figure 1c seems like a missed opportunity to explain what's really going on, which is that on any particular trial the positions of the peaks of the log likelihood can shift in both phase and amplitude (with phase being more important). The reason it's a missed opportunity is that Figure 1c shows the average log likelihood, which makes it hard to understands why there are catastrophic errors. It would really help if Figure 1c were expanded into its own large figure, with sample log likelihoods showing catastrophic errors for multipeaked tuning curves but not for singlepeaked ones. You could also indicate why, when multipeaked tuning curves do give the right answer, the error tends to be small.
We thank the reviewer for this suggestion. We have now split the first figure into two.
In the new Figure 1, we provide an intuitive explanation of local vs catastrophic errors and singlepeaked / periodic tuning curves. We have also added smaller panels to illustrate how the distribution of errors changes with decoding time (using a simulated singlepeaked population).
The new Figure 2 shows sampled likelihoods for 3 different populations. We hope this provides some intuitive understanding of the phase shifts. Unfortunately, it proved difficult not to normalize the “height” of each module’s likelihood as they can differ by several orders of magnitude across the modules. However, due to the multiplication, the peak likelihood values can (approximately) be disregarded in the MLdecoding. Lastly, we have also added more simulation points (scale factors) compared to what we had in the earlier version of the figure (see panels de).
II. What the reader really wants to know is: would sensory processing in real brains be more efficient if multipeaked tuning curves were used? That's certainly hard to answer in all generality, but you could make a comparison between a code with singlepeaked tuning curves and a _good_ code with multipeaked tuning curves. Our guess is that a good code would have λ_1=1 and c around 0.5 (but not, of course, exactly 1/2; you could use the module ratio the grid cell people came up with  we think 1/sqrt(2)  although we doubt if it matters much). What would be great would be a relatively simple relation, possibly fitted empirically to simulations, that told us when multipeaked tuning curves are better than singlepeaked one, given the number of neurons, background and peak firing rates, and dimensionality. Of course, this may be impossible, but if so it's important to state that up front, so that people know how difficult this is.
We thank the reviewer for this comment and the suggestions. We agree, ideally such an expression would be useful. However, as you note it is a very challenging task due to the large parameter space (number of neurons, peak amplitude, ongoing background firing rate, width of tuning, stimulus dimensionality etc) and is beyond the scope of the present study. We have instead included a new figure (see Figure 7 in the manuscript) detailing the minimal decoding times for various choices of parameter values. We believe this gives an indication to how minimal decoding time scales with various parameters.
In addition, we have a number of comments, all designed to improve the paper. Feel free to disagree with any of them; we don't feel massively strongly about them. Also, if you implement them, no reason to provide a reply  we're sensitive to the fact that replies to the reviewer are becoming longer than the paper!
Reviewer #1 (Recommendations for the authors):
1. When catastrophic errors are possible, we agree that it's very hard to find a good measure of error. RMS error clearly isn't very good. However, it's not clear minimal decoding time is any better, since it'd defined in terms of RMS error. Finding a good measure seems critical when comparing codes, given that Fisher info can change by 5 orders of magnitude while decoding time changes by only 1 (e.g., Figure 4d). Moreover, we could be wrong, but our intuition is that when λ_1=1, catastrophic errors aren't much of an issue. Is that correct? If so, time to threshold (defined to be time for the RMS error to be twice that predicted by the inverse of the Fisher info) could be a bad measure.
Here we don't have any really great suggestions. But you might try something like [<(error)^k>/((2k1)!!)]^{1/k} for k larger than 2 and even (the factor of (2k1)!!) makes this equal to the RMS error for Gaussian distributions. But other measures are possible.
We thank the reviewer for these questions, they greatly helped us devise some additional figures for explanation. In the following we provide a reply to each of the points raised by the reviewer.
i. All populations (even singlepeaked) suffer from catastrophic errors when the decoding time is too short.We have added a panel showing this for singlepeaked tuning, see Figure 1ce.
ii. We acknowledge that the choice of measure for minimal decoding time is not a perfect measure, but wedo not agree that it is not better than only studying RMSE. We believe they all have their strengths and weaknesses. The choice of minimal decoding time criteria compares the empirical RMSE with the predicted RMSE if no catastrophic distortion is present (i.e., inverse of Fisher information). Thus, this measure does provide new information that neither RMSE nor Fisher information in themselves capture. We have highlighted this in points ivv.
iii. Regarding the order of magnitude, we believe that although the minimal decoding time differs only onone order of magnitude, ecologically it could be more beneficial to reduce the decoding time with (say) 30 ms per neural population while still providing reasonable accuracy.
iv. Regarding catastrophic errors for λ_1 = 1, we have added a comparison between singlepeakedtuning curves and periodic tuning curves (λ_1 = 1, and c = 1/1.44) using the optimal scale factor for a 2dimensional stimulus (Figure 7 ac). It clearly shows how large estimation errors occur in the periodic population even after the periodic population has lower MSE than the singlepeaked. However, only the singlepeaked population have approached the CRbound. This provides an example of when RMSE and Fisher information alone can provide a possibly erroneous conclusion.
v. About minimal decoding time being the wrong measure, we have also added a supplementary figureusing an alternative definition of minimal decoding time – instead of comparing MSE with the CRbound, we compared the entire distribution of empirical errors against the Gaussian distribution predicted by Fisher information. Using onesided KolmogorovSmirnov (KS) tests, we gradually increased the decoding time until the empirical distribution could not be distinguished from the predicted distribution with a significance level of 0.05. To correct for multiple sequential hypothesis testing, we imposed a Bonferronitype of correction, scaling the significance level with the number of sofar considered decoding times. I.e., 0.05 / j, where “j” is the j:th comparison. This correction does not seem to change the results significantly and it is comparable to the value without the correction. This alternative definition gives qualitatively similar results (see Figure 3 – —figure supplement 4). [See lines 269274]
Thus, we believe that all measures, MSE, Fisher information and minimal decoding time, have their strengths and weaknesses – therefore we argue they should be considered together and not in isolation!
2. Are you using periodic boundary conditions? Tuning curves are clearly periodic, but decoding near the boundaries is a bit tricky. For instance, the ML estimator can put the decoded value outside the range [0, 1). This should be clarified.
Yes, thank you for catching this. We assume that the stimulus is periodic in the range [0,1) for precisely this reason. We have updated the manuscript to clarify this [Line 9699].
3. It should be clear that the expression for Fisher info (Equation 3) applies when a_i=a and b=0.
Thank you for noticing this. We have specified it in the text [Line 110]
4. Figure 1a: it would be helpful to indicate the location of s_1 and s_2 in the top panel.
Thank you for noticing this. We have mentioned it in the figure.
Also, isn't Figure 1a a bit misleading? The problem isn't a small number of spikes; it's a phase shift in the log likelihood.
The catastrophic errors are not just a property of Bayesian estimators, such errors are observed for any nonlocal estimator. This type of illustration as shown in our Figure is well established as an intuitive explanation of these errors (see Shannon, Sreevivasan, Wernersson, etc.). When presenting this work, we have found that this illustration is very useful for providing some intuition about the role of catastrophic errors. Secondly, regarding the number of spikes and phase shift, our study suggests that the phase shift is related to the number of spikes (see Figure 3 – —figure supplement 5). If the number of spikes is small, then the ambiguousness of a multipeaked code might not be resolved. Thus, these two concepts are highly linked.
5. l 99100: "For the largest mode of the joint likelihood function to also be centered close to the true stimulus condition, the distance δ between s^(1)_ML and s^(2)_ML must be smaller than between any other pair of modes of Q_1 and Q_2." We can see why this might be reasonable, but it's not totally obvious that the distance between the modes is the only thing that matters. This needs to be shown.
We thank the reviewer for the suggestion. We have clarified this in the revised manuscript. See the new text in the appendix with a reference to it in the method section, but it is (of course) an approximation. [line 574577 and Appendix 3]
6. l 100101: "Thus, to avoid catastrophic errors, δ must be smaller than some largest allowed distance δ* (see Methods for calculation of δ*)." This makes it seems like δ* is something complicated. But in fact it's exactly half the smallest distance between any other pair of modes. Why not say so?
Strictly speaking δ* is not half the distance of any other pair of peaks.\δ* depends on \δ_{0,0} (see equations 1116). It is unfortunately not easy to describe it in words.
7. l 1201: "estimates can cause catastrophic shifts in the joint likelihood function. For singlepeaked tuning curves (λ_1 = 1), however, only small scale factors c can pose such problems." Should "For singlepeaked tuning curves" be "When the first mode has a single peak". If not, we're lost.
Yes, we agree with the reviewer. We have, corrected this [see Line 183]
8. l 128: "Furthermore, only populations with unambiguous codes over the stimulus interval were included [22]." Can't parse.
We have added a clarification that the only requirement is that c is not ½, ⅓, ¼ etc. [see Lines 196197]
9. Equation 8 seems unlikely to be correct, given that it doesn't depend on c, which we know is critical (as shown in Figure 1e). However, if you do keep it, it would be helpful to express it in terms of the average firing rate. If you do that, lots of terms go away.
Equation 8 is correct, we clarified that it is sample averages and that the scale factor is implicitly defined within this average. We also added the suggested change, but as this is (strictly speaking) only true for integer number of peaks – we opted to add it as an approximate equation. If not integer peaks, the average firing rate is not exactly this expression for homogeneous amplitudes a_i = a. As it turns out in simulations, though, it is approximately true. [See lines 222225]
10. Figure caption 2c: "colored lines show the estimated minimal decoding time from simulations and the black lines show the fitted theoretical predictions". Presumably the description of the fitted lines was given on on l. 178. However, we can't tell from this description what you actually did. Equations are needed.
And presumably this also comes up on l. 211: "ratios of fitted regressors K1 were approximately 1.69 and 1.72 for D = 1 and D = 2, respectively".
We have added equation references for this. [Line 259]
11. l. 179: "Within this range of scale factors". Within what range?
We have changed the text to “Within the simulated range of scale factors” [Line 259]
12. l 1845: "As suggested by Figure 2d, there is also a strong correlation between Fisher information and minimal decoding time again indicating a speedaccuracy tradeoff." There's certainly a tradeoff between Fisher info and minimal decoding time, but, as mentioned in point 1, minimal decoding time isn't always a great measure. You should be up front about this.
See previous comments.
13. The tables should be moved to the main text  the parameters (especially the number of modules) are highly relevant. Or what would be even more useful would be to put the parameters in the figure captions.
Thank you for this suggestion, we have done so for Figures 14. For the spiking model, we have not included any equations in the result section and therefore we are not sure adding the parameters would make sense. Therefore, we kept the tables in the method section for Figures 56.
14. Figure 5: by "Gaussian pop." and "Periodic pop." do you mean "Single peaked" and "multipeaked"? If so, this should be clear.
Thank you for catching this oversight. We have corrected this. See Figures 56
15. l 303: "Thus, minimal decoding time should set a bound on the number of features a population can jointly encode reliably." Very interesting prediction! Which you point out on lines 3257: "Our work gives a compelling reason to understand whether and how biological brains can reliably encode highdimensional stimuli at behaviorally relevant time scales." But this should be quantified; see point 1.
We thank the review for this comment. We hope the new Figure 7 will clarify this.
Reviewer #2 (Recommendations for the Authors):
The paper makes a simple but important point. It is also remarkably well explained especially considering the complexity of the calculations. For once, I do not feel the need to make many suggestions for improvement.
One point that could be better addressed regards the justification of speed of encoding based on data on sensory cortex. The introduction seems heavy on the visual system based on the results of Thorpe and colleagues. In the discussion (lines 288289) the authors point out (though the sentence is not very strongly linked to the overall text flow) that the first few spikes carry significant information in the visual and olfactory system. While the point of raising the importance f speed of encoding in sensory areas is valid, I am not sure that it should privilege the visual system or the visual and olfactory system. I think that somatosensory modalities encode information faster and earlier than visual modalities, including with regard to the prominence of first spikes (see e.g. Panzeri et al. Neuron 2001). If anything, visual cortical data show encoding in longer time scales. It would be nice to have a more precise introduction to and discussion of these topics to motivate this work and to evaluate its implications.
Thank you, this is an important point. We agree that sensory modalities ought to have similar constraints, and might operate on even faster time scales. Our results might also be applicable outside of the visual areas, too. We focused on early visual processing due to the established smooth singlepeaked tuning and the (often) periodic sensory features (e.g., orientation and color). For other modalities, we are not sure the tuning is continuous (e.g., odor) or periodic (e.g., somatosensory stimuli). All the simulations assumed periodic stimuli, however, we do not believe the periodic stimulus is of great importance for our results. For discrete stimuli, catastrophic errors can also occur – but the calculations made here assumed continuous variables. Extending this work to discrete stimulus points is an interesting direction.
We have expanded more on the applicability outside of the visual area in the discussion. [Lines 411, 446450].
https://doi.org/10.7554/eLife.84531.sa2Article and author information
Author details
Funding
Digital Futures
 Movitz Lenninger
 Mikael Skoglund
 Pawel Andrzej Herman
 Arvind Kumar
Vetenskapsrådet
 Arvind Kumar
Institute of Advanced Studies (Fellowship)
 Arvind Kumar
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
We thank the reviewers and editors for their helpful comments on improving the manuscript and Dr. Pascal Helson for proofreading the manuscript.
Senior Editor
 Panayiota Poirazi, Foundation for Research and Technology Hellas, Greece
Reviewing Editor
 Peter Latham, University College London, United Kingdom
Reviewer
 Stefano Panzeri, UKE, Germany
Version history
 Preprint posted: September 10, 2022 (view preprint)
 Received: October 28, 2022
 Accepted: May 11, 2023
 Accepted Manuscript published: May 16, 2023 (version 1)
 Version of Record published: June 9, 2023 (version 2)
Copyright
© 2023, Lenninger et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 631
 Page views

 82
 Downloads

 0
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Neuroscience
How does the human brain combine information across the eyes? It has been known for many years that cortical normalization mechanisms implement ‘ocularity invariance’: equalizing neural responses to spatial patterns presented either monocularly or binocularly. Here, we used a novel combination of electrophysiology, psychophysics, pupillometry, and computational modeling to ask whether this invariance also holds for flickering luminance stimuli with no spatial contrast. We find dramatic violations of ocularity invariance for these stimuli, both in the cortex and also in the subcortical pathways that govern pupil diameter. Specifically, we find substantial binocular facilitation in both pathways with the effect being strongest in the cortex. Nearlinear binocular additivity (instead of ocularity invariance) was also found using a perceptual luminance matching task. Ocularity invariance is, therefore, not a ubiquitous feature of visual processing, and the brain appears to repurpose a generic normalization algorithm for different visual functions by adjusting the amount of interocular suppression.

 Neuroscience
Tastes typically evoke innate behavioral responses that can be broadly categorized as acceptance or rejection. However, research in Drosophila melanogaster indicates that taste responses also exhibit plasticity through experiencedependent changes in mushroom body circuits. In this study, we develop a novel taste learning paradigm using closedloop optogenetics. We find that appetitive and aversive taste memories can be formed by pairing gustatory stimuli with optogenetic activation of sensory neurons or dopaminergic neurons encoding reward or punishment. As with olfactory memories, distinct dopaminergic subpopulations drive the parallel formation of short and longterm appetitive memories. Longterm memories are protein synthesisdependent and have energetic requirements that are satisfied by a variety of caloric food sources or by direct stimulation of MBMP1 dopaminergic neurons. Our paradigm affords new opportunities to probe plasticity mechanisms within the taste system and understand the extent to which taste responses depend on experience.