Representational untangling by the firing rate nonlinearity in V1 simple cells
Abstract
An important computational goal of the visual system is ‘representational untangling’ (RU): representing increasingly complex features of visual scenes in an easily decodable format. RU is typically assumed to be achieved in highlevel visual cortices via several stages of cortical processing. Here we show, using a canonical population coding model, that RU of lowlevel orientation information is already performed at the first cortical stage of visual processing, but not before that, by a fundamental cellularlevel property: the thresholded firing rate nonlinearity of simple cells in the primary visual cortex (V1). We identified specific, experimentally measurable parameters that determined the optimal firing threshold for RU and found that the thresholds of V1 simple cells extracted from in vivo recordings in awake behaving mice were near optimal. These results suggest that information reformatting, rather than maximisation, may already be a relevant computational goal for the early visual system.
https://doi.org/10.7554/eLife.43625.001Introduction
The visual cortex relies on a series of hierarchically organised processing stages to construct increasingly complex representations of the visual environment (Orban, 2008; Tafazoli et al., 2017; Ungerleider and Haxby, 1994). An important goal of this processing hierarchy in the ventral visual stream is to represent information about stimuli in a format that facilitates behaviourally relevant tasks, such as object recognition, identification, or classification. This reformatting of information has been called ‘representational untangling’ (RU; DiCarlo and Cox, 2007; Bengio et al., 2013) and it is often formalised via the concept of linear decodability. Linear decoding measures the extent to which a particular stimulus feature (such as the identity of an object) is explicitly encoded in a representation such that it can be accurately estimated based on a simple weighted sum of the activation of representational units (Bishop, 2006). For example, classification boundaries for different objects appear highly nonlinear and ‘tangled’ in the space of pixel intensities, or the space of retinal or primary visual cortical (V1) activities, thus preventing efficient linear decoding of object identity information. In contrast, in inferior temporal cortex (IT), these boundaries become untangled, making object identity information linearly decodable (DiCarlo et al., 2012). Critically, linear decodability of a stimulus feature does not only require that neural responses are modulated by this feature, but also that at the same time they remain tolerant to, that is depend only weakly or trivially on other, ‘nuisance’ stimulus features that can also change across stimuli. For example, while neural responses across the whole visual system, from the retina and V1 to IT, will change when different objects are presented, the activation of select IT cells shows tolerance to changes in 'nuisance’ parameters, such as illumination, location or scale (Brincat and Connor, 2004; DiCarlo and Cox, 2007; Ito et al., 1995; Logothetis et al., 1994; Tanaka, 1996; Vogels and Biederman, 2002).
The neural mechanisms underlying RU are largely unknown, with most previous work focussing on RU of object category information in IT, and how the cascaded nonlinear inputoutput transformations of the early stages of the visual hierarchy contribute to it (Hung et al., 2005; Pagan et al., 2013; Yamins and DiCarlo, 2016). In contrast, we study an elementary form of RU that is already taking place at the first stage of visual cortical processing, in V1, and uses a simple and ubiquitous property of single neurons: the firing rate nonlinearity (FRNL), that is the nonlinear transformation between a cell’s membrane potential and its instantaneous firing rate. In V1, there is broad agreement that an image feature that is explicitly represented is local orientation. Indeed, several studies investigated linear decodability of stimulus orientation directly from V1 firing rates (or spike counts) as a function of tuning curve properties (Ecker et al., 2011; Seriès et al., 2004; Seung and Sompolinsky, 1993; Shamir and Sompolinsky, 2006), noise correlations (Berens et al., 2012; MorenoBote et al., 2014), or the internal dynamics of V1 (Gutnisky et al., 2017). However, previous work did not examine membrane potential responses, and was thus not suitable for studying the specific contribution of the FRNL to linear decodability. Moreover, these studies only considered at most a single nuisance parameter (contrast), thus requiring only minimal RU to be carried out in neural responses.
In order to study the contribution of the FRNL to RU in V1, we directly compared the linear decodability of stimulus orientation from membrane potentials and firing rates, and considered some of the most prevalent nuisance parameters of visual stimuli: contrast (an elementary aspect of illumination), phase (location), and spatial period (scale). We found that nuisance parameters made the linear decodability of orientation information nontrivial: once this richer set of nuisance parameters was considered, membrane potential responses of a population of orientationselective cells remained highly tangled, and linearly undecodable. However, despite the obvious loss of total information caused by the rectifying aspect of the FRNL, which is due to all membrane potential values below the firing threshold being mapped to zero firing rate, the format of information in firing rates was more amenable to the linear decoding of orientation. This tradeoff between total information and linear decodability resulted in a clear optimum for the value of the firing threshold. In particular, the optimal firing threshold depended on a few key experimentally measurable parameters, and we confirmed in in vivo intracellular recordings that mouse V1 simple cells had their thresholds near the optimal value. These results suggest that RU may be a universal principle of organisation throughout the visual system, and it involves cellular as well as circuitlevel mechanisms.
Results
In order to study the RU of information about stimulus orientation in V1 responses, we adapted a canonical population coding model (Jones and Palmer, 1987) (Figure 1). We chose this model as it had the minimal complexity necessary for systematically studying the effects of specific singleneuron (e.g. FRNL threshold, membrane potential variability) and network parameters (e.g. population size, noise correlations) on the encoding of stimulus orientation in the face of other (nuisance) stimulus features affecting neural responses. More specifically, the membrane potential response of each model neuron was determined by the linear response of the neuronspecific oriented Gabor filter to the stimulus. These Gabor filters approximated the receptive field properties of simple cells, resulting in response characteristics that were modulated by the orientation, frequency, and phase of a static, full field sinusoidal grating stimulus (Dayan and Abbott, 2005). To model the in vivo variability of V1 membrane potential responses in awake animals (Haider et al., 2013), these membrane potentials were then subject to additive Gaussian noise independently sampled in time windows whose length (20 ms) was approximately matched to the decay time of the autocorrelation function of simple cells (Azouz and Gray, 1999). The firing rate of each cell was obtained using a rectifying nonlinearity that has been shown to capture the FRNL of simple cells (Carandini and Ferster, 2000; Dorn and Ringach, 2003; Priebe and Ferster, 2008). RU was quantified by the performance of a linear decoder which decoded stimulus orientation from membrane potentials or firing rates in the face of noise and variability in other (nuisance) parameters of the stimulus: phase, contrast, and spatial frequency (Figure 1, blue).
We studied RU as a function of parameters describing the stimulus distribution as well as parameters describing the neural population. As the complete parameter space of the model (including the detailed stimulus filter of every neuron) was vast, it was unfeasible to explore it fully. Thus, we focussed on a few key characteristics of our model neurons (Figure 1, red): the mean membrane potential (${u}_{\mathrm{D}\mathrm{C}}$) and depth of modulation (${u}_{\mathrm{A}\mathrm{C}}$), which were defined based on the membrane potential response to a drifting fullfield grating at 100% contrast, preferred orientation and preferred spatial period; noise variability ($\sigma $), which determined the magnitude of the noise injected into membrane potentials; and the threshold (${u}_{th}$) and exponent ($\kappa $) of the single neuron FRNL. At the population level, we studied the effects of population size ($N$); decoding resolution ($K$), defined as the number of different orientation categories to be decoded; and the magnitude of noise correlations ($\rho $) with a given structure, that is the correlation between the membrane potential noise of different cells in the population.
The effect of response rectification on representational untangling
The effects of noise and nuisance parameters on linear decodability can be understood by considering the binary discrimination of two orientations in a pair of neurons responding to stimuli with variable orientation (to be decoded) and phase (to which the decoder is required to be invariant) (Figure 2). This binary classification task (discrimination) generalises for multiclass cases that can be regarded as combinations of pairwise comparisons. In the model, filter responses depend both on the orientation and phase of the stimulus: at any particular orientation, variability in phase induces a manifold of responses (Figure 2A, coloured ellipses). Membrane potential responses are derived from filter responses but are contaminated by noise (Figure 2A, coloured dots), which introduces uncertainty even when phase is fixed and known. Nevertheless, as long as the phase of the stimulus is fixed (and the noise is not overwhelmingly large), all membrane potential values scatter around the same value for each orientation, creating well separated sets of joint responses so that orientation remains linearly decodable (Figure 2B, green line shows optimal classification boundary). However, variability in phase introduces a substantial amount of additional variability in responses along the corresponding manifolds which intersect multiple times. This causes the sets of membrane potential responses to become strongly overlapping (Figure 2C, coloured dots) and the optimal decision boundary to become highly nonlinear (‘entangled’, Figure 2C, green line), such that no linear decision boundary can approximate it efficiently. Thus, even the representation of orientation information by orientationtuned cells can become highly entangled in the presence of nuisance parameters.
Nonlinear transformations of variables can render even complex representations linearly decodable – an insight that underlies many pattern recognition algorithms (Bishop, 2006). Specifically, in our case, we focus on the rectifying aspect of the firing rate nonlinearity of neurons. This rectification effectively ‘removes’ a large part of the membrane potential response space thus letting the decoder operate only on the quadrant of superthreshold responses (Figure 2D). While this drastic removal of a large fraction of responses can clearly lead to severe total information loss (Barak et al., 2013), responses in the remaining quadrant may also become more linearly separable. This is because the density of manifold intersections generally decreases towards higher membrane potential values. In other words, the rectification only allows strongly responding cells to contribute to decoding, and the resulting sparsification of the representation will generally render it more linearly separable (Barak et al., 2013). This explains why the optimal decision boundary for firing rates remains well approximated by a line (Figure 2E), even with variability in stimulus phase (Figure 2F, the decision boundary of the optimal decoder is shown in light green while that of the best linear decoder in dark green). Thus, there is a tradeoff for RU between total information loss and sparsification controlled by the firing threshold, and this tradeoff becomes particularly acute in the face of nuisance parameter variations.
Orientation decoding from a population under phase variability
In order to study the tradeoff between total information loss and sparsification quantitatively, we parametrically varied the firing threshold in a population of $N=500$ neurons, each characterised by identical firing thresholds and noise but random receptive field parameters (Figure 3—figure supplement 1A). To simplify our analysis, we assumed a rectified linear FRNL (see below for FRNLs with exponents greater than one). At each threshold level, we compared the performance of two decoders: a linear decoder, and an optimal Bayesian decoder (see Materials and methods). The linear decoder was trained and tested on different subsets of data differing in the membrane potential noise added to the linear filter responses of model neurons. The training data set was sufficiently large that asymptotic test performance was achieved, therefore the performance of the linear decoder was limited solely by the properties of the stimuli and not by the amount of data. The optimal decoder was constructed with perfect knowledge of the process generating neural responses and thus did not need to be separately trained. As it used the optimal decision boundaries, the performance of the Bayesian decoder represented a theoretical upper bound on the performance of any decoder, and could also be used to quantify the total information content of responses (Materials and methods; Panzeri et al., 1999). Each decoder was tested with the phase of the stimulus kept fixed (Figure 3A, black: linear decoder, gray: optimal decoder) or being varied (Figure 3A, dark blue: linear decoder, light blue: optimal decoder).
The performance of the optimal decoder simply decreased monotonically as the threshold was increased (Figure 3A, gray and light blue). This was expected because the thresholding effect of the FRNL loses information in the subthreshold range (as in this range all membrane potential values are mapped to the same zero firing rate) while the superthreshold part of the FRNL (even if it is nonlinear) represents a onetoone mapping which does not change information content. This means that the net effect of the FRNL can only be information loss, with higher thresholds leading to larger information loss (Figure 3—figure supplement 2B). Therefore, as long as the functional objective of V1 was the maximisation of total orientation information transmitted to downstream areas (the socalled 'infomax' principle; Linsker, 1988; Bell and Sejnowski, 1997), one would expect to see low values of the firing threshold, clipping the membrane potential distribution as little as possible (Figure 3A, green shaded area).
In contrast to the simple monotonic decrease in total information, linear decoding performance showed a more complex, nonmonotonic dependence on the firing threshold. At the lowest values of the threshold, all membrane potential responses were superthreshold (Figure 3A, green histogram shows the distribution of membrane potential responses across all stimuli), and so decoding from firing rates was essentially equivalent to decoding from membrane potentials. Thus, as expected (Figure 2), linear decoding with variable stimulus phase was at chance (10%) at this extreme, that is it was unable to extract any information about orientation from the membrane potential responses of the population (Figure 3B, left blue bar). Correspondingly, the coefficients of the linear decoder did not bear any systematic relationship with the preferred orientations of neurons and the decoded orientations (Figure 3—figure supplement 1B). The failure of linear decoding was due to membrane potential responses fully reverting for stimuli with antipreferred phases. This meant that depending on stimulus phase, responses for the preferred orientation of a cell could be well above or below responses to nonpreferred orientations, thus violating the monotonic relationship that linear decoding requires between responses and the match of stimulus orientation to the preferred orientation of cells. For fixed stimulus phase, linear decodability was well above chance (~87%) even at low threshold values (Figure 3B, left black bar). Note that, unlike the case suggested in Figure 2, it did not reach the performance of the optimal decoder because we allowed orientation itself to vary within each $K=10$ discrete decoded orientation categories, making the optimal decision boundaries slightly nonlinear even at a fixed stimulus phase.
At extremely high values of the firing threshold, membrane potential responses always remained subthreshold, keeping firing rates zero at all times. Thus, all decoders performed at chance due to this total loss of information. Between the two extremes, as the threshold increased, there was a tradeoff between two opposing effects: the total information in responses decreased, as shown by the monotonically decreasing performance of the optimal decoder (Figure 3A, gray and light blue; see also above), while responses became increasingly sparser (see Materials and methods, Figure 3A, red), increasing the linear decodability of the remaining information (see Figure 2F). As a result of this tradeoff, linear decoding had a pronounced peak with an optimum at intermediate firing thresholds values, around ${V}_{m}=57$ mV in this case (Figure 3A, dark blue diamond; Figure 3B right blue bar), that is ~3 mV above the average membrane potential (Figure 3A). This optimal firing threshold was largely independent of the precise measure used to quantify performance, whether it was simply the fraction of correct responses used here and in the following, a statistically more appropriate probabilistic fraction correct measure, or a measure that also depended on the magnitude of the (circular) error between true and decoded orientation (Figure 3—figure supplement 2A). The success of linear decoding at the optimal threshold was also reflected in the patterns of decoding coefficients: as expected, they scaled with the difference between a neuron’s preferred orientation and the decoded orientation (Figure 3—figure supplement 1C). To assess the extent to which the FRNL threshold threshold was effective in helping RU using local image patches instead of grating stimuli covering a large portion of the visual field, we also performed the same analysis with a 500neuron population resembling a V1 hypercolumn, with receptive fields covering only a 3° circle, produced results similar to those obtained with fullfield stimuli. (Figure 3—figure supplement 3).
In order to see how much the tradeoff between total information loss and population sparseness could account for linear decodability, we computed the correlation between the actual performance of the linear decoder and the performance that could be predicted based on scaling the performance of the optimal decoder (FC_{opt}, indicative of total information, see above) by population sparseness: (FC_{opt}  chance) × sparseness + chance. We found a strong correlation across different values of the firing threshold between the actual and predicted performance of the linear decoder ($r=0.98$, Figure 3A, inset). For fixed stimulus phase, decision boundaries were generally more linear, thus the reduction of overall information dominated, which resulted in only a smaller peak in performance at around the same threshold value (Figure 3A, black diamond; Figure 3B right black bar).
Representational untangling of orientation is specific to V1
To show that the RU of orientation information is specific to V1 and does not occur at earlier stages of visual processing, we performed simulations with two other model neuron populations in which selectivities of individual neurons resembled that of neurons in the retina and the lateral geniculate nucleus (LGN). For this we used neurons that were sensitive to a single pixel in the stimulus (as a simple model of photoreceptor activations in the retina; Figure 3C, inset) or neurons characterised by centersurround receptive fields (modelling retinal ganglion and LGN cells; Figure 3D inset). For a fair comparison with our V1 population, we used the same number of cells, with the same set of receptive field locations, and the same amount of overall signal and noise variability in their membrane potentials.
While linear decodability was similarly high as from V1 responses without phase nuisance (Figure 3C–D, black bars; cf. Figure 3B, black bars), it was markedly different once phase nuisance was introduced (Figure 3C–D, blue bars; cf. Figure 3B, blue bars). Not only was orientation linearly undecodable from the membrane potentials of our model retinal and LGN populations (Figure 3C–D, MP blue bars) but, in contrast to the V1 population, it also remained undecodable from their firing rates, even with the best possible choice of the firing threshold (Figure 3C–D, FR blue bars; see Appendix 1 for an intuition).
Decoding with multiple nuisance features
Simple cells in V1 show mixed selectivity to a number of stimulus features beside orientation and phase. Thus, we extended our analyses to include two more nuisance features, spatial frequency (or its inverse, spatial period) and contrast, that are among the two strongest modulators of V1 responses (Hubel and Wiesel, 1968) and, in addition, they are analogous to size and illumination, which are in turn among the most commonly considered nuisance features for highlevel RU (Brincat and Connor, 2004; Ito et al., 1995; Vogels and Biederman, 2002). We tested linear decoding performance with all eight possible combinations of these nuisance features varying or being fixed. When varied, each feature was sampled from a probability density that was chosen to reflect the main characteristics of natural stimulus statistics (Figure 4A, Materials and methods). When fixing a nuisance feature, we chose a value that was near the mean of the natural distribution (Figure 4A, ticks on the xaxis).
Overall, the pattern of results was similar to that obtained with variability in phase only (Figure 4, black and blue for no variability in nuisance features, and variability in phase only, respectively, repeated from Figure 3A,B): more nuisance variability decreased performance (Figure 4C), and the tradeoff between total information and sparseness resulted in a peak in performance at intermediate threshold levels (Figure 4B, coloured lines). Interestingly, we found the most often studied nuisance parameter to cause the least amount of representational entanglement: variability in contrast affected linear decodability of membrane potentials the least (Figure 4C, red) because the response manifolds corresponding to changes in contrast were radial lines (as membrane potential responses simply scaled with contrast) which all intersected only at one point and thus created little additional ambiguity (not shown). Importantly, while there was a slight variation in the optimal firing threshold values that allowed maximal decoding performance (Figure 4B, coloured ticks), this variation was relatively small across the eight combinations of nuisance features we tested and remained largely unchanged when the receptive field parameters of the neurons were perturbed (Figure 4—figure supplement 1).
To test the robustness of these results to variations in stimulus statistics, we also varied the distributions of spatial frequency and contrast (Figure 5A–B). (We left the uniform phase distribution unchanged as it is unlikely that any realistic stimulus manipulation would lead to particular phases to be overrepresented.) By using the original parameter distributions (Figure 4A) to construct training stimuli for the decoder and different parameter distributions to test performance, this analysis provided a test of how well the optimal thresholds generalised across different stimulus distributions. We found that the firing thresholds that were optimal for the original stimulus distributions remained near optimal with these changed distributions: there was no discernible loss of performance compared to a decoder which was trained with the modified stimulus distributions (Figure 5C–D, bars vs. horizontal lines).
While it is orientation coding that is most often associated with simple cell activity in V1, all other stimulus parameters to which simple cells show selectivity can be regarded as targets for decoding. Thus, to further test the robustness of our findings, we also studied the linear decodability of the spatial period or contrast of stimuli, while letting all other (nuisance) stimulus parameters (including orientation) vary (see Materials and methods). We found that the linear decodability of both stimulus parameters had a similar firing thresholddependence as that of orientation, with a distinct peak performance close to the optimal threshold for orientation decoding (Figure 4—figure supplement 2; cf. Figure 4, grey).
Parameterdependence of the optimal firing threshold
We noted that for most model settings we studied so far (identity and distribution of fixed vs. variable nuisance stimulus features) there was a clear optimum for the firing threshold which remained roughly constant across all conditions (Figure 4). In contrast, simple scaling arguments predicted (Materials and methods) that performance should fundamentally depend on specific combinations of single cell parameters (see Figure 2B). In particular, it should depend on how much of the noise variability ($\sigma $) versus signal variability (controlled by the depth of modulation, ${u}_{\mathrm{A}\mathrm{C}}$) in membrane potentials is 'removed’ (i.e. mapped to zero firing rate) by the thresholding of the FRNL. In turn, the optimal threshold (${u}_{th}^{opt}$) should shift with the mean membrane potential (${u}_{\mathrm{D}\mathrm{C}}$), and scale when noise and signal variance are jointly scaled. We tested these predictions by varying either the depth of modulation (Figure 6A, × symbols) or the noise variance of membrane potentials (Figure 6A, + symbols) and confirmed that in all these cases the appropriately normalised optimal threshold, $({u}_{th}^{opt}{u}_{\mathrm{D}\mathrm{C}})/{u}_{\mathrm{A}\mathrm{C}}$, showed the same, approximately linear, relationship with the noisetosignal ratio, $\sigma /{u}_{\mathrm{A}\mathrm{C}}$ (Figure 6A).
To further test the robustness of this result, we systematically varied a number of other model parameters: the number of neurons in the population ($N$, Figure 6B), the number of orientation categories to be decoded ($K$, Figure 6C), the magnitude of noise correlations in the population ($\rho $, Figure 6D), the superthreshold shape (exponent) of the FRNL of neurons ($\kappa $, Figure 6E). Wherever possible, we varied these parameters around experimentally found values. For example, average noise correlations measured in spike counts in awake V1 are typically reported to be less than 0.1 (Ecker et al., 2011) which imply average membrane potential correlations (which we are modelling here) in the range of 0.06–0.15 (Bányai et al., 2017) (Figure 6D). The average exponent of the suprathreshold part of the FRNL of V1 simple cells was found to be ~1.2 (Carandini, 2004) (Figure 6E). For other parameters, those controlling the size of the population (Figure 6B), and the resolution of the estimation task (Figure 6C), experimentally validated data was not available and so we varied them severalfold to ensure our results remained robust to them. In all cases, we measured linear decoding performance while varying phase as the nuisance stimulus feature (as in Figure 4B, blue).
We found that a clear optimum existed for the firing threshold in all cases (Figure 6—figure supplement 1). Although peak performance could depend strongly on some model parameters (Figure 6—figure supplement 1), the value of the optimal threshold at which it was achieved was largely independent of these parameters (Figure 6). Specifically, as expected, peak performance increased with the number of cells in the population (Figure 6—figure supplement 1A) and decreased with the number of orientation categories to be decoded (Figure 6—figure supplement 1B), but, the scaling of the optimal threshold with the noisetosignal ratio remained invariant to either parameter (Figure 6B–C). Both peak performance (Figure 6—figure supplement 1C) and the optimal threshold (Figure 6D) were only weakly affected by increasing correlations among the responses of the neurons (either uniformly across the population, or such that they had a specific ‘information limiting’ structure; Figure 6—figure supplement 2, see also MorenoBote et al. (2014) and the Appendix 1). The only other parameter that had a substantial effect on the optimal threshold (but not on peak performance, Figure 6—figure supplement 1D) was the exponent of the FRNL (Figure 6E).
V1 simple cells have near optimal firing thresholds
The finding that the optimal firing threshold depended on only a handful, mostly directly measurable parameters of cellular responses allowed us to test experimentally whether the FRNL of simple cells supports RU in V1. For this, we studied the example of linear decoding of orientation with phase as a nuisance parameter. First, we estimated the mean, the modulation depth (both in mV) and the noise variance (in mV^{2}) of membrane potential responses, as well as the threshold of the FRNL (in mV) from intracellular recordings of V1 simple cells in awake mice viewing drifting fullfield sinusoidal grating stimuli (i.e. with phase changing systematically; Materials and methods, Figure 7A–B, Figure 7—figure supplement 1). We then constructed model neuron populations with matching membrane potential response properties (using the experimentally measured mean, modulation depth and noise variance parameters) and for each model population computed the optimal threshold. Given our results on the parameter dependence of the optimal threshold (Figure 6), in order to compare experimentally measured and optimal threshold values, we expressed both in normalised units (subtracting the mean membrane potential and dividing by modulation depth) and plotted them as functions of the noisetosignal ratio (the standard deviation of noise divided by modulation depth). As our data did not allow the reliable estimation of the precise value of the exponent of the FRNL (Figure 7B), we expressed our predictions for each value of the noisetosignal ratio as the set of normalised threshold values that would result in at least 90% peak performance at any value of the exponent within the range of exponents (between 1 and 2) that earlier reports considered realistic (Carandini, 2004) (Figure 7—figure supplements 2–3). We found that the firing thresholds determined experimentally were in this nearoptimal robust performance regime in all cases despite large differences in individual parameters across the cells we recorded (Figure 7C; blue circles). Randomly swapping parameters across cells revealed that the incidence of measured thresholds in the robust performance regime was significant (Figure 7C inset, black dots; permutation test, p=0.01), suggesting that the nearoptimal thresholds we found in the actual cells required specific cotuning of these parameters.
These analyses also offered a way to directly compare our hypothesis that V1 is optimised for RU to the classical infomax hypothesis that V1 is optimised for total information transmission. Recall that infomax predicts that the optimal firing threshold is as low as possible. In principle, this optimum is well below the average membrane potential, which appears to be in contradiction with our experimental data that showed firing thresholds clearly above the average membrane potential (Figure 7C, normalised thresholds are all above 0). However, we also found that total information remained at ceiling for substantially higher values of the firing thresholds (Figure 3, optimal decoder), raising the possibility that infomax may also be able to account for our data. Thus, we followed the same approach as for RU, and rather than concentrating on the unique optimal threshold, which was difficult to define without knowing the precise value of the FRNL exponent, we identified the robust performance regime as the set of normalised threshold values that would result in at least 90% maximal total information at each value of the noisetosignal ratio, and across the whole regime of realistic FRNL exponents (Figure 7D, grey region). Only 3 out of 4 of our recorded cells were in this regime (Figure 7D, blue circles), and given the breadth of the robust performance regime extending to infinitely low thresholds, this was not significant by using the same permutation test as in the case of RU (p=0.54). Thus, in contrast to RU, the specificity of the infomax hypothesis was too limited to be able to convincingly account for the data.
The effect of population heterogeneity
The results above were obtained assuming a homogeneous population of neurons which only differed in their receptive field locations, preferred orientations and phases, but had otherwise identical parameters. As population heterogeneity can have an important influence on neural coding (Ecker et al., 2011; Shamir and Sompolinsky, 2006), we also studied heterogeneous populations. In particular, we were wondering whether our finding of nearoptimal thresholds in individual cells (Figure 7) would be representative even if those cells were part of a heterogeneous population. Therefore, we constructed a population in which noise variance (${\sigma}^{2}$) and FRNL exponent ($\kappa $) were varied across cells in experimentally found ranges (Carandini, 2004). The FRNL threshold of each individual cell was then set to a value that would have been optimal in a homogeneous population in which all cells had the same parameters (${\sigma}^{2}$ and $\kappa $) as that single cell. (That is, we only optimised thresholds ‘locally’ for each neuron, rather than attempting to find the globally optimal combination of thresholds for the population of cells). We then randomly varied the thresholds of all cells around their respective locally optimal values and asked whether there were parameter combinations which yielded better performance. We found that modifying the thresholds generally resulted in deteriorating performance, such that decoding performance was highest near the original locally optimal thresholds (Figure 8). Thus, the thresholds found to be optimal based on the assumption of a homogeneous population still provided high performance in more realistic, heterogeneous populations. This result also suggests that overall (‘global’) optimality of population decoding performance might be achieved by some optimization rule acting locally on the firing threshold of each neuron separately, without needing information about the cellular parameters of other neurons, which would be difficult to obtain by biologically plausible mechanisms.
We also studied the effect of additional forms of heterogeneity in individual receptive field properties (size, noncircularity or inclination, preferred spatial frequency) by adjusting these parameters to approximately match those found in experiments (Jones and Palmer, 1987) and found that this also had only a small influence on performance, or the optimal threshold (data not shown).
Discussion
We have shown that the firing rate nonlinearity (FRNL) of V1 simple cells contributes to the representational untangling (RU) of orientation and other lowlevel visual information. While decoding performance is traditionally considered to be limited by the ‘noise’ variability in neural responses (Berens et al., 2012; Chen et al., 2006), our focus on RU warranted that we explicitly took into account another, oftneglected source of response variability: that due to variability in nuisance parameters of the stimulus, such as phase, contrast, and spatial period. We have quantified RU by the linear decodability of membrane potentials or firing rates, and found that despite the obvious (and substantial) information loss entailed by the FRNL, sparsification of responses made the format of information in firing rates more amenable to linear decoding. Our analyses suggested this effect to be specific to V1 as it did not arise in model populations of retinal or LGN cells with nonoriented receptive fields. We also found that the value of the FRNL threshold that struck the optimal balance between sparsifying responses and preserving information was robust to variations in the identity of decoded and nuisance features, and in fact most other model parameters, and depended only on a few, experimentally welldefined local (as opposed to populationwide) quantities characterising the responses of individual cells. An analysis of intracellular recordings of mouse V1 simple cells showed that the thresholds of these cells were near optimal for RU despite substantial variability in their cellular parameters. In comparison, an alternative computational objective that is often considered to be relevant for V1, information maximisation, was unable to specifically account for these data. These results suggest that the FRNL of V1 simple cells may be specifically adapted to support the RU of orientation information.
Visual processing in ecologically relevant regimes
Although the evolutionary objective of the visual system is to maximise performance on natural images, we used highly simplified, fullfield sinusoidal gratings as stimuli (but note that natural image statistics were taken into account in the choice of the distribution of nuisance parameters; Figure 5A). Our choice for artificial stimuli was motivated by a number of factors. First, it allowed our results to be directly compared to a large swathe of the theoretical and experimental literature that used the same stimuli (Ecker et al., 2011; Seung and Sompolinsky, 1993; Shamir and Sompolinsky, 2006; Berens et al., 2012; Gutnisky et al., 2017). Second, it also allowed us to show that nuisance parameters, rather than the traditionally studied factors of singleneuron variability or noise correlations, are the main bottleneck for decoding (and that this bottleneck is at least partially alleviated by the firing rate nonlinearity) even for the same simple stimuli that previous studies have used, without considering the whole complexity of natural images. Third, more complex stimuli will recruit mechanisms based on lateral and feedback connections (eg. those responsible for extraclassical receptive field effects) that the network architecture we used here cannot capture. However, our main results remained essentially unchanged when we considered a ‘hypercolumnar’ population representing local rather than global orientation (ie. such that all cells had their receptive fields in the same location; Figure 3—figure supplement 3). Importantly, inasmuch as our model represents an appropriate approximation of at least such a hypercolumnar population, the content of natural images outside this (classical) receptive field location will not affect the decoding of the content at this location, and thus these results should generalise to natural stimuli.
In general, the performance of any (but the optimal) decoder depends on the amount of data used to train it. Indeed, we often need to make decisions based on only a few training examples  something that biological learning systems excel at (Lake et al., 2015). However, in contrast to highlevel cognitive tasks requiring flexible decision making, it is reasonable to expect that the decoding of lowlevel visual features in an early visual area, such as V1, has been optimised on evolutionary time scales and would thus not be limited by the amount of data experienced over the lifetime of an individual. Therefore, in all cases, we trained our linear decoders with sufficient amounts of data so that to achieve asymptotic performance. This meant that we mostly tested generalisation performance only for new instances of membrane potential noise, not for new stimuli (but see Figure 5 for generalisation to new distributions of stimuli.) Nevertheless, this approach also allowed us to demonstrate that even in the limit of infinite training data, nuisance parameters represent a fundamental challenge for RU that can be mitigated by the appropriate firing rate nonlinearity.
Decoding from firing rates versus spike counts
In previous work, decoding was often performed from spike counts rather than firing rates (Berens et al., 2012; Pitkow and Meister, 2012; Seung and Sompolinsky, 1993; Shamir, 2014; but see Abbott and Dayan, 1999; Ecker et al., 2011; Shamir and Sompolinsky, 2006). The transformation between the two entails further information loss due to the discrete nature of spike counts and potential additional (Poisson) stochasticity in them, with the magnitude of this information loss depending on the time window used for counting spikes. However, in the limit of large time windows or large populations, variability in firing rates due to nuisance parameter variability dominates over the effects of spiking variability. While the relevant time window for decoding may depend on the ecological situation and the specific task an animal is facing, the size of the population is likely large enough to allow the effective averaging out of spiking variability. Importantly, we have also shown that population size only scales overall performance but does not affect the value for the optimal threshold (Figure 6—figure supplement 1). Moreover, the Gaussian variability in membrane potentials with deterministic conversion to firing rates we assumed, combined with a deterministic spike generation process, has been shown to result in spike count variability that is phenomenologically similar to classical Poisson spike count models (Carandini, 2004) and in fact matches experimentally observed stimulus (orientation and contrast) dependent changes in spike count (co)variability better (Bányai et al., 2017). Thus, we expect our results to generalise to spike count decoding from large experimentally recorded populations.
Noise correlations
We found an increase, albeit relatively small, in decoding performance with an increase in noise correlations (Figure 6D, showing results for uniform noise correlations – similar results, not shown, were obtained with other correlation structures). Although this may at first seem counterintuitive (correlations imply redundancy), it is well known that the effects of noise correlations depend on their relation to the tuning of cells (Averbeck et al., 2006; Lin et al., 2015) and they generally increase linear Fisher information for the particular (tuningindependent) pattern of correlations we studied (Abbott and Dayan, 1999). As expected, information limiting correlations decreased the performance of both the optimal Bayesian decoder and of the linear decoder, such that the efficiency of the linear decoder relative to the optimal decoder became higher  that is the resulting code was relatively more linearly decodable. While other patterns of correlations may result in a decrease of performance, noise correlations in V1 tend to be small overall (Ecker et al., 2011). Moreover, as we argued above, effective noise correlations will be dominated by correlations induced by nuisance parameter variability in the settings we studied, and so the effects of ‘standard’ noise correlations are likely to be diminishing.
Complex cells in V1
Orientation selectivity is a central feature of V1 neurons (Hubel and Wiesel, 1968). We argued that the mixed selectivity of neurons affects the linear decodability of stimulus information adversely: if neurons are selective to additional stimulus features then variations in these will likely cause representational entanglement. Stimulus phase is particularly prone to causing representational entanglement of orientation information if neuronal responses are jointly modulated by both phase and orientation. Importantly, the level of phase selectivity greatly varies across neurons (Niell and Stryker, 2008; Skottun et al., 1991). For a neuron that is sensitive to orientation but not to phase, a socalled complex cell, variability in stimulus phase is not detrimental and decodability remains intact. It is unclear if the FRNLbased mechanism of RU contributes to the emergence of such complex cells, but there is a suggestive correspondence between the canonical model of complex cells and the architecture we studied. According to the canonical model of complex cell responses, these responses are brought about by a specific pooling of simple cell responses. Intriguingly, the mathematical form of this pooling is essentially isomorphic to the linear decoder we studied here: it takes a linear combination of the nonlinearly (typically quadratically) transformed responses of a number of simple cells differing in their preferred phases (Hubel and Wiesel, 1968) and potentially other receptive field properties (Rust et al., 2005). This suggests that the same principles that we found determine the optimal firing threshold of simple cells for an abstract linear decoder may also determine the optimal firing threshold of simple cells for the efficient operation of complex cells. Furthermore, our simulations show similar detrimental effects for nuisance parameters other than phase, including spatial period. For these other nuisance parameters, complex cell properties have limited capacity to prevent the detrimental effect of entanglement. We argue that the FRNL provides a surprisingly effective solution for the more general problem of decoding under nuisance parameter uncertainty.
The computational role of the FRNL
We have shown that the FRNL has an important computational role in RU. Previous work implicated the FRNL of V1 cells in achieving contrastinvariant orientation tuning curves in V1 simple cells (Finn et al., 2007). This effect can also be understood as a spacial case of a mechanism promoting RU: contrastinvariant tuning curves contribute to more efficient contrastinvariant decoding of firing rates by ensuring that firing rates are simply scaled by contrast and so decision boundaries for orientation decoding (Figure 2) remain radial and thus linear when contrast varies (Ma et al., 2006). However, contrast is but one of several nuisance parameters whose variability makes RU of orientation information challenging in V1, and as we have shown, other nuisance parameters (phase and spatial period) have even more dramatic effects (Figure 4). Our results thus extend previous work by placing contrast invariant tuning curves in the wider context of RU and showing that the FRNL of simple cells plays a general role in keeping orientation information linearly decodable in the face of variability in a number of nuisance parameters.
The FRNL has been shown to contribute to performing linear classification on arbitrary mappings of a set of variables (Barak et al., 2013). In such tasks, no single input feature alone can be used for solving the task by linear readout. Thus, mixed selectivity (i.e. the property that neuronal responses depend on multiple stimulus attributes, Asaad et al., 1998; Churchland and Shenoy, 2007; Warden and Miller, 2010) was shown to be advantageous and even necessary (e.g. in higherorder association cortices during tasks requiring cognitive flexibility, Rigotti et al., 2013). In contrast, in the standard task of orientation decoding in V1 considered here, an ‘orderly’ inputoutput mapping is required, in which one input feature needs to be mapped to the output monotonically. For this task, the mixed selectivity of neurons is less of a blessing: if neurons had pure selectivities for orientation such that their responses did not depend on any other stimulus parameters then variation in nuisance parameters would not lead to representational entanglement. Moreover, earlier work on the effects of response sparsification on linear decodability studied a network of abstract binary neurons (Barak et al., 2013) and thus could not relate sparseness to a biophysically welldefined firing threshold as we did. Our results generalise the utility of the FRNL, showing that it pertains even to an elementary sensory decoding task.
Given that the main direct effect of the FRNL is the sparsification of neural responses (Figure 3A, red), one might then intuitively reason that setting the FRNL threshold to very high values to achieve ultrasparse codes could increase linearly separability even more. In this limit, each image would be coded as a onehot population response vector. Although onehot population coding achieved by ultrasparse codes has appealing theoretical properties, it fundamentally relies on assuming no noise, and requires an exponential number of neurons (in the number of nuisance parameters). In fact, in the (unrealistic) limit of no noise, the question of an optimal threshold even becomes somewhat moot as essentially all thresholds above a minimum will perform equally well (essentially perfectly)  as the broadening of the robust performance regime towards low values of the noisetosignal ratio in Figure 7C (and Figure 7—figure supplement 3) also suggests. Moreover, we expect ultra sparse coding to be particularly sensitive to contrast as a nuisance parameter (as the correct threshold for achieving a onehot code will critically depend on the overall scaling of responses which in turn depends monotonically on contrast, such that selecting a single optimal threshold is impossible). Importantly, the conditions for ultrasparse codes are also unlikely to be met in real V1, for example the experimentally measured levels of noisetosignal ratio in our V1 data were well above 0 (Figure 7). Indeed, at these realistic noise levels, we found that increasing the number of neurons in the population did not favour higher thresholds which could have potentially led to such ultra sparse codes (Figure 6B).
The FRNLinduced increase in sparseness has also been shown to contribute to increasing mutual information between visual stimuli and the responses of retinal ganglion cells (Pitkow and Meister, 2012), such that there was an optimum for the FRNL threshold at intermediate values. Although our results may superficially suggest a similar interpretation, they are in fact orthogonal. It is important to note that, in our case, the performance of the optimal decoder (the analogue of mutual information measured by Pitkow and Meister, 2012) was a monotonically decreasing function of the FRNL threshold, without an optimum at intermediate values. The difference is due to the fact that Pitkow and Meister (2012) modeled the effects of the FRNL threshold in a regime in which spiking noise dominated. Specifically, they studied small populations of neurons ($N$ ≤ 8) and kept the average firing rate of neurons constant (by adjusting their peak firing rate) as the FRNL threshold was varied. This meant that a decrease in the threshold in their setting led to sustained firing with low spike counts, which were associated with high relative variability, thus diminishing information in the low threshold regime. In contrast, we considered a large population of neurons in which spiking noise is less relevant (and thus decoding performance does not depend on the overall scaling of firing rates) and instead the effects of nuisance parameters dominate (see above). Indeed, in line with our results, Pitkow and Meister (2012) also found that increasing population size shifted the value of the FRNL threshold at which total (mutual) information was maximised towards smaller values, such that the dominant effect was now a decrease in total information for higher thresholds. Thus, the optimal intermediate value for the FRNL threshold we found emerged for a fundamentally different reason: because we measured linearly decodable information rather than total information, which is brought about by a tradeoff between total information and sparseness. Taken together, these results suggest that the FRNL threshold may play an important role in neural computations at different stages of sensory processing via different mechanisms: by maximising total information transmission in the retina, and by achieving RU in the visual cortex.
Materials and methods
Population model of MP responses
Request a detailed protocolThe default population model for encoding stimuli consisted of $N$ = 500 simple cells whose membrane potential responses were established by calculating circular Gabor filter responses plus Gaussian noise (Figure 1B). Each circular Gabor filter (indexed by n) is described by six parameters: the coordinates of the center of the filter measured from the line of sight (${x}_{n}$,${y}_{n}$), the spatial period of the plane wave component of the Gabor filter ($\lambda $), the orientation of the sinusoidal component (${\theta}_{n}\in [0,{180}^{\circ})$), the phase offset of the sinusoidal component relative to the center of the filter (${\phi}_{n}\in [0,{360}^{\circ})$), standard deviation of the circular Gaussian envelope ($\delta $). All angles are measured in degrees, retinal distances (coordinates, spatial period, envelope width) are measured in degrees of visual angle. Numerical values of the above filter parameters were chosen according to Table 1. In the default model, spatial period and envelope width were identical across the population and identical parameters were defined to be approximately equal to the empirical average from Jones and Palmer (1987). Filter locations were uniformly sampled in the whole visual field (90 degrees), which ensured that not only local stimulus effects were considered. Preferred orientations and phases of Gabor filter were uniformly sampled from the entire range of possible values (Figure 3—figure supplement 1A). Code used for simulation is available on GitHub (Gáspár, 2019; copy archived at https://github.com/elifesciencespublications/representational_untangling).
In order to limit simulation time and eliminate discretisation noise, analytical filter responses are calculated in response to sine wave stimuli. A strictly finite size realistic receptive field requires truncation of the receptive field but such a truncation would prevent the analytical calculation of filter responses, therefore only the standard exponentially decaying Gaussian envelope of the Gaborfilter is supposed to keep the filter localised. Filter response of a circular Gabor filter with infinite domain to an infinite sine wave stimulus when spatial period of the filter and the stimulus are supposed to be identical is given by:
Here response is shifted and scaled such that the predefined value $u}_{\mathrm{D}\mathrm{C}$ matches the phase averaged DC component and $u}_{\mathrm{A}\mathrm{C}$ matches the amplitude of the phase modulated AC component at the preferred orientation at 100% contrast level (for derivations in the case of unequal spatial periods see the Appendix 1). In the above expression $\lambda $ is the common spatial periods; $\vartheta $: grating stimulus orientation; $\varphi $: stimulus phase relative to the line of sight;
a phase term belonging to the filter location. A similar expression can be obtained for more general settings of a grating stimulus (not shown). Deviation of the above analytical response from the response of a truncated Gabor filter (to a local stimulus) is not substantial. Subscript indices of filter parameters, identifying a particular filter of the population are not shown for clarity.
Parameters of the Gabor filters determine the mean response characteristics of a model neuron: the mean membrane potential was assumed to be equal to the linear filter response to a stimulus. As a consequence, mean responses of individual neurons to oriented grating stimuli can be characterised by a tuning curve with peak response corresponding to the preferred orientation of the neuron. The response of a neuron was the sum of the mean response and a Gaussian noise. The noise, however, was not necessarily independent: in some experiments (Figure 6D) correlation between membrane potential responses was introduced by using multivariate Gaussian noise to determine the responses of the complete population. Noise was assumed to have no temporal structure beyond a limited time window, therefore samples were considered to be iid samples in 20 ms time bins. Amplitude and contrast parameters of stimuli are kept constant throughout the simulations, filter responses are scaled and shifted to match empirical neural responses, here characterised by $u}_{\mathrm{D}\mathrm{C}$ and $u}_{\mathrm{A}\mathrm{C}$ (Table 1). Noise level was set to be 25% relative to the signal variance (matched to Carandini, 2004 example simple cell).
Population models for LGN and retina responses
Request a detailed protocolTo study the potential contribution of FRNL to RU at earlier stages of the visual processing hierarchy, we defined LGN and retina models by altering the filter properties of the encoding population. Retina receptive fields were approximated by pixel responses, while LGN receptive fields were approximated by difference of Gaussian filters (DoG, Dayan and Abbott, 2005). In order to be able to use analytic calculations derived for Gabor filters, we approximated DoG responses as the difference of two constrained Gabor filters. The sinusoidal component of the Gabor filter was modified such that the phase relative to the Gabor center (${\phi}_{n}$) was zero and the spatial period was increased so that within the Gaussian envelope there was no practical modulation. The sizes of the central ONregion and peripheral OFFregion were set to 1.5° and 4°, respectively. To be able to contrast retina, LGN and V1 analyses, filter positions were matched to those of the V1 population and retina/LGN filter responses were calibrated such that stimulus variance of the model neurons were equal to their corresponding neurons in V1.
Mapping of membrane potentials to firing rates
Request a detailed protocolThe nonlinear transformation from membrane potential to firing rate is described by the firing rate nonlinearity (FRNL). A general thresholdpowerlaw function, $r\left(u\right)=\Phi {\left[u{u}_{th}\right]}_{+}^{\kappa}$, is used to simulate simple cell firing responses that has empirically been found to fit simple cell responses well (Carandini, 2004). In this expression ${\left[.\right]}_{+}$ indicates rectification. The specific value of the scaling factor $\Phi $ does not affect results. The value of the powerlaw exponent for the simulations was $\kappa =1$ (except Figure 6 and Figure 6—figure supplement 1) and the range of possible physiological values (Carandini, 2004) was explored when fitting membrane potential recordings (Figure 7). Parameters of the FRNL are assumed to be identical across cells (but see Figure 8). Since nonlinearity is central to our analysis FRNL threshold was varied in the simulations in order to study its effect on decoding performance.
Linear decoder
Request a detailed protocolA singlelayer decoder was used to perform probabilistic linear decoding of orientation information from the stimulus. The decoder represented $K$ classes and was performing multinomial logistic regression by assigning probabilities to the represented classes. Weights, which could take both positive and negative values, were tuned to be optimal by supervised learning on a set of static training stimuli (Figure 3—figure supplement 1B,C). In the training data set multiple values of the decoded parameter belonged to any given class (${M}_{\vartheta}$ in the case of orientation). Training and testing was performed on different stimulus sets, and training and testing samples differed in membrane potential noise. Wherever nuisance parameters were present, parameters of training images were sampled from the distribution characteristic of the nuisance parameter(s). Parameters of individual test images were sampled from the same distribution, except for Figure 5 where a different parameter distribution was used for one of the nuisance parameters. Stimulus parameter distributions were constructed such that these approximated the characteristics of natural images. Fitting of the weight parameters of the decoder was performed in MATLAB using a custom code that uses the BarzalaiBorwein method to perform gradient descent (Barzilai and Borwein, 1988; Gáspár, 2019). Training stimulus parameter space is uniformly covered by a 2D grid in case of phase uncertainty, and the same grid is used to test the decoder. ${M}_{rep}$ (and ${M}_{rep\text{'}}$) describes the number of repetitions with newly generated noise to generate the whole training (or testing) stimulus data set (Table 1). For any given nuisance parameter that is characterised by nonuniform prior (see Figure 4) a distorted multidimensional grid is used to generate the stimulus bank. For spatial period, a lognormal distribution with parameters $\mu $ = 0.95, $\sigma $ = 0.55 was used. For contrast, a betadistribution was fitted to local contrast distribution based on a small dataset containing 24hour timelapse images, resulting in parameters $\alpha $ = 2.4, $\beta $ = 3.6.
The resulting decoder provided class probabilities, and the maximum class probability was used to indicate the decision. We tested the robustness of our claims by using alternative methods for decision. These analyses revealed variations in the measured level of decoding performance but the optimal threshold was invariant to the choice of performance measure (Figure 3—figure supplement 2A). The efficiency of the decoding is measured by fraction correct.
Optimal decoder
Request a detailed protocolThe optimal decoder was constructed by inferring class labels from data by explicitly inverting the process of the generation of output from the population of neurons. Membrane potential responses of individual neurons for a grating stimulus with a particular orientation and phase was described by normal distribution
where ${g}_{n}(\vartheta ,\phi )$ is the Gaborfilter response and ${\sigma}^{2}$ is the level of membrane potential noise. Likelihood of observing a particular firing rate response is
where ${{p}_{n}}^{0}(\vartheta ,\phi )$ is the baseline probability of zero firing rate response, $H\left(\right)$ is the Heaviside function and ${\rho}_{n}\left({r}_{n}\right\vartheta ,\phi )=p\left({u}_{n}\right\vartheta ,\phi )(dr/du{)}^{1}$. Assuming a discrete set of stimulus phases, the posterior probability of orientation ${\vartheta}_{k}$ is
Priors, $p\left(\vartheta \right)$ and $p\left(\phi \right)$, were chosen to be uniform.
An optimal decoder was derived when ILC was assumed to contribute to the membrane potential noise. We constructed the decoder by explicitly modelling how ILC is introduced: ILC can be modelled by assuming that the value of the decoded variable (the orientation in our case) is not constant but changes stochastically. We assumed that the orientation of stimuli, $\vartheta $, are sampled from a normal distribution around the true value, $\Theta $:
In order to obtain the posterior probability for the true orientation, a marginalisation over the stochastically sampled orientations beyond the marginalisation over phases is required:
Contrast and spatial period decoding
Request a detailed protocolOrientation decoding is a standard measure of stimulus encoding in V1 simple cells but the joint selectivity to a number of other variables, including contrast and spatial period means that the contribution of FRNL to decoding these variables can also be directly tested. Diversity of the filter properties of the encoding population is crucial for efficient decoding. When studying orientation decoding properties of the population we ensured diversity in filter orientation and phases but used identical spatial period sensitivities across the population. When studying spatial period and contrast decoding, we constructed a population which represented the spatial period characteristics of the stimuli by matching the distribution of period of the filters by sampling the spatial periods from the period distribution of the stimuli. Such an encoding population was used in all of the analyses in Figure 4—figure supplement 2. Here, after choosing a particular parameter for decoding, all other stimulus parameters were regarded as nuisance parameters. Classes for the decoded stimuli were established by partitioning the decoded parameters according to the cumulative distribution of the stimuli: stimulus classes were defined as the centers of the partitions containing equal probability mass.
Infomax model
Request a detailed protocolThe objective of the infomax model is the optimisation of information transmission with respect to the decoding variable, which is formulated through the mutual information:
where the first term on the right hand side is the conditional entropy of the class label given possible responses and the second term is the entropy of the category distribution, the index k runs through the different orientation classes. Note that the conditional probability under the logarithm in the first term is the posterior of the class label. The second term in Equation 8 does not depend on the response distribution therefore it is not affected by changes in firing properties of the neurons, including the firing threshold.
The probabilistic fraction correct (PFC) performance measure,
where m runs through the trials of the experiment, is in an intimate relationship with the mutual information (Equation 8). This can be seen by taking the logarithm of PFC is
which corresponds to a Monte Carlo approximation to the integral in the conditional entropy in Equation 8 since in individual trials stimuli and population responses to the particular stimulus can be regarded as samples:
This indicates that the PFC performance measure is an approximation to exponent of the mutual information, up to a scaling constant that is determined by the entropy of the stimuli (Figure 3—figure supplement 2B).
Experimental subjects and surgical procedures
Request a detailed protocolAll experimental procedures were approved by the University of California Los Angeles Office for Protection of Research Subjects and the Chancellor’s Animal Research Committee. 1–12 month old C57Bl6/J mice underwent implantation of headbars, surgery recovery, acclimation to the spherical treadmill and craniotomy over V1 as described in Polack et al. (2013). A 3 mm diameter coverslip drilled with a 500 μm diameter hole was placed over the dura such that the coverslip fit entirely in the craniotomy and was flush with the skull surface.. The coverslip was maintained in place using Vetbond and dental cement and the recording chamber was filled with cortex buffer containing 135 mM NaCl, 5 mM KCl, 5 mM HEPES, 1.8 mM CaCl2 and 1 mM MgCl2. The headbar was fixed to a post and the mouse was placed on the spherical treadmill to recover from anesthesia. All recordings were performed at least 2 hr after the end of anesthesia.
Electrophysiological recordings
Request a detailed protocolTwophoton guided invivo wholecell recordings were performed as described in Polack et al. (2013). Longtapered micropipettes made of borosilicate glass (1.5 mm outer diameter, 0.86 mm inner diameter, Sutter Instrument) were pulled on Sutter Instruments P97 pipette puller to a resistance of 3–7 MΩ, and filled with an internal solution containing 115 mM potassium gluconate, 20 mM KCl, 10 mM HEPES, 10 mM phosphocreatine, 14 mM ATPMg, 0.3 mM GTP, and 0.01–0.05 mM Alexa594. Wholecell currentclamp recordings were performed using the bridge mode of an Axoclamp 2A amplifier (Molecular Devices), further amplified and lowpass filtered at 5 kHz using a Warner Instruments amplifier (LPF 202A). Series of current pulses of small intensity (typically −100 pA) were used to balance the bridge and compensate the pipette capacitance. The membrane potential was not corrected for liquid junction potentials (estimated to be about 10 mV).
Visual presentation
Request a detailed protocolVisual stimuli were presented as described in Polack et al. (2013). A 40 cm diagonal LCD monitor was placed in the monocular visual field of the mouse at a distance of 30 cm, contralateral to the craniotomy. Custommade software developed with Psychtoolbox in MATLAB was used to display drifting sine wave gratings (single orientations at the preferred orientation, temporal frequency = 2 Hz, spatial frequency = 0.04 cycle per degree, contrast = 100%). The presentation of each orientation lasted 3 s, which was preceded by the presentation of a gray isoluminant screen for an additional 3 s.
Data processing of intracellular measurements
Request a detailed protocolMP changes of V1 simple cells induced by drifting grating stimulus were fitted with three parameters: MP DC level ($u}_{\mathrm{D}\mathrm{C}$), signal variance ($u}_{\mathrm{A}\mathrm{C}$), level of noise ($\sigma $). Further, FRNL threshold was also extracted from MP recordings for each neuron individually. First, a spike removal algorithm is applied on the raw MP data to obtain the generator potential (Carandini, 2004) and the time of action potentials. Our spike removal algorithm used composite analytical fits (Figure 7—figure supplement 1) to accurately subtract spikes from raw MP data. This allowed precise estimation of residual MP levels at the location of spikes, which was required for FRNL calculation. Since the phase of periodic membrane potential modulation shows some variance across trials, and there are trials in which the phasic modulation of the membrane potential is difficult to determine, we performed the analysis in such a way that individual cycles of the periodic modulation are extracted from the 3sec duration of stimulus presentation (6 cycles). The three parameters of the membrane potential response were calculated individually for these trials from the corresponding data segments. To do so, multiple cycles from a given trial were overlaid. The DC level (in mV) was established by averaging across cycles and time; AC level (in mV) was established by averaging across the overlaid cycles and measuring the amplitude of the average modulation; noise level (in mV^{2}) was measured by calculating variance across overlaid cycles.
For the estimation of the parameters of FRNL, the method described by Carandini (2004) was used (Figure 7B). The trialaveraged generator potential was segmented using 20 msec windows, and mean membrane potential levels in these bins were used to construct a histogram with 1 mV bin size. Another histogram was constructed for the number of spikes generated at any given membrane potential level. To obtain the mean firing rate corresponding to a membrane potential level, spike counts were normalised with the total duration of membrane potential segments in that bin. A thresholdlinear function was fitted to the above determined FRNL to estimate the FRNL threshold.
Appendix 1
1 Intuition for the specificity of representational untangling of orientation decoding to V1
The reason for the difference between V1 and these upstream areas in the ability of the FRNL to help linear decodability can be understood through a simple intuition. The main effect of the FRNL is distinguishing between stimuli based on whether the corresponding neural responses exceed the firing threshold or not. This is useful for orientation decoding with phase nuisance when cells can attain different maximal membrane potential values depending on stimulus orientation even when pooling across all possible stimulus phases. For Gabor filter receptive fields this is clearly the case: at the preferred orientation of a cell, it will be deeply modulated by phase, and will attain a high maximal membrane potential value at its preferred phase, while at the orthogonal orientation it will be unmodulated by stimulus phase and will thus only attain intermediate values of the membrane potential. If the firing threshold is between these intermediate and maximal values, the FRNL can contribute to decoding. In contrast, cells with nonoriented receptive fields, such as those we used to model upstream areas, will show the same amount of membrane potential modulation at any stimulus orientation, and thus the attained maximal membrane potentials will not differ between stimulus orientations. As a result, no firing threshold will be able to distinguish between different orientations, and linear decoding performance remains at chance, just as we found.
2 Informationlimiting correlations
Imposing a flat correlation structure on the membrane potential responses had a small effect on peak performance (Figure 6—figure supplement 1C) and on optimal threshold (Figure 6D). The effect of correlations, however, depends not only on magnitude but also on its specific structure: Information Limiting Correlation (ILC) (MorenoBote et al., 2014) imply covariations on neuron populations responses that are indistinguishable from those caused by changes in the decoded stimulus, stimulus orientation in our case and are known to have detrimental effects on coding. We investigated the effect of ILC by introducing orientation noise in the stimulus: the orientation of input stimulus was randomly sampled from a five degree mormal distribution. The 'private' noise added to the linear filter responses (Figure 1) was rescaled so that the total variance, the joint effect caused by the variance in stimulus drive and that of the 'private' noise, remained the same as in earlier simulations (3 mV). ILC limits decoding performance by introducing uncertainty about the identity of the stimulus orientation, which can be seen in decreased Bayesian decoder performance too (Figure 6—figure supplement 2). Importantly, while ILC also affected linear decoding by reducing overall performance, the qualitative properties of the firing rate decoder remained intact: the decoder was characterised by a clear optimal threshold and the location of the optimum was similar to the independent noise case. Interestingly, the magnitude of decline in linear decoding performance relative to the performance of the optimal decoder is smaller when variance is partly caused by ILC than in the case of a network where noise is independent across neurons (90% vs. 78%, respectively).
3 Analytical calculation of Gabor filter responses to sine grating stimuli
Gabor filter response ($G(x,y)$) to general sine wave stimulus,
is calculated in a coordinate system aligned to the two cardinal axes of the Gabor filter. In this coordinate system the general form of a Gabor filter is given by
and the response of the filter is
The integral is broken down into two terms according to the two terms of Equation S1
We introduce ${\stackrel{~}{I}}_{0}\left(x,y\right)$ and ${\stackrel{~}{I}}_{1}\left(x,y\right)$ such that $I}_{0}={G}_{0}\phantom{\rule{thinmathspace}{0ex}}{S}_{0}{\stackrel{~}{I}}_{0$ and $I}_{1}={G}_{0}\phantom{\rule{thinmathspace}{0ex}}{S}_{1}{\stackrel{~}{I}}_{0$, where we omitted the variables $x$ and $y$ for the clarity of the notation. We focus on the calculation of $\stackrel{~}{I}}_{1$ since $\stackrel{~}{I}}_{0$ can be obtained by substituting of $p=q=0$ and $r=\pi /2$. Expansion of trigonometric functions results in sixteen terms, only four of which yield nonzero double integrals:
Using the addition and subtraction formulae for trigonometric functions,
Using $\underset{\mathrm{\infty}}{\overset{\mathrm{\infty}}{\int}}{\mathrm{e}}^{a{x}^{2}}\mathrm{cos}(mx)dx=\sqrt{\frac{\pi}{a}}{\mathrm{e}}^{\frac{{m}^{2}}{4a}}$, all the integrals can be be expressed in a closed form:
After expansion and simplification the above equation results in
Finally this can be rewritten in a compact form
Substituting $p=q=0$ and $r=\pi /2$, we can obtain a closed form expression for $\stackrel{~}{I}}_{0$:
Up to this point, for mathematical convenience, sine and cosine waves were parametrised with the wave vectors $(u,v)$ and $(p,q)$ together with the phases $w$ and $r$. In general, wave vectors $({k}_{x}{k}_{y})$ can be mapped onto the more commonly used spatial period ($\lambda $) and orientation ($\vartheta $) parameters through
In the above derivation we assumed the coordinate system to be centred on the centre of the Gabor filter. Since in a population of Gabor filters this simplifying assumption cannot hold for most of the filters, we relax the assumptions to have arbitrary centre for the Gabor filter. Under these conditions, a Gabor filter is characterised by eight parameters: ${x}_{0}$, ${y}_{0}$, ${\vartheta}_{G}$, $\sigma $, $\u03f5$, $\delta $, ${\lambda}_{G}$, ${\phi}_{G}$, where $({x}_{0},{y}_{0})$ is the centre of the Gaussian envelope, ${\vartheta}_{G}\in [0,\pi )$ is the orientation of its major axis measured from the horizontal axis of the reference coordinate system, $\sigma $ is the largest variance of the Gaussian in the direction of the major axes, $\u03f5$ defines the ratio of the minor and major axes, $\delta \in [0,\pi )$ is the relative orientation of the cosine wave component measured from the major axes, ${\lambda}_{G}$ is the period of the cosine, and ${\phi}_{G}$ is its phase at the centre of the Gaussian. This set of parameters can be mapped to our original set of six parameters (${G}_{0}$, $a$, $b$, $u$, $v$, $w$) by
Analogously, $\stackrel{~}{I}}_{1$ can be also expressed in this coordinate system, using the parameters spatial period (${\lambda}_{S}$), orientation (${\vartheta}_{S}$), phase (${\phi}_{S}$), and amplitude (${S}_{1}$). The spatial period and the amplitude are independent of the coordinate system but orientation (${\vartheta}^{\prime}$) and phase (${\phi}^{\prime}$) undergo a transformation upon change in the coordinate system:
Taken together, the Gabor filter response with parameters ${x}_{0}$, ${y}_{0}$, ${\vartheta}_{G}$, $\sigma $, $\u03f5$, $\delta $, ${\lambda}_{G}$, ${\phi}_{G}$, to a sinusoidal grating stimulus, parametrised by ${S}_{0}$, ${S}_{1}$, ${\lambda}_{S}$, ${\vartheta}_{S}$, ${\phi}_{S}$, is
where ${\phi}_{0}=2\pi \left[\mathrm{sin}({\vartheta}_{S}){x}_{0}\mathrm{cos}({\vartheta}_{S}){y}_{0}\right]/{\lambda}_{S}$.
Data availability
Data is available along with code at the GitHub repository https://github.com/CSNLWigner/representational_untangling (copy archived at https://github.com/elifesciencespublications/representational_untangling).
References

Neural correlations, population coding and computationNature Reviews Neuroscience 7:358–366.https://doi.org/10.1038/nrn1888

Cellular mechanisms contributing to response variability of cortical neurons in vivoThe Journal of Neuroscience 19:2209–2223.https://doi.org/10.1523/JNEUROSCI.190602209.1999

Population activity statistics dissect subthreshold and spiking variability in V1Journal of Neurophysiology 118:29–46.https://doi.org/10.1152/jn.00931.2016

The sparseness of mixed selectivity neurons controls the generalizationdiscrimination tradeoffJournal of Neuroscience 33:3844–3856.https://doi.org/10.1523/JNEUROSCI.275312.2013

Twopoint step size gradient methodsIMA Journal of Numerical Analysis 8:141–148.https://doi.org/10.1093/imanum/8.1.141

The "independent components" of natural scenes are edge filtersVision Research 37:3327–3338.https://doi.org/10.1016/S00426989(97)001211

Representation learning: a review and new perspectivesIEEE Transactions on Pattern Analysis and Machine Intelligence 35:1798–1828.https://doi.org/10.1109/TPAMI.2013.50

A fast and simple population code for orientation in primate V1Journal of Neuroscience 32:10618–10626.https://doi.org/10.1523/JNEUROSCI.133512.2012

Underlying principles of visual shape selectivity in posterior inferotemporal cortexNature Neuroscience 7:880–886.https://doi.org/10.1038/nn1278

Membrane potential and firing rate in cat primary visual cortexThe Journal of Neuroscience 20:470–484.https://doi.org/10.1523/JNEUROSCI.200100470.2000

Optimal decoding of correlated neural population responses in the primate visual cortexNature Neuroscience 9:1412–1420.https://doi.org/10.1038/nn1792

Temporal complexity and heterogeneity of singleneuron activity in premotor and motor cortexJournal of Neurophysiology 97:4235–4257.https://doi.org/10.1152/jn.00095.2007

Untangling invariant object recognitionTrends in Cognitive Sciences 11:333–341.https://doi.org/10.1016/j.tics.2007.06.010

Estimating membrane voltage correlations from extracellular spike trainsJournal of Neurophysiology 89:2271–2278.https://doi.org/10.1152/jn.000889.2002

The effect of noise correlations in populations of diversely tuned neuronsJournal of Neuroscience 31:14272–14283.https://doi.org/10.1523/JNEUROSCI.253911.2011

Receptive fields and functional architecture of monkey striate cortexThe Journal of Physiology 195:215–243.https://doi.org/10.1113/jphysiol.1968.sp008455

Size and position invariance of neuronal responses in monkey inferotemporal cortexJournal of Neurophysiology 73:218–226.https://doi.org/10.1152/jn.1995.73.1.218

An evaluation of the twodimensional Gabor filter model of simple receptive fields in cat striate cortexJournal of Neurophysiology 58:1233–1258.https://doi.org/10.1152/jn.1987.58.6.1233

Viewdependent object recognition by monkeysCurrent Biology 4:401–414.https://doi.org/10.1016/S09609822(00)000890

Bayesian inference with probabilistic population codesNature Neuroscience 9:1432–1438.https://doi.org/10.1038/nn1790

Highly selective receptive fields in mouse visual cortexJournal of Neuroscience 28:7520–7536.https://doi.org/10.1523/JNEUROSCI.062308.2008

Higher order visual processing in macaque extrastriate cortexPhysiological Reviews 88:59–89.https://doi.org/10.1152/physrev.00008.2007

On decoding the responses of a population of neurons from short time windowsNeural Computation 11:1553–1577.https://doi.org/10.1162/089976699300016142

Decorrelation and efficient coding by retinal ganglion cellsNature Neuroscience 15:628–635.https://doi.org/10.1038/nn.3064

Cellular mechanisms of brain statedependent gain modulation in visual cortexNature Neuroscience 16:1331–1339.https://doi.org/10.1038/nn.3464

Emerging principles of population coding: in search for the neural codeCurrent Opinion in Neurobiology 25:140–148.https://doi.org/10.1016/j.conb.2014.01.002

Implications of neuronal diversity on population codingNeural Computation 18:1951–1986.https://doi.org/10.1162/neco.2006.18.8.1951

Inferotemporal cortex and object visionAnnual Review of Neuroscience 19:109–139.https://doi.org/10.1146/annurev.ne.19.030196.000545

'What' and 'where' in the human brainCurrent Opinion in Neurobiology 4:157–165.https://doi.org/10.1016/09594388(94)900663

Taskdependent changes in shortterm memory in the prefrontal cortexJournal of Neuroscience 30:15801–15810.https://doi.org/10.1523/JNEUROSCI.156910.2010

Using goaldriven deep learning models to understand sensory cortexNature Neuroscience 19:356–365.https://doi.org/10.1038/nn.4244
Article and author information
Author details
Funding
Hungarian Academy of Sciences (Lendulet Fellowship)
 Merse E Gáspár
 Gergo Orban
Wellcome Trust
 Merse E Gáspár
 Máté Lengyel
Human Frontier Science Program (RGP0044/2018)
 Peyman Golshani
 Máté Lengyel
 Gergo Orban
National Institutes of Health (R01 MH105427)
 Peyman Golshani
Whitehall Foundation
 Peyman Golshani
National Brain Research Program of Hungary (20171.2.1NKP201700002)
 Gergo Orban
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
This work was supported by a Lendület Award of the Hungarian Academy of Sciences (GO), an award from the National Brain Research Program of Hungary (NAPB, KTIA_NAP_122201, GO), and the Wellcome Trust (ML), the Whitehall Foundation (PG) and NIH R01 grant MH105427 (PG) and the Human Frontier Science Program (RGP0044/2018, PG,ML,GO,). We would like to thank Michael Einstein for assistance with electrophysiological recordings. The authors declare no competing financial interests.
Ethics
Animal experimentation: All animal experiments were approved by University of California Los Angeles IACUC and Animal Research Committee (Protocol #06066).
Copyright
© 2019, Gáspár et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 1,453
 views

 176
 downloads

 5
 citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Neuroscience
γSecretase plays a pivotal role in the central nervous system. Our recent development of genetically encoded Förster resonance energy transfer (FRET)based biosensors has enabled the spatiotemporal recording of γsecretase activity on a cellbycell basis in live neurons in culture. Nevertheless, how γsecretase activity is regulated in vivo remains unclear. Here, we employ the nearinfrared (NIR) C99 720–670 biosensor and NIR confocal microscopy to quantitatively record γsecretase activity in individual neurons in living mouse brains. Intriguingly, we uncovered that γsecretase activity may influence the activity of γsecretase in neighboring neurons, suggesting a potential ‘cell nonautonomous’ regulation of γsecretase in mouse brains. Given that γsecretase plays critical roles in important biological events and various diseases, our new assay in vivo would become a new platform that enables dissecting the essential roles of γsecretase in normal health and diseases.

 Neuroscience
Decoding the activity of individual neural cells during natural behaviours allows neuroscientists to study how the nervous system generates and controls movements. Contrary to other neural cells, the activity of spinal motor neurons can be determined noninvasively (or minimally invasively) from the decomposition of electromyographic (EMG) signals into motor unit firing activities. For some interfacing and neurofeedback investigations, EMG decomposition needs to be performed in real time. Here, we introduce an opensource software that performs realtime decoding of motor neurons using a blindsource separation approach for multichannel EMG signal processing. Separation vectors (motor unit filters) are optimised for each motor unit from baseline contractions and then reapplied in real time during test contractions. In this way, the firing activity of multiple motor neurons can be provided through different forms of visual feedback. We provide a complete framework with guidelines and examples of recordings to guide researchers who aim to study movement control at the motor neuron level. We first validated the software with synthetic EMG signals generated during a range of isometric contraction patterns. We then tested the software on data collected using either surface or intramuscular electrode arrays from five lower limb muscles (gastrocnemius lateralis and medialis, vastus lateralis and medialis, and tibialis anterior). We assessed how the muscle or variation of contraction intensity between the baseline contraction and the test contraction impacted the accuracy of the realtime decomposition. This opensource software provides a set of tools for neuroscientists to design experimental paradigms where participants can receive realtime feedback on the output of the spinal cord circuits.