Expansion and contraction of resource allocation in sensory bottlenecks
Abstract
Topographic sensory representations often do not scale proportionally to the size of their input regions, with some expanded and others contracted. In vision, the foveal representation is magnified cortically, as are the fingertips in touch. What principles drive this allocation, and how should receptor density, for example, the high innervation of the fovea or the fingertips, and stimulus statistics, for example, the higher contact frequencies on the fingertips, contribute? Building on work in efficient coding, we address this problem using linear models that optimally decorrelate the sensory signals. We introduce a sensory bottleneck to impose constraints on resource allocation and derive the optimal neural allocation. We find that bottleneck width is a crucial factor in resource allocation, inducing either expansion or contraction. Both receptor density and stimulus statistics affect allocation and jointly determine convergence for wider bottlenecks. Furthermore, we show a close match between the predicted and empirical cortical allocations in a wellstudied model system, the starnosed mole. Overall, our results suggest that the strength of cortical magnification depends on resource limits.
Editor's evaluation
The article develops a mathematical approach to study the allocation of cortical area to sensory representations in the presence of resource constraints. The theory is applied to study sensory representations in the somatosensory system. This problem is largely unexplored, the results are novel, and can be of interest to experimental and theoretical neuroscientists.
https://doi.org/10.7554/eLife.70777.sa0Introduction
In many sensory systems, receptors are arranged spatially on a sensory sheet. The distribution of receptors is typically not uniform, but instead, densities can vary considerably. For example, in vision, cones are an order of magnitude more dense in the fovea than in the periphery (Goodchild et al., 1996; WellsGray et al., 2016). In the somatosensory system, mechanoreceptors are denser in the fingertips than the rest of the hand (Johansson and Vallbo, 1979). Alongside the density of receptors, the statistics of the input stimuli can also vary. For example, the fingertips are much more likely to make contact with objects than the palm (Gonzalez et al., 2014). Subsequent sensory areas are typically arranged topographically, such that neighbouring neurons map to nearby sensory input regions, for example, retinotopy in vision and somatotopy in touch. However, the size of individual cortical regions is often not proportional to the true physical size of the respective sensory input regions and, instead, representations might expand (often called magnification) or contract. For example, both the fovea and the fingertips exhibit expanded representations in early visual and somatosensory cortices, respectively, compared to their physical size (Azzopardi and Cowey, 1993; Engel et al., 1997; Sereno et al., 1995; Martuzzi et al., 2014). What determines this cortical magnification? For somatotopy, it has been proposed that cortical topography might directly reflect the density of sensory receptors (Catani, 2017). On the other hand, receptor density alone is a poor predictor of magnification (Corniani and Saal, 2020) and work on plasticity has established that cortical regions can expand and contract dynamically depending on their usage, suggesting that expansion and contraction might be driven by the statistics of the sensory stimuli themselves (Coq and Xerri, 1998; Merzenich and Jenkins, 1993; Xerri et al., 1996).
Here, we tackle this problem from a normative viewpoint, employing efficient coding theory, which has been widely used to model and predict sensory processing. Efficient coding theory includes a number of different approaches such as sparse coding, redundancy reduction and predictive coding, depending on the constraints of the proposed problem (Chalk et al., 2018). Here, we focus on efficient coding via redundancy reduction (Barlow, 1961), which suggests that neural populations are tuned to maximize the information present in the sensory input signals by removing redundant information (Atick, 1992; Atick and Redlich, 1990; Attneave, 1954; Graham and Field, 2009; Chechik et al., 2006). Efficient coding models have been most prominent in vision (Kersten, 1987; Karklin and Simoncelli, 2011; Atick, 1992; Atick and Redlich, 1990; Doi et al., 2012; Olshausen and Field, 1996; Olshausen and Field, 1997; Olshausen and Field, 2004; Bell and Sejnowski, 1997) and audition (Smith and Lewicki, 2006; Lewicki, 2002). This prior work has mostly focused on predicting the response properties and receptive field structure of individual neurons. In contrast, here we ask how receptive fields—independent of their precise structure—should tile the sensory sheet when the receptors themselves differ in density and activation levels.
Some aspects of magnification in topographic representations have been qualitatively reproduced using selforganizing maps (Ritter et al., 1992). However, these models generally lack a clear cost function and the magnification factor can be determined exactly only in rare cases, while a general expression is lacking (Ritter and Schulten, 1986). Assuming a uniform density of output neurons, cortical maps may be optimizing for the spread of incoming information to be equally distributed over these neurons (Plumbley, 1999).
In contrast to receptor density, there has been some work on how populations of neurons should encode nonuniform stimulus statistics using Fisher information (Ganguli and Simoncelli, 2010; Ganguli and Simoncelli, 2014; Ganguli and Simoncelli, 2016; Yerxa et al., 2020). This approach aims to approximate mutual information and is used to calculate optimal encoding in a neural population (Yarrow et al., 2012) however, its use is restricted to specific conditions and assumptions (Berens et al., 2011; Bethge et al., 2002). Rather than receptive fields uniformly tiling the input space, the optimal population should be heterogeneous, with receptive fields placed more densely over highprobability inputs, at detriment to lowprobability regions (Ganguli and Simoncelli, 2014). Our approach differs from this prior work: rather than maximizing information between the neural population and the stimulus itself, we instead consider information between the neural population and an initial population of receptor neurons. This places a limit on the total amount of information that can be represented.
The need for information maximization is often motivated by resource constraints. These can take the form of an explicit bottleneck, where the number of receptor neurons is greater than the number of output neurons. This is the case in the early visual system, where photoreceptors in the retina are much more numerous than the retinal ganglion cells to which they project (WellsGray et al., 2016; Goodchild et al., 1996). Other sensory systems might lack such explicit bottlenecks, but still place limits on the amount of information that is represented at a higherorder processing stage.
How then should resource allocation change for differentsized bottlenecks, given varying densities of receptors and different stimulus statistics? Here, we derive optimal neural allocations for different bottlenecks, while systematically varying receptor density and stimulus statistics. A preliminary version of these results restricted to the effects of receptor density in a 1D space was previously presented as a conference paper (Edmondson et al., 2019).
Results
We restrict ourselves to linear models and only consider secondorder statistics of the sensory signals, such that redundancy reduction simplifies to decorrelation (for examples from the visual literature, see Hancock et al., 1992; Simoncelli and Olshausen, 2001; Doi and Lewicki, 2005). We also introduce a sensory bottleneck, such that the number of output neurons is smaller than the number of receptors. Specifically, $\mathit{r}=\mathit{W}\mathit{x}$, where $\mathit{r}$ is the $m$dimensional output vector, $\mathit{x}$ is the $n$dimensional input vector (with $n>m$) containing sensory receptor responses, and $\mathit{W}$ contains the receptive field. Assuming a lownoise regime, optimal decorrelation can be achieved if $\mathit{W}$ takes the following form:
where $\mathbf{\Lambda}$ and $\mathrm{\Phi}$ contain the (top $m$) eigenvalues and eigenvectors, respectively, of the covariance matrix $\mathbf{\Sigma}={\mathit{X}}^{T}\mathit{X}$.
In the specific setup we consider here, receptors are placed in nonuniform densities across the sensory sheet, with some regions having a higher density than others. For example, in Figure 1B, the digit tip is tiled more densely than the rest of the finger. The covariance of receptor activation decreases with distance (see Figure 1C). Thus, directly neighbouring units in denser regions covary more strongly than those from less dense regions. The activation level of receptors can also vary between regions, which is modelled through scaling the covariance function (see Appendix 1 for further details). We initially focus on negative exponential covariance functions, for which the eigenvalues and the resulting allocation can be calculated analytically. The covariance between receptors x_{i} and x_{j} for two regions R_{1} and R_{2}, differing in density and activation, can thus be expressed as
where $a$ denotes the receptor activation ratio and $d$ the receptor density ratio between both regions. It can be seen that changing the density affects the decay of the covariance function and thereby ‘stretches’ the space, while changing the activation scales the function.
We approximate the regions as separate, such that no receptors are activated from both regions simultaneously. The covariance matrix across all receptors then forms a block matrix, where the covariance between separate regions is zero (Figure 1D):
Thus, eigenvalues can be calculated separately from the covariance matrices of each region (inset panels in Figure 1D), and for negative exponential covariance functions, this can be done analytically (see ‘Methods’ for mathematical derivations):
Finally, the resulting eigenvalues are ordered by magnitude for both regions combined (grey line in Figure 1E). A bottleneck is introduced by restricting the total number of output neurons. This is done by selecting from the combined set of ordered eigenvalues until the limit is reached. The proportion of eigenvalues originating from each input region determines its allocation for the chosen bottleneck width (see red dashed line in Figure 1E and F).
In the following, we will present results for the common case of 2D sensory sheets, while results for 1D receptor arrangements are summarized in Appendix 6. For ease of analysis, all cases discussed will assume two sensory input regions of equal size, which are differing in receptor density, input statistics, or both. In all our examples, receptors in the lowdensity (baseline) region are always spaced on a grid with a distance of 1.
Resource limits determine the amount of magnification
First, we investigated resource allocation in bottlenecks for heterogeneous density of receptors and heterogeneous stimulus statistics separately, whilst keeping the other factor uniform across the input regions.
Heterogeneous density
For two regions with different receptor densities, we found that the higher density region could either expand or contract relative to its input density, depending on the width of the bottleneck. Specifically, for narrow bottlenecks (smaller than approximately 10% of input neurons), the higher density region is either exclusively represented or its representation is expanded compared to a proportional density allocation (see example in Figure 2A). Mathematically, this can be explained by a multiplicative scaling of the eigenvalue function for the higher density region (see illustration in Figure 2—figure supplement 3A). In contrast, for intermediate bottlenecks, lowdensity regions expand their representation, beyond what is expected for a proportional mapping (see dashed red line in Figure 2A denoting proportional allocation), leading to a contraction of the highdensity region. For negative exponential covariance functions, this allocation converges to a fixed ratio at wider bottlenecks of $1/(1+\sqrt{d})$, where $d$ is the density ratio between both regions (dashed yellow line in Figure 2A; see ‘Methods’ for derivation), as neurons are allocated to either region at a fixed ratio (see inset in Figure 2—figure supplement 3A). Finally, for wide bottlenecks, all information arising from the lowdensity region has now been captured, and any additional output neurons will therefore be allocated to the highdensity region only. The overrepresentation of the lowdensity region thus decays back to the original density ratio. The overall nonlinear effect of bottleneck width is present regardless of the ratio between the densities (see Figure 2B). The spatial extent of the correlations over the sensory sheet (controlled by the decay constant γ in the covariance function, see ‘Methods’) determines allocation at narrow bottlenecks, and how fast the allocation converges, but does not affect the convergence limit itself (Figure 2C). As γ increases, and therefore spatial correlations decrease, the convergence limit is approached only at increasingly wider bottlenecks. The extent of magnification thus heavily depends on the correlational structure of the stimuli for narrower bottlenecks, while receptor densities are more important for wider bottlenecks.
Heterogeneous statistics
Aside from receptor densities, the stimulus statistics can also vary over the input space, leading to differences in receptor activation levels between the regions and their associated response variance. Overall, allocations for heterogeneous receptor activation are similar to those found with heterogeneous density. However, while the allocations are again a nonlinear function of bottleneck width, the representations are solely expanded for the higher activation region for all bottleneck widths (see example in Figure 2D). The extent of this expansion depends on the width of the bottleneck and is again more extreme for narrower bottlenecks (see Figure 2E). The convergence limit is $1/(1+a)$, where $a$ is the activation ratio, implying that the level of expansion and contraction in intermediate bottlenecks is more extreme than in the heterogeneous density case (see difference between red and yellow dashed lines in Figure 2A and D). Finally, the effect of spatial correlations is also more pronounced (Figure 2F).
In the cases described above, the bottleneck was constrained by the number of output neurons. Alternatively, the limiting factor might be set as the amount of information (total variance) captured. Doing so results in allocation curves that retain the nonlinear behaviour described here (see Figure 2—figure supplement 4 for illustrations). An additional consideration is the robustness of the results regarding small perturbations of the calculated eigenvalues. As allocation depends on the ranks of the eigenvalues only, low levels of noise are unlikely to affect the outcome for narrow bottlenecks, especially since eigenvalues are decaying rather steeply in this regime. On the other hand, allocation in wider bottlenecks is determined from small tail eigenvalues which are much more sensitive to noise (which is also evident when comparing the analytical solution to numerical ones). Allocation can therefore be expected to be somewhat less robust in those regimes.
In summary, we find that representations of different input regions can contract or expand, depending on the bottleneck width. This effect plays out similarly for differences in receptor density and receptor activation, however, with some crucial differences. Finally, for narrow bottlenecks, the spatial extent of the correlations across the sensory sheet becomes an important driver.
Interplay between stimulus statistics and receptor density
In sensory systems, such as in touch, both the density and input statistics vary across regions and will therefore jointly determine the resulting allocation. As a consequence, the convergence for intermediate bottlenecks will depend on both density and activation ratios, and can be calculated as $\frac{1}{(1+a\sqrt{d})}$, where $a$ is the activation and $d$ is the density ratio (see ‘Methods’ for derivation).
At narrow bottlenecks, the spread of possible allocations is much wider for varying the activation ratio (see left panel of Figure 3A) compared to varying the density ratio. Thus, for 2D sensory sheets, the activation ratio more strongly determines the possible allocation than does the density ratio. Specifically, this means that the allocation regime, that is, whether the allocation expands, contracts, or exhibits both behaviours across all bottleneck widths, is more dependent on relative receptor activation than densities (Figure 3B).
Finally, it is likely that regions with higher receptor densities will also show greater activation than lower density regions. For example, in touch, the regions on the hand with the highest receptor densities also are the most likely to make contact with objects (Gonzalez et al., 2014). In these cases, both effects reinforce each other and drive the resulting allocation further from proportional allocation (see orange lines in Figure 3A and green ellipse in Figure 3B).
Resource limits determine the strength of plasticity under changes in stimulus statistics
Over most of the lifetime of an organism, the resources available for processing sensory information and the density of sensory receptors should be relatively constant. The stimulus statistics, on the other hand, can and will change, for example, when encountering a new environment or learning new skills. These changes in stimulus statistics should then affect sensory representations, mediated by a variety of plasticity mechanisms. For example, increased stimulation of a digit will lead to an expansion of that digit’s representation in somatosensory cortex (Jenkins et al., 1990).
We asked how representations should adapt under the efficient coding framework and whether resource limits would affect the resulting changes. To answer this question, we calculated optimal allocations for different bottleneck widths, receptor densities, and stimulus statistics. We then introduced a change in stimulus statistics and recalculated the resulting allocations (see illustration in Figure 4A). As expected, we found that when increasing the receptor activation over a region (e.g. by increasing stimulation of the region), more neurons would be allocated to this region than to the other. Interestingly, however, this effect was dependent on the width of the bottleneck. The largest effects are seen for smaller bottlenecks and then diminish as the bottleneck size increases. Figure 4B demonstrates such allocation changes for three different bottleneck widths for a range of receptor densities and activation ratios. This suggests that plasticity should be relatively stronger under extreme resource constraints than in cases where limits on the information are generous.
Generalization to other covariance functions
The results above relate to negative exponential covariance functions, where allocations can be solved analytically and it is unclear whether these findings generalize to other monotonically decreasing covariance functions and are therefore robust. Specifically, negative exponential covariances are relatively ‘rough’ and allocations might conceivably depend on the smoothness of the covariance function. To test directly how optimal allocations depend systematically on the smoothness of the covariance function, we numerically calculated allocations for three covariance functions of the Matérn class, whose smoothness depends on a single parameter ν. The negative exponential considered so far is a special case of the Matérn class when $\nu =1/2$ and we also considered $\nu =3/2$ and $\nu =5/2$ (see ‘Methods’). As ν increases, the decay function becomes smoother, yielding larger correlations at short distances and smaller ones at farther distances (see Figure 5A).
We numerically calculated allocations for different density and activation ratios in 1D (Figure 5B). We focused on smaller bottleneck widths, where solutions are numerically stable, and observed a close match between the numerical and analytical solutions for the negative exponential covariance function (see dashed lines in Figure 5B).
This analysis yielded two main findings. First, allocation curves are qualitatively similar across all tested covariance functions in that regions with higher density, activation, or both are systematically overrepresented. Furthermore, this effect is more extreme at smaller bottleneck widths. Second, the resulting allocations depend systematically on the smoothness of the covariance function, such that smoother covariances yield less extreme differences in allocation. This effect is most obvious when considering the convergence points at larger bottlenecks: smoother covariance functions induce a more uniform allocation (closer to a 50:50 split) than does the negative exponential (Figure 5C). We also noticed that the convergence points appeared to follow a simple function, namely, $\frac{1}{1+\sqrt[6]{ad}}$ for $\nu =5/2$, $\frac{1}{1+\sqrt[4]{ad}}$ for $\nu =3/2$ (see dashed lines in Figure 5C), and $\frac{1}{1+\sqrt{ad}}$ for $\nu =1/2$ (which is the result obtained analytically, see Appendix 6).
In summary, monotonically decreasing covariance functions result in qualitatively similar allocations to the negative exponential studied earlier. However, precise allocations and convergence are determined by the correlational structure: smoother covariance functions lead to less extreme differences in allocation.
Predicting cortical magnification in the starnosed mole
Finally, we investigated to what extent the procedure outlined in the previous sections might predict actual resource allocation in the brain. While our model is relatively simple (see ‘Discussion’), decorrelation has been found to be a strong driver in early sensory processing, including in vision (Atick and Redlich, 1992; Graham et al., 2006; Vinje and Gallant, 2000), audition (Clemens et al., 2011; Smith and Lewicki, 2006), and touch (Ly et al., 2012), and one might therefore expect the approach to at least yield qualitatively valid predictions. As currently available empirical data makes it difficult to test the impact of different bottleneck widths on the resulting allocation directly (see ‘Discussion’), we instead focused on another predicted outcome of the proposed model: the precise interaction between receptor density and receptor activation in driving resource allocation that we presented earlier. We picked the starnosed mole as our model system because stimulus statistics, receptor densities, and cortical allocations have been precisely quantified. Moreover, the starnosed mole displays considerable variation in all these parameters, presenting a good opportunity to put the model to the test.
The starnosed mole is a mostly underground dwelling creature relying on active tactile sensing while foraging for prey (Catania and Kaas, 1997; Catania, 2020). This process is facilitated by two sets of 11 appendages arranged in a starlike pattern that make up the mole’s nose (Figure 6A). Individual rays are tiled with tactile receptors known as Eimer’s organs that detect prey, which is then funnelled towards the mouth (Catania and Kaas, 1997; Sawyer and Catania, 2016). The density of fibres innervating the Eimer’s organs differs across the rays, with rays closer to the mouth exhibiting higher densities (Figure 6B). The rays also vary in size and location with respect to the mouth. This affects their usage as the rays closest to the mouth encounter tactile stimuli much more frequently than other rays (Figure 6C). In the cortex, a clear topographic representation of the rays can be found (Catania and Kaas, 1997). However, the extent of the cortical ray representations is not proportional to the physical size of the rays. Ray 11, which sits closest to the mouth, is cortically magnified several fold and is considered the tactile equivalent of the visual fovea (Catania and Remple, 2004).
Previous work by Catania and Kaas, 1997 found that cortical sizes are correlated more strongly with the different patterns of activation across the rays, rather than their innervation densities. Using the empirical data quantifying receptor densities and stimulus statistics from this study, we investigated whether the efficient coding model could predict typical cortical representation sizes for each ray (see ‘Methods’ for details) and whether innervation densities or usage would lead to more accurate allocation predictions. As the bottleneck size between periphery and cortex is unknown for the starnosed mole, we calculated the optimal allocations over all possible bottleneck sizes. Using the model described above, we calculated allocations considering three different scenarios. The first, ‘density only’, included accurate afferent densities, but with activations set to the mean over all rays. The second model, ‘usage only’, included accurate activation ratios but mean receptor densities. Finally, the ‘full model’ included accurate values for both factors. We found that the empirical cortical representation sizes are most accurately predicted by the models that include receptor activation—the ‘receptor usage only’ and ‘full models’ (Figure 6D)—suggesting the starnosed mole could be employing an efficient coding strategy based on decorrelation in the neural representation of somatosensory inputs.
Discussion
We examined efficient population coding under limited numbers of output neurons in cases of nonuniform input receptor densities and stimulus statistics. Instead of focusing on the precise structure of the receptive field, we asked which coarse region of the sensory sheet the receptive field would fall on. We showed that the resulting allocations are nonlinear and depend crucially on the width of the bottleneck, rather than being proportional to the receptor densities or statistics. Specifically, narrow bottlenecks tend to favour expansion of a single region, whereas for larger bottlenecks, allocations converge to a constant ratio between the regions that is closer to a proportional representation. Whether, across all possible bottlenecks, allocations are always expanded, contracted, or show both expansion and contraction depends on the relative density and activation ratios, but receptor activation plays a bigger role. When allocation changes due to novel stimulus statistics, a larger fraction of output neurons will switch their receptive field to another region for narrow compared to wide bottlenecks. Finally, we demonstrated that in the starnosed mole a model that includes both accurate innervation densities and contact statistics provides a better fit to the sizes of somatosensory cortical regions than considering each of these factors alone.
Comparison with previous approaches
A key feature of efficient coding population models is the nonuniform allocation of output neurons, whereby stimuli occurring at higher probabilities are represented by a greater number of neurons. A common approach is to use Fisher Information as a proxy for mutual information, enabling the calculation of optimal output neuron density and tuning curve placement given a distribution of sensory stimuli (Ganguli and Simoncelli, 2010; Ganguli and Simoncelli, 2014; Ganguli and Simoncelli, 2016; Yerxa et al., 2020). However, these approaches maximize information between the output population and the stimulus distribution itself, such that allocating additional neurons to a given region of the input space will always lead to an increase in the overall amount of information represented. In contrast, our approach assumes a finite number of input receptors in each region. We are thus asking a different question than previous research: once information about a sensory stimulus has been captured in a limited population of receptors, what is the most efficient way of representing this information? This framing implies that once all information from a given region has been fully captured in the output population, our method does not allocate further neurons to that region. Therefore, this also places a limit on the total size of the output population as this cannot exceed the total number of input receptors.
There has also been prior work on how bottlenecks affect sensory representations, though mostly focused on how different levels of compression affect receptive field structure. For example, Doi and Lewicki, 2014 predicted receptive fields of retinal ganglion cells at different eccentricities of the retina, which are subject to different convergence ratios. More recently, such direct effects on representations have also been studied in deep neural networks (Lindsey et al., 2019). Finally, some approaches employ a different cost function than the meansquared reconstruction error inherent to the principal component analysis (PCA) method used here. For example, the Information Bottleneck method (Tishby et al., 2000; Tishby and Zaslavsky, 2015) aims to find a lowdimensional representation that preserves information about a specific output variable, while compressing information available in the input. It is quite likely that the choice of cost function would affect the resulting allocations, a question that future research should pursue.
Implications for sensory processing
The quantitative comparison of the model results with the cortical somatosensory representation in the starnosed mole provided limited evidence that one of the model predictions—namely, that receptor density and activation statistics should jointly determine cortical organization—is borne out in a biological model system. Further direct tests of the theory are hampered by the lack of reliable quantitative empirical data. Nevertheless, the model makes a number of qualitative predictions that can be directly compared to available data or tested in future experiments.
First, there is additional evidence that both receptor density and stimulus statistics drive allocation in neural populations. Similar to the starnosed model, the primate somatosensory system also exhibits nonuniform distributions of both stimulus statistics and receptor density. Both of these factors are also broadly correlated, for example, receptor densities are higher on the fingertips, which are also more likely to be in contact with objects (Gonzalez et al., 2014). Importantly, while receptor densities alone can explain some of the magnification observed in the cortical somatosensory homunculus, they cannot account for this effect fully (Corniani and Saal, 2020). Indeed, evidence from nonhuman primates shows that cortical magnification also critically depends on experience (Xerri et al., 1996; Xerri et al., 1999). Similar results have recently been obtained from the brainstem in mice (Lehnert et al., 2021). Empirically reported differences in sizes of receptive fields between the digit tips and the palm have also been recreated from typical contact across the hand during grasping (Bhand et al., 2011), suggesting an influence of statistics of hand use. Both mechanoreceptor densities (e.g. Verendeev et al., 2015) and hand use statistics (Fragaszy and Crast, 2016) differ across primates, forming the basis for a potential crossspecies study. The model presented here demonstrates how both effects can be treated within a single framework driven by information maximization.
Second, it appears that allocation in sensory systems with severely constrained resources is qualitatively in agreement with model predictions. As our results demonstrated, magnification should be more extreme the tighter the bottleneck. The best characterized and most clearly established bottleneck in sensory processing is likely the optical nerve in vision. Given that the optic nerve serves as a narrow bottleneck (approximately 12–27%; assuming 0.71–1.54 million retinal ganglion cells with 80% Midget cells [Curcio and Allen, 1990; Perry et al., 1984] and 4.6 million cones [Curcio et al., 1990]), and that the fovea contains a much higher density of cone receptors than the periphery (WellsGray et al., 2016; Curcio et al., 1990), the model would predict a large overrepresentation of the fovea, agreeing with experimental observations (see Edmondson et al., 2019, for further qualitative evidence). In order to further test the proposed model, comparisons could be made between individuals to study variations within a population. Specifically, it would be expected that optic nerves containing a high number of fibres would devote proportionally fewer of them to the fovea than optic nerves containing smaller numbers of fibres (assuming equal receptor densities and numbers in the retina). These comparisons could be extended across species, taking advantage of the fact that photoreceptor densities and optic nerve fibre counts differ across many primate species (Finlay et al., 2008). Along these lines, recent computational work has shown that the amount of neural resources allocated to the optic nerve can be expected to affect the structure of the receptive field itself (Lindsey et al., 2019). Finally, the extent of cortical areas can be controlled by experimental interventions in animals (Huffman et al., 1999), which would constitute a direct manipulation of the bottleneck.
Reallocation during development, learning, and ageing
Changing receptor densities, stimulus statistics, or resource limits over the life span of an organism should lead to a dynamic reallocation of the available resources. The most common case will be changes in stimulus statistics as both receptor densities and resources should be relatively stable. In such cases, representations should adapt to the new statistics. For example in touch, changing the nature of tactile inputs affects cortical representations (Coq and Xerri, 1998; Merzenich and Jenkins, 1993; Xerri et al., 1996). Increasing statistics of contact over a region typically leads to expansion of that region in the cortex. Our method suggests that the precise level of expansion would be dependent on the bottleneck width with larger effects observed for narrower bottleneck sizes.
For the other cases, changes in fibre numbers during development and ageing might be interpreted as a change in resources. For example, the optic nerve undergoes a period of rapid fibre loss early during development (Sefton et al., 1985; Provis et al., 1985). Similarly, fibre counts in the optic nerve decrease during ageing (Dolman et al., 1980; Jonas et al., 1990; Sandell and Peters, 2001). In this case, the model would predict a decrease in the size of peripheral representation in the bottleneck compared to the fovea. It is also possible that the receptor densities themselves may change. In touch, ageing leads to reductions in the densities of receptors in older adults (GarcíaPiqueras et al., 2019). In such cases, we have effectively increased the bottleneck width relative to our receptor population, which again should lead to the reallocation of resources.
Expansion and contraction along the sensory hierarchy
Magnification of specific sensory input regions can be observed throughout the sensory hierarchy, from the brainstem and thalamus up to cortical sensory areas. Often, the amount of expansion and contraction differs between areas. For example, the magnification of the fovea increases along the visual pathway from V1 to V4 (Harvey and Dumoulin, 2006). Which of these representations might be best addressed by the model presented here? The main component of the model is decorrelation, which has been shown to be a driving principle for efficient coding in settings where noise is low (Chalk et al., 2018). This is generally the case in lowlevel sensory processing, for example, in touch (Goodwin and Wheat, 2004). Our results might therefore best match early sensory processing up to perhaps lowlevel cortical representations. Beyond this, it is likely that noise will be much higher (Shadlen and Newsome, 1998) and for efficient codes to shift away from decorrelation (Hermundstad et al., 2014). Furthermore, while distinct bottlenecks, whether on the number of neurons or the amount of information, are common in lowlevel processing, it is less clear whether such restrictions constrain cortical processing. Whether and how such different regimes should affect neural allocations remains an open question.
Perceptual consequences
Does the allocation of output neurons lead to testable perceptual consequences? While we do not model neurons’ receptive fields directly, allocating more neurons to a given region would increase perceptual spatial acuity for that region. Indeed, cortical magnification and perceptual acuity are correlated in both vision (Duncan and Boynton, 2003) and touch (Duncan and Boynton, 2007). At the same time, the absolute limits on spatial acuity are determined by the density of receptors in each input region. A naive allocation scheme that assigns output neurons proportional to the density of receptors would therefore result in perceptual spatial acuity proportional to receptor distance. Instead, as our results have shown, the allocation should not be proportional in most cases. Specifically, for narrow bottlenecks we would expect relatively higher spatial acuity for regions with high receptor density than might be expected from a proportional allocation. Conversely, for wider bottlenecks this relationship should be reversed and spatial acuity should be better than expected for lower density regions. In agreement with these results, it has been found that in vision spatial resolution declines faster than expected with increasing eccentricity, suggesting a narrow bottleneck in the optic nerve (Anderson et al., 1991).
A second consequence is that spatial acuity should be better in regions with higher activation probability even when receptor densities are equal. Indeed, spatial discrimination in touch improves with training or even just passive stimulation (Van Boven et al., 2000; Godde et al., 2000), up to a limit that is presumably related to receptor density (Wong et al., 2013; Peters et al., 2009). Assuming a fixed resource limit, training may offer improvements to some digits to the detriment of others. Whether this is indeed the case has to our knowledge not yet been empirically tested.
Finally, previous work has shown that nonuniform tuning curves across a population will lead to characteristic biases in perceptual tasks (Wei and Stocker, 2015). While the original formulation assumed that this heterogeneous allocation of output neurons was driven by stimulus statistics alone, we have shown here that it can also be a consequence of receptor densities. Thus, perceptual biases might also be expected to arise from neural populations that efficiently represent sensory inputs sampled by nonuniform receptor populations.
Limitations and future work
We considered simple linear models based on decorrelation and demonstrated that even such seemingly straightforward models exhibit surprising complexity in how they manage tradeoffs in resource allocation under constraints. Specifically, we found that output neurons were not generally allocated proportionally to input neurons or according to some other fixed rule. It therefore stands to reason that similarly complex tradeoffs would manifest in more complex models, even though the precise allocations might differ. Nevertheless, since PCA is widely employed for decorrelation and dimensionality reduction, and therefore incorporated into many other algorithms, our results immediately generalize to several other methods. For example, it is straightforward to extend the model with additional constraints (e.g. response sparsity or power constraints) that would not affect the resulting allocations (see ‘Methods’ for details), and therefore the model presented here already covers a number of related models. This includes independent component analysis, which considers higherorder rather than secondorder statistics, but relies on a whitened signal, which in the undercomplete (bottleneck) case is obtained via PCA (Hyvärinen and Oja, 2000). Similarly, some models that do incorporate sensory noise and maximize reconstruction accuracy also use an undercomplete set of principal components to reduce the dimensionality of the sensory signal (Doi and Lewicki, 2014). In both of these examples, the resulting receptive field structure will differ, but their allocation—where on the sensory sheet they will fall—will be governed by the same principles described earlier. However, we did not consider nonlinear models and previous work has demonstrated that response nonlinearities can make important contributions to decorrelation (Pitkow and Meister, 2012). Additionally, the precise structure and origin of noise in nonlinear models have been demonstrated to affect efficient coding strategies (Brinkman et al., 2016) and might therefore also influence the resulting allocations. Future work should explore resource allocation in such complex models.
We considered allocations for monotonically decreasing covariance functions. Choosing a negative exponential function and assuming that activations are uncorrelated across different input regions allowed us to derive an analytical solution. How justified are these assumptions? Many sensory systems will likely obey the intuitive notion that receptor correlations decrease with distance; however, there are notable exceptions, for example, in the auditory system (Terashima and Okada, 2012). In touch, the sensory system we are mainly concerned with here, contact might often be made with several fingers (for humans) or rays (for the starnosed mole) simultaneously, which would induce farranging nonmonotonic correlations. Unfortunately there is little quantified evidence on the strength of these correlations, rendering their importance unclear. At least for the starnosed mole, their prey is often small compared to the size of the rays, which move mostly independently, so crossray correlations might be low. Furthermore, the receptive fields of neurons in early somatosensory cortex of both humans and starnosed moles are strongly localized to a single appendage and lie within cortical subregions that are clearly delineated from others (Sur et al., 1980; Nelson et al., 1980; Catania and Kaas, 1995). If longrange nonmonotonic correlations were strong, we would expect to find many multiappendage receptive fields and blurry region boundaries. As this is not the case, it therefore stands to reason that either these correlations are not very strong or that there is some other process, perhaps during development, that prevents these correlations affecting the final allocation. Either way, our assumption of monotonically decreasing covariance functions appears to be a good firstorder match. Still, the question of how to arrive at robust allocations for more complex covariance functions is an important one that should be considered in future research. While it is possible in principle to solve the allocation problem numerically for arbitrary covariance functions, in practice we noticed that the presence of small numerical errors can affect the sorting process and caution is therefore warranted, especially when considering nonmonotonic functions.
Methods
Our main goal was to derive a method for efficiently allocating output neurons to one of several input regions with different correlational response structure in the presence of constraints on the number of output neurons or amount of information being transmitted. In the following, we focus on the main rationale and equations, while specific proofs can be found in Appendix 1–6. First, we outline the framework for combined whitening and dimensionality reduction that is employed. Next, we demonstrate how this framework can be applied to multiple input regions with different statistics and densities of receptors, and how calculation of the eigenvalues of regionspecific covariance matrices solves the problem of resource allocation. Finally, we demonstrate how the problem can be solved analytically for a certain choice of covariance function. The Python code implementing the equations that can be used to recreate the figures in this article is available on GitHub (Edmondson, 2021).
Combined whitening and dimensionality reduction
We assume that receptors are arranged on a 2D sensory sheet. Correlations in the inputs occur as receptors that are nearby in space have more similar responses. Restricting ourselves to such secondorder statistics and assuming that noise is negligible, information is maximized in such a setup by decorrelating the sensory inputs. Here, we decorrelate using a simple linear model. To model the bottleneck, we restrict the number of outputs to $m$ < $n$, where $n$ is the total number of receptors.
If the inputs are represented as a matrix $\mathit{X}$ of dimensions $n\times z$ (where $z$ is the number of sensory input patterns), then our goal is to find an $m\times n$ dimensional matrix $\mathit{W}$ such that $\mathit{W}\mathit{X}$ is uncorrelated:
This is achieved by setting $\mathit{W}={\mathbf{\Sigma}}^{\frac{1}{2}}$, where $\mathbf{\Sigma}={\mathit{X}}^{T}\mathit{X}$. Solutions can then be expressed in terms of the diagonal matrix of eigenvalues, $\mathbf{\Lambda}$, and eigenvectors, $\mathrm{\Phi}$, of the covariance matrix $\mathbf{\Sigma}$:
Whitening filters are not uniquely determined and optimal decorrelation is obtained with any orthogonal matrix $\mathit{P}$. Setting $\mathit{P}=\mathit{I}$ (plain whitening/PCA) leads to the form shown in the main results section (Equation 1). Localized receptive fields are obtained by setting $\mathit{P}=\mathbf{\Phi}$, which is known as zerophase component analysis. In cases with a bottleneck, the solution involves solving an Orthogonal Procrustes problem (Doi and Lewicki, 2014) to find ${\mathit{P}}^{*}$, an $m$dimensional orthogonal matrix (where $m$ is the size of the bottleneck) which minimizes the reconstruction error of the inputs and a set of ideal local receptive fields ${\mathit{W}}_{opt}$:
where $\parallel \cdot {\parallel}_{F}$ denotes the Frobenius norm, and $\mathbf{\Lambda}$ and $\mathbf{\Phi}$ are as above but retaining only those components with the $m$ largest eigenvalues. As the optimally whitened solution is independent of $\mathit{P}$, additional constraints on the solution can be enforced. For example, power constraints are often added, which either limit the total variance or equalize the variance across output neurons, and additional sparsity constraints can be placed on the output neuron’s activity or their receptive field weights (Doi et al., 2012; Doi and Lewicki, 2014). For example, to optimize the sparsity of the weight matrix, one would define an appropriate cost function (such as the ${L}^{1}$ norm of the weight matrix), then iteratively calculate the gradient with respect to $\mathit{W}$, take an update step to arrive at a new ${\mathit{W}}_{\mathit{o}\mathit{p}\mathit{t}}$, and determine $\mathit{P}$ as described in Equation 7 (see Doi and Lewicki, 2014, for further details). Importantly for our problem, such additional constraints will affect the precise receptive field structure (through $\mathit{P}$), but not the eigenvalues and eigenvectors included in $\mathbf{\Lambda}$ and $\mathbf{\Phi}$, respectively. As we will see in the following section, our solution relies on the eigenvalues only, and we can therefore solve the allocation problem irrespective of the precise receptive field structure or additional constraints on the solution.
Extension to multiple input regions
For our specific problem, we are interested in the case of multiple input regions with different correlational structure (i.e. due to differing receptor density or activation). To simplify the derivations, we approximate different input regions as independent, such that the overall covariance matrix will be a block diagonal matrix. The covariance $\mathbf{\Sigma}$ for two input regions, R_{1} and R_{2}, can then be expressed as follows:
This assumption turns out to be a reasonable approximation when region sizes are relatively big and correlations typically do not extend far across the sensory sheet (see Edmondson et al., 2019, for a comparison between block and nonblock region covariance matrices in 1D). Furthermore, in many sensory systems, the borders between regions of differing density tend to be relatively narrow. For example, in touch, the digits of the hand are spatially separated, and regions of differing densities, for example, between the digit tips and proximal phalanges, neighbour along the short rather than long axis of the digit. In the starnosed mole, rays of different innervation densities are separated and neighbour only along their connection to the rest of the nose. However, the block matrix approximation might be problematic in cases with many very small adjacent regions with strong, farranging correlations.
The eigenvalues and eigenvectors of a block diagonal covariance matrix also follow the block diagonal form and can be calculated from the individual region covariances alone by a simple application of the Cauchy interlacing theorem. Thus, the corresponding eigenvalues and eigenvectors are
Due to the imposed bottleneck, only the $m$ largest eigenvalues from the combined set $\mathbf{\Lambda}$ will be retained. If receptive fields are localized such that they are constrained to fall within a single input region, then an eigenvalue selected from $\mathbf{\Lambda}}^{(R1)$ indicates that the receptive field of the corresponding output neuron will fall onto region R_{1}, and analogously for R_{2}. This fact holds independent of the structure of $\mathit{P}$ that is chosen in Equation 7 because in order to preserve decorrelation a given region cannot contain more output neurons than eigenvalues retained from this region. In the following, we show how the eigenvalues can be calculated analytically for certain covariance functions.
Calculation of eigenvalues for negative exponential covariance functions
We model the covariance between receptors as a negative exponential function. The covariance matrix is then calculated as a function of distance between pairs of receptors (see Figure 1C). We will first go through the calculation of eigenvalues for the baseline region R_{1}, and then continue with the derivation for R_{2}, which exhibits different receptor density, receptor activation, or both.
For region R_{1} the covariance between receptors is calculated as
where x_{i} and x_{j} are the locations of the ith and jth receptors, and γ is the decay constant.
The corresponding eigenvalues for an exponential covariance function in the continuous domain can be calculated analytically. The eigenvalue–eigenvector problem is expressed as an integral homogeneous equation, such that for R_{1} we get
where ${\varphi}_{k}(x)$ is the kth eigenfunction and ${\lambda}_{k}$ its corresponding eigenvalue. The domain length $L$ is the input region size for one of the dimensions.
It can be shown that solutions to this problem can be related to the Laplacian operator (see Appendix 2 and Appendix 3 for proofs), such that
where ${\mu}_{k}$ are the eigenvalues of the Laplacian operator.
The general solution for the Laplacian eigenvalue problem for a 2D rectangle with Dirichlet boundary conditions is (Strauss, 2007)
where L_{1} and L_{2} are the sizes of the domain for each dimension.
To calculate the covariance for R_{2}, we need to take into account the potentially different receptor density and response variance for this region. Denoting the ratio of the response variances between both regions by $a$, and the ratio of receptor densities by $d$, the covariance for R_{2} can be expressed as
It can be seen that $a$ scales the overall covariance matrix (see also Figure 1C), while $d$ changes the spatial extent of the correlations and thereby implicitly accounts for the different receptor density.
The calculation of the eigenvalues for R_{2} proceeds analogously. The eigenvalue–eigenvector problem is given as
Since the receptor density ratio $d$ causes an implicit stretching of the space for the higher density region, the region length $L$ needs to be adjusted in order to keep the effective size of the region constant. In 2D, each axis is therefore scaled by $\sqrt{d}$, resulting in an upper integration limit of $L\sqrt{d}$.
Allocation in the bottleneck
Given a sensory system with limited representational capacity, different regions may be allocated different amounts of resources. Here, we calculate the allocations over different bottleneck widths for two regions, while the extension to multiple regions is given in Appendix 5. In the following, we assume 2D square regions of equal size for ease of analysis (see Appendix 6 for the equivalent solution in 1D). A single variable $L$ is therefore used to denote the lengths of the squares. Following Equation 12, the eigenvalues for regions R_{1} and R_{2} can now be calculated as
where $l,m$ and $n,o\in \mathbb{N}$ enumerate different eigenvalues for regions R_{1} and R_{2}, respectively.
In order to calculate how many output neurons are allocated to R_{1} and R_{2} for different bottleneck widths, we will need to establish an ordering of the eigenvalues, such that for each pair $(l,m)$ we can determine the sorted rank of the eigenvalues. In contrast to the 1D case (see Appendix 6), there is no natural ordering of the eigenvalues in two dimensions; however, a close approximation can be obtained by calculating the number of lattice points enclosed by a quarter circle with radius $p={l}^{2}+{m}^{2}$ (see Appendix 4 for full details). Denoting this function as $N(p)$ and setting ${p}^{(R1)}={l}^{2}+{m}^{2}$ and ${p}^{(R2)}={n}^{2}+{o}^{2}$, we can then calculate the number of eigenvalues allocated to R_{1} as a function of the number of neurons allocated to R_{2}, by setting ${\lambda}^{(R1)}={\lambda}^{(R2)}$ and solving for ${p}^{(R2)}$. This yields
As we allocate more neurons to region R_{1}, the ratio $\frac{N({p}^{(R1))}}{N({p}^{(R2)})}$ simplifies to ${lim}_{{R}_{1}\to \mathrm{\infty}}\frac{N({p}^{(R1)})}{N({p}^{(R2)})}=a\sqrt{d}$. The fraction of neurons allocated to each region therefore depends on the size of the bottleneck and converges to $\frac{1}{1+a\sqrt{d}}$ and $\frac{a\sqrt{d}}{1+a\sqrt{d}}$ for R_{1} and R_{2}, respectively.
Alternative covariance functions
To test generalization to other covariance functions that decrease monotonically with receptor distance, we tested a number of functions from the Matérn class, in which the parameter ν controls the smoothness of the function. The negative exponential covariance function we employed in previous sections is equivalent to a Matérn function with $\nu =1/2$. Larger values of ν lead to progressively higher correlations for smaller distances and lower correlations for larger distances. Specifically, we tested a Matérn function with $\nu =3/2$:
and with $\nu =5/2$:
For simplicity and to obtain numerically stable solutions, all calculations were performed for 1D regions only. For all cases tests, the baseline region contained 500 receptors, while the number of receptors for the other region was determined by the density ratio. Eigenvalues were calculated from the covariance matrix using the eigh method of the numpy Python package (Harris et al., 2006). Comparing the numerically obtained allocation for the negative exponential covariance function with its analytical solutions showed a close match for most bottleneck widths. However, numerical errors increased for wider bottlenecks, where eigenvalues became very small and their decay flattened out, affecting the sorting process. For the analyses here, we therefore restricted ourselves to bottleneck widths where a close match between the numerical and analytical solutions could be obtained. This range was sufficient to demonstrate allocation at narrow bottlenecks and estimate the convergence point for larger ones.
Calculations for starnosed mole
The 11 rays were approximated as 2D square regions with areas set to their reported sizes (Sawyer and Catania, 2016). Receptor densities for each ray were calculated as the sensory fibre innervation per mm^{2} (Catania and Kaas, 1997). Approximations of receptor activation on each ray were calculated from empirical data of prey foraging interactions recorded by Catania and Kaas, 1997. Contact statistics were converted to receptor activation probabilities with receptors following a Bernoulli distribution. Finally, activation variance was calculated as the variance of the Bernoulli distribution (see Appendix 1). The decay rate γ of the negative exponential covariance function was determined for each ray using a model of the typical extent of receptor activation during interaction with prey stimuli of varying sizes. Each ray interacts with varying prey sizes at different frequencies. For example, ray 11 is typically contacted by smaller stimuli more often than other rays. A 2D model of the rays was used to simulate average responses to each stimulus size. Each model ray was tiled with receptors, and circular stimuli of different sizes were then randomly placed over the ray. The radii and frequencies of each stimulus size were based on the prey model (Catania and Kaas, 1997). A ray receptor was marked as active if its coordinate position was within the bounds of the stimuli. Response covariance between receptors was then calculated and an exponential function was fit to find the γ decay parameter. See Table 1 for the full set of parameters. The code implementing the receptor model is available on GitHub (Edmondson, 2021). To determine allocations, eigenvalues associated with each ray were calculated analytically, resulting in allocations for each ray at all bottleneck widths. Three models were compared: first, a ‘densityonly’ model, which includes accurate receptor density ratios, but receptor activation ratio remains uniform across all rays; second, an ‘activationonly’ model, which includes heterogeneous receptor activation ratios, but uniform receptor density ratios across all rays; finally, the ‘full model’ combines both accurate densities and receptor activation ratios. Model allocations for each ray were compared to the cortical allocation empirical data from Catania and Kaas, 1997. As the bottleneck size for the starnosed mole is unknown, the rootmeansquare error (RMSE) was calculated for each model at all bottleneck widths. The bottleneck resulting in the lowest error was then selected for each. Allocations for rays at each bottleneck can be found in Figure 6—figure supplement 1A. For the ‘activationonly’ and ‘full’ models, the lowest RMSE values were for bottleneck widths of between 37 and 45%; for the ‘densityonly’ model, the RMSE was similar over all bottlenecks widths (see Figure 6—figure supplement 1C). We also tested a ‘baseline model’ where densities and activation were randomly selected for each ray within the possible range of parameters. The aim was to determine how much explained variance within the cortical allocations was due to the selection of the bestfitting bottleneck and how much was due to the specific density and activation parameters. A total of 20 random models were run, and the average ${R}^{2}$ was –0.09 (see Figure 6—figure supplement 1D).
Appendix 1
Stimulus statistics and response variance
Decorrelation works on secondorder statistics and therefore stimulus statistics would only be taken into account by the model if they affect the covariance matrix. One way this can happen is through the extent of the spatial correlations (parameter γ in the covariance function). For example, in touch the size distribution of stimuli that would typically make contact with a given skin region might differ, leading to a different correlational structure. While we calculate allocations for different values of γ, we keep this value fixed across different input regions, for simplicity.
A more common case is that of receptors in one region being more active than receptors in another region. For example, in touch, the fingertips make contact with objects much more frequently than does the palm. An increased probability of making contact would translate into higher receptor response rates. In turn, higher response rates imply higher response variance, which would be directly reflected in the covariance matrix as a multiplicative scaling of the covariance function. For example, assuming that each receptor follows a simple Bernoulli distribution (either on or off, with a probability of p being on), then the response variance can be calculated as $p(1p)=p{p}^{2}$. Assuming that the likelihood of any receptor being active is generally low, the variance scales almost linearly with receptor activation. Differences in activation between two regions are represented by the activation ratio $a$ throughout the article.
Appendix 2
Relationship between PCA and Laplacian eigenvalue problem
Rationale
Let $\mathrm{\Omega}$ be a region with a density of receptors ρ. In a 1D region $\mathrm{\Omega}=[0,L]$, the density can be expressed as $\rho =\frac{N}{L}$ or number of receptors per unit length. Assuming an exponential decay of correlations, the covariance between receptors $i$ and $j$ is
where $\mathrm{\Delta}x=1/\rho $ is the distance between receptors. Subsampling the space by taking a fraction $N/d$ of the original receptors, $d>1$, the covariance for positions $i,j$, becomes
Therefore, we encode the changes in receptor density in a scaling of the exponential decay rate. For a given distribution of receptors, there is an induced partition of the interval $[0,L]$, therefore, for a fixed $x=i\mathrm{\Delta}x$, the covariance in the $jth$ bin is approximately equal to the area of the exponential covered in that bin:
Summing over all the bins, we arrive at the PCA problem:
The continuum limit is found formally as $\mathrm{\Delta}x\to 0$.
Derivation
In order to find the optimal assignment for a given receptor density, we are interested in solutions to the following equation which can be seen as a continuous version of the traditional PCA problem with an exponentially decaying covariance matrix:
where γ is the decay rate. We are interested in solutions $\varphi \in {C}^{2}(\mathbb{R})$, that is, twice differentiable solutions that satisfy appropriate boundary conditions.
Theorem 1 If $\varphi $ is a solution of Equation 24, then it is an eigenfunction of the Laplacian operator with eigenvalues:
That is, in one dimension, solutions $\varphi $ satisfy
Proof: Differentiating Equation 24 twice using the Leibniz rule we obtain
The second term on the righthand side can be replaced using Equation 24 obtaining the desired result:
The previous is a sufficient condition on the solutions to Equation 24. A necessary condition is given in the following theorem:
Theorem 2 A solution to Equation 26 is also solution to Equation 24 if it satisfies the following boundary conditions:
Proof: Assume $\varphi $ is a solution to Equation 26. We proceed by convolving Equation 26 on both sides with the kernel ${e}^{\mathrm{}\gamma \mathit{}x}$:
Integrating by parts twice we get
Using Equation 25 and Equation 32, we obtain
Repeating the procedure with the kernel ${e}^{\gamma \mathit{}x}$ in the interval $\mathrm{[}x\mathrm{,}L\mathrm{]}$, yields
Adding Equation 35 and Equation 36 we recover Equation 24, which finalizes the proof.
Scaling
In some instances of our problem, the exponential covariance will be scaled by the activation ratio, a. In general, the same reasoning applies to any linear combination of solutions; therefore, our results extend to that case. In particular, we have the following result:
Theorem 3 The eigenvalues of the scaled covariance matrix, ${C}^{\mathrm{\prime}}\mathit{}\mathrm{(}x\mathrm{,}y\mathrm{)}\mathrm{=}a\mathit{}C\mathit{}\mathrm{(}x\mathrm{,}y\mathrm{)}$ are
where λ is an eigenvalue of the original problem.
Proof: Let $\varphi \mathit{}\mathrm{(}x\mathrm{)}$ be a solution of Equation 24. By linearity of the integral we have
Appendix 3
Solutions
In the previous section, we saw that solutions of the PCA problem (Equation 24) and the Laplacian eigenvalue problem (Equation 26) coincide if boundary conditions specified in Equation 32 and Equation 31 are met. Here, we show how these solutions relate to solutions of the boundary value problem of Equation 26 with $\varphi (0)=\varphi (L)=0$, which correspond to the eigenmodes of an idealized vibrating string fixed at the extremes. Such modes are considerably simpler than the exact ones and, as we show, are sufficient for our analysis.
Eigenmodes of a vibrating string
Solving Equation 26, for $\kappa =\sqrt{\mu},\mu >0$, using standard methods (see Simmons, 2016, p. 355 onwards), we find the general expression for the eigenfunctions:
with coefficients to be determined using the boundary conditions. The first boundary condition ($\varphi (0)=0$) implies that $B=0$. The second boundary condition ($\varphi (L)=0$) gives the equation $\mathrm{sin}(\kappa L)=0$, which is satisfied for $\kappa L=n\pi $, or
where $n=1,2\mathrm{\dots}$ is the index of the eigenvalue. The value of $A$ is arbitrary and is usually selected so the solution is normalized.
Exact eigenvalues
In order to find the exact analytical eigenvalues of Equation 24, we again assume $\mu >0$ and a solution of the form given in Equation 39. Using the first boundary condition (Equation 31, Equation 32) we get the following relationship:
and with the second boundary condition, we obtain
replacing Equation 41, we find the transcendental equation
whose solutions lead to the exact eigenvalues of the continuous PCA problem.
Relationship between exact and approximate eigenvalues
The main results presented in this study rely on the ordering of the eigenvalues only, rather than their precise magnitude. Any divergence between the exact and approximate eigenvalues is therefore relevant only if it affects this ordering. Furthermore, since allocation of eigenvalues to regions proceeds cumulatively as more eigenvalues are added, localized disturbances in a few eigenvalues will lead to only minor errors that are quickly corrected as more eigenvalues are added (since the correct allocation of eigenvalues to regions will be restored). In the following, we demonstrate that, comparing the ordering of the exact eigenvalues where the boundary conditions are derived from the integral equation (Equation 43) with that of the approximate ones (Equation 40), this order is altered only in a localized fashion that does not affect the analytical results of this article. To understand the changes in the ordering, consider the functions
and
whose intersection gives the solutions to Equation 43. The function $g$ has two hyperboliclike branches, one to each side of the singularity $\kappa =\gamma $. We can distinguish two cases for the said intersection:
For $\kappa \gg \gamma $, $g$ approaches 0 and there is clearly only one intersection with $f$, which happens to be increasingly close to the approximate eigenvalues. Moreover, the function is also monotonic in this regime. This implies that the order is preserved.
For $\kappa \approx \gamma $, that is, close to the singularity, an additional root is inserted as $f$ cuts both branches of $g$ and the order is disturbed.
Altogether, this implies that, for the cases studied in this article that consider $\gamma <\pi$, only the first two eigenvalues might differ from the order given by the approximate solution. In general, in the intervals $(n\pi /2,(n+1)\pi /2)$ there will be one root for both equations except in the one closest to the singularity which will contain two. Moreover, from Equation 25 we see that, for two regions with densities $\gamma$ and $d\gamma $ and Laplacian eigenvalues ${\kappa}_{1}$ and ${\kappa}_{2}$, $\lambda}_{1}>{\lambda}_{2$ implies $\kappa}_{2}d{\kappa}_{1}>(1d)d{\gamma}^{2$. This inequality is satisfied by the exact eigenvalues, again, in all intervals except the one in which the singularity lies (as can be confirmed by expanding $fg=0$ around the approximate eigenvalues and solving for κ). We have compared the exact and approximate ordering for two regions for a number of (manageable) cases and have found the relationship described above to hold.
Appendix 4
Ordering in the 2D square case
For rectangle regions, the ordering can be solved by calculating the number of lattice points enclosed by a quarter ellipse (Strauss, 2007). Here, we use square regions and therefore the solution is the number of points enclosed in a quarter circle. The Gauss circle problem determines the number of integer lattice points which lie within a circle with radius $p\ge 0$, with its centre at the origin:
The number of lattice points within the circle is approximately equal to its area. The number of points within a square region can be approximated by calculating the area of the upper quarter of the circle (positive values only):
The number of eigenvalues in each region is therefore the area of the intersection of the circle and region.
For each region we calculate the number of lattice points enclosed by a quarter circle; for R_{1} we set the radius equal to ${l}^{2}+{m}^{2}$ and for R_{2} to ${n}^{2}+{o}^{2}$ (the solution of Equation 18), where $l,m,n,o=1,2..$. This number is approximated as the area of the quarter circle. For values of ${l}^{2}+{m}^{2}$ or ${n}^{2}+{o}^{2}$ greater than the total number of eigenvalues in each dimension ($L\sqrt{d}$), the approximation diverges from the true ordering as the area of the quarter circle becomes larger than the area of the lattice (region). In this case, a correction term is added:
where $p$ is either ${l}^{2}+{m}^{2}$ or ${n}^{2}+{o}^{2}$ for R_{1} and R_{2} respectively, and k is the total number of eigenvalues in each region. Assuming a region size of $L$, with each receptor spaced one unit apart, $k={L}^{2}$ for R_{1}, and $k={L}^{2}d$ for R_{2}.
Appendix 5
Allocation for multiple regions in 2D
For more than two regions, density and activation ratios for each additional region are calculated relative to a chosen baseline region. This leads to the following general form for calculation of the eigenvalues of any region x:
where $a$ is the region activation scaling ratio, d_{b} is the density of the baseline region, and d_{x} the density of region $x.l,m\in \mathbb{N}$ enumerate different eigenvalues for region $x$.
Appendix 6
Allocation for the 1D case
The 1D case for changes in density has previously been addressed in Edmondson et al., 2019. Here, we extend this to include changes in activation. For two regions R_{1} and R_{2}, we can calculate their eigenvalues as
where d is the ratio of higher and lower densities, $a$ is the ratio of receptor activation, L is the length of the region, and $l,m\in \mathbb{N}$ denote successive eigenvalues for regions R_{1} and R_{2}, respectively.
To calculate how many output neurons are allocated to region R_{2} as a function of the number of neurons allocated to region R_{1}, we set ${\lambda}_{l}^{(R1)}={\lambda}_{m}^{(R2)}$ and solve for $m$. This yields
It becomes apparent that for $l=1$, that is, the first neuron allocated to region R_{1}, we have already assigned $m={\displaystyle \frac{\sqrt{ad({\pi}^{2}+{L}^{2}{\gamma}^{2}){L}^{2}{\gamma}^{2}}}{\pi}}$ neurons to region R_{2}. As we allocate more neurons to region R_{1}, the ratio $\frac{m}{l}$ simplifies to: ${lim}_{l\to \mathrm{\infty}}\frac{m}{l}=\sqrt{ad}$. The fraction of neurons allocated to each region therefore depends on the size of the bottleneck and converges to $\frac{1}{1+\sqrt{ad}}$ and $\frac{\sqrt{ad}}{1+\sqrt{ad}}$ for R_{1} and R_{2}, respectively.
Data availability
No data was generated for this study. All equations and model parameters are included in the manuscript and supporting files. Additionally, code implementing the model equations has been made available on Github at https://github.com/lauraredmondson/expansion_contraction_sensory_bottlenecks, (copy archived at swh:1:rev:dd6de7c05ae9443d034361b042b053b4f40717f5) (see also Methods section in manuscript).
References

Towards a theory of early visual processingNeural Computation 2:308–320.https://doi.org/10.1162/neco.1990.2.3.308

What does the retina know about natural scenes?Neural Computation 4:196–210.https://doi.org/10.1162/neco.1992.4.2.196

Some informational aspects of visual perceptionPsychological Review 61:183–193.https://doi.org/10.1037/h0054663

Possible principles underlying the transformation of sensory messagesSensory Communication 1:217–234.https://doi.org/10.7551/mitpress/9780262518420.003.0013

The “independent components” of natural scenes are edge filtersVision Research 37:3327–3338.https://doi.org/10.1016/s00426989(97)001211

Optimal shortterm population coding: when fisher information failsNeural Computation 14:2317–2351.https://doi.org/10.1162/08997660260293247

ConferenceUnsupervised learning models of primary cortical receptive fields and receptive field plasticityAdvances in Neural Information Processing Systems.

How do efficient coding strategies depend on origins of noise in neural circuits?PLOS Computational Biology 12:e1005150.https://doi.org/10.1371/journal.pcbi.1005150

Organization of the somatosensory cortex of the starnosed moleThe Journal of Comparative Neurology 351:549–567.https://doi.org/10.1002/cne.903510406

Somatosensory fovea in the starnosed moleJournal of Comparative Neurology 387:215–233.

Tactile foveation in the starnosed moleBrain, Behavior and Evolution 63:1–12.https://doi.org/10.1159/000073755

All in the family  touch versus olfaction in molesAnatomical Record 303:65–76.https://doi.org/10.1002/ar.24057

Tactile innervation densities across the whole bodyJournal of Neurophysiology 124:1229–1240.https://doi.org/10.1152/jn.00313.2020

Topography of ganglion cells in human retinaThe Journal of Comparative Neurology 300:5–25.https://doi.org/10.1002/cne.903000103

Human photoreceptor topographyThe Journal of Comparative Neurology 292:497–523.https://doi.org/10.1002/cne.902920402

BookRelations between the Statistical Regularities of Natural Images and the Response Properties of the Early Visual SystemJapanese Cognitive Science Society, SIG P&P.

Efficient coding of spatial information in the primate retinaThe Journal of Neuroscience 32:16256–16264.https://doi.org/10.1523/JNEUROSCI.403612.2012

A simple model of optimal population coding for sensory systemsPLOS Computational Biology 10:e1003761.https://doi.org/10.1371/journal.pcbi.1003761

Aging of the optic nerveArchives of Ophthalmology 98:2053–2058.https://doi.org/10.1001/archopht.1980.01020040905024

ConferenceNonlinear scaling of resource allocation in sensory bottlenecksAdvances in Neural Information Processing Systems. pp. 7545–7554.

BookFunctions of the hand in primatesIn: Fragaszy DM, editors. The Evolution of the Primate Hand. Springer. pp. 313–344.

Implicit encoding of prior probabilities in optimal neural populationsAdvances in Neural Information Processing Systems 23:658–666.

Efficient sensory encoding and bayesian inference with heterogeneous neural populationsNeural Computation 26:2103–2134.https://doi.org/10.1162/NECO_a_00638

Tactile coactivationinduced changes in spatial discrimination performanceThe Journal of Neuroscience 20:1597–1604.https://doi.org/10.1523/JNEUROSCI.200401597.2000

Analysis of hand contact areas and interaction capabilities during manipulation and explorationIEEE Transactions on Haptics 7:415–429.https://doi.org/10.1109/TOH.2014.2321395

Sensory signals in neural populations underlying tactile perception and manipulationAnnual Review of Neuroscience 27:53–77.https://doi.org/10.1146/annurev.neuro.26.041002.131032

Formation of cortical fields on a reduced cortical sheetThe Journal of Neuroscience 19:9939–9952.

Independent component analysis: algorithms and applicationsNeural Networks 13:411–430.https://doi.org/10.1016/s08936080(00)000265

Histomorphometry of the human optic nerveInvestigative Ophthalmology & Visual Science 31:736–744.

ConferenceEfficient coding of natural images with a population of noisy linearnonlinear neuronsAdvances in Neural Information Processing Systems. pp. 999–1007.

Predictability and redundancy of natural imagesJournal of the Optical Society of America. A, Optics and Image Science 4:2395–2400.https://doi.org/10.1364/josaa.4.002395

Cellular and circuit mechanisms maintain low spike covariability and enhance population coding in somatosensory cortexFrontiers in Computational Neuroscience 6:7.https://doi.org/10.3389/fncom.2012.00007

Human finger somatotopy in areas 3b, 1, and 2: a 7t fmri study using a natural stimulusHuman Brain Mapping 35:213–226.https://doi.org/10.1002/hbm.22172

Representations of the body surface in postcentral parietal cortex of macaca fascicularisThe Journal of Comparative Neurology 192:611–643.https://doi.org/10.1002/cne.901920402

Waveletlike receptive fields emerge from a network that learns sparse codes for natural imagesNature 381:607–609.

Sparse coding of sensory inputsCurrent Opinion in Neurobiology 14:481–487.https://doi.org/10.1016/j.conb.2004.07.007

Diminutive digits discern delicate details: fingertip size and the sex difference in tactile spatial acuityThe Journal of Neuroscience 29:15756–15761.https://doi.org/10.1523/JNEUROSCI.368409.2009

Decorrelation and efficient coding by retinal ganglion cellsNature Neuroscience 15:628–635.https://doi.org/10.1038/nn.3064

Human fetal optic nerve: overproduction and elimination of retinal axons during developmentThe Journal of Comparative Neurology 238:92–100.https://doi.org/10.1002/cne.902380108

On the stationary state of kohonen’s selforganizing sensory mappingBiological Cybernetics 54:99–106.https://doi.org/10.1007/BF00320480

Effects of age on nerve fibers in the rhesus monkey optic nerveThe Journal of Comparative Neurology 429:541–553.https://doi.org/10.1002/10969861(20010122)429:4<541::aidcne3>3.0.co;25

Somatosensory organ topography across the star of the starnosed mole (condylura cristata)The Journal of Comparative Neurology 524:917–929.https://doi.org/10.1002/cne.23943

The development of the optic nerve in rodentsAustralian and New Zealand Journal of Ophthalmology 13:135–145.https://doi.org/10.1111/j.14429071.1985.tb00414.x

The variable discharge of cortical neurons: implications for connectivity, computation, and information codingThe Journal of Neuroscience 18:3870–3896.https://doi.org/10.1523/JNEUROSCI.181003870.1998

Natural image statistics and neural representationAnnual Review of Neuroscience 24:1193–1216.https://doi.org/10.1146/annurev.neuro.24.1.1193

BookThe Topographic Unsupervised Learning of Natural Sounds in the Auditory Cortex, Advances in Neural Information Processing SystemsCurran Associates, Inc.

ConferenceDeep learning and the information bottleneck principle2015 IEEE Information Theory Workshop (ITW).https://doi.org/10.1109/ITW.2015.7133169

Comparative analysis of meissner’s corpuscles in the fingertips of primatesJournal of Anatomy 227:72–80.https://doi.org/10.1111/joa.12327

A bayesian observer model constrained by efficient coding can explain “antibayesian” perceptsNature Neuroscience 18:1509–1517.https://doi.org/10.1038/nn.4105

Experienceinduced plasticity of cutaneous maps in the primary somatosensory cortex of adult monkeys and ratsJournal of Physiology, Paris 90:277–287.https://doi.org/10.1016/s09284257(97)814386

Fisher and shannon information in finite neural populationsNeural Computation 24:1740–1780.https://doi.org/10.1162/NECO_a_00292

Efficient sensory coding of multidimensional stimuliPLOS Computational Biology 16:e1008146.https://doi.org/10.1371/journal.pcbi.1008146
Decision letter

Stephanie E PalmerReviewing Editor; University of Chicago, United States

Michael J FrankSenior Editor; Brown University, United States

Memming ParkReviewer
Our editorial process produces two outputs: (i) public reviews designed to be posted alongside the preprint for the benefit of readers; (ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.
Decision letter after peer review:
Thank you for submitting your article "Expansion and contraction of resource allocation in sensory bottlenecks" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Michael Frank as the Senior Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Memming Park (Reviewer #2).
The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.
Essential revisions:
While the reviewers found the work interesting and engaging, they agreed that some major revisions would greatly strengthen the manuscript. All have consulted with each other and the reviewing editor to compile these major prompts. Please also see the recommended, slightly more minor, revisions below as well as the public reviews provided by each reviewer.
1) Clarifying assumptions made
The paper should explicitly state the assumptions of the model, and discuss their consequences much earlier, and in more depth.
Currently, the Introduction and first paragraphs of the results equate efficient coding with decorrelation (e.g. paragraphs 3 and 4 of the introduction and opening sentences of the Results). The authors address the role of noise only briefly in the Discussion. This could be expanded. "Efficient coding" is a broad conceptual framework, and not a single algorithm such as decorrelation or even redundancy reduction. Currently, the manuscript makes the impression that the two are the same.
Other, unstated assumptions of the model (e.g. constraints on the total variance of the bottleneck, and the assumption that output activity is equalized across all neurons) should be made more explicit. How are those assumptions connected to biology? Is there evidence that cortical neurons in the somatosensory system are equally active? Even if there is not – this should be addressed explicitly.
Also, the analytical results are derived with a number of assumptions about the correlation structure of the stimuli and the structure of the bottleneck. Please clarify these assumptions as well. Importantly, please more clearly define the cost function. Is it simply the squared error loss between receptors and representation?
2) Putting the math more frontandcenter
The fact that all results are derived analytically is a clear strength of the manuscript, but the authors have chosen to refrain from presenting any math in the Results section. This is very unfortunate, because this makes the formal arguments more fuzzy than need be. Please find a compromise between a technical and very highlevel paper, and include the essential formal steps of the arguments in the main text.
3) Expanding the simulation examples provided, increasing the scope of the results
Along these lines, and looping back to prompt 1 and clarifying the assumptions in the model, please include one or two examples with more complicated input statistics or system architecture. For example, it is frequently the case that a narrow sensory bottleneck is followed by an expansion (e.g. retina > optic nerve > V1) – could the theory say something about resource allocation in such scenario?
In another example – what if the correlation structure is more complex? E.g. in the auditory nerve correlations between sensors can spatially nonlocal, due to the presence of harmonic sounds (e.g. Terashima and Okada, NeurIPS 2012). These are just suggestions of more complex scenarios which the authors could consider. These additional results do not have to be obtained analytically, and could be optimized through simulation. Such additions could greatly strengthen the current paper, broaden its scope, and increase its impact.
To show that those results are robust for small variations in the stimulus distribution, please provide numerical experiments with a different covariance function. Perhaps try a Matérn covariance function with nu=3/2 which is slightly smoother than the OU covariance. The experiments are numerically difficult, so this prompt only requires showing a case in a regime where things are numerically stable (this should be possible since the authors have done something similar in the NeurIPS 2019 paper). We would like to see that the presented results are not qualitatively very different.
4) Provide more details on the starnosed mole data and predictions from the model
Please report in more detail how the predictions are generated for the data from the starnosed mole. This could be done in the main text. For example, one could plot predictions for different sizes of the bottleneck. As a field, we should not be afraid of showing the range of predictions for different parameter values (e.g. for different bottleneck sizes), and demonstrating where they match the data, and where they do not – this does not necessarily weaken the theory. In this particular case – the theory could inform us about notyetmeasured properties of the system, such as the size of the bottleneck. Moreover, this will be more explicit way to report these results.
Furthermore, to model the stimulus distribution for starnosed mole case, the authors simulated a noisy sensory system to convert contact statistics (assuming a circular object) to receptor activation statistics. Starnose mole parameters (Appendix table) are in the code provided, but it doesn't seem like code for the conversion from the contact statistics to the receptor activation statistics is included. Please share this code, since the text in the manuscript is missing much of the necessary detail for the reproduction of this work.
Recommendations for the authors:
A) page 2 and Discussion: The authors discuss the use of Fisher Information and other measures for their goal. The use of Fisher Information for optimal population coding can be dangerous especially when taking N to infinity, as then the convergence conditions of 1/FI to the MSE are not met (see Berens et al. 2012). The crucial quantity for convergence is the SNR. Thus, the stated reason is not a reason not to use Fisher Information, although it's fine for the authors to choose not to use it. The logic should be improved in the text, though.
B) The claim that full model has better MSE is not very convincing. The difference between using density only and receptor only makes sense. But the full model has an additional degree of freedom, so naturally it should be better then either if tested on the same data that it was fit to. Unfortunately, the dataset the authors have is only 11 points, so there's no easy way of splitting them to show that this is effect is real. Hence, it's hard to make a strong claim about the tiny change in MSE. Please weaken the claim. Also, Figure 5D scatter plot for the full model is very similar to that of the receptor usage model (I ran your code). Please provide the corresponding plots explicitly in the supplement instead of the code only for transparency.
C) The model makes no distinction between the density of sensors in different regions and the exponential decay of the covariance functions. This is not a problem from the perspective of theoretical predictions, but perhaps it could be explained better (maybe with additional illustrations?).
D) It wasn't clear what the authors mean by "secondorder". One assumes this means that the input statistics is Gaussian, but perhaps this is wrong. Did the authors mean that they are only analyzing the covariance functions, i.e. secondorder statistics? Or was it that the neurons' response entropy were quantified by their variance? Please clarify.
E) In the text, 'activation' and 'activation ratio' are used interchangeably. It would be better to stick with the ratio since it's really only the relative activation of the second population that matters.
E) The 'σ' as 'decay constant' should be explicitly mentioned in the main text in addition to the methods section. At first it seems to be the lengthscale, but it turned out to be inverse lengthscale which was confusing, since σ is often used to denote the inverse lengthscale. (Figures1C, 2C)
F) Please add the following citations to other theoretical work, which considered efficient encoding of somatosensory stimuli and the allocation of cortical space :
a) Bhand, M., Mudur, R., Suresh, B., Saxe, A., and Ng, A. (2011). Unsupervised learning models of primary cortical receptive fields and receptive field plasticity. Advances in neural information processing systems, 24, 19711979.
b) Plumbley, M. D. (1999). Do cortical maps adapt to optimize information density?. Network: Computation in Neural Systems, 10(1), 41.
G) page 2: In the discussion about bottlenecks in the retina, recent work by Lindsey et al. (2019, arxiv 1901:00945) should be cited.
H) page 5: "Decorrelation has been found to be a…" > Please add citations to this claim
Figures:
I) Figure 1D: This figure is quite confusing. The decaying covariance is plotted for a 1D case, the axes are unlabelled, and it takes a while to figure out that these two block matrices do form a single covariance matrix (eventhough they correspond to different regions). Please revise this figure.
J) Figure 2D: small boxes with receptor location is misleading. It shows increased number of receptors when it's the activation ratio that's different. Please revise this figure.
K) Figure 3: A/Blabels seem to be too large; the meaning of the boxes of different colors and the lines was unclear to me. What determines e.g. the line length? The two panels in A should be clearly labeled with what quantity is varied.
L) Figure 4: What are the dashed lines? Is it correct that 4B is a simple restatement of the fact that the lines in 3A left are more spread for smaller bottlenecks? Interpreting this as a sign of more plasticity is an overstatement.
M) Figure 5: Please normalize the RMSE in Figure 5D with respect to the power of the signal. Otherwise the numbers aren't that meaningful, unless some clarifying point is missing. (Or, why not just report r^2?) Please also provide RMSE curves for all different bottleneck sizes and discuss how plausible the optimal values are. Given the right panel, it looks like the RMSE is mainly driven by ray 11, the outlier. How stable is the inferred conclusion if ray 11 is left out? Also, the authors seem to use RMSE to compare percentages. Could the authors comment on the suitability for that loss function?
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "Expansion and contraction of resource allocation in sensory bottlenecks" for further consideration by eLife. Your revised article has been evaluated by Michael Frank (Senior Editor) and a Reviewing Editor.
The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:
1. Regarding lownoise regime, it only appears in the discussion and it does not warn the readers that the numerous tiny tail eigenvalues are very sensitive to noise. This affects the robustness of the analysis and should be clearly stated in the text and appear after the first interpretation of the bottleneck results (Figure 2 and 2supplement 4).
2. It is not stated clearly that decorrelation is an information maximizing transform only in absence of noise, which is a key assumption of the model. In the Methods (lines 493494) it is written that adding the noise does not affect the eigendecomposition (and, in consequence, resource allocation), but if that is the case – it is an important result which should be highlighted and explored further – a sensory coding transformation which is independent of the noise level seems like an important finding. Please revise the text around this point.
3. It is misleading to claim that the analysis "maximizes information" without qualification in the abstract since this work only deals with variances. Please rephrase.
4. In Line 66, please add a qualifier to the statement that Fisher Information can be used to approximate Mutual Information. This is only true under certain assumptions in certain cases and specifically for optimal coding, the relationship is not straightforward (Berens et al. 2012, PNAS; Bethge et al. Neural Comp 2002).
5. lines 8788: "We consider linear models which maximize information by decorrelating the sensory inputs, or, equivalently, minimize the meansquared reconstruction error […]" – in a general case these two goals are not equivalent. The following citations referencing work by Doi and colleagues do not support this claim. This should be revised.
6. In Figure 1, it is hard to parse which colors are kept to mean the same thing across panels and which change meaning. An additional schematic of the model in this figure would help with clarity of the presentation.
7. In Equation 1, is P used at all? If not simplify. The explanation of P sounds complicated without really saying anything specific. Give an example of constraints that could be incorporated.
8. The assumption that the covariance between regions is zero is quite a strong one and seems central to the paper. This should be justified or at least discussed how realistic it is. In the finger tip example, one of course typically touches objects not with one finger but with multiple ones.
9. The brackets in Figure 3 were found to be very confusing, please revise.
10. The covariance functions in Figure 5 are not very different, it is not surprising that these result in very similar outcomes. Comment on this or consider revising.
https://doi.org/10.7554/eLife.70777.sa1Author response
Essential revisions:
While the reviewers found the work interesting and engaging, they agreed that some major revisions would greatly strengthen the manuscript. All have consulted with each other and the reviewing editor to compile these prompts.
We would like to thank the reviewers and editors for their clear and insightful feedback. We have addressed the comments in a revised version. In brief, the major changes are as follows:
We have reorganised the manuscript as suggested, and clarified the model and its assumptions.Specifically, we now make clear that our results are not restricted to a particular decorrelation model but cover a wider class of models. This broadens the applicability of the findings, but also clarifies the essential assumptions the model rests on.
We have included numerical results for additional covariance functions; these demonstrate that theresults hold qualitatively and are robust to small changes in the covariance structure, but also shed some light on how properties of the covariance structure affect the resulting allocation, which we believe will be insightful to the reader.
Additional results have been provided for the starnosed mole example and the language has beentoned down. Throughout the manuscript, many minor improvements have been made and some unwieldy sections of the mathematical appendix have been reorganised. Visualisation has also been improved.
Overall, we believe that the manuscript is much improved and think that the reviewers will agree.
1) Clarifying assumptions made
The paper should explicitly state the assumptions of the model, and discuss their consequences much earlier, and in more depth.
Currently, the Introduction and first paragraphs of the results equate efficient coding with decorrelation (e.g. paragraphs 3 and 4 of the introduction and opening sentences of the Results). The authors address the role of noise only briefly in the Discussion. This could be expanded. "Efficient coding" is a broad conceptual framework, and not a single algorithm such as decorrelation or even redundancy reduction. Currently, the manuscript makes the impression that the two are the same.
We have clarified the broader meaning of ‘efficient coding’ in the manuscript and now explain in more detail how decorrelation/redundancy reduction fits into this framework and why we have chosen to focus on it for this paper. We have also updated our discussion on the role of noise (specifically in the light of model assumptions, see next comment).
Other, unstated assumptions of the model (e.g. constraints on the total variance of the bottleneck, and the assumption that output activity is equalized across all neurons) should be made more explicit. How are those assumptions connected to biology? Is there evidence that cortical neurons in the somatosensory system are equally active? Even if there is not – this should be addressed explicitly.
Also, the analytical results are derived with a number of assumptions about the correlation structure of the stimuli and the structure of the bottleneck. Please clarify these assumptions as well. Importantly, please more clearly define the cost function. Is it simply the squared error loss between receptors and representation?
Thank you for prompting us to clarify our assumptions. We realized that in our original submission, we did not make sufficiently clear which assumptions were strictly necessary for determining receptive field locations (i.e. allocation) and which were choices that need to be made in order to calculate precise receptive fields, but which do not affect the resulting allocation, and where therefore different choices might be made. This is now made clearer both at the start of the Results section (where we now present some key equations, see below) and in the Methods section.
In brief, the framework we employ (developed by Doi and Lewicki in several papers) allows calculation of linear receptive fields in the undercomplete case, enforcing decorrelation of the signals. Dimensionality reduction and decorrelation are achieved via PCA by calculating eigenvectors and eigenvalues. Since decorrelation is not unique, the additional degrees of freedom can be used to enforce additional constraints (via the matrix P in Equation 6). Doi and Lewicki showed how this could be done using an iterative procedure for optimizing P (Equation 7) for a number of constraints e.g. power constraints on individual or all output neurons, sparsity constraints, and even included some simple forms of noise. Crucially, the decorrelation part (eigenvectors) is unaffected by any of these constraints (as they only act through P), and it is this component that determines which region receptive fields will fall onto. Any additional constraints enforced through P will affect the shape and structure of the receptive fields, but not the region they fall onto. Additional constraints, such as power constraints or response sparsity, can thus be added to the model without affecting the results.
2) Putting the math more frontandcenter
The fact that all results are derived analytically is a clear strength of the manuscript, but the authors have chosen to refrain from presenting any math in the Results section. This is very unfortunate, because this makes the formal arguments more fuzzy than need be. Please find a compromise between a technical and very highlevel paper, and include the essential formal steps of the arguments in the main text.
We now present the main equations associated with the efficient coding approach as well as the main steps of the derivation at the start of the Results section, aligned with the presentation of the problem and solution strategy presented in figure 1.
3) Expanding the simulation examples provided, increasing the scope of the results
Along these lines, and looping back to prompt 1 and clarifying the assumptions in the model, please include one or two examples with more complicated input statistics or system architecture. For example, it is frequently the case that a narrow sensory bottleneck is followed by an expansion (e.g. retina > optic nerve > V1) – could the theory say something about resource allocation in such scenario?
We have included numerical calculations for other covariance functions (see next comment). The model assumes a bottleneck, either as a limit on the number of neurons or on the information, and therefore will not directly be able to answer what the allocation should look like in an overcomplete scenario (such as the expansion from optic nerve to V1). The model might still apply if a subsequent stage places a clear limit on the amount of information rather than the number of neurons (see Figure2—figure supplement 4), however empirical data is lacking to answer such questions. We have expanded our discussion on this issue and now make clearer the limits and assumptions of the current model.
In another example – what if the correlation structure is more complex? E.g. in the auditory nerve correlations between sensors can spatially nonlocal, due to the presence of harmonic sounds (e.g. Terashima and Okada, NeurIPS 2012). These are just suggestions of more complex scenarios which the authors could consider. These additional results do not have to be obtained analytically, and could be optimized through simulation. Such additions could greatly strengthen the current paper, broaden its scope, and increase its impact.
To show that those results are robust for small variations in the stimulus distribution, please provide numerical experiments with a different covariance function. Perhaps try a Matérn covariance function with nu=3/2 which is slightly smoother than the OU covariance. The experiments are numerically difficult, so this prompt only requires showing a case in a regime where things are numerically stable (this should be possible since the authors have done something similar in the NeurIPS 2019 paper). We would like to see that the presented results are not qualitatively very different.
We thank the reviewers for these suggestions, which have prompted us to test how well our findings generalize beyond negative exponential covariance functions. To do this we have focused on Matérn covariance functions, as suggested, of which the negative exponential can be considered a special case, while other instances exhibit a smoother covariance structure with higher correlation at small distances but smaller correlation at larger distances compared to the negative exponential. Therefore this class provides a natural starting point to test robustness and it turns out that the results prove rather insightful.
First, in numerical simulations, we found that our results held qualitatively across a range of Matérn parameter values. Allocations were more extreme at narrow compared to wide bottlenecks and regions with higher density and/or activations ended up overrepresented. Thus, it appears the results obtained for negative exponential covariance functions do generalize to other monotonically decreasing covariance functions.
Second, we noted that precise allocations, and specifically the convergence points, differ across Matern functions and that they do so in a systematic fashion. Specifically, the smoother the covariance, the less extreme the overrepresentation. Moreover, the convergence points for the different Matérn functions appear to follow a simple mathematical function. Altogether these results suggest that the precise allocations for monotonically decreasing covariance functions are determined by their smoothness. We have included these findings as a new section in the Results (“Generalisation to other covariance functions”) and also added a new figure.
We also considered more complex covariance functions. Thank you for pointing out a specific example from the auditory system, which we weren’t aware of. Our approach requires that covariance functions decay with distance (to ensure that the approximation of the covariance matrix with a block matrix holds, which allows us to consider regions separately), however, they need not do so monotonically. We therefore tested a damped sinusoid covariance function numerically, whose shape resembles the complex harmonic covariance structure observed in the auditory system. Results suggested that in contrast to monotonically decreasing covariance functions, where allocations converge monotonically towards a limit, this more complex covariance function induced complex nonlinear allocations at different bottleneck widths. Unfortunately, it was difficult to obtain numerically stable solutions. These instabilities arise when the eigenvalue function flattens out and small numerical errors then affect the sorting of eigenvalues between the two regions. For monotonically decreasing covariance functions, this issue only arises close to convergence, however more complex covariance functions contain more such regions and we could not find a method to satisfactorily address this issue. We therefore elected not to include these results in the current manuscript, but have included additional text in the discussion instead.
4) Provide more details on the starnosed mole data and predictions from the model
Please report in more detail how the predictions are generated for the data from the starnosed mole. This could be done in the main text. For example, one could plot predictions for different sizes of the bottleneck. As a field, we should not be afraid of showing the range of predictions for different parameter values (e.g. for different bottleneck sizes), and demonstrating where they match the data, and where they do not – this does not necessarily weaken the theory. In this particular case – the theory could inform us about notyetmeasured properties of the system, such as the size of the bottleneck. Moreover, this will be more explicit way to report these results.
Furthermore, to model the stimulus distribution for starnosed mole case, the authors simulated a noisy sensory system to convert contact statistics (assuming a circular object) to receptor activation statistics. Starnose mole parameters (Appendix table) are in the code provided, but it doesn't seem like code for the conversion from the contact statistics to the receptor activation statistics is included. Please share this code, since the text in the manuscript is missing much of the necessary detail for the reproduction of this work.
We have included a supplementary figure with the predictions for all bottleneck sizes and have run some additional tests to verify that a good fit cannot be obtained by simply varying the bottleneck width if arbitrary densities and activation statistics are chosen. We have also clarified in the text the simulations of the decay parameter and calculation for bottleneck allocations. We have additionally switched from RMSE to R^2 and the results should now be easier to interpret. Finally, the code to convert the contact statistics and calculate the decay parameters for each ray has been added to the Github repository linked in the paper.
Recommendations for the authors:
A) page 2 and Discussion: The authors discuss the use of Fisher Information and other measures for their goal. The use of Fisher Information for optimal population coding can be dangerous especially when taking N to infinity, as then the convergence conditions of 1/FI to the MSE are not met (see Berens et al. 2012). The crucial quantity for convergence is the SNR. Thus, the stated reason is not a reason not to use Fisher Information, although it's fine for the authors to choose not to use it. The logic should be improved in the text, though.
Thank you for the clarification. We have removed the reasoning about population sizes from the manuscript.
B) The claim that full model has better MSE is not very convincing. The difference between using density only and receptor only makes sense. But the full model has an additional degree of freedom, so naturally it should be better then either if tested on the same data that it was fit to. Unfortunately, the dataset the authors have is only 11 points, so there's no easy way of splitting them to show that this is effect is real. Hence, it's hard to make a strong claim about the tiny change in MSE. Please weaken the claim. Also, Figure 5D scatter plot for the full model is very similar to that of the receptor usage model (I ran your code). Please provide the corresponding plots explicitly in the supplement instead of the code only for transparency.
All models have the same number of variable parameters density of each ray and activation. In the models where one of these is fixed, we set the fixed value to the mean of the values for all the rays. To test whether the range of bottleneck sizes results in the possibility to fit any given values for the rays, we included a further random simulation in the supplementary information. Here we randomise the values for the density and activation of each ray within the possible range of values for each. We find that with this randomisation of the values the model performs poorly on fitting even with a range of bottleneck sizes. This suggests that the model can only be fitted when using the empirically measured values.
C) The model makes no distinction between the density of sensors in different regions and the exponential decay of the covariance functions. This is not a problem from the perspective of theoretical predictions, but perhaps it could be explained better (maybe with additional illustrations?).
We have improved the text clarifying this relationship in the Results section. We now also make clear that in all examples shown receptors in the lowdensity region are spaced with a distance of 1, and thus the covariance decay is relative to this spacing.
D) It wasn't clear what the authors mean by "secondorder". One assumes this means that the input statistics is Gaussian, but perhaps this is wrong. Did the authors mean that they are only analyzing the covariance functions, i.e. secondorder statistics? Or was it that the neurons' response entropy were quantified by their variance? Please clarify.
Upon reflection, we have removed the term “secondorder” from the text, as it was redundant and confusing. Instead, the term “decorrelation” that we already use implies that we are acting on secondorder statistics. We don’t require the stimulus distribution itself to be Gaussian, or in fact the processing to only consider secondorder statistics, but merely that decorrelation is a necessary step. If it is, this requires calculating the eigenvalues and eigenvectors of the covariance matrix and then the rest of our argument follows.
E) In the text, 'activation' and 'activation ratio' are used interchangeably. It would be better to stick with the ratio since it's really only the relative activation of the second population that matters.
We have clarified in the text what we mean by activation and activation ratio. Where we mean the activation ratio the text has been amended to clearly state this.
E) The 'σ' as 'decay constant' should be explicitly mentioned in the main text in addition to the methods section. At first it seems to be the lengthscale, but it turned out to be inverse lengthscale which was confusing, since σ is often used to denote the inverse lengthscale. (Figures1C, 2C)
To avoid confusion, we have renamed the σ parameter to γ and have clarified its meaning in the text.
F) Please add the following citations to other theoretical work, which considered efficient encoding of somatosensory stimuli and the allocation of cortical space :
a) Bhand, M., Mudur, R., Suresh, B., Saxe, A., and Ng, A. (2011). Unsupervised learning models of primary cortical receptive fields and receptive field plasticity. Advances in neural information processing systems, 24, 19711979.
b) Plumbley, M. D. (1999). Do cortical maps adapt to optimize information density?. Network: Computation in Neural Systems, 10(1), 41.
Thank you for these additional references. We have added relevant points from these in the introduction relating to the optimal representation of information and magnification in cortical maps (Plumbley) and in the discussion on the influence of receptor density and statistics (Bhand).
G) page 2: In the discussion about bottlenecks in the retina, recent work by Lindsey et al. (2019, arxiv 1901:00945) should be cited.
This work was cited elsewhere in the paper, but indeed should also be mentioned in this context and we have now added the reference.
H) page 5: "Decorrelation has been found to be a…" > Please add citations to this claim
We have added a number of citations to support this claim.
Figures:
I) Figure 1D: This figure is quite confusing. The decaying covariance is plotted for a 1D case, the axes are unlabelled, and it takes a while to figure out that these two block matrices do form a single covariance matrix (eventhough they correspond to different regions). Please revise this figure.
We have updated this figure and believe the new version to be much clearer. We have switched to a 2D version for the covariance so that it aligns with the example introduced in the earlier panels. The fact that we are considering a single covariance matrix with block structure should also be more visible now.
J) Figure 2D: small boxes with receptor location is misleading. It shows increased number of receptors when it's the activation ratio that's different. Please revise this figure.
We have updated the figure and this should be more clearly indicated now.
K) Figure 3: A/Blabels seem to be too large; the meaning of the boxes of different colors and the lines was unclear to me. What determines e.g. the line length? The two panels in A should be clearly labeled with what quantity is varied.
Thank you for pointing out this issue. The original boxes and lines were supposed to indicate groupings of examples that showed similar behaviour. These indicators have been now replaced with symbols resembling brackets which should be more easily interpretable. We have also clarified the content of these plots with a fuller explanation in the figure caption. The size of the labels has been corrected.
L) Figure 4: What are the dashed lines? Is it correct that 4B is a simple restatement of the fact that the lines in 3A left are more spread for smaller bottlenecks? Interpreting this as a sign of more plasticity is an overstatement.
The dashed lines in panel A denote the two example bottlenecks for which allocations are visualized directly below. This has now been clarified in the figure legend.
M) Figure 5: Please normalize the RMSE in Figure 5D with respect to the power of the signal. Otherwise the numbers aren't that meaningful, unless some clarifying point is missing. (Or, why not just report r^2?) Please also provide RMSE curves for all different bottleneck sizes and discuss how plausible the optimal values are. Given the right panel, it looks like the RMSE is mainly driven by ray 11, the outlier. How stable is the inferred conclusion if ray 11 is left out? Also, the authors seem to use RMSE to compare percentages. Could the authors comment on the suitability for that loss function?
The plot in former figure 5D (now figure 6) has been changed to report the R^2 between allocations in model and cortical percentages from the empirical data at the best fitting bottleneck. We have provided the fits across a range of bottlenecks in figure 6—figure supplement 1.
[Editors' note: further revisions were suggested prior to acceptance, as described below.]
The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:
1. Regarding lownoise regime, it only appears in the discussion and it does not warn the readers that the numerous tiny tail eigenvalues are very sensitive to noise. This affects the robustness of the analysis and should be clearly stated in the text and appear after the first interpretation of the bottleneck results (Figure 2 and 2supplement 4).
We have made the assumption of low noise and its implication clearer by making several changes to the text. In addition to the text in the discussion, we now make clear at the start of the Results section that we are assuming a lownoise regime. As requested, we have also added a note about the potential effects of noise after presenting the initial results in Figure 2:
“An additional consideration is the robustness of the results regarding small perturbations of the calculated eigenvalues. As allocation depends on the ranks of the eigenvalues only, low levels of noise are unlikely to affect the outcome for narrow bottlenecks, especially since eigenvalues are decaying rather steeply in this regime. On the other hand, allocation in wider bottlenecks is determined from small tail eigenvalues which are much more sensitive to noise (which is also evident when comparing the analytical solution to numerical ones). Allocation can therefore be expected to be somewhat less robust in those regimes.”
2. It is not stated clearly that decorrelation is an information maximizing transform only in absence of noise, which is a key assumption of the model. In the Methods (lines 493494) it is written that adding the noise does not affect the eigendecomposition (and, in consequence, resource allocation), but if that is the case – it is an important result which should be highlighted and explored further – a sensory coding transformation which is independent of the noise level seems like an important finding. Please revise the text around this point.
We have edited the text in the Results section to clarify that decorrelation maximises information only in the absence of noise. We have removed the text in the Methods section regarding the effect of noise on the eigendecomposition. The idea was that under some specific noise models, receptive fields would be affected but not their location, thus rendering the resulting allocation the same. However, this issue is not the focus of the paper, not fully worked out and in any case has no bearing on the results of the paper, and removing it will avoid confusing the reader.
3. It is misleading to claim that the analysis "maximizes information" without qualification in the abstract since this work only deals with variances. Please rephrase.
We have removed the phrase ‘maximize information’ from the abstract:
“Building on work in efficient coding, we address this problem using linear models that optimally decorrelate the sensory signals.”
See also point 5 below, which concerns the wording in the main manuscript text.
4. In Line 66, please add a qualifier to the statement that Fisher Information can be used to approximate Mutual Information. This is only true under certain assumptions in certain cases and specifically for optimal coding, the relationship is not straightforward (Berens et al. 2012, PNAS; Bethge et al. Neural Comp 2002).
We have added a note to this section (with the suggested references) that explains this important caveat:
“In contrast to receptor density, there has been some work on how populations of neurons should encode nonuniform stimulus statistics using Fisher information (Ganguli and Simoncelli, 2010, 2014, 2016; Yerxa et al., 2020). This approach aims to approximate mutual information and is used to calculate optimal encoding in a neural population (Yarrow et al., 2012), however its use is restricted to specific conditions and assumptions (Berens et al., 2011; Bethge et al., 2002).”
5. lines 8788: "We consider linear models which maximize information by decorrelating the sensory inputs, or, equivalently, minimize the meansquared reconstruction error […]" – in a general case these two goals are not equivalent. The following citations referencing work by Doi and colleagues do not support this claim. This should be revised.
We have revised the wording in this section:
“We restrict ourselves to linear models and only consider secondorder statistics of the sensory signals, such that redundancy reduction simplifies to decorrelation (see Hancock et al. 1992, Simoncell and Olshausen 2001, Doi and Lewicki 2005, for examples from the visual literature).”
We have also ensured that the wording in the Methods section is appropriate:
“We assume that receptors are arranged on a twodimensional sensory sheet. Correlations in the inputs occur as receptors that are nearby in space have more similar responses. Restricting ourselves to such secondorder statistics and assuming that noise is negligible, information is maximised in such a setup by decorrelating the sensory inputs. Here we decorrelate using a simple linear model.”
6. In Figure 1, it is hard to parse which colors are kept to mean the same thing across panels and which change meaning. An additional schematic of the model in this figure would help with clarity of the presentation.
We have redesigned Figure 1 and made two major changes:
7. In Equation 1, is P used at all? If not simplify. The explanation of P sounds complicated without really saying anything specific. Give an example of constraints that could be incorporated.
We have removed mention of P from the Results section, as it is not needed to understand the main arguments and its choice does not affect allocation. We have kept it in the Methods section, to acknowledge that whitening is not unique and because it can be used to develop more complex models that enforce further constraints, but still allow a straightforward determination of the allocation. We now give a specific example of how such a constraint (in this case sparsity) could be enforced:
“As the optimally whitened solution is independent of P, additional constraints on the solution can be enforced. For example, power constraints are often added, which either limit the total variance or equalize the variance across output neurons, and additional sparsity constraints can be placed on the output neuron’s activity or their receptive field weights (Doi et al., 2012; Doi and Lewicki, 2014). For example, to optimize the sparsity of the weight matrix, one would define an appropriate cost function (such as the L1 norm of the weight matrix), then iteratively calculate the gradient with respect to W , take an update step to arrive at a new W_opt, and determine P as described above in Equation 7 (see Doi and Lewicki, 2014, for further details). Importantly for our problem, such additional constraints will affect the precise receptive field structure (through P), but not the eigenvalues and eigenvectors […].”
8. The assumption that the covariance between regions is zero is quite a strong one and seems central to the paper. This should be justified or at least discussed how realistic it is. In the finger tip example, one of course typically touches objects not with one finger but with multiple ones.
If covariance functions are monotonically decreasing and they are doing so fast enough that the extent of correlations is smaller than the size of the different regions, then the block matrix approximation is very accurate (as we have shown previously numerically). Of course, covariance functions might NOT decrease monotonically and, as pointed out here, one might expect this to be the case in touch where multifinger contact would induce longranging and nonlinear correlations.
We have responded to this point by expanding the corresponding paragraph in the ‘Limitations’ section of the Discussion:
“We considered allocations for monotonically decreasing covariance functions. Choosing a negative exponential function and assuming that activations are uncorrelated across different input regions allowed us to derive an analytical solution. How justified are these assumptions? Many sensory systems will likely obey the intuitive notion that receptor correlations decrease with distance; however, there are notable exceptions, for example in the auditory system (Terashima and Okada, 2012). In touch, the sensory system we are mainly concerned with here, contact might often be made with several fingers (for humans) or rays (for the starnosed mole) simultaneously, which might induce farranging nonmonotonic correlations. Unfortunately there is little quantified evidence on the strength of these correlations, rendering their importance unclear. At least for the starnosed mole, their prey is often small compared to the size of the rays, which move mostly independently, so crossray correlations might be low. Furthermore, the receptive fields of neurons in early somatosensory cortex of both humans and starnosed moles are strongly localized to a single appendage and lie within cortical subregions that are clearly delineated from others (Sur et al., 1980; Nelson et al., 1980; Catania and Kaas, 1995). If longrange nonmonotonic correlations were strong, we would expect to find many multiappendage receptive fields and blurry region boundaries. As this is not the case, it therefore stands to reason that either these correlations are not very strong or that there is some other process, perhaps during development, that prevents these correlations affecting the final allocation. Either way, our assumption of monotonically decreasing covariance functions appears to be a good firstorder match. That said, the question of how to arrive at robust allocations for more complex covariance functions is an important one that should be considered in future research. While it is possible in principle to solve the allocation problem numerically for arbitrary covariance functions, in practice we noticed that the presence of small numerical errors can affect the sorting process and caution is therefore warranted, especially when considering nonmonotonic functions.”
9. The brackets in Figure 3 were found to be very confusing, please revise.
We have changed the brackets to markers whose colors match those in panel B.
10. The covariance functions in Figure 5 are not very different, it is not surprising that these result in very similar outcomes. Comment on this or consider revising.
We agree that the results are not particularly surprising, however they demonstrate the robustness of the method (i.e. the results are not an artefact of assuming a negative exponential covariance function) and provide some additional, insightful context on how the particular covariance structure affects the resulting allocation, tested via a systematic change in the smoothness of the function. Last but not least, the covariance functions tested were specifically requested by reviewers in the previous revision round. We have now clarified the motivation behind including these functions and what we can learn from them at the start of the relevant section:
“The results above relate to negative exponential covariance functions, where allocations can be solved analytically and it is unclear whether these findings generalize to other monotonically decreasing covariance functions and are therefore robust. Specifically, negative exponential covariances are relatively 'rough' and allocations might conceivably depend on the smoothness of the covariance function. To test directly how optimal allocations depend systematically on the smoothness of the covariance function, we numerically calculated allocations for three covariance functions […]”
https://doi.org/10.7554/eLife.70777.sa2Article and author information
Author details
Funding
Wellcome Trust (209998/Z/17/Z)
 Hannes P Saal
European Commission (HBPSGA2 785907)
 Alejandro Jiménez Rodríguez
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication. For the purpose of Open Access, the authors have applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.
Acknowledgements
This work was supported by the Wellcome Trust (209998/Z/17/Z) and the European Union Horizon 2020 programme as part of the Human Brain Project (HBPSGA2, 785907).
Senior Editor
 Michael J Frank, Brown University, United States
Reviewing Editor
 Stephanie E Palmer, University of Chicago, United States
Reviewer
 Memming Park
Version history
 Preprint posted: May 27, 2021 (view preprint)
 Received: May 28, 2021
 Accepted: July 29, 2022
 Accepted Manuscript published: August 4, 2022 (version 1)
 Version of Record published: August 19, 2022 (version 2)
Copyright
© 2022, Edmondson et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 555
 Page views

 117
 Downloads

 0
 Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading

 Computational and Systems Biology
 Neuroscience
Cerebellar climbing fibers convey diverse signals, but how they are organized in the compartmental structure of the cerebellar cortex during learning remains largely unclear. We analyzed a large amount of coordinatelocalized twophoton imaging data from cerebellar Crus II in mice undergoing ‘Go/Nogo’ reinforcement learning. Tensor component analysis revealed that a majority of climbing fiber inputs to Purkinje cells were reduced to only four functional components, corresponding to accurate timing control of motor initiation related to a Go cue, cognitive errorbased learning, reward processing, and inhibition of erroneous behaviors after a Nogo cue. Changes in neural activities during learning of the first two components were correlated with corresponding changes in timing control and error learning across animals, indirectly suggesting causal relationships. Spatial distribution of these components coincided well with boundaries of AldolaseC/zebrin II expression in Purkinje cells, whereas several components are mixed in single neurons. Synchronization within individual components was bidirectionally regulated according to specific task contexts and learning stages. These findings suggest that, in close collaborations with other brain regions including the inferior olive nucleus, the cerebellum, based on anatomical compartments, reduces dimensions of the learning space by dynamically organizing multiple functional components, a feature that may inspire newgeneration AI designs.

 Neuroscience
Ultrasonic vocalizations (USVs) fulfill an important role in communication and navigation in many species. Because of their social and affective significance, rodent USVs are increasingly used as a behavioral measure in neurodevelopmental and neurolinguistic research. Reliably attributing USVs to their emitter during close interactions has emerged as a difficult, key challenge. If addressed, all subsequent analyses gain substantial confidence. We present a hybrid ultrasonic tracking system, Hybrid Vocalization Localizer (HyVL), that synergistically integrates a highresolution acoustic camera with highquality ultrasonic microphones. HyVL is the first to achieve millimeter precision (~3.4–4.8 mm, 91% assigned) in localizing USVs, ~3× better than other systems, approaching the physical limits (mouse snout ~10 mm). We analyze mouse courtship interactions and demonstrate that males and females vocalize in starkly different relative spatial positions, and that the fraction of female vocalizations has likely been overestimated previously due to imprecise localization. Further, we find that when two male mice interact with one female, one of the males takes a dominant role in the interaction both in terms of the vocalization rate and the location relative to the female. HyVL substantially improves the precision with which social communication between rodents can be studied. It is also affordable, opensource, easy to set up, can be integrated with existing setups, and reduces the required number of experiments and animals.