A principle of economy predicts the functional architecture of grid cells
 Cited 10
 Views 2,395
 Annotations
Abstract
Grid cells in the brain respond when an animal occupies a periodic lattice of ‘grid fields’ during navigation. Grids are organized in modules with different periodicity. We propose that the grid system implements a hierarchical code for space that economizes the number of neurons required to encode location with a given resolution across a range equal to the largest period. This theory predicts that (i) grid fields should lie on a triangular lattice, (ii) grid scales should follow a geometric progression, (iii) the ratio between adjacent grid scales should be √e for idealized neurons, and lie between 1.4 and 1.7 for realistic neurons, (iv) the scale ratio should vary modestly within and between animals. These results explain the measured grid structure in rodents. We also predict optimal organization in one and three dimensions, the number of modules, and, with added assumptions, the ratio between grid periods and field widths.
https://doi.org/10.7554/eLife.08362.001eLife digest
In the 1930s, neuroscientists studying how rodents find their way through a maze proposed that the animals could construct an internal map of the maze inside their heads. The map was thought to enable the animals to navigate between familiar locations and also to identify shortcuts and alternative routes whenever familiar ones were blocked.
In the 1960s, recordings of electrical activity in the rat brain provided the first clues as to which nerve cells form this spatial map. In a region of the brain called the hippocampus, nerve cells called ‘place cells’ are active whenever the rat finds itself in a specific location. However, place cells alone are not able to support all types of navigation. Some spatial tasks also require cells in a region of the brain called the medial entorhinal cortex (MEC), which supplies most of the information that the hippocampus receives.
Cells in the MEC called ‘grid cells’ represent twodimensional space as a repeating grid of triangles. A given grid cell is activated if the animal is located at a particular distance and angle away from the center of any of these triangles. The size of the triangles in these grids varies systematically throughout the MEC. Individual grid cells at one end of the structure encode space in finer detail than grid cells at the opposite end.
Wei et al. have now used mathematical modeling to explore how grid cells are organized. The model assumes that the brain seeks to encode space at whatever resolution an animal requires using as few nerve cells as possible. The model successfully reproduces several known features of grid cells, including the triangular shape of the grid, and the fact that the size of the triangles increases in steps of a specific size across the MEC.
In addition to providing a mathematical basis for the way that grid cells are organized in the brain, the model makes a number of testable predictions. These include predictions of the number of grid cells in the rat brain, as well as the pattern that grid cells adopt in threedimensions: a question that is currently being studied in bats. Wei et al.'s findings suggest that the code used by the grid to represent space is an analog of a decimal number system—except that space is not subdivided by factors of 10 to form decimal ‘digits’, but by a quantity related to a famous constant in the field of mathematics called Euler's number.
https://doi.org/10.7554/eLife.08362.002Introduction
How does the brain represent space? Tolman (1948) suggested that the brain must have an explicit neural representation of physical space, a cognitive map, that supports higher brain functions such as navigation and path planning. The discovery of place cells in the rat hippocampus (O'Keefe, 1976; O'Keefe and Nadel, 1978) suggested one potential locus for this map. Place cells have spatially localized firing fields which reorganize dramatically when the environment changes (Leutgeb et al., 2005). Another potential locus for the cognitive map of space has been uncovered in the main input to hippocampus, a structure known as the medial entorhinal cortex (MEC) (Figure 1, Fyhn et al., 2004; Hafting et al., 2005). When rats freely explore a twodimensional open environment, individual ‘grid cells’ in the MEC display spatial firing fields that form a periodic triangular grid which tiles space (Figure 1A). It is believed that grid fields provide relatively rigid coordinates on space based partly on selfmotion and partly on environmental cues (Moser et al., 2008). The scale of grid fields varies systematically along the dorso–ventral axis of the MEC (Figure 1A) (Hafting et al., 2005; Barry et al., 2007; Stensola et al., 2012). Recently, it was shown that grid cells are organized in discrete modules within which cells share the same orientation and periodicity but vary randomly in phase (Barry et al., 2007; Stensola et al., 2012).
How does the grid system represent spatial location and what function does the modular variation in grid scale serve? Here, we propose that the grid system provides a hierarchical representation of space where fine grids provide precise location and coarse grids resolve ambiguity, and that the grids are organized to minimize the number of neurons required to achieve the behaviorally necessary spatial resolution across a spatial range equal in size to the period of the largest grid module. Our analyses thus assume that there is a behaviorally defined maximum range over which a fixed grid represents locations. Our hypotheses, together with general assumptions about tuning curve shape and decoding mechanism, explain the triangular lattice structure of twodimensional grid cell firing maps and predict a geometric progression of grid scales. Crucially, the theory further predicts that the ratio of adjacent grid scales will be modestly variable within and between animals with a mean in the range 1.4–1.7 depending on the assumed decoding mechanism used by the brain. With additional assumptions the theory also predicts that the ratio between grid scale and individual grid field widths should lie in the same range. These predictions naturally explain the structural parameters of grid cell modules measured in rodents (Barry et al., 2007; Giocomo et al., 2011a; Stensola et al., 2012). Our results follow from general principles, and thus, we expect similar organization of the grid system in other species. The theory makes further predictions including: (a) the number of grid scales necessary to support navigation over typical behavioral distances (i.e., a logarithmic relation between number of modules and navigational range), (b) possible deficits in spatial behavior that will obtain upon inactivating specific grid modules, (c) the structure of one and threedimensional grids that will be relevant to navigation in, for example, bats (Yartsev et al., 2011), (d) an estimate of the number of grid cells we expect in the mEC. Remarkably, in a simple decoding scheme, the scale ratio in an ndimensional environment is predicted to be close to $\sqrt[n]{e}$.
As we will explain, our results and their apparent experimental confirmation in Stensola et al. (2012), suggest that the grid system implements a twodimensional neural analog of a baseb number system. This provides an intuitive and powerful metaphor for interpreting the representation of space in the entorhinal cortex.
Results
The setup
The key features of the grid system in the MEC are schematized in Figure 1A. Grid cells are organized in modules, and cells within a module share a common lattice organization of their firing fields (Barry et al., 2007; Stensola et al., 2012). These lattices have periods λ_{1} > λ_{2} >⋯λ_{m}, measured as the distance between nearest neighbor firing fields. It will prove convenient to define ‘scale factors’ r_{i} = λ_{i}/λ_{i+1} relating the periods of adjacent scales. In each module, the grid firing fields (i.e., the connected spatial regions that evoke firing) are compact (with a diameter denoted l_{i}) after thresholding for activity above the noise level (see, e.g., Hafting et al., 2005). Within any module, grid cells have a variety of spatial phases so that at least one cell will respond at any physical location (Figure 1B,D). Grid modules with smaller field widths l_{i} provide more local spatial information than those with larger scales. However, this increased spatial precision comes at a cost: the correspondingly smaller periodicity λ_{i} of these modules leads to increased ambiguity since there are more grid periods within a given spatial region (e.g., see scale 3 in the schematic onedimensional grid in Figure 1B,D). By contrast, modules with large periods and field widths have less spatial precision, but also less ambiguity (e.g., in scale 1 in Figure 1B the red cell has only one firing field in the environment and hence no ambiguity).
We propose that the entorhinal cortex exploits this tradeoff to implement a hierarchical representation of space where large scales resolve ambiguity and small scales provide precision. Consistently with existing data for one and twodimensional grids (Barry et al., 2007; Brun et al., 2008; Stensola et al., 2012), we will take the largest grid period λ_{1} to be comparable to the range over which space is represented unambiguously by a fixed grid without remapping (Fyhn et al., 2007). (An alternative view, that the range might greatly exceed the largest period, is addressed in the ‘Discussion’.) The spatial resolution of such a grid can be measured by comparing the range of spatial representation set by the largest period λ_{1} to the precision (related to the smallest grid field width l_{m}) to quantify how many distinct spatial ‘bins’ can be resolved. We will assume that the required resolution is set by the animal's behavioral requirements.
Intuitions from a simplified model
What are the advantages of a multiscale, hierarchical representation of physical location? Consider an animal living in an 8 m linear track and requiring spatial precision of 1 m to support its behavior. To develop intuition, consider a simple model where location is represented in the animal's brain by reliable neurons with rectangular firing fields (e.g., Figure 1B). The animal could achieve the required resolution in a place coding scheme by having eight neurons tuned to respond when the animal is in 1 m wide, nonoverlapping regions (see [Fiete et al., 2008] for a related comparison between grid and place cells). Consider an alternative, the idealized grid coding scheme in Figure 1B. Here, the two neurons at the largest scale (λ_{1}) have 4 m wide tuning curves so that their responses just indicate the left and right halves of the track. The pairs of neurons at the next two scales have grid field widths of 2 m and 1 m respectively, and proportionally shorter periodicities as well. These pairs successively localize the animal into 2 m and 1 m bins. All told only six neurons are required, less than in the place coding scheme. This suggests that grid schemes that integrate multiple scales of representation can encode space more efficiently, that is, with fewer neural resources. In the sensory periphery, there is evidence of selection for more efficient circuit architectures (e.g., Simoncelli and Olshausen, 2001). If similar selection operates in cortex, the experimentally measured grid architecture should be predicted by maximizing the efficiency of the grid system given a behaviorally determined range and resolution. Thus, we seek to predict the key structural parameters of the grid system—the ratios r_{i} = λ_{i}/λ_{i+1} relating adjacent scales (which need not be equal).
The need to avoid spatial ambiguity constrains the ratios r_{i}. Again in our simple model, consider Figure 1C where the cells with the grid fields marked in red respond at scales i and i + 1. Then the animal might be in either of the two marked locations. Avoiding ambiguity requires that λ_{i+1}, the period at scale i + 1, must exceed l_{i}, the grid field width at scale i. Variants of this condition will recur in the more realistic models that we will consider. Theoretically, one could resolve the ambiguity in Figure 1C by combining the responses of more grid modules, provided they have mutually incommensurate periods (Fiete et al., 2008; Sreenivasan and Fiete, 2011). However, anatomical evidence suggests that contiguous subsets of the mEC along the dorso–ventral axis project topographically to the hippocampus (Van Strien et al., 2009). While there is evidence that hippocampal place cells are not formed and maintained by grid cell inputs alone (Bush et al., 2014; Sasaki et al., 2015), for each of these restricted projections to represent a welldefined spatial map, ambiguities like the one in Figure 1C should be resolved at each scale. The hierarchical position encoding schemes that we consider below embody this observation by seeking to reduce position ambiguity at each scale, given the responses at larger scales.
Efficient grid coding in one dimension
How should the grid system be organized to minimize the resources required to represent location unambiguously with a given resolution? Consider a onedimensional grid system that develops when an animal runs on a linear track. As described above, the ith module is characterized by a period λ_{i}, while the ratio of adjacent periods is r_{i} = λ_{i}/λ_{i+1}. Within any module, grid cells have periodic, bumpy response fields with a variety of spatial phases so that at least one cell responds at any physical location (Figure 1D). If d cells respond above the noise threshold at each point, the number of grid cells n_{i} in module i will be n_{i} = dλ_{i}/l_{i}. We will take d, the coverage factor, to be the same in each module. In terms of these parameters, the total number of grid cells is $N={{\displaystyle \sum}}_{i=1}^{m}\text{\hspace{0.17em}}{n}_{i}={{\displaystyle \sum}}_{i=1}^{m}\text{\hspace{0.17em}}d\frac{{\lambda}_{i}}{{l}_{i}}$, where m is the number of grid modules. How should such a grid be organized to minimize the number of grid cells required to achieve a given spatial resolution? The answer might depend on how the brain decodes the grid system. Hence, we will consider decoding methods at extremes of decoding complexity and show that they give similar answers for the optimal grid.
Winnertakeall decoder
First imagine a decoder which considers the animal as localized within the grid fields of the most responsive cell in each module (Coultrip et al., 1992; Maass, 2000). A simple ‘winnertakeall’ (WTA) scheme of this kind can be easily implemented by neural circuits where lateral inhibition causes the influence of the most responsive cell to dominate. A maximally conservative decoder ignoring all information from other cells and from the shape of the tuning curve (illustrated in Figure 1E) could then take uncertainty in spatial location to be equal to l_{i}. The smallest interval that can be resolved in this way will be l_{m}. We therefore quantify the resolution of the grid system (the number of spatial bins that can be resolved) as the ratio of the largest to the smallest scale, R_{1} = λ_{1}/l_{m}, which we assume to be large and fixed by the animal's behavior. In terms of scale factors r_{i} = λ_{i}/λ_{i+1}, we can write the resolution as ${R}_{1}={{\displaystyle \prod}}_{i=1}^{m}\text{\hspace{0.17em}}{r}_{i}$, where we also defined r_{m} = λ_{m}/l_{m}. As in our simplified model above, unambiguous decoding requires that l_{i} ≤ λ_{i+1} (Figure 1C,E), or, equivalently, $\frac{{\lambda}_{i}}{{l}_{i}}\ge {r}_{i}$. To minimize $N=d\text{\hspace{0.17em}\hspace{0.17em}}{\displaystyle {\sum}_{i}\text{\hspace{0.17em}}{\lambda}_{i}/{l}_{i}}$, all the $\frac{{\lambda}_{i}}{{l}_{i}}$ should be as small as possible; so this fixes $\frac{{\lambda}_{i}}{{l}_{i}}={r}_{i}$. Thus, we are reduced to minimizing the sum $N=d{{\displaystyle \text{\hspace{0.17em}}\sum}}_{i=1}^{m}{r}_{i}$ over the parameters r_{i}, while fixing the product ${R}_{1}={\displaystyle {\prod}_{i}{r}_{i}}$. Because this problem is symmetric under permutation of the indices i, the optimal r_{i} turn out to all be equal, allowing us to set r_{i} = r (Optimizing the grid system: winnertakeall decoder, ‘Materials and methods’). This is our first prediction: (1) the ratios between adjacent periods will be constant. The constraint on resolution then gives m = log_{r}R_{1}, so that we seek to minimize N (r) = d r log_{r} R_{1} with respect to r: the solution is r = e (Optimizing the grid system: winnertakeall decoder, ‘Materials and methods’, and panel B of Figure 5 in Optimizing the grid system: probabilistic decoder, ‘Materials and methods’). This gives a second prediction: (2) the ratio of adjacent grid periods should be close to r = e. Therefore, for each scale i, λ_{i} = e λ_{i + 1} and λ_{i} = el_{i}. This gives a third prediction: (3) the ratio of the grid period and the grid field width will be constant across modules and be close to the scale ratio.
More generally, in winnertakeall decoding schemes, the local uncertainty in the animal's location in grid module i will be proportional to the grid field width l_{i}. The proportionality constant will be a function f(d) of the coverage factor d that depends on the tuning curve shape and neural variability. Thus, the uncertainty will be f(d)l_{i}. Unambiguous decoding at each scale requires that λ_{i + 1} ≥ f(d)l_{i}. The smallest interval that can be resolved in this way will be f(d)l_{m}, and this sets the positional accuracy of the decoding scheme. Finally, we require that λ_{1} > L, where L is a scale big enough to ensure that the grid code resolves positions over a sufficiently large range. Behavioral requirements fix the required positional accuracy and range. The optimal grid satisfying these constraints is derived in Optimizing the grid system: winnertakeall decoder, ‘Materials and methods’. Again, the adjacent modules are organized in a geometric progression and the ratio between adjacent periods is predicted to be e. However, the ratio between the grid period and grid field width in each module depends on the specific model through the function f(d). Thus, within winnertakeall decoding schemes, the constancy of the scale ratio, the value of the scale ratio, and the constancy of the ratio of grid period to field width are parameterfree predictions, and therefore furnish tests of theory. If the tests succeed, f(d) can be matched to data to constrain possible mechanisms used by the brain to decode the grid system.
Probabilistic decoder
What do we predict for a more general, and more complex, decoding scheme that optimally pools all the information available in the responses of noisy neurons within and between modules? Statistically, the best we can do is to use all these responses, which may individually be noisy, to find a probability distribution over physical locations that can then inform subsequent behavioral decisions (Figure 2). Thus, the population response at each scale i gives rise to a likelihood function over location P(xi), which will have the same periodicity λ_{i} as the individual grid cells' firing rates (Figure 2A). This likelihood explicitly captures the uncertainty in location given the tuning and noise characteristics of the neural population in the module i. Because there are at least scores of neurons in each grid module (Stensola et al., 2012) P(xi) can be approximated as a periodic sum of Gaussians without making restrictive assumptions about the shapes of the tuning curves of individual grid cells, or about the precision of their periodicity, so long as the variability of individual neurons is weakly correlated and homogeneous. For example, even though individual grid cells can have somewhat different firing rates in each of their firing fields, this spatial heterogeneity will be smoothed in the posterior over the full population of cells, leading to much more accurate periodicity. In other words, individual grid cells show both spiking noise and ‘noise’ due to heterogeneity and imperfect periodicity of the firing rate maps. Both these forms of variability are smoothed by averaging over the population, provided, as we will assume, that there are enough cells and noise is not too correlated between cells.
The standard deviations of the peaks in P(xi), which we call σ_{i}, depend on the tuning curve shape and response noise of individual grid cells, and will decrease as the coverage factor d increases. To have even coverage of space, the number of grid phases, and thus grid cells in a module, must be uniformly distributed so that equally reliable posterior distributions can be formed at each point in the unit cell of the module response. This requires that the number of cells (and phases) in the module should be proportional to the ratio $\frac{{\lambda}_{i}}{{\sigma}_{i}}$. Summing over modules, the total number of grid cells will be $N\propto {{\displaystyle \sum}}_{i=1}^{m}\frac{{\lambda}_{i}}{{\sigma}_{i}}$. The composite posterior given all m scales and a uniform prior over positions, Q_{m}(x), will be given by the product ${Q}_{m}\left(x\right)\propto {\text{\Pi}}_{i=1}^{m}\text{\hspace{0.17em}}P\left(xi\right)$, assuming independent response noise across scales (Figure 2B). The animal's overall uncertainty about its position depends on the standard deviation δ_{m} of the composite posterior distribution Q_{m}(x). Setting δ_{0} to be the uncertainty in location without using any grid responses at all, we can quantify resolution as R = δ_{0}/δ_{m}.
In this framework, there is a precisionambiguity tradeoff controlled by the scale factors r_{i}. The larger these ratios, the more rapidly grid field widths shrink in successive modules, thus increasing precision and reducing the number of modules, and hence grid cells, required to achieve a given resolution. However, if the periods of adjacent scales shrink too quickly, the composite posterior Q_{i}(x) will develop prominent sidelobes (Figure 2C,D) making decoding ambiguous, as reflected in a large standard deviation δ_{i} of the composite posterior distribution (Figure 2B,D). This ambiguity could be avoided by shrinking the width of Q_{i−1}(x)—however, this would require increasing the number of neurons n_{1},⋯n_{i−1} in the modules 1,⋯i − 1. Ambiguity can also be avoided by having a smaller scale ratio (so that the side lobes of the posterior P(xi) of module i do not penetrate the central lobe of the composite posterior Q_{i−1}(x) of modules 1,⋯i−1. But reducing the scale ratios to reduce ambiguity increases the number of modules necessary to achieve the required resolution, and hence increases the number of grid cells. This sets up a tradeoff—increasing the scale ratios reduces the number of modules to achieve a fixed resolution but requires more neurons in each module; reducing the scale ratios permits the use of fewer grid cells in each module, but increases the number of required modules. Optimizing this tradeoff (analytical and numerical details in 'Materials and methods' and Figure 5) predicts: (1) a constant scale ratio between the periods of each grid module, and (2) an optimal ratio ≈2.3, slightly smaller than, but close to the winnertakeall value, e.
Why is the predicted scale factor based on the probabilistic decoder somewhat smaller than the prediction based on the winnertakeall analysis? In the probabilistic analysis, when the likelihood is combined across modules, there will be side lobes arising from the periodic peaks of the likelihood derived from module i multiplying the tails of the Gaussian arising from the previous modules. These side lobes increase location ambiguity (measured by the standard deviation δ_{i} of the overall likelihood). Reducing the scale factor reduces the height of side lobes because the secondary peaks from module i move further into the tails of the Gaussian derived from the previous modules. Thus, conceptually, the optimal probabilistic scale factor is smaller than the winnertakeall case in order to suppress side lobes that arise in the combined likelihood across modules (Figure 2). Such side lobes were absent in the winnertakeall analysis, which thus permits a more aggressive (larger) scale ratio that improves precision, without being penalized by increased ambiguity. The theory also predicts a fixed ratio between grid period λ_{i} and posterior likelihood width σ_{i}. However, the relationship between σ_{i} and the more readily measurable grid field width l_{i} depends on a variety of parameters including the tuning curve shape, noise level, and neuron density.
General grid coding in two dimensions
How do these results extend to two dimensions? Let λ_{i} be the distance between nearest neighbor peaks of grid fields of width l_{i} (Figure 3). Assume in addition that a given cell responds on a lattice whose vertices are located at the points λ_{i} (nu + mv), where n, m are integers and u, v are linearly independent vectors generating the lattice (Figure 3A). We may take u to have unit length (u = 1) without loss of generality, however v ≠ 1 in general. It will prove convenient to denote the components of v parallel and perpendicular to u by ${v}_{\parallel}$ and v_{⊥}, respectively (Figure 3A). The two numbers ${v}_{\parallel},\text{\hspace{0.17em}}{v}_{\perp}$ quantify the geometry of the grid and are additional parameters that we may optimize over: this is a primary difference from the onedimensional case. We will assume that ${v}_{\parallel}$ and v_{⊥} are independent of scale; this still allows for relative rotation between grids at different scales. At each scale, grid cells have different phases so that at least one cell responds at each physical location. The minimal number of phases required to cover space is computed by dividing the area of the unit cell of the grid (${\lambda}_{i}^{2}\left\right\mathbf{u}\times \mathbf{v}\left\right\text{\hspace{0.17em}}={\lambda}_{i}^{2}\left{v}_{\perp}\right$) by the area of the grid field. As in the onedimensional case, we define a coverage factor d as the number of neurons covering each point in space, giving for the total number of neurons $N=d\left{v}_{\perp}\right{\displaystyle {\sum}_{i}{\left({\lambda}_{i}/{l}_{i}\right)}^{2}}$.
As before, consider a situation where grid fields thresholded for noise lie completely within compact regions and assume a simple decoder which selects the most activated cell and does not take tuning curve shape into account (Coultrip et al., 1992; Maass, 2000; de Almeida et al., 2009). In such a model, each scale i simply serves to localize the animal within a circle of diameter l_{i}. The spatial resolution is summarized by the square of the ratio of the largest scale λ_{1} to the smallest scale l_{m}: R_{2} = (λ_{1}/l_{m})^{2}. In terms of the scale factors ${\stackrel{~}{r}}_{i}={\lambda}_{i}/{\lambda}_{i+1}$, we write ${R}_{2}={{\displaystyle \prod}}_{i=1}^{m}\text{\hspace{0.17em}}{\stackrel{~}{r}}_{i}^{2}$, where we also define ${\stackrel{~}{r}}_{m}={\lambda}_{m}/{l}_{m}$. To decode the position of an animal unambiguously, each cell at scale i + 1 should have at most one grid field within a region of diameter l_{i}. We therefore require that the shortest lattice vector of the grid at scale i has a length greater than l_{i − 1}, in order to avoid ambiguity (Figure 3B). We wish to minimize N, which will be convenient to express as $N=d\left{v}_{\perp}\right{\displaystyle {\sum}_{i}\text{\hspace{0.17em}}{\stackrel{~}{r}}_{i}^{2}{\left({\lambda}_{i+1}/{l}_{i}\right)}^{2}}$. There are two kinds of contributions here to the number of neurons—the factors ${\stackrel{~}{r}}_{i}^{2}$ are constrained by the overall resolution of the grid, while, as we will see, the combination v_{⊥}(λ_{i + 1}/l_{i})^{2} measures a packing density of discs placed on the grid lattice. This suggests that we should separate the minimization of neuron number into first optimizing the lattice and then optimizing ratios. After doing so, we can check that the result is the global optimum.
To obtain the optimal lattice geometry, we can ignore the resolution constraint, as it depends only on the scale factors and not the grid geometry. We may then exploit an equivalence between our optimization problem and the optimal circlepacking problem. To see this connection, consider placing disks of diameter l_{i} on each vertex of the grid at scale i + 1. In order to avoid ambiguity, all points of the grid i + 1 must be separated by at least l_{i}: equivalently, the disks must not overlap. The density of disks is proportional to ${l}_{i}^{2}/\left({\lambda}_{i+1}^{2}\left{v}_{\perp}\right\right)$, which is proportional to the reciprocal of each term in N. Therefore, minimizing neuron number amounts to maximizing the packing density; and the noambiguity constraint requires that the disks do not overlap. This is the optimal circle packing problem, and its solution in two dimensions is known to be the triangular lattice (Thue, 1892), so ${v}_{\parallel}=1/2$ and ${v}_{\perp}=\sqrt{3}/2$. Furthermore, the grid spacing should be as small as allowed by the noambiguity constraint, giving ${\lambda}_{i+1}={l}_{i}$.
We have now reduced the problem to minimizing $N=\frac{d\sqrt{3}}{2}\text{\hspace{0.17em}}{\displaystyle {\sum}_{i}{\stackrel{~}{r}}_{i}^{2}}$, over the scale factors ${\stackrel{~}{r}}_{i}$, while fixing the resolution R_{2}. This optimization problem is mathematically the same as in one dimension if we formally set ${r}_{i}\equiv {\stackrel{~}{r}}_{i}^{2}$. This gives the optimal ratio ${\stackrel{~}{r}}_{i}^{2}=e$ for all i (Figure 3C). We conclude that in two dimensions, the optimal ratio of neighboring grid periodicities is $\sqrt{e}\approx 1.65$ for the simple winnertakeall decoding model, and the optimal lattice is triangular.
The optimal probabilistic decoding model from above can also be extended to two dimensions with the posterior distributions P(xi) becoming sums of Gaussians with peaks on the twodimensional lattice. In analogy with the onedimensional case, we then derive a formula for the resolution R_{2} = λ_{1}/δ_{m} in terms of the standard deviation δ_{m} of the posterior given all scales. The quantity δ_{m} may be explicitly calculated as a function of the scale factors ${\stackrel{~}{r}}_{i}$ and the geometric factors ${v}_{\parallel},\text{\hspace{0.17em}}{v}_{\perp}$, and the minimization of neuron number may then be carried out numerically (Optimizing the grid system: probabilistic decoder, ‘Materials and methods’). In this approach, the optimal scale factor turns out to be ${\stackrel{~}{r}}_{i}\approx 1.44$ (Figure 3C), and the optimal lattice is again triangular (Figure 3D). Attractor network models of grid formation readily produce triangular lattices (Burak and Fiete, 2009); our analysis suggests that this architecture is functionally beneficial in reducing the required number of neurons.
Even though our two decoding strategies lie at extremes of complexity (one relying just on the most active cell at each scale and another optimally pooling information in the grid population) their respective ‘optimal intervals’ substantially overlap (Figure 3C; see Figure 5B in 'Materials and methods' for the onedimensional case). This indicates that our proposal is robust to variations in grid field shape and to the precise decoding algorithm (Figure 3C). The scaling ratio r may lie anywhere within a basin around the optimum at the cost of a small number of additional neurons. Such considerations also suggest that these coding schemes have the capacity to tolerate developmental noise: different animals could develop grid systems with slightly different scaling ratios, without suffering a large loss in efficiency. In two dimensions, the required neuron number will be no more than 5% of the minimum if the scale factor is within the range (1.43, 1.96) for the winnertakeall model and the range (1.28, 1.66) for the probabilistic model. These ‘optimal intervals’ are narrower than in the onedimensional case and have substantial overlap.
In summary, for 2d case, the theory predicts that (1) the ratios between adjacent scales should be a constant; (2) the optimal scaling constant is $\sqrt{e}\approx 1.65$ in a simple WTA decoding model, and it is ≈1.44 in a probabilistic decoding model; (3) the predictions for the optimal grid field width depends on the specific decoding method, (4) The grid lattice should be a triangular lattice.
Comparison to experiment
Our predictions agree with experiment (Barry et al., 2007; Giocomo et al., 2011a; Stensola et al., 2012) (see Reanalysis of grid data from previous studies, ‘Materials and methods’ for details of the data reanalysis). Specifically, Barry et al. (2007) (Figure 4A) reported the grid periodicities measured at three locations along the dorso–ventral axis of the MEC in rats and found ratios of ∼1, ∼1.7 and ∼2.5 ≈ 1.6 × 1.6 relative to the smallest period (Barry et al., 2007). The ratios of adjacent scales reported in Barry et al. (2007) had a mean of 1.64 ± 0.09 (mean ± std. dev., n = 6), which almost precisely matches the mean scale factor of $\sqrt{e}$ predicted from the winnertakeall decoding model, and is also consistent with the probabilistic decoding model. In another study (Krupic et al., 2012), the scale ratio between the two smaller grid scales, measured by the ratio between the grid frequencies, is reported to be ∼1.57 in one animal. Recent analysis based on a larger data set (Stensola et al., 2012) confirms the geometric progression of the grid scales in individual animals over four modules. The mean ratio between adjacent scales is 1.42 ± 0.17 (mean ± std. dev., n = 24) in that data set, accompanied by modest variability within and between animals. These measurements again match both our models (Figure 4A).
The optimal grid was triangular in both of our models, again matching measurements (Figure 4C) (Hafting et al., 2005; Moser et al., 2008; Stensola et al., 2012). However, the minimum in Figure 3D is relatively shallow—the contour lines indicating equally efficient grids are widely spaced near the minimum. This leads us hypothesize that the measured grid geometries will be modestly variable around the triangular lattice, as reported in Stensola et al. (2012).
A recent study measured the ratio between grid periodicity and grid field size to be 1.63 ± 0.035 (mean ± S.E.M., n = 48) in wildtype mice (Giocomo et al., 2011a). This ratio was unchanged, 1.66 ± 0.03 (mean ± S.E.M., n = 86), in HCN1 knockout strains whose absolute grid periodicities increased relative to the wild type (Giocomo et al., 2011a). Such measurements are consistent with the prediction of the simple winnertakeall model, which predicts a ratio between grid period and grid field width of ${\lambda}_{i}/{l}_{i}=\sqrt{e}\approx 1.65$ (Figure 4B).
Discussion
We have shown that a grid system with a discrete set of periodicities, as found in the entorhinal cortex, should use a common scale factor r between modules to represent spatial location with the fewest neurons. In other words, the periods of grid modules should be organized in a geometric progression. In one dimension, this organization may be thought of intuitively as implementing a neural analog of a baseb number system. Roughly, the largest scale localizes the animal into a coarse region of the environment and finer scales successively subdivide the region into b ‘bins’. For example, suppose that the largest scale has one firing field in the environment and that b = 2, so that subsequent scales subdivide this firing field into halves (Figure 1B). Then, keeping track of which half the animal occupies at each scale gives a binary encoding of location. This is just like a binary number system being used to encode a number representing the location. Our problem of minimizing neuron number while fixing resolution is analogous to minimizing the product of the number of digits and the number of decimal places (which we can term complexity) needed to represent a given range R of integers in a baseb number system. The complexity is approximately C ∼ b log_{b} R. What ‘base’ minimizes the complexity of the representation? We can compute this by evaluating the extremum $\partial C/\partial b=0$ and find that the optimum is at b = e (details in Optimizing a ‘baseb’ representation of onedimensional space, ‘Materials and methods’). Our full theory is a generalization of this simple fixedbase representational scheme for numbers to noisy neurons encoding twodimensional location. It is remarkable that natural selection seems to have reached such efficient solutions for encoding location.
Our theory quantitatively predicted the ratios of adjacent scales within the variability tolerated by the models and by the data (Figure 4). Further tests of our theory are possible. For example, a direct generalization of our reasoning says that in ndimensions the optimal ratio between grid scales for winnertakeall decoding is $\sqrt[n]{e}$ (as compared to $\sqrt{e}$ in two dimensions). The threedimensional case is possibly relevant to the grid system in, for example, bats (Yartsev et al., 2011; Yartsev and Ulanovsky, 2013). Robustly, for any given decoding scheme, our theory would predict a smaller scaling ratio for 3d grids than for 2d grids. The packing density argument given above for twodimensional lattice structure, when generalized to three dimensions, would predict a face center cubic lattice or hexagonal close packing, which share the highest packing density. Bats are known to have 2d grids when crawling on surfaces (Yartsev et al., 2011) and if they also have a 3d grid system when flying, similar to their place cell system (Yartsev and Ulanovsky, 2013), our predictions for threedimensional grids can be directly tested. In general, the theory can be tested by comprehensive population recordings of grid cells along the dorso–ventral axis for animals moving in one, two, and threedimensional environments.
Our theory also predicts a logarithmic relationship between the natural behavioral range and the number of grid modules. To estimate the number of modules, m, required for a given resolution R_{2} via the approximate relationship $m=\text{log}{R}_{2}/\text{log}{\stackrel{~}{r}}^{2}$. Assuming that the animal must be able to represent an environment of area ∼(10 m)^{2} (e.g., Davis et al., 1948), with a positional accuracy on the scale of the rat's body size, ∼(10 cm)^{2}, we get a resolution of R_{2} ∼ 10^{4}. Together with the predicted twodimensional scale factor $\stackrel{~}{r}$, this gives m ≈ 10 as an orderofmagnitude estimate. Indeed, in Stensola et al. (2012), 4–5 modules were discovered in recordings spanning up to 50% of the dorsoventral extent of MEC; extrapolation gives a total module number consistent with our estimate.
How many grid cells do we predict in total? Consider the simplest case where grid cells are independent encoders of position in two dimensions. Our likelihood analysis (details in Optimizing the grid system: probabilistic decoder, ‘Materials and methods’) gives the number of neurons as N = mc(λ/σ)^{2}, where m is the number of modules and c is constant. In detail, c is determined by factors like the tuning curve shape of individual neurons and their firing rates, but broadly what matters is the typical number of spikes K that a neuron emits during a sampling time, because this will control the precision with which location can be inferred from a single cell's response. General considerations (Dayan and Abbott, 2001) indicate that c will be proportional to 1/K. We can estimate that if a rat runs at ∼50 cm/s and covers ∼1 cm in a sampling time, then a grid cell firing at 10 Hz (Stensola et al., 2012) gives K ∼ 1/5. Using our prediction that the number of modules will be ∼10 and that λ/σ ≈ 5.3 in the optimal grid (see Optimizing the grid system: probabilistic decoder, ‘Materials and methods’), we get N_{est} ≈ 1400. This estimate assumed independent neurons and that the decoder of the grid system will efficiently use all the information in every grid cell's response. This is unlikely to be the case. Given homogeneous noise correlations within a grid module, which will arise naturally if grid cells are formed by an attractor mechanism, the required number of neurons could be an order of magnitude higher (Sompolinsky et al., 2001; Averbeck et al., 2006). (Noise correlation between grid cells was investigated in Mathis et al. (2013); Dunn et al. (2015)—they found positive correlation between aligned grids of similar periods and some evidence for weak negative correlation for grids differing in phase.) Thus, in round numbers, we estimate that our theory requires something in the range of ∼1400–14000 grid cells.
Are there so many grid cells in the MEC? In fact, we need this number of grid cells separately in layer II and layer III of the MEC since these regions likely maintain separate grid codes. (To see this, recall that layers II and III project largely to the dentate gyrus and CA1, respectively [Steward and Scoville, 1976; Dolorfo and Amaral, 1998], while the place map in CA1 survives lesions of the dentate input to CA1 via CA3 [Brun et al., 2002].) Physiological studies (Sargolini et al., 2006) have shown that only about 10% of the cells in MEC are layer II grid cells and another 10% are layer III grid cells. Cells that have weak responsiveness during spatial tasks are probably undersampled in such experiments and so the real proportion of grid cells is likely to be somewhat smaller. Other studies (Mulders et al., 1997) have shown that MEC has ∼10^{5} neurons. Thus, we can estimate that layer II and layer III each contain something in the range of 5000–10000 grid cells. This is well within the predicted theoretical range.
Our analysis assumed that the grid code is hierarchical, with large grids resolving the spatial ambiguity created by the multiple firing fields of the small grids that deliver precision of location. Recall that place cells are thought to provide one readout of the grid system. Anatomical evidence (Van Strien et al., 2009) shows that the projections from the mEC to the hippocampus are restricted along the dorsoventral axis, so that a given place cell receives input from perhaps a quarter of the mEC. The data of Stensola et al. (2012) show additionally that the dorsal mEC is impoverished in large grid modules. If place cells were formed from grids via summation as in the model of (Solstad et al., 2006), the anatomy (Van Strien et al., 2009) and the hierarchical view of location coding that we have proposed would together predict that dorsal place cells should be revealed to have multiple place fields in large environments because their spatial ambiguities will not be fully resolved at larger scales. Preliminary evidence for such a multiplicity of dorsal place fields appears in Fenton et al. (2008); Rich et al. (2014). However, a naive model where place cells are sums of grid cells would also suggest that the multiple place fields would be arranged in an orderly, possibly periodic, manner. To the contrary, the data (Fenton et al., 2008; Rich et al., 2014) show that the multiple place fields of dorsal hippocampal cells are organized in a disorderly fashion. On the other hand, real grid fields show significant variability in period, orientation, and ellipticity even within a module (Stensola et al., 2012)—this variability would disorder any linearly summed place fields, changing the prediction of the naive model. We have not attempted to investigate this in detail because there is also significant evidence (summarized in Bush et al., 2014; Sasaki et al., 2015) that place cells are not formed and maintained via simple summation of grid cells alone, although they are influenced by them. It would be interesting for future work to integrate the accumulating information about the complex interplay between the hippocampus and the mEC to better understand the consequences of hierarchical grid organization for the hippocampal place system.
We assumed that the largest scales of grid modules should be roughly comparable to the behavioral range of the animal. This is consistent with the existing data on grid modules (Stensola et al., 2012) and with measurements in the largest environments tested so far (Brun et al., 2008) (periods at least as large as 10 m in an 18 m track). To accommodate very large environments, grids could either increase their scale (as reported at least transiently in Barry et al., 2007; Stensola et al., 2012) or could segment the environment into large sections (Derdikman et al., 2009; Derdikman and Moser, 2010) across which remapping occurs (Fyhn et al., 2007). These predictions can be tested in detail by exploring spatial coding in natural environments of behaviorally appropriate size and complexity. In fact, ethological studies have indicated a typical homing rate of a few tens of meters for rats with significant variation between strains (Davis et al., 1948; Fitch, 1948; Stickel and Stickel, 1949; Slade and Swihart, 1983; Braun, 1985). Our theory predicts that the period of the largest grid module and the number of modules will be correlated with homing range.
In our theory, we took the coverage factor d (the number of grid fields overlapping a given point in space) to be the same for each module. In fact, experimental measurements have not yet established whether this parameter is constant or varies between modules. How would a varying d affect our results? The answer depends on the dimensionality of the grid. In two dimensions, if neurons have weakly correlated noise, modular variation of the coverage factor does not affect the optimal grid at all. This is because the coverage factor cancels out of all relevant formulae, a coincidence of two dimensions (see Optimizing the grid system: probabilistic decoder, ‘Materials and methods’, and p. 112 of Dayan and Abbott, 2001). In one and three dimensions, variation of d between modules will have an effect on the optimal ratios between the variable modules. Thus, if the coverage factor is found to vary between grid modules for animals navigating one and three dimensions, our theory can be tested by comparing its predictions for the corresponding variations in grid scale factors. Similarly, even in two dimensions, if noise is correlated between grid cells, then variability in d can affect our predicted scale factor. This provides another avenue for testing our theory.
The simple winnertakeall model assuming compact grid fields predicted a ratio of field width to grid period that matched measurements in both wildtype and HCN1 knockout mice (Giocomo et al., 2011a). Since the predicted grid field width is model dependent, the match with the simple WTA prediction might be providing a hint concerning the method the brain uses to read the grid code. Additional data on this ratio parameter drawn from multiple grid modules may serve to distinguish and select between potential decoding models for the grid system. The probabilistic model did not make a direct prediction about grid field width; it instead worked with the standard deviation σ_{i} of the posterior P(xi). This parameter is predicted to be σ_{i} = 0.19λ_{i} in two dimensions (see Optimizing the grid system: probabilistic decoder, ‘Materials and methods’). This prediction could be tested behaviorally by comparing discrimination thresholds for location to the period of the smallest module. The standard deviation σ_{i} can also be related to the noise, neural density and tuning curve shape in each module (Dayan and Abbott, 2001).
Previous work by Fiete et al. (2008) proposed that the grid system is organized to represent very large ranges in space by exploiting the incommensurability (i.e., lack of common rational factors) of different grid periods. As originally proposed, the grid scales in this scheme were not hierarchically organized (as we now know they are Stensola et al., 2012) but were of similar magnitude, and hence it was particularly important to suggest a scheme where a large spatial range could be represented using grids with small and similar periods. Using all the scales together (Fiete et al., 2008) argued that it is easy to generate ranges of representation that are much larger than necessary for behavior, and Sreenivasan and Fiete argued that the excess capacity could be used for error correction over distances relevant for behavior (Sreenivasan and Fiete, 2011). However, recent experiments tell us that there is a hierarchy of scales (Stensola et al., 2012) which should make the representation of behaviorally plausible range of 20–100 m easily accessible in the alternative hierarchical coding scheme that we have proposed. Nevertheless, we have checked that a grid coding scheme with the optimal scale ratio predicted by our theory can represent space over ranges larger than the largest grid period (‘Range of location coding in a grid system’, Appendix 1). However, to achieve this larger range, the number of neurons in each module will have to increase relative to the minimum in order to shrink the widths of the peaks in the likelihood function over position. It could be that animals sometimes exploit this excess capacity either for error correction or to avoid remapping over a range larger than the period of the largest grid. That said, experiments do tell us that remapping occurs readily over relatively small (meter length) scales at least for dorsal (small scale) place cells and grid cells (Fyhn et al., 2007) in tasks that involve spatial cues.
Our hierarchical grid scheme makes distinctive predictions relative to a nonhierarchical model for the effects of selective lesions of grid modules in the context of specific models where grid cells sum to make place cells (details in ‘Predictions for the effects of lesions and for place cell activity’, Appendix 1). In such a simple grid to place cell transformation, lesioning the modules with small periods will expand place field widths, while lesioning modules with large periods will lead to increased firing at locations outside the main place field, at scales set by the missing module. Similar effects are predicted for any simple decoder of a lesioned hierarchical grid system that has no other location related inputs—that is, animals with lesions to fine grid modules will show less precision in spatial behavior, while animals with lesions to large grid modules will confound wellseparated locations. In contrast, in a nonhierarchical grid scheme with similar but incommensurate periods, lesions of any module lead to the appearance of multiple place fields at many scales for each place cell. Recent studies which ablated a large fraction of the mEC at all depths showed an increase in place field widths (Hales et al., 2014), as did the more focal lesions of Ormond and McNaughton (2015) along the dorso–ventral axis of the mEC. However, there are multiple challenges in interpreting these experiments. First, the data of Stensola et al. (2012) shows that there are modules with both small and large periods at every depth along the mEC—the dorsal mEC is simply enriched in modules with large periods. So Hales et al. (2014); Ormond and McNaughton (2015) are both removing modules that have both small and large periods. A simple linear transformation from a hierarchical grid to place cells would predict that removing large periods increases the number of place fields, but Hales et al. (2014) did not look for this effect while in Ormond and McNaughton (2015) the reported number of place fields decreases after lesions (including complete dirsruption of place fields of some cells). The underlying difficulty in interpretation is that while place cells might be summing up grid cells, there is evidence that they can be formed and maintained through mechanisms that may not critically involve the mEC at all (Bush et al., 2014; Sasaki et al., 2015). Thus, despite the interpretation given in Kubie and Fox (2015); Ormond and McNaughton (2015) in favor of the partial validity of a linearly summed grid to place model, it is difficult for theory to make a definitive prediction for experiments until the interrelation of the mEC and hippocampus is better understood.
Mathis et al. (2012a) and Mathis et al. (2012b) studied the resolution and representational capacity of grid codes vs place codes. They found that grid codes have exponentially greater capacity to represent locations than place codes with the same number of neurons. Furthermore, Mathis et al. (2012a) predicted that in one dimension a geometric progression of grids that is selfsimilar at each scale minimizes the asymptotic error in recovering an animal's location given a fixed number of neurons. To arrive at these results the authors formulated a population coding model where independent Poisson neurons have periodic onedimensional tuning curves. The responses of these model neurons were used to construct a maximum likelihood estimator of position, whose asymptotic estimation error was bounded in terms of the Fisher information—thus the resolution of the grid was defined in terms of the Fisher information of the neural population (which can, however, dramatically overestimate coding precision for neurons with multimodal tuning curves [Bethge et al., 2002]). Specializing to a grid system organized in a fixed number of modules, Mathis et al. (2012a) found an expression for the Fisher information that depended on the periods, populations, and tuning curve shapes in each module. Finally, the authors imposed a constraint that the scale ratio had to exceed some fixed value determined by a ‘safety factor’ (dependent on tuning curve shape and neural variability), in order reduce ambiguity in decoding position. With this formulation and assumptions, optimizing the Fisher information predicts geometric scaling of the grid in a regime where the scale factor is sufficiently large. The Fisher information approximation to position error in Mathis et al. (2012a) is only valid over a certain range of parameters. An ambiguityavoidance constraint keeps the analysis within this range, but introduces two challenges for an optimization procedure: (i) the optimum depends on the details of the constraint, which was somewhat arbitrarily chosen and was dependent on the variability and tuning curve shape of grid cells, and (ii) the optimum turns out to saturate the constraint, so that for some choices of constraint the procedure is pushed right to the edge of where the Fisher information is a valid approximation at all, causing difficulties for the selfconsistency of the procedure.
Because of these limits on the Fisher information approximation, Mathis et al. (2012a) also measured decoding error directly through numerical studies. But here a complete optimization was not possible because there are too many interrelated parameters, a limitation of any numerical work. The authors then analyzed the dependence of the decoding error on the grid scale factor and found that, in their theory, the optimal scale factor depends on ‘the number of neurons per module and peak firing rate’ and, relatedly, on the ‘tolerable level of error’ during decoding (Mathis et al., 2012a). Note that decoding error was also studied in Towse et al. (2014) and those authors reported that the results did not depend strongly on the precise organization of scales across modules.
In contrast to Mathis et al. (2012a), we estimated decoding error directly by working with approximated forms of the likelihood function over position rather than by approximating decoding error in terms of the Fisher information. Conceptually, we can think of the winnertakeall analysis as effectively approximating the likelihood in terms of periodic boxcar functions; for the probabilistic analysis, we treat the likelihood as a periodic sumofGaussians. Since at least scores of cells are being combined within modules, the Gaussian approximation to local likelihood peaks is valid, allowing us to circumvent detailed analysis of tuning curves and variability of individual neurons. These approximations allow analytical treatment of the optimization problem over a much wider parameter range without requiring arbitrary handimposed constraints. Our formulation of grid resolution then simply estimates the number of distinct regions that a fixed range can be divided into. We then fix this resolution as being behaviorally determined and minimize the number of required neurons while allowing the periods of the modules, and, crucially, the number of modules, to vary to achieve the minimum.
All told, our simpler, and more intuitive, formulation of grid coding embodies very general considerations trading off precision and ambiguity with a sufficiently dense population of grid cells. The simplicity and generality of our setting allows us to make predictions for structural parameters of the grid system in different dimensions. These predictions—scaling ratios in 1, 2, and 3 dimensions; the ratio of grid period to grid field width; the number of expected modules; the shape of the optimal grid lattice; an estimate of the total expected number of grid cells—can be directly tested in experiments.
There is a long history in the study of sensory coding, especially vision, of identifying efficiency principles underlying neural circuits and codes starting with Barlow (1961). Our results constitute evidence that such principles might also operate in the organization of cognitive circuits processing nonsensory variables. Furthermore, the existence of an efficiency argument for grid organization of spatial coding suggests that grid systems may be universal amongst the vertebrates, and not just a rodent specialization. In fact, there is evidence that humans (Doeller et al., 2010; Jacobs et al., 2013) and other primates (Killian et al., 2012) also have grid systems. We expect that our predicted scaling of the grid modules also holds in humans and other primates.
Materials and methods
Optimizing a ‘baseb’ representation of onedimensional space
Suppose that we want to resolve location with a precision l in a track of length L. In terms of the resolution R = L/l, we argued in the ‘Discussion’ that a ‘baseb’ hierarchical neural coding scheme will roughly require N = b log_{b} R neurons. To derive the optimal base (i.e., the base that minimizes the number of the neurons), we evaluate the extremum $\partial N/\partial b=0$:
Setting $\partial N/\partial b=0$ gives lnb − 1 = 0. Therefore, the number of neurons is extremized when b = e. It is easy to check that this is a minimum. Of course, the base of a number system is usually taken to be an integer, so the argument should be taken as motivating the more detailed treatment of neural representations of space above. Neurons are of course not constrained to organize the periodicity of their tuning curves in integer ratios.
Optimizing the grid system: winnertakeall decoder
Deriving the optimal grid
We have seen that, for a winnertakeall decoder, the problem of deriving the optimal ratios of adjacent grid scales in one dimension is equivalent to minimizing the sum of a set of numbers ($N=d\text{\hspace{0.17em}}{{\displaystyle \sum}}_{i=1}^{m}\text{\hspace{0.17em}}{r}_{i}$) while fixing the product (${R}_{1}={{\displaystyle \prod}}_{i=1}^{m}\text{\hspace{0.17em}}{r}_{i}$) to take the value R. Mathematically, it is equivalent to minimize N while fixing lnR_{1}. When N is large, we can treat it as a continuous variable and use the method of Lagrange multipliers as follows. First, we construct the auxiliary function H(r_{1}⋯r_{m}, β) = N − β (ln R_{1} − ln R) and then extremize H with respect to each r_{i} and β. Extremizing with respect to r_{i} gives
Next, extremizing with respect to β to implement the constraint on the resolution gives
Having thus implemented the constraint that lnR_{1} = lnR, it follows that H = N = dmR^{1/m}. Alternatively, solving for m in terms of r, we can write H = d r (ln R)/ln r) = d r log_{r} R. It remains to minimize the number of cells N with respect to r,
This is in turn implies our result
for the optimal ratio between adjacent scales in a hierarchical, grid coding scheme for position in one dimension, using a winnertakeall decoder. In this argument, we employed the sleight of hand that N and m can be treated as continuous variables, which is approximately valid when N is large. This condition obtains if the required resolution R is large. A more careful argument is given below that preserves the integer character of N and m.
Integer N and m
Above we used Lagrange multipliers to enforce the constraint on resolution and to bound the scale ratios to avoid ambiguity while minimizing the number of neurons required by a winnertakeall decoding model of grid systems. Here, we will carry out this minimization while recognizing that the number of neurons is an integer. First, consider the arithmetic mean–geometric mean inequality which states that, for a set of nonnegative real numbers, x_{1}, x_{2},…, x_{m}, the following holds:
with equality if and only if all the x_{i}'s are equal. Applying this inequality, it is easy to see that to minimize ${{\displaystyle \sum}}_{i=1}^{m}\text{\hspace{0.17em}}{r}_{i}$, all of the r_{i} should be equal. We denote this common value as r, and we can write r = R^{1/m}.
Therefore, we have
Suppose R = e^{z + ϵ}, where z is an integer, and ϵ ∈ [0, 1). By taking the first derivative of N with respect to m, and setting it to zero, we find that N is minimized when m = z + ϵ. However, since m is an integer the minimum will be achieved either at m = z or m = z + 1. (Here, we used the fact mR^{1/m} is monotonically increasing between 0 and z + ϵ and is monotonically decreasing between z + ϵ and ∞.) Thus, minimizing N requires either
In either case, when z is large (and therefore R, N and m are large), r → e. This shows that when the resolution R is sufficiently large, the total number of neurons N is minimized when r_{i} ≈ e for all i.
Optimal winnertakeall grids: general formulation
As described in the above, we wish to choose the grid system parameters {λ_{i}, l_{i}}, 1 ≤ i ≤ m, as well as the number of scales m, to minimize neuron number:
where d is the fixed coverage factor in each module, while constraining the positional accuracy of the grid system and the range of representation. We can take the positional accuracy to be proportional to the grid field width of the smallest module. This gives
To give a sufficiently large range of representation in our hierarchical scheme we will require that
Following the main text, to eliminate ambiguity at each scale we need that
where c_{2} depends on the tuning curve shape and coverage factor (written as f(d) above).
We will first fix m and solve for the remaining parameters, then optimize over m in a subsequent step. Optimization problems subject to inequality constraints may be solved by the method of KarushKuhnTucker (KKT) conditions (Kuhn and Tucker, 1951). We first form the Lagrange function,
The KKT conditions include that the gradient of $\mathrm{\mathcal{L}}$ with respect to {λ_{i},...,l_{i}} vanish,
together with the ‘complementary slackness’ conditions,
From Equations 15, 16, we obtain:
It follows that β_{i} ≠ 0, and so the complementary slackness conditions give:
Substituting this result into Equation 19 yields,
that is, the scale factor r is the same for all modules. Once we obtain a value for r, Equations 20–22 yield values for all λ_{i} and l_{i}. Since the resolution constraint may now be rewritten,
we have m = ln (c_{1}L/A)/lnr. Therefore, r determines m and so minimizing N over m is equivalent to minimizing over r. Expressing N entirely in terms of r gives,
Optimizing with respect to r gives the result r = e, independent of d, c_{1}, c_{2}, L, and R.
Optimizing the grid system: probabilistic decoder
Consider a probabilistic decoder of the grid system that pools all the information available in the population of neurons in each module by forming the posterior distribution over position given the neural activity. In this general setting, we assume that the firing of different grid cells is weakly correlated, that noise is homogeneous, and that the tuning curves in each module i provide dense, uniform, coverage of the interval λ_{i}. With these assumptions, we will first consider the onedimensional case, and then analyze the twodimensional case by analogy.
Onedimensional grids
With the above assumptions, the likelihood of the animal's position, given the activity of grid cells in module i, P(xi), can be approximated as a series of Gaussian bumps of standard deviation σ_{i} spaced at the period λ_{i} (Dayan and Abbott, 2001). As defined in 'Results', the number of cells (n_{i}) in the ith module, is expressed in terms of the period (λ_{i}), the grid field width (l_{i}) and a ‘coverage factor’ d representing the cell density as n_{i} = dλ_{i}/l_{i}. The coverage factor d will control the relation between the grid field width l_{i} and the standard deviation σ_{i} of the local peaks in the likelihood function of location. If d is larger, σ_{i} will be narrower since we can accumulate evidence from a denser population of neurons. The ratio $\frac{{l}_{i}}{{\sigma}_{i}}$ in general will be a monotonic function of the coverage factor d, which we will write as $\frac{{l}_{i}}{{\sigma}_{i}}=g\left(d\right)$. In the special case where the grid cells have independent noise $g\left(d\right)\propto \sqrt{d}$, so that ${\sigma}_{i}/{l}_{i}\propto 1/\sqrt{d}$—that is, the precision increases as the inverse square root of the cell density, as expected because the relevant parameter is the number of cells within one grid field rather than the total number of cells. Note that this does not imply an inverse square root relation between the number of cells n_{i} and σ_{i}, because n_{i} is also proportional to the period λ_{i}, and in our formulation the density d is fixed while λ_{i} can be varied. Note also that if the neurons have correlated noise, g(d) may scale substantially slower than $\sqrt{d}$ (Britten et al., 1992; Zohary et al., 1994; Sompolinsky et al., 2001). Putting all of these statements together, we have, in general, ${n}_{i}=\frac{d}{g\left(d\right)}\frac{{\lambda}_{i}}{{\sigma}_{i}}$. Assuming that the coverage factor d is the same across modules, we can simplify the notation and write ${n}_{i}=c\frac{{\lambda}_{i}}{{\sigma}_{i}}$, where c = d/g(d) is a constant. (Again, for independent noise σ_{i} ∝ 1/d as expected—see above—and this does not imply a similar relationship to the number of cells n_{i} as one might have naively assumed.) In sum, we can write the total number of cells in a grid system with m modules as $N={{\displaystyle \sum}}_{i=i}^{m}\text{\hspace{0.17em}}{n}_{i}=c\text{\hspace{0.17em}}{{\displaystyle \sum}}_{i=1}^{m}\text{\hspace{0.17em}}\frac{{\lambda}_{i}}{{\sigma}_{i}}$.
The likelihood of position derived from each module can be combined to give an overall probability distribution over location. Let Q_{i}(x) be the likelihood obtained by combining modules 1 (the largest period) through i. Assuming that the different modules have independent noise, we can compute Q_{i}(x) from the module likelihoods as ${Q}_{i}\left(x\right)\propto {{\displaystyle \prod}}_{j=1}^{i}\text{\hspace{0.17em}}P\left(xj\right)$. We will take the prior probability over locations be uniform here so that this combined likelihood is equivalent to the Bayesian posterior distribution over location. The likelihoods from different scales have different periodicities, so multiplying them against each other will tend to suppress all peaks except the central one, which is aligned across scales. We may thus approximate Q_{i}(x) by single Gaussians whose standard deviations we will denote as δ_{i}. (The validity of this approximation is taken up in further detail below.)
Since Q_{i}(x) ∝ Q_{i−1}(x)P(xi), δ_{i} is determined by δ_{i−1}, λ_{i} and σ_{i}. These all have dimensions of length. Dimensional analysis (Rayleigh, 1896) therefore says that, without loss of generality, the ratio δ_{i}/δ_{i−1} can be written as a dimensionless function of any two crossratios of these parameters. It will prove useful to use this freedom to write ${\delta}_{i}={\delta}_{i1}/\rho \left(\frac{{\lambda}_{i}}{{\sigma}_{i}},\text{\hspace{0.17em}}\frac{{\sigma}_{i}}{{\delta}_{i1}}\right)$. The standard error in decoding the animal's position after combining information from all the grid modules will be proportional to δ_{m}, the standard deviation of Q_{m}. We can iterate our expression for δ_{i} in terms of δ_{i−1} to write ${\delta}_{m}={\delta}_{0}/{{\displaystyle \prod}}_{i=1}^{m}\text{\hspace{0.17em}}{\rho}_{i}$, where δ_{0} is the uncertainty in location without using any grid responses at all. (We are abbreviating ρ_{i} = ρ(λ_{i}/σ_{i}, σ_{i}/δ_{i−1})). In the present probabilistic context, we can view δ_{0} as the standard deviation of the a priori distribution over position before the grid system is consulted, but it will turn out that the precise value or meaning of δ_{0} is unimportant. We assume a behavioral requirement that fixes δ_{m} and thus the resolution of the grid, and that δ_{0} is likewise fixed by the behavioral range. Thus, there is a constraint on the product $\prod}_{i}{\rho}_{i$.
Putting everything together, we wish to minimize $N=c\text{\hspace{0.17em}}{{\displaystyle \sum}}_{i=1}^{m}\text{\hspace{0.17em}}\frac{{\lambda}_{i}}{{\sigma}_{i}}$ subject to the constraint that $R={{\displaystyle \prod}}_{i=1}^{m}\text{\hspace{0.17em}}{\rho}_{i}$, where ρ_{i} is a function of λ_{i}/σ_{i} and σ_{i}/δ_{i − 1}. Given the formula for ρ_{i} derived in the next section, this can be carried out numerically. To understand the optimum, it is helpful to observe that the problem has a symmetry under permutations of i. So we can guess that in the optimum all the λ_{i}/σ_{i}, σ_{i}/δ_{i − 1} and ρ_{i} will be equal to a fixed λ/σ, σ/δ, and ρ. We can look for a solution with this symmetry and then check that it is an optimum. First, using the symmetry, we write N = cm(λ/σ) and R = ρ^{m}. It follows that N = c(1/lnρ)(λ/σ) and we want to minimize it with respect to λ/σ and σ/δ. Now, ρ(λ/σ, σ/δ) is a complicated function of its arguments (Equation 30) which has a maximum value as a function of σ/δ for any fixed λ/σ. To minimize N at fixed λ/σ, we should maximize ρ with respect to σ/δ (Figure 5). Given this ρ_{max}, we can minimize N = c(λ/σ)/ln ρ_{max}(λ/σ) with respect to λ/σ, and then plug back in to find the optimal ρ. It turns out to be ${\rho}_{max}^{\text{*}}=2.3$.
In fact, ρ is equal to the scale factor of the grid: ρ_{i} = r_{i} = λ_{i}/λ_{i+1}. To see this, we have to express ρ_{i} in terms of the parameters λ_{i}/σ_{i} and σ_{i}/δ_{i−1}: ${\rho}_{i}=\frac{{\delta}_{i1}}{{\delta}_{i}}=\frac{{\delta}_{i1}}{{\sigma}_{i}}\frac{{\sigma}_{i}}{{\lambda}_{i}}\frac{{\lambda}_{i}}{{\lambda}_{i+1}}\frac{{\lambda}_{i+1}}{{\sigma}_{i+1}}\frac{{\sigma}_{i+1}}{{\delta}_{i}}$. Since the factors σ_{i}/δ_{i − 1} and λ_{i}/σ_{i} are independent of i, they cancel in the product and we are left with ρ_{i} = λ_{i}/λ_{i + 1}.
Thus, the probabilistic decoder predicts an optimal scale factor r^{*} = 2.3 in one dimension. This is similar to, but somewhat different than, the winnertakeall result r^{*} = e = 2.7 (Figure 5). At a technical level, the difference arises because the function ρ_{max}(λ/σ) is effectively ${\rho}_{max}\propto \frac{\lambda}{\sigma}$ in the winnertakeall analysis, but in the probabilistic case, it is more nearly a linear function with a positive offset $\rho \approx {\alpha}^{1}\left(\frac{\lambda}{\sigma}+\beta \right)$. Conceptually, the optimal probabilistic scale factor is smaller in order to suppress side lobes that can arise in the combined likelihood across modules (Figure 2). Such side lobes were absent in the winnertakeall analysis. The optimization also predicts λ^{*} = 9.1σ. This relation between the period and standard deviation at each scale could be converted into a relation between grid period and grid field width given specific measurements of tuning curves, noise levels, and cell density in each module. For example, if neurons within a module have independent noise, then general population coding considerations (Dayan and Abbott, 2001) show that σ = βd^{−1/2}l, where l is a measure of grid field width, d is the density of neurons in a module, and β is a dimensionless number that depends on noise (given the integration time) and tuning curve shape.and tuning curve shape.
Twodimensional grids
A similar probabilistic analysis can be carried out for twodimensional grid fields. The posteriors P(xi) become twodimensional sumsofGaussians, with the centers of the Gaussians laid out on the vertices of the grid. Q_{i}(x) is then similarly approximated by a twodimensional Gaussian. Generalizing from the onedimensional case, the number of cells in module i is given by n_{i} = d(λ_{i}/l_{i})^{2}, where d is density of grid fields. As in one dimension, increasing the density d will decrease the standard deviation σ_{i} of the local bumps in the posterior P(xi)—that is, l_{i}/σ_{i} = g(d), where g is an increasing function of d. In the special case where the neurons have independent noise, g(d) ∝ d so that the precision measured by the standard deviation σ_{i} decreases as the inverse square root of d. Putting all of these statements together, we have, in general, ${n}_{i}=\frac{d}{g{\left(d\right)}^{2}}{\left(\frac{{\lambda}_{i}}{{\sigma}_{i}}\right)}^{2}$. In the special case where noise is independent so that g(d) ∝ d, the density d cancels out in this expression, and in this case, or when the density d is the same across modules, we can write ${n}_{i}=c{\left(\frac{{\lambda}_{i}}{{\sigma}_{i}}\right)}^{2}$, where c is just a constant. Redoing the optimization analysis from the onedimensional case, the form of the function ρ changes (Calculating $\rho \left(\frac{\lambda}{\sigma},\frac{\sigma}{\delta}\right)$, ‘Materials and methods’), but the logic of the above derivation is otherwise unaltered. In the optimal grid, we find that λ^{*} ≈ 5.3σ (or equivalently σ ≈ 0.19λ^{*}).
Calculating $\rho \left(\frac{\lambda}{\sigma},\frac{\sigma}{\delta}\right)$
Above, we argued that the function $\rho \left(\frac{\lambda}{\sigma},\text{\hspace{0.17em}\hspace{0.17em}}\frac{\sigma}{\delta}\right)$ can be computed by approximating the posterior distribution of the animal's position given the activity in module i, P(xi), as a periodic sumofGaussians:
where K is assumed large. We further approximate the posterior given the activity of all modules coarser than λ_{i} by a Gaussian with standard deviation δ_{i−1}:
(We are assuming here that the animal is really located at x = 0 and that the distributions P(xi) for each i have one peak at this location.) Assuming noise independence across scales, it then follows that ${Q}_{i}\left(x\right)=\frac{P\left(x\text{\hspace{0.17em}}\text{\hspace{0.17em}}i\right){Q}_{i1}\left(x\right)}{{\displaystyle \int}\text{\hspace{0.17em}d}x\text{\hspace{0.17em}}P\left(x\text{\hspace{0.17em}}\text{\hspace{0.17em}}i\right){Q}_{i1}\left(x\right)}$. Then ρ(λ_{i}/σ_{i}, σ_{i}/δ_{i − 1}) is given by δ_{i − 1}/δ_{i}, where δ_{i} is the standard deviation of Q_{i}. We therefore must calculate Q_{i}(x) and its variance in order to obtain ρ. After some algebraic manipulation, we find,
where ${\text{\Sigma}}^{2}={\left({\sigma}_{i}^{2}+{\delta}_{i1}^{2}\right)}^{1}$, ${\mu}_{n}={\left(\frac{\text{\Sigma}}{{\sigma}_{i}}\right)}^{2}{\lambda}_{i}\text{\hspace{0.17em}}n$, and
Z is a normalization factor enforcing $\text{\hspace{0.17em}}{\displaystyle {\sum}_{n}{\pi}_{n}}=1$. Q_{i} is thus a mixtureofGaussians, seemingly contradicting our approximation that all the Q are Gaussian. However, if the secondary peaks of P(xi) are well into the tails of Q_{i−1}(x), then they will be suppressed (quantitatively, if ${\lambda}_{i}^{2}\gg {\sigma}_{i}^{2}+{\delta}_{i1}^{2}$, then ${\pi}_{n}\ll {\pi}_{0}$ for n ≥ 1), so that our assumed Gaussian form for Q holds to a good approximation. In particular, at the values of λ, σ and δ selected by the optimization procedure described above, π_{1} = 1.3 × 10^{−3}π_{0}. So our approximation is selfconsistent.
Next, we find the variance ${\delta}_{i}^{2}$:
We can finally read off $\rho \left(\frac{{\lambda}_{i}}{{\sigma}_{i}},\text{\hspace{0.17em}\hspace{0.17em}}\frac{{\sigma}_{i}}{{\delta}_{i1}}\right)$ as the ratio δ_{i−1}/δ_{i}:
For the calculations reported in the text, we took K = 500.
We explained above that we should maximize ρ over $\frac{\sigma}{\delta}$, while sholding $\frac{\lambda}{\sigma}$ fixed. The first factor in Equation 30 increases monotonically with decreasing $\frac{\sigma}{\delta}$; however, $\sum}_{n}\text{\hspace{0.17em}}{n}^{2}{\pi}_{n$ also increases and this has the effect of reducing ρ. The optimal $\frac{\sigma}{\delta}$ is thus controlled by a tradeoff between these factors. The first factor is related to the increasing precision given by narrowing the central peak of P(xi), while the second factor describes the ambiguity from multiple peaks.
Generalization to twodimensional grids
The derivation can be repeated in the twodimensional case. We take P(xi) to be a sumofGaussians with peaks centered on the vertices of a regular lattice generated by the vectors $\left({\lambda}_{i}\overrightarrow{u},\text{\hspace{0.17em}\hspace{0.17em}}{\lambda}_{i}\overrightarrow{v}\right)$. We also define ${\delta}_{i}^{2}\equiv \frac{1}{2}{\langle x{}^{2}\rangle}_{{Q}_{i}}$. The factor of 1/2 ensures that the variance so defined is measured as an average over the two dimensions of space. The derivation is otherwise parallel to the above, and the result is,
where ${\pi}_{n,m}=\frac{1}{Z}{e}^{n\overrightarrow{u}+m\overrightarrow{v}{}^{2}{\lambda}_{i}^{2}/2\left({\sigma}_{i}^{2}+{\delta}_{i1}^{2}\right)}$.
Reanalysis of grid data from previous studies
We reanalyzed the data from Barry et al. (2007) and Stensola et al. (2012) in order to get the mean and the variance of the ratio of adjacent grid scales. For Barry et al. (2007), we first read the raw data from Figure 3B of their paper using the software GraphClick, which allows retrieval of the original (x,y)coordinates from the image. This gave the scales of grid cells recorded from six different rats. For each animal, we grouped the grids that had similar periodicities (i.e., differed by less than 20%) and calculated the mean periodicity for each group. We defined this mean periodicity as the scale of each group. For four out of six rats, there were two scales in the data. For one out six rats, there were three grid scales. For the remaining rat, only one scale was obtained as only one cell was recorded from that rat. We excluded this rat from further analysis. We then calculated the ratio between adjacent grid scales, resulting in 6 ratios from five rats. The mean and variance of the ratio were 1.64 and 0.09, respectively (n = 6).
For Stensola et al. (2012), we first read in the data using GraphClick from Figure 5D of their paper. This gave the scale ratios between different grids for 16 different rats. We then pooled all the ratios together and calculated the mean and variance. The mean and variance of the ratio were 1.42 and 0.17, respectively (n = 24).
Giocomo et al. (2011a) reported the ratios between the grid period and the radius of grid field (measured as the radius of the circle around the center field of the autocorrelation map of the grid cells) to be 3.26 ± 0.07 and 3.32 ± 0.06 for Wildtype and HCN KO mice, respectively. We halved these measurements to the ratios between grid period and the diameter of the grid field to facilitate the comparison to our theoretical predictions. The results are plotted in a bar graph (Figure 4B).
Finally, in Figure 4C, we replotted Figure 1C from Hafting et al. (2005) by reading in the data using GraphClick and then translating that information back into a plot.
Appendix 1
Range of location coding in a grid system
The main text describes hierarchical grid coding schemes where the larger periods resolve ambiguity and smaller periods give precision in location coding. We took the largest grid period to be comparable to the behavioral range. In fact, if the periods λ_{i} of the different modules are incommensurate with each other (i.e., they do not share common integer factors), it should be possible to resolve location over ranges larger than the largest grid period (Fiete et al., 2008; Sreenivasan and Fiete, 2011). The grid schemes that we predict share this virtue since they predict scale ratios that are not simple rational numbers. However, the precise maximum range will also depend on the widths of the grid fields l_{i} relative to the period and on the number of grid cells n_{i} in each module. In the probabilistic decoding scheme described in the main text, these parameters determine the standard deviation σ_{i} of the periodic peaks in the likelihood of position given the activity in module i. The full range of unambiguous location representation depends on the ratios λ_{i}/σ_{i}. Increasing this ratio will tend to increase the range of unambiguous representation, but at the cost of increasing the number of cells in each module.
To illustrate, consider a onedimensional grid system with four modules with a ratio of 2.7 between adjacent scales (this is close to the optimal ratio predicted by our analysis). Suppose the animal's true location is at 0. We can calculate the overall probability of the animal's location by multiplying together the likelihood functions resulting from activity in each individual module (see main text for details). We will examine the extent to which location can be decoded unambiguously over a range (−3λ_{max}, 3λ_{max}) where λ_{max} is the larges period. When λ_{i}/σ_{i} is close to the value of 9.1 predicted by the probabilistic analysis in Optimizing the grid system: probabilistic decoder, ‘Materials and methods’, the overall likelihood shows substantial ambiguity over this range because of secondary peaks in the likelihood distribution (Appendix figure 1A). As λ_{i}/σ_{i} increases (requiring more neurons in each module), these secondary peaks decrease in amplitude. In Appendix figure 1B, we show that when λ_{i}/σ_{i} = 30, the 4module grid system can represent location at least within the range (−3λ_{max}, 3λ_{max}).
If there is a biological limitation to the largest period possible in a grid system, and if the organism must represent very large ranges without grid remapping, it may prove beneficial to add neurons to expand range. Analyzing this tradeoff requires knowledge of the range, biophysical limits on grid periods, and the degree of ambiguity (the maximum heights of secondary peaks in the probability of position) that can be behaviorally tolerated. This information is not currently available for any species, and so we do not attempt the analysis.
Predictions for the effects of lesions and for place cell activity
In the grid coding scheme that we propose there is a hierarchy of grid periods governed by a geometric progression. The alternative schemes of Fiete et al. (2008); Sreenivasan and Fiete (2011) are designed to produce a large range of representation from grids with similar periods. These two alternatives make very different predictions for the effects of lesions in the entorhinal cortex on location coding. In a hierarchical scheme, losing a grid module produces location ambiguities that increase in size with the period of the missing module. In the alternative scheme of Fiete et al. (2008); Sreenivasan and Fiete (2011) lesions of a module produce periodic ambiguities that are sporadically tied to the missing period. An illustrative example is shown in Appendix figure 2.
The grid cell representation of space in the entorhinal cortex is related in a complex manner to the hippocampal place cell representation (Bush et al., 2014; Sasaki et al., 2015). Simplistic models of this transformation assume that grid cells are pooled in the hippocampus and that some form of synaptic plasticity selects inputs with the same spatial phase (Solstad et al., 2006). In the context of such a model (which does not reflect many aspects of the known physiology), our grid scheme makes specific predictions for the effects of module lesions on place fields.
We use a firing rate model for both place cells and grid cells. The 1d grid cell firing rate is modeled as a periodic sum of truncated Gaussians (a full Gaussian mixture model gives similar results but the truncated model is easier to handle numerically). We will consider four grid modules with module periods λ_{i}, Gaussian standard deviations σ_{i} of the bump of the grid cell tuning curve, and ratios λ_{i}/σ_{i} = 9.1. The grid periods follow a scaling λ_{i}/λ_{i + 1} = 2.7, and we examine place coding over the range set by the biggest period λ_{1}.
The place cell response is modeled via linear pooling of grid cells with the same phase followed by a threshold and an exponential nonlinearity:
Here, g_{i}(x) is the grid cell firing rate, c = 0.3 sets the threshold and $m=max\text{\hspace{0.17em}}\left\{exp\left({{\displaystyle \sum}}_{1}^{4}{g}_{i}\left(x\right)\right)\right\}$ is the maximum activation. This is a simplified description of the essential features of many models of the gridplace transformation (see, e.g., [Solstad et al., 2006; de Almeida et al., 2009] and the review [Giocomo et al., 2011b]). To model the effect of lesioning grid module i, we set the g_{i}(x) = 0. The results are shown in the Appendix figure 3. Qualitatively, lesioning the smallest grid module increases the place cell width, while lesioning the largest grid module leads to increased firing in locations outside the main place fields. In general, lesioning different grid modules along the hierarchy leads to different effects on the place field. This is a testable prediction in future experiments. Note that lesions of dorsalventral bands are not a direct test—multiple grid modules coexist in each location along the dorsalventral axis (Stensola et al., 2012).
For comparison purposes, we also simulated a nonhierarchical model where grid periods are similar but incommensurate. In this model, the place cell response is
where $c=\stackrel{~}{0}.35$ is a threshold, $\stackrel{~}{m}=max\text{\hspace{0.17em}}\left\{exp\left({{\displaystyle \sum}}_{1}^{4}\text{\hspace{0.17em}}\stackrel{~}{{g}_{i}}\left(x\right)\right)\right\}$, and $\stackrel{~}{{g}_{i}}\left(x\right)$ is the grid cell firing rate again modeled as a sum of truncated Gaussians. In each module, we took the standard deviation of the Gaussians to be 1/210 of the whole range. The periods of the grids in the four modules were 1/105 (forth), 1/70 (third), 1/42 (second), 1/30 (first) of the whole range respectively. Again, to model the effect of lesioning grid module i, we set the $\stackrel{~}{{g}_{i}}\left(x\right)=0$. In this grid scheme, lesioning any grid module leads to qualitatively similar effects on the place cell activity, as they all lead to the emergence of several place fields (Appendix figure 3). This is in contrast with the hierarchical scheme, in which lesioning the largest scale leads to an expansion of place fields rather than an increase in the number of fields.
References

1
Neural correlations, population coding and computationNature Reviews Neuroscience 7:358–366.https://doi.org/10.1038/nrn1888

2
Possible principles underlying the transformation of sensory messagesSensory Communication pp. 217–234.

3
Experiencedependent rescaling of entorhinal gridsNature Neuroscience 10:682–684.https://doi.org/10.1038/nn1905

4
Optimal shortterm population coding: when fisher information failsNeural Computation 14:2317–2351.https://doi.org/10.1162/08997660260293247

5
Home range and activity patterns of the giant kangaroo rat, dipodomys ingensJournal of Mammalogy 6:1–12.https://doi.org/10.2307/1380950

6
The analysis of visual motion: a comparison of neuronal and psychophysical performanceThe Journal of Neuroscience 12:4745–4765.
 7
 8

9
Accurate path integration in continuous attractor network models of grid cellsPLOS Computational Biology 5:e1000291.https://doi.org/10.1371/journal.pcbi.1000291

10
What do grid cells contribute to place cell firing?Trends in Neurosciences 37:136–145.https://doi.org/10.1016/j.tins.2013.12.003
 11

12
Studies on home range in the brown ratJournal of Mammalogy 29:207–225.https://doi.org/10.2307/1375387
 13

14
The input–output transformation of the hippocampal granule cells: from grid cells to place fieldsThe Journal of Neuroscience 29:7504–7512.https://doi.org/10.1523/JNEUROSCI.604808.2009

15
A manifold of spatial maps in the brainTrends in Cognitive Sciences 14:561–569.https://doi.org/10.1016/j.tics.2010.09.004

16
Fragmentation of grid cell maps in a multicompartment environmentNature Neuroscience 12:1325–1332.https://doi.org/10.1038/nn.2396
 17
 18

19
Correlations and functional connections in a population of grid cellsPLOS Computational Biology 11:e1004052.https://doi.org/10.1371/journal.pcbi.1004052
 20

21
What grid cells convey about rat locationThe Journal of Neuroscience 28:6858–6871.https://doi.org/10.1523/JNEUROSCI.568407.2008

22
Habits and economic relationships of the tulare kangaroo ratJournal of Mammalogy 29:5–35.https://doi.org/10.2307/1375277
 23

24
Spatial representation in the entorhinal cortexScience 305:1258–1264.https://doi.org/10.1126/science.1099901
 25
 26
 27
 28

29
Direct recordings of gridlike neuronal activity in human spatial navigationNature Neuroscience 16:1188–1190.https://doi.org/10.1038/nn.3466
 30
 31

32
Do the spatial frequencies of grid cells mold the firing fields of place cells?Proceedings of the National Academy of Sciences of USA 112:3860–3861.https://doi.org/10.1073/pnas.1503155112

33
Nonlinear programming. In proceedings of the second Berkeley symposium on mathematical statistics and probability, Volume 5Berkeley, California: University of California Press.
 34

35
On the computational power of winnertakeallNeural Computation 12:2519–2535.https://doi.org/10.1162/089976600300014827

36
Optimal population codes for space: grid cells outperform place cellsNeural Computation 24:2280–2317.https://doi.org/10.1162/NECO_a_00319

37
Resolution of nested neuronal representations can be exponential in the number of neuronsPhysical Review Letters 109:018103.https://doi.org/10.1103/PhysRevLett.109.018103
 38

39
Place cells, grid cells, and the brain's spatial representation systemAnnual Review of Neuroscience 31:69–89.https://doi.org/10.1146/annurev.neuro.31.061307.090723

40
Neuron numbers in the presubiculum, parasubiculum, and entorhinal area of the ratJournal of Comparative Neurology 385:83–94.https://doi.org/10.1002/(SICI)10969861(19970818)385:1<83::AIDCNE5>3.0.CO;28

41
Place units in the hippocampus of the freely moving ratExperimental Neurology 51:78–109.https://doi.org/10.1016/00144886(76)900558
 42

43
Place field expansion after focal mec inactivations is consistent with loss of fourier components and path integrator gain reductionProceedings of the National Academy of Sciences of USA 112:4116–4121.https://doi.org/10.1073/pnas.1421963112
 44
 45

46
Spatial and memory circuits in the medial entorhinal cortexCurrent Opinion in Neurobiology 32:16–23.https://doi.org/10.1016/j.conb.2014.10.008

47
Natural image statistics and neural representationAnnual Review of Neuroscience 24:1193–1216.https://doi.org/10.1146/annurev.neuro.24.1.1193

48
Home range indices for the hispid cotton rat (sigmodon hispidus) in northeastern kansasJournal of Mammalogy 64:580–590.https://doi.org/10.2307/1380513

49
From grid cells to place cells: a mathematical modelHippocampus 16:1026–1031.https://doi.org/10.1002/hipo.20244

50
Population coding in neuronal systems with correlated noisePhysical Review E 64:051904.https://doi.org/10.1103/PhysRevE.64.051904

51
Grid cells generate an analog errorcorrecting code for singularly precise neural computationNature Neuroscience 14:1330–1337.https://doi.org/10.1038/nn.2901
 52

53
Cells of origin of entorhinal cortical afferents to the hippocampus and fascia dentata of the ratJournal of Comparative Neurology 169:347–370.https://doi.org/10.1002/cne.901690306

54
A sigmodon and baiomys population in ungrazed and unburned texas prairieJournal of Mammalogy pp. 141–150.https://doi.org/10.2307/1375262
 55

56
Om nogle geometrisktaltheoretiske theoremerForhandlinger ved de Skandinaviske Naturforskeres pp. 352–353.
 57

58
Optimal configurations of spatial scale for grid cell firing under noise and uncertaintyPhilosophical Transactions of the Royal Society B: Biological Sciences 369:20130290.https://doi.org/10.1098/rstb.2013.0290

59
The anatomy of memory: an interactive overview of the parahippocampal–hippocampal networkNature Reviews Neuroscience 10:272–282.https://doi.org/10.1038/nrn2614
 60
 61
 62
Decision letter

Frances K SkinnerReviewing Editor; University Health Network, Canada
eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see review process). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.
Thank you for submitting your work entitled “A principle of economy predicts the functional architecture of grid cells” for peer review at eLife. Your submission has been favorably evaluated by Timothy Behrens (Senior Editor), a Reviewing Editor, and three reviewers.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
In this study, the authors present a simpler, more elegant, and more intuitive presentation (relative to other studies) of coding optimality principles/constraints leading to grid cell formation. While there was agreement about this, two main aspects came forth that should be addressed in a revised submission:
1) Be clear and specific about how this work differs from other studies (e.g. Mathis, Herz et al.). That is, the analysis in the current paper is simpler and does not rely on Fisher information, but rather on quite straightforward assumptions related to the nature of optimal coding.
2) Be clear about how the novel analyses presented (as different from other studies) allow a closer link with experimental studies.
More details of revisions are given below.
Reviewer #1:
The authors present a theoretical analysis of the ideal properties that a “grid coding” system for spatial location should exhibit, by assessing the conditions which allow location to be encoded with maximum precision across a specific spatial range using the minimum number of grid cells. Unlike several previous theoretical studies (most notably, Fiete et al., 2008), the authors assume that this spatial range is equal to the scale of the largest grid module (i.e. ∼10m), rather than exploiting the combinatorial properties of the grid cell code to encode location over a range equal to the lowest common multiple of all grid scales. Thus, although a similar topic has been addressed in various guises by several previous publications (i.e. Fiete et al., 2008; Mathis et al., 2012, 2013; Towse et al., 2014), there is some novelty to the current work. The manuscript is well written, thorough, and makes some interesting and specific predictions (such as the optimal ratio between grid field size and grid module scale) that have not – to the best of my knowledge – been described elsewhere. Concerns to be addressed are described below.
Specific comments:
In the subsection “Intuitions from a simplified model”, the current body of experimental data in this field simply does not support the authors’ repeated assertion that “[…] anatomical and functional evidence suggests that place cells selectively read out contiguous subsets of entorhinal grid modules along the dorsoventral axis (Van Strien, Cappaert and Witter, 2009; Solstad, Moser and Einevoll, 2006)”. First, citations Van Strien, Cappaert and Witter, 2009 and Solstad, Moser and Einevoll, 2006 are an anatomical review and a theoretical paper, respectively – neither of which can reasonably be described as providing “functional evidence”. Second, several groups have recently published review papers summarising a wide body of functional evidence that directly contradicts the hypothesis that place cells ‘selectively readout’ a subset of grid cell inputs (i.e. Bush et al., 2014; Sasaki et al., 2015). The authors should edit the text here, and at several other junctures throughout the manuscript (listed below) to address this point. The hypothesis that place cells represent a readout of the grid cell system has no bearing on the theoretical work presented, and only serves to misrepresent our current understanding of the grid cell system.
In the Discussion, the authors state that: “Together with the anatomy (Van Strien, Cappaert and Witter, 2009), the hierarchical view of location coding that we have proposed then predicts that dorsal place cells should be revealed to have multiple place fields in large environments because their spatial ambiguities will not be fully resolved at larger scales. Preliminary evidence for this prediction has appeared in Fenton et al., 2008; Rich, Liaw and Lee, 2014.” However, those studies show no systematic relationship between the locations of dorsal place cell's multiple firing fields in large environments, which directly contradicts the predictions of a grid cell to place cell model (see, for example, Appendix–figure 3 and Appendix–figure 4 in this manuscript). The authors should note this caveat– along with the other experimental data showing that a grid to place cell model is overly simplistic (see above) – or remove the corresponding piece of text.
In the Discussion, the authors “[…] predict that lesioning the modules with small periods will expand place field widths, while lesioning modules with large periods will lead to increased firing at locations outside the main place field, at scales set by the missing module. Our prediction is supported by a recent study demonstrating effects of lesions including dorsal mEC on place field widths in small environments (Hales et al., 2014)”. This is misleading for several reasons. First, as described above, this statement neglects to mention the wider body of evidence contradicting the hypothesis that grid cell inputs solely generate place cell firing fields. Second, the aim of the cited study (Hales et al., 2014) was to examine the effect of eliminating all grid cell inputs to place cells, and ∼85% of the total mEC volume was ablated, including 94.6% of layer II and 83.5% of layer III. Are the authors suggesting that the observed effects on place cell firing are a result of remaining grid cell inputs from modules with a large or small period? Third, the more specific prediction is that lesioning grid modules with large periods will lead to the appearance of additional place fields at periodic, gridlike locations in two dimensions, but that analysis is not made or discussed in Hales et al. (2014). Fourth, more recent experimental evidence indicates that focal inactivations of dorsal or ventral mEC each produce place field expansion, and neither generated increased firing at locations outside the main place field – in fact, that data showed a trend towards a decrease in the number of firing fields exhibited by each place cell following focal inactivations (Figure S7 in Ormond et al., 2015). Each of those results also contradict the predictions of a grid cell to place cell model, despite the strange interpretation of the data made in that paper. Hence, the authors should include a citation to that paper and edit the text accordingly.
In the Appendix, the authors again suggest that “the grid cell representation of space in the entorhinal cortex is […] transformed in the hippocampus into the place cell representation.” They should edit this text to more accurately represent the current understanding of the relationship between grid and place cell firing patterns.
Abstract and Introduction: Given that the principal difference between the analysis in this manuscript and that presented in several previous publications (i.e. Fiete et al., 2008; Towse et al., 2014) is that the authors assume a spatial range equal to the size of the largest grid module, this should be more explicitly stated in the Abstract and Introduction. This would make the novelty of this manuscript more apparent. For example, the Abstract should be edited to read: “We propose that the grid system implements a hierarchical code for spatial location that economizes the number of neurons required to encode location with a given resolution across a spatial range equal in size to the period of the largest grid module” or similar. Likewise, the Introduction should be edited to read: “minimize the number of neurons required to achieve the behaviourally necessary spatial resolution across a spatial range equal in size to the period of the largest grid module” or similar.
In the Introduction, it is not clear to me what the authors mean by the following statement: “Consistent with studies of grid cell and place cell remapping, our analyses assume that there is a behaviorally defined maximum range over which a fixed grid represents locations (Fyhn et al., 2007).” I fail to see the relevance of remapping to the behavioural range of an animal. Could the authors explain their rationale here please?
In the Introduction, when the authors state “three dimensional grids that will be relevant to navigation in, e.g., bats”, they should include a reference to Yartsev et al., 2011, which demonstrates that bats do have grid cell responses, even though they have only been recorded in two dimensional environments so far.
In the subsection “Intuitions from a simplified model”, when the authors stress that “the animal could achieve the required resolution in a place coding scheme […]”, they must incorporate a reference to Fiete et al., 2008, which makes a very similar comparison of “place coding” and “grid coding” schemes.
Figure 2E is cited before Figures 2AD in the main text, which is confusing (subsection “WinnerTake All Decoder”). Similarly, in subsection “General grid coding in two dimensions”, Figure 3A is not referred to in the text at all, and Figures 3B and 3C are cited before Figure 2F, which is not ideal. It would be preferable if the authors placed all figures pertaining to the 2D case in Figure 3 (i.e. move Figure 2F into Figure 3) and moved Figure 2E to match the flow of the text (i.e. before Figure 2A).
In the Discussion, the authors state: “Given homogeneous positive noise correlations within a grid module, which will arise naturally if grid cells are formed by an attractor mechanism, the required number of neurons could be an order of magnitude higher (Sompolinsky et al., 2001; Averbeck, Latham and Pouget, 2006)”. It has recently been demonstrated that positive noise correlations appear to be largely absent in the rodent grid cell system, and the authors may wish to note this point and cite the corresponding paper (Mathis et al., 2013) here.
In the Discussion, “the answer depends on the dimension of the grid” should be “the answer depends on the dimensionality of the grid”.
Again, in the Discussion, the authors mention that they “have checked that the optimal grid scheme predicted by our theory, if decoded in the fashion of (Fiete et al., 2008), can represent space over ranges longer than the largest scale”, but do not mention (in the main text) whether it could or not. They should incorporate a brief description of the outcome of those simulations in this section of text, for clarity.
Reviewer #2:
The paper by Wei et al. uses an optimality argument to explain the empirically observed geometric progression of the spacing of grid modules and the ratio of this geometric progression. The paper is nicely written and the arguments are clear. I am however not fully convinced about the potential influence of this paper based on the following reasons.
1) The major assumption of this paper is the ambiguity of the grid cell firing, that is, from the firing of one grid cell, it is not possible to infer in which of the many vertices of the grid one is located. This assumption, however, does not take into account the fact that the peak firing rate of a grid cell at its fields significantly vary. In other words, the translational symmetry is about the positions at which peak firing occurs, not that each field is identical to the other in terms of firing rate. In my view, this experimental fact fundamentally affects the argument offered here.
2) The minimum of the cost function versus ratio of the spacing of successive modules is very wide, raising the question whether one can really say anything meaningful about the value that that ratio should take. It should not escape our attention that in Figure 2E and F, the authors plot the cost function versus the logarithm of the ratio between successive modules, which gives the impression of a narrower minimum (though still wide). Even with this, the authors’ prediction is stated to “[…] robustly lie in the rage 1.41.7[…]”. This is a 20% range.
3) (a) The idea of using optimality for predicting the ratio of grid spacing of the modules has been already employed by Mathis et al., Neural Comp 2012. There are differences between the two works, e.g. Mathis et al. maximize the resolution given a fixed number of neurons while Wei et al. minimize the number of neurons given the resolution and Mathis et al. only focus on the one dimensional case. Despite the differences, it is not clear what is the major conceptual advancement. As far as I can say, the argument of Mathis et al. can be easily extended to 2D to produce a geometric progression.
(b) It is true that, as stated in the in the conclusion, in the work of Mathis et al. the optimal ratio depends on “the number of neurons per module and peak firing rate”. But the prediction of the optimal ratio here also varies over a wide range, depending on the assumption on the decoding scheme (and probably the shape of the tuning curves, assumption on the correlation between neurons etc.).
Reviewer #3:
This excellent paper uses a very simple principle for demonstrating that coding of grid cells is better than coding of place cells, and generates some postdictions following this simple principle. The basic idea is that grid cells act as a kind of “Baseb” representation of space, and it is shown that the representation is optimal when the base chosen is base e (2.71828…). From that, various postdictions follow (which conform nicely with known experiments). Specifically, grid cell modules have a constant scale ratio, which should be √e in the simplest model, and closer to the real experimental value (1.4) in a probabilistic model of the cells coding. Furthermore, there should be a certain optimal ratio between the grid field width and the spacing between grid points.
The paper interacts nicely the papers of the group of Andreas Herz, which deal with similar issues using Fisher information. I have no major concerns, as I think the paper is well written, deals with an important subject, looks sound mathematically, and has a nice treatment of relation to experimental data.
The only issue I would like to be dealt with is to make the Discussion more clear as to the relation between this paper and the papers from the Herz group (including the relevant recent one from 2015). Specifically, they have a treatment of the issue of grid cell coding through Fisher information, and it could be of value to connect the work performed here to their line of thought, at least minimally by adding some discussion to the paper (elaborating on the existing paragraph).
Another small question I am curious about is whether the winnertakeall decoder could be seen as a limitcase of the probabilistic decoder. But if that is the case, I do not completely understand the “leap” from e to 2.4.
https://doi.org/10.7554/eLife.08362.011Author response
In this study, the authors present a simpler, more elegant, and more intuitive presentation (relative to other studies) of coding optimality principles/constraints leading to grid cell formation. While there was agreement about this, two main aspects came forth that should be addressed in a revised submission.
1) Be clear and specific about how this work differs from other studies (e.g., Mathis, Herz et al.). That is, the analysis in the current paper is simpler and does not rely on Fisher information, but rather on quite straightforward assumptions related to the nature of optimal coding.
To respond to this recommendation, we have expanded our discussion of what sets our work apart from others, especially Mathis et al. The comments of the referees have been helpful in this regard. We have attempted to be clear and specific that Mathis et al. explored grid coding in one dimension using Fisher information and numerical simulation to explore decoding error. They found that the set of periods that maximizes the Fisher information is approximated by a geometric series in a regime of largescale ratios. By contrast, we rely on a simpler formulation of optimal coding with straightforward assumptions about tradeoffs between ambiguity and resolution in a hierarchical grid. We take a simpler definition of the resolution (as the largest scale divided by the smallest scale the system can discriminate), assume that the grid encodes location with a restricted range, and then seek to minimize the resources (number of neurons) required to achieve a given resolution within this range. Our simpler formulation allows us to extend our analysis to any number of dimensions, and to predict the values of structural parameters of the grid such as the ratio between periods, the grid geometry etc. We have added substantially to the Discussion to address these points.
In more detail, the Fisher information approximation to position error in Mathis et al. is only valid over a certain range of parameters. They introduce a noambiguity constraint to keep them within this range, but this creates two challenges for an optimization procedure: (1) The optimum depends on the details of the constraint, which was somewhat arbitrarily chosen and dependent on the variability, the tuning curve shape of grid cells and a “tolerable error level”, and (2) The optimum turns out to saturate the constraint, so for some choices of the constraint the procedure is pushed right to the edge of where the Fisher information is a valid approximation at all, causing difficulties for the selfconsistency of the procedure. Because of these limits on the Fisher information approximation, Mathis et al. proceed to measure decoding error directly through their numerical studies. But here a complete optimization is not possible because there are too many interrelated parameters. This last point is a limitation of any numerical study. In contrast, we estimated decoding error directly by working with approximated forms of the posteriors rather than by approximating decoding error in terms of the Fisher information. For the winnertakeall analysis, we effectively approximate posteriors as periodic boxcar functions, for the probabilistic analysis as periodic sumsofGaussians. These choices allow analytical treatment of the optimization problem over a much wider parameter range without requiring arbitrary handimposed constraints.
While going through the paper of Mathis and collaborators carefully for the purpose of this revision, we have also developed some concerns about their analysis. First, with the assumptions as formulated in their paper, and the scores of neurons that are known to exist in each grid module, the scale ratios would generically be predicted to be much larger than 1 (please see the detailed response to Reviewer 2). This is in tension with data. Even ignoring this point, optimizing the Fisher information generally predicts a hierarchy of scale ratios, and only predicts geometric scaling if that scale is significantly greater than 1. Experimentally the scale ratio is ∼1.5. Thus, it seems that optimizing Fisher information does not predict geometric scaling in the regime of relevance to experiment. What is more, while Mathis et al. can only predict a geometric scaling if the scale is large, their Figure 5B illustrates that for large scale ratios the Fisher information does a poor job in approximating the decoding error. So this means that their prediction of a geometric series of periods, even in one dimension, has a limited range of validity. The optimal onedimensional grid in our work is perched near the edge of the estimated range where their analysis appears to be valid.
We have described these last points in the detailed response to the referees. However, pointing out limitations of Mathis et al. is not our goal in this paper. Hence, we have simply added the phrase “in a regime where the scale factor is sufficiently large” in our description of their prediction of geometric scaling in one dimension, without further comment on the limitation of this prediction.
2) Be clear about how the novel analyses presented (as different from other studies) allow a closer link with experimental studies.
We have expanded our discussion of the link with experiments. Specifically, we have pointed out that, as distinct from other studies, our approach allows us to predict that: (a) grid fields should lie on a triangular latice, (b) grid periods should follow a geometric projection, (c) the ratio between grid scales should be e^{1/2} for idealized neurons, liying between 1.4 and 1.7 for realistic neurons, (d) the scale ratio should vary modestly within and between animals, (e) the optimal scale ratio in one and three dimensions. With some additional assumptions we also predict: (i) the number of grid modules should be ∼10, and (ii) the ratio between grid periods and field widths. Finally, we examine possible deficits in spatial behavior that will obtain upon inactivating grid modules in the context of specific models of grid cells readout. Most of this material is in the Abstract, Introduction, Comparison to Experiment and Discussion sections.
In the original submission, we used a simple model of linear summation of grid cells to make place cells to investigate the effects of grid module inactivation in a hierarchical grid system like the one we study. Reviewer 1 pointed out that: (a) the idea that place cells are a readout of grid cells has no direct bearing on our theoretical work, (b) recent experimental work very strongly suggests that while the hippocampal place system is certainly affected by the grid system, place cells are not a “readout” of the grid system in any simple sense of that term. We agree entirely and have edited the text in detail to reflect these points. The changes are in Results, Discussion and Appendix.
Details of the changes are described below.
Reviewer #1:
[…] In the subsection “Intuitions from a simplified model”, the current body of experimental data in this field simply does not support the authors’ repeated assertion that “[…] anatomical and functional evidence suggests that place cells selectively read out contiguous subsets of entorhinal grid modules along the dorsoventral axis (Van Strien, Cappaert and Witter, 2009; Solstad, Moser and Einevoll, 2006)”. First, citations Van Strien, Cappaert and Witter, 2009 and Solstad, Moser and Einevoll, 2006 are an anatomical review and a theoretical paper, respectively – neither of which can reasonably be described as providing “functional evidence”. Second, several groups have recently published review papers summarising a wide body of functional evidence that directly contradicts the hypothesis that place cells ‘selectively readout’ a subset of grid cell inputs (i.e. Bush et al., 2014; Sasaki et al., 2015). The authors should edit the text here, and at several other junctures throughout the manuscript (listed below) to address this point. The hypothesis that place cells represent a readout of the grid cell system has no bearing on the theoretical work presented, and only serves to misrepresent our current understanding of the grid cell system.
We agree that, as written, our paper suggests an understanding of the relation between grid and place cells that is both overly definitive, and one which is challenged by recent findings (e.g. that place cells are active before grid cells, and that place fields survive sustained inactivation of grid cells). The material that the referee is commenting on arose from multiple discussions with audiences of seminars and readers of our manuscript. We were repeatedly asked (and are still asked during talks) how our view of a hierarchical grid code would affect readout of the grid system, perhaps via place cells, as compared to a nonhierarchical grid system of the sort proposed by Burak and Fiete. We decided that a concrete way of addressing these questions would be to pick a simple model relating grid and place cells (e.g. the Fourier summation setup of Solstad et al., and explicitly show that different grid schemes can have different effects).
However, we fully agree with the reviewer that: (a) the idea that place cells are a readout of grid cells has no direct bearing on our theoretical work, (b) recent experimental work very strongly suggests that while the hippocampal place system is certainly affected by the grid system, place cells are not a “readout” of the grid system in any simple sense of that term. The two reviews cited above (Bush et al., 2014 and Sasaki et al., 2015) make the latter point very clearly and effectively.
As we see it there are two options: (1) we could completely remove any mention of a readout via place cells or otherwise and simply discuss the architecture of the grid system; (2) we could be clear that we are going to look at linear summation models of grid cell readout as a toy model, not because we think they accurately represent the readout, but to show that different assumptions about the grid architecture can lead to different specific effects for manipulations like lesions. Of course, in order to make specific predictions for how lesions in the grid system would affect place cells, we would need detailed knowledge about the precise relationship between these systems, and while there are many hints of a complex relationship, the details remain unclear.
We decided to go with option (2), because we are repeatedly asked, “I know that the relation between place cells and grid cells is complicated, but can you tell me what would happen if you imagine, for purposes of argument, a simple linear summation readout and then remove some modules?” So we think that including this material (which is mostly in the Appendix and Discussion), with appropriate nuance and caveats, might be helpful to some readers. We have revised to try to achieve this goal, but are open to the idea of simply leaving this material out.
We have made a series of changes starting with the remark that the referee mentions (in the subsection “Intuitions from a simplified model”) and continuing throughout the paper (please also see our responses to the comments below). Also, a minor point – as the referee says, references to Van Strien, Cappaert and Witter, 2009, and Solstad, Moser and Einevoll, 2006 are an anatomical review and a theory paper, and we have modified the citation accordingly.
In the Discussion, the authors state that: “Together with the anatomy (Van Strien, Cappaert and Witter, 2009), the hierarchical view of location coding that we have proposed then predicts that dorsal place cells should be revealed to have multiple place fields in large environments because their spatial ambiguities will not be fully resolved at larger scales. Preliminary evidence for this prediction has appeared in Fenton et al., 2008; Rich, Liaw and Lee, 2014.” However, those studies show no systematic relationship between the locations of dorsal place cell's multiple firing fields in large environments, which directly contradicts the predictions of a grid cell to place cell model (see, for example, Appendix–figure 3 and Appendix–figure 4 in this manuscript). The authors should note this caveat– along with the other experimental data showing that a grid to place cell model is overly simplistic (see above) – or remove the corresponding piece of text.
The studies of Fenton et al., 2008, and Rich et al., 2014, find that dorsal place cells often have multiple place fields. Fenton et al. claim that 85% of dorsal CA1 place cells have multiple place fields, and the majority of cells in Rich et al. have multiple place fields in a 48m track, some having dozens of place fields. However, these studies show that the locations of these place fields are sporadic, perhaps even random according to some distribution. This disordered structure may be in tension with a simplistic summation view of the gridtoplace cell transformation. On the other hand, it should be noted that the data of Stensola et al. shows that there is significant variability of the period, orientation and ellipticity within each module. As part of a different collaboration, one of us (VB) has been investigating the consequences of this variability for spatial coverage – it seems to produce significant differences in the relative phase of grid cells between unit cells of the grid lattice. This variability can change the prediction of regularity in the locations of multiple place fields in a naive summation model. That said, investigating this properly lies outside the scope of the present paper. Thus we have contented ourselves with adding appropriate nuance and caveats as follows: (a) indicate that a naive summation model of place cells along with the anatomy of mEChippocampal projections predicts multiple place fields for a single dorsal place cell, as seen in experiments, (b) a naive model of this kind also predicts a orderly distribution of place fields which is not seen, (c) however, the variability in the grids even within a module likely interferes with the predicted order, (d) and in any case there is significant evidence (see Bush et al., and Sasaki et al. for a summary) that place cells are not formed and maintained by grid cells alone. The changes have been added to the sixth paragraph of the Discussion.
In the Discussion, the authors “[…] predict that lesioning the modules with small periods will expand place field widths, while lesioning modules with large periods will lead to increased firing at locations outside the main place field, at scales set by the missing module. Our prediction is supported by a recent study demonstrating effects of lesions including dorsal mEC on place field widths in small environments (Hales et al., 2014)”. This is misleading for several reasons. First, as described above, this statement neglects to mention the wider body of evidence contradicting the hypothesis that grid cell inputs solely generate place cell firing fields. Second, the aim of the cited study (Hales et al., 2014) was to examine the effect of eliminating all grid cell inputs to place cells, and ∼85% of the total mEC volume was ablated, including 94.6% of layer II and 83.5% of layer III. Are the authors suggesting that the observed effects on place cell firing are a result of remaining grid cell inputs from modules with a large or small period? Third, the more specific prediction is that lesioning grid modules with large periods will lead to the appearance of additional place fields at periodic, gridlike locations in two dimensions, but that analysis is not made or discussed in Hales et al. (2014). Fourth, more recent experimental evidence indicates that focal inactivations of dorsal or ventral mEC each produce place field expansion, and neither generated increased firing at locations outside the main place field – in fact, that data showed a trend towards a decrease in the number of firing fields exhibited by each place cell following focal inactivations (Figure S7 in Ormond et al., 2015). Each of those results also contradict the predictions of a grid cell to place cell model, despite the strange interpretation of the data made in that paper. Hence, the authors should include a citation to that paper and edit the text accordingly.
We agree entirely that there is a substantial body of evidence that grid cells do not solely generate place cells, although they do influence some of the functional properties of the hippocampus. Further, Hales et al. were indeed attempting to examine the effects of eliminating all the grid inputs and found that the substantial ablation they performed led to fewer, smaller and less stable place fields. These ablations were not specific to a given module, but we would expect in a hierarchical grid code that elimination of many contributions with small periods would decrease the precision of spatial coding that exploits grid cell responses. Of course elimination of large periods should lead to ambiguities in large environments and Hales et al. did not test for or discuss this – they are working with small 1m x 1m environments so we might not expect to see many ambiguities.
There is also the paper of Ormond et al., which discussed focal inactivations. We find the results in this paper difficult to interpret also. For starters, the inactivations are focal along the dorsoventral axis of the mEC, but the data in Stensola et al. seem to indicate that that many grid modules are present at every mEC depth, with a dorsal enrichment of small periods. So the focal inactivations of Ormond et al. would seem to still be inactivating modules with multiple periods. Thus, even in a naive summation model, we would expect expansion of place fields for all of these inactivations with larger expansion for dorsal fields. This is because dorsal lesions would get rid of more cells in the smallest modules, and would lead to a greater broadening. Ventral lesions would still remove some cells with small periods, and so the broadening effect should be smaller as Ormond et al. appear to see.
The data in the supplement of Ormond et al. seem to indicate, on the one hand, a decrease in the number of distinct place fields associated to place cells (in tension with a grid to place model), but on the other hand they seem to show an increase in the “outoffield” firing rate. What is more, these figures likely include various cases where the place cells went from having one field to having none (i.e. the place field simply disappeared) – it is hard to be sure, because the information was not provided as far as we can tell, and thus it is hard to evaluate whether this is in fact inconsistent with a summation model. One could also imagine that the general increase in the size and noisiness of place fields might lead to a decrease in the number of place fields that can be accommodated in the 7m linear track that they were working with. In particular, the decrease in theta power makes their measurements much more prone to noise. Finally, Ormond et al. are recording dorsally in CA1 and CA3. If the anatomy of Witter et al. that we cite is accurately indicative of functional connectivity, then ventral inactivation in mEC would have less clear effects on these dorsal hippocampal recordings. That makes it still more problematic to interpret what is going on. We also note that none of the individual examples depicted in Figure S5 of that paper show a decrease in ambiguity after lesions. So we conclude from Ormond et al. that there is no clear evidence for a decrease in ambiguity following lesion, but it is similarly debatable whether there is any indication of an increase in ambiguity.
Given the complex, and, in our view, difficult to interpret, experimental situation we have edited as follows. First, we have modified our remarks to make it clear that we regard a linear summation model of place cells as simplistic in view of recent experimental developments reviewed in Bush et al. and Sasaki et al. Second, we are more clear that we are simply seeking to illustrate that different grid schemes can have different effects on specific readouts. Finally we refer to Hales et al. and Ormond et al. in a nuanced way, clarifying that these experiments do not lesion individual modules and thus do not constitute specific evidence for the results of such lesions. These changes have been added to the Discussion section.
In the Appendix, the authors again suggest that “the grid cell representation of space in the entorhinal cortex is […] transformed in the hippocampus into the place cell representation.” They should edit this text to more accurately represent the current understanding of the relationship between grid and place cell firing patterns.
We have edited this text to be more nuanced and accurate about the current state of understanding of the relationship between the mEC and the hippocampus (see Section F of the Appendix). Specifically we say: “The grid cell representation of space in the entorhinal cortex is related in a complex manner to the hippocampal place cell representation (Bush et al., 2014, Sasaki et al., 2015). […] In the context of such a model (which does not reflect many aspects of the known physiology), our grid scheme makes specific predictions for the effects of module lesions on place fields.”
The papers mentioned above are now cited in the main text, and again in the Appendix.
Abstract and Introduction: Given that the principal difference between the analysis in this manuscript and that presented in several previous publications (i.e. Fiete et al., 2008; Towse et al., 2014) is that the authors assume a spatial range equal to the size of the largest grid module, this should be more explicitly stated in the Abstract and Introduction. This would make the novelty of this manuscript more apparent. For example, the Abstract should be edited to read: “We propose that the grid system implements a hierarchical code for spatial location that economizes the number of neurons required to encode location with a given resolution across a spatial range equal in size to the period of the largest grid module” or similar. Likewise, the Introduction should be edited to read: “minimize the number of neurons required to achieve the behaviourally necessary spatial resolution across a spatial range equal in size to the period of the largest grid module” or similar.
Thank you for this suggestion. We have made this change, and it helps to clarify the differences between the frameworks.
We have also added a reference to Towse et al. (please see the Discussion section) as part of our analysis of the relation of our study to previous work: “Note that decoding error was also studied in Towse et al. and those authors reported that the results did not depend strongly on the precise organization of scales across modules.”
In the Introduction, it is not clear to me what the authors mean by the following statement: “Consistent with studies of grid cell and place cell remapping, our analyses assume that there is a behaviorally defined maximum range over which a fixed grid represents locations (Fyhn et al., 2007).” I fail to see the relevance of remapping to the behavioural range of an animal. Could the authors explain their rationale here please?
We intended to say that assuming that the grid code can only represent location up to some maximum range without additional information, it would be necessary that a new grid should be loaded upon reaching the edge of the representational range. The ability of grids to remap in new environments suggests that this should be possible. In effect, we were trying to suggest that an animal could “stitch together” a representation of a very large environment by remapping its grids between segments. To keep things simple we have removed the phrase “Consistent with studies of grid cell and place cell remapping”.
In the Introduction, when the authors state “three dimensional grids that will be relevant to navigation in, e.g., bats”, they should include a reference to Yartsev et al., 2011, which demonstrates that bats do have grid cell responses, even though they have only been recorded in two dimensional environments so far.
We have added the citation.
In the subsection “Intuitions from a simplified model”, when the authors stress that “the animal could achieve the required resolution in a place coding scheme […]”, they must incorporate a reference to Fiete et al., 2008, which makes a very similar comparison of “place coding” and “grid coding” schemes.
We have added this citation. Thank you for the suggestion.
Figure 2E is cited before Figures 2AD in the main text, which is confusing (subsection “WinnerTake All Decoder”). Similarly, in subsection “General grid coding in two dimensions”, Figure 3A is not referred to in the text at all, and Figures 3B and 3C are cited before Figure 2F, which is not ideal. It would be preferable if the authors placed all figures pertaining to the 2D case in Figure 3 (i.e. move Figure 2F into Figure 3) and moved Figure 2E to match the flow of the text (i.e. before Figure 2A).
Thank you for these suggestions. In order to respect the flow of the text we have reorganized as follows. We moved Figure 2E (the optimization curve in 1D) to be a panel of Appendix–figure 1 where other material on the optimization in one dimension is gathered. We removed Figure 3A (which was not referred to) and we moved Figure 2F into Figure 3. Now Figure 2 is focused on illustrating the precisionambiguity tradeoff in the setting of probabilistic decoding. Figure 3 is focused on the two dimensional optimization. We hope that this helps with clarity.
In the Discussion, the authors state: “Given homogeneous positive noise correlations within a grid module, which will arise naturally if grid cells are formed by an attractor mechanism, the required number of neurons could be an order of magnitude higher (Sompolinsky et al., 2001; Averbeck, Latham and Pouget, 2006)”. It has recently been demonstrated that positive noise correlations appear to be largely absent in the rodent grid cell system, and the authors may wish to note this point and cite the corresponding paper (Mathis et al., 2013) here.
Thank you for this suggestion. We have added this citation. However, the authors of the paper seem to explicitly say that they did find noise correlations. More specifically, their abstract says: “We analyze the noise correlations between pairs of grid code neurons in behaving rodents. We find that if the grids of the two neurons align and have the same length scale, the noise correlations between the neurons can reach 0.8. For increasing mismatches between the grids of the two neurons, the noise correlations fall rapidly.” This is apparently also the message that they derive from their Figure 9. Meanwhile Dunn, Morreaunet and Roudi (2015) also report positive noise correlations for grids with similar phases, and vanishing or sometimes negative correlations for grids with very different phases. Since most grid cells differ in their mutual phase or period, we take the referee’s point. Since the presence or absence of noise correlations is not a main point of our paper, we have simply indicated (Discussion, fourth paragraph) that Mathis et al. and Dunn et al. investigated noise correlations between grid cells and found positive correlations for aligned grids (i.e. similar phase) of the same scale and weak correlations otherwise.
In the Discussion, “the answer depends on the dimension of the grid” should be “the answer depends on the dimensionality of the grid”.
We have made this change.
Again, in the Discussion, the authors mention that they “have checked that the optimal grid scheme predicted by our theory, if decoded in the fashion of (Fiete et al., 2008), can represent space over ranges longer than the largest scale”, but do not mention (in the main text) whether it could or not. They should incorporate a brief description of the outcome of those simulations in this section of text, for clarity.
We were trying to indicate the following. The optimization analysis predicts a particular scale ratio and enough neurons in each module to ensure that the likelihood function over position in each module has peak widths that are a certain fraction of the period. The range of representation can be extended by shrinking the widths of the likelihood ratio peaks. This requires increasing the number of neurons in each module beyond the minimum required for the spatial range that we started with. We have edited this text to say: “Nevertheless, we have checked that a grid coding scheme with the optimal scale ratio predicted by our theory can represent space over ranges larger than the largest grid period (Appendix, Section E). However, to achieve this larger range, the number of neurons in each module will have to increase relative to the minimum in order to shrink the widths of the peaks in the likelihood function over position.” The edited text is in the Discussion.
Reviewer #2:
1) The major assumption of this paper is the ambiguity of the grid cell firing, that is, from the firing of one grid cell, it is not possible to infer in which of the many vertices of the grid one is located. This assumption, however, does not take into account the fact that the peak firing rate of a grid cell at its fields significantly vary. In other words, the translational symmetry is about the positions at which peak firing occurs, not that each field is identical to the other in terms of firing rate. In my view, this experimental fact fundamentally affects the argument offered here.
Consider the probabilistic decoder. In this case, P(xi) can be approximated as a periodic sum of Gaussians without making restrictive assumptions about the shapes of the tuning curves of individual grid cells, or about the precision of their periodicity, so long as, on average, the variability of individual neurons is weakly correlated and homogeneous.
For example, even though individual grid cells can have somewhat different firing rates in each of their firing fields, this spatial heterogeneity will be smoothed in the posterior over the full population of cells, leading to much more accurate periodicity. In other words, individual grid cells show both spiking noise and “noise” due to heterogeneity and imperfect periodicity of the firing rate maps. Both these forms of variability are smoothed out by averaging over the population, provided there are enough cells and noise is homogeneous and not too correlated – we assume this. The first paragraph in the “Probabilistic Decoder” subsection makes these points.
Even if the experimentallymeasured heterogeneity is too strong to be completely neglected, we still feel that our framework provides value in studying the grid system. Developing a theoretical framework that solves the simpler case of perfect periodicity is a natural starting point for studying the more complex, realistic case. Experimental details that deviate from our idealized assumptions may be added, and our calculations modified to see how these complications modify our predicted optimality conditions. We think this is an exciting avenue for future work building on the results and framework we have reported here.
2) The minimum of the cost function versus ratio of the spacing of successive modules is very wide, raising the question whether one can really say anything meaningful about the value that that ratio should take. It should not escape our attention that in Figure 2E and F, the authors plot the cost function versus the logarithm of the ratio between successive modules, which gives the impression of a narrower minimum (though still wide). Even with this, the authors’ prediction is stated to “[…] robustly lie in the rage 1.41.7[…]”. This is a 20% range.
Please note that our text goes to some pains to point out that the minima of the cost functions are not extremely sharp (in the subsection “General grid coding in two dimensions”). In our view this is a virtue, not a problem, because it means that a degree of variability in the grid parameters can be tolerated. Please also note that, as we say, the predictions of the simple winnertakeall and probabilistic models lie within the “overlapping shallow basins” of the two models. Given that these two models lie at extremes of decoding complexity, this adds to our confidence that over a wide range of assumptions the optimal grids will lie within a similar range. Similar considerations apply to both the one dimensional and two dimensional grids.
As we also state, the relative shallowness of the minima lead us to expect that the parameters of the grid should be somewhat variable between between cells within a module, and between individuals. Indeed, the experimental measurements are variable in this way. It is difficult to formulate a theory of precisely how much variability, and associated cost in the number of neurons, is acceptable to the animal. In this situation, the sensible prediction to make is that the grid periods ratios will be localized around a certain value, and to ask what deviation in cost relative to the optimum is implied by the experimentally determined spread of these ratios (we find a ∼5% deviation in cost). This is what we have done in Figure 4.
We can further illustrate these points by considering an additional kind of variability in the experimentally measured grids. It has been noted that grids can have an ellipticity – i.e. they can be “squished”. Our analysis of grid geometries in two dimensions showed that the triangular grid is optimal, but geometries close to the triangular one will do well also (see the contour plot in Figure 3D). How does the range of ellipticity in the experiments compare to the tolerable extent predicted by the theory?
To address this point, we can reexamine the contour plot in Figure 3D which shows N/N_{min} (number of neurons/minimum number of neurons) as a function of the array geometry after minimizing over the scale factors between modules for a fixed resolution R. The plateau around the triangular array geometry (the point in the middle of the plot) shows that a range of ellipticities will be similarly efficient. To show this range explicitly in a different way, we can keep N fixed, and plot the logarithm of the resolution (as defined in the main text), normalized to its maximum, which is achieved at the optimal triangular grid – we will call this the “relative efficiency”. The array geometry is parametrized in terms of two variables, v parallel and v perpendicular as described in the main text, with the triangular lattice being given by v∥= 1/2, v⊥= √3/2. The plot below shows the relative efficiency as a function of v∥ for v⊥= 1/2 – there is a plateau surrounding the triangular lattice parameters, with a sharp decline in efficiency on either side. Marked on the figure is a range of ellipticities 1.01.4 that is wider than the range reported in Stensola et al., 2012 (the largest ellipticity there was 1.26, albeit with a small sample). Satisfyingly, the ellipticities reported by Stensola et al. will all be closely arranged along the plateau, as our theory would predict.
We have not included this analysis (Author response image 1) in the paper because it is not comprehensive and we hope to include a more detailed version in work that is ongoing. But we hope that the result helps to answer the referee’s question.
3) (a) The idea of using optimality for predicting the ratio of grid spacing of the modules has been already employed by Mathis et al., Neural Comp 2012. There are differences between the two works, e.g. Mathis et al. maximize the resolution given a fixed number of neurons while Wei et al. minimize the number of neurons given the resolution and Mathis et al. only focus on the one dimensional case. Despite the differences, it is not clear what is the major conceptual advancement. As far as I can say, the argument of Mathis et al. can be easily extended to 2D to produce a geometric progression.
First, we would like to be clear that, contrary to the assertion here, Mathis et al. did not claim to predict a value for the ratio of grid spacings in grid modules in any dimensions, and did not attempt to say anything about the optimal period ratio and grid shape in two dimensions. They formulated the Fisher information for decoding position from populations of periodic, one dimensional tuning curves and found that under some conditions the set of periods that maximizes the Fisher information approximates a geometric series. However, as discussed in detail below, their derivation, as they present it: (1) generically implies either unrealistically large period ratios or unreasonably small numbers of cells in each module, both of which disagree with experiment, and (2) in general does not predict a constant scale ratio r unless r >> 1, which it is not in the data. Incidentally, the derivations in Mathis et al. also have minor mathematical errors which are fixed in the discussion below.
Mathis et al. write an expression for the Fisher Information of the ith module which takes the form (Equation 3.22):
J_{i} = C_{1} (M_{i}/λ_{i})^{2} (1)
where C_{1} is a constant, M_{i} is the number of cells in module i, and λ_{i} is the period of module i. The sum of M_{i} over L modules is N, the number of cells in the grid system. The total Fisher Information J is the sum of the J_{i}. In any treatment of the grid system we must understand how information from different modules is integrated to eliminate the ambiguity in position left by the responses in a single module. Mathis et al. resolve this ambiguity by a hard constraint (which is reminiscent of our WinnerTakeAll model) by setting (Equation 3.24 and the text below it):
λ_{i+1} = D(ε)/(J_{i})^{1/2} = (D(ε)/M_{i} C_{1}) λ_{i} (2)
Here D(ε) is a “safety factor” that depends on noise and the tuning curve shapes. They arrive at this equation by first placing a bound on how small the period of module i+1 can be to achieve a tolerable degree of ambiguity (set by D(ε)) and then saying that the Fisher Information is optimized when this bound is met.
But Equation (2) above implies that the ratio of scales that we seek to predict has the form:
r_{i} = λ_{i}/λ_{i+1} = M_{i} (C_{1}/D(ε)) (3)
Now Mathis et al. are taking C_{1}/D(ε) to be a parameter of O(1) (bottom of their p. 17). But M_{i}, the number of grid cells in each module is expected to be in the scores or maybe the hundreds. So, given the general assumptions of Mathis et al., the period ratio r_{i} would be expected to be much larger than 1, contrary to experiment. Alternatively, to get an O(1) scale ratio with C_{1}/ D(ε) ∼ O(1), you could take M_{i} to be O(1). This seems to be the scenario considered in Mathis et al., where they say that M_{i} ∼ 3 would be optimal. But we know that each module contains many more cells than that. A final option may be to suppose that C_{1}/D(ε) is small in their formalism. But they do not seem to consider this in their paper. Hence we conclude that the analysis of Mathis et al., as presented in their paper, gives estimates that are in tension with data.
Ignoring this for the moment, we can proceed further with their analysis. As far as we can tell there is a minor mathematical error in going from their Equation (3.26) to their Equation (3.27) for the population Fisher Information. They chose to scale M_{i} as M_{i}’= (C_{1}/ D(ε))^{1/2} M_{i}, but this does not lead the scaling in Equation 3.27. The correct choice seems to be to scale M_{i} as P_{i} = M_{i} C_{1}/D(ε). This difference leads to different constant factors in front of Equation 3.27 and different scalings of variables in their analysis of regimes of validity. Neither of these changes makes a big difference to the analysis, but it is worth correcting the small error anyway. In any case, the bottom line is that the Fisher Information can be written as:
J = (D(ε)/λ_{0})^{2} (r_{0}^{2} + r_{0}^{2}r_{1}^{2} + r_{0}^{2}r_{1}^{2}r_{2}^{2} +…) (4)
This is (Equation 3.27) in Mathis et al. rewritten in terms of the scale ratios. To get the coefficients in front right you have to fix the minor scaling error mentioned above. Meanwhile, the constraint on the total number of cells is:
N (C_{1}/D(ε)) = r_{0}+r_{1}+r_{2}+… (5)
This is simply the constraint on the sum of M_{i} written in terms of Equation (3) above.
It is obvious that (4) is not symmetric between the r_{i} and hence the Fisher Information equations of Mathis et al. do not in general predict a geometric series of periods (i.e. constant r_{i}). In fact, one can show by optimizing (4) above with a Lagrange multiplier imposing the constraint (5) that:
r_{i}>r_{i+1} (6)
when the Fisher Information is optimized. (For example, in the case of two scales the problem becomes maximizing r_{0}^{2}+ r_{0}^{2}r_{1}^{2} subject to the constraint r_{0} + r_{1} = constant. If r_{1} is large we can ignore the contribution from the first term, and optimizing gives r_{0}=r_{1}.
However, the additional r_{0}^{2}term favors making r_{0} slightly larger as compared to making r_{0} and r_{1} equal.) So the scale ratios are not all equal at the optimum. Of course, we can try to get a symmetric solution optimizing the Fisher Information J in Equation (4) by supposing that the r_{i} are much larger than 1, so that the symmetric product term, (r_{0}^{2}r_{1}^{2} r_{2}^{2} r_{3}^{2}…), dominates Equation (4). Indeed, below their Equation 3.28, this is precisely the limit that Mathis et al. are considering. For r_{i} < 3 or so, their analysis is invalid in its prediction of a geometric series of periods. On the other hand, Figure 5B in their paper very clearly illustrates that for small “contraction factors” (i.e. large scale ratios) the Fisher information does a poor job in approximating the decoding error. So this means that the prediction of a geometric series of periods is based on a tenuous analysis with an uncertain range of validity. The optimal one dimensional grid in our work is perched near the edge of the estimated range where their analysis appears to valid. Thus Mathis et al. predict a geometric scaling of the grid system only when the scale ratios r_{i} are large, while we know that these ratios are O(1) from experiment. What is more, their equations explicitly predict a hierarchy of scale ratios (Equation 6 above) in the O(1) regime.
We then considered the possibility that the Fisher information approach of Mathis et al. could be rescued in two dimensions. Translating everything for two dimensional lattices would be a formidable work, so we contented ourselves with the following observations: (1) in two dimensions the Fisher information would scale the same way with the r_{1}^{2} and so would take a similar form to Equation (4) above in terms of these variables, and (2) the constraint expression for the number of cells in terms of the r_{i} would still be symmetric between the r_{i}. In two dimensions, experiments have shown a geometric series of periods with a period ratio of ∼1.5. This is too small for the last, symmetric term in the Fisher Infomation dominate. Thus in two dimensions the analysis of Mathis et al. cannot predict a geometric series of grid periods in the regime of O(1) period ratios that applies to the data.
Our paper uses very simple, general assumptions to make a number of specific quantitative predictions that Mathis et al. do not. Specifically, we: (1) predict a constant grid scaling ratio (in a regime where their alternative theory predicts a hierarchy of scales), (2) predict the grid scale factor (which they explicitly state they cannot do), (3) explain the 2d triangular grid geometry (which they do not even try to do), and (4) predict the ratio of grid period to grid field width under specific assumptions (which they parametrize in terms of tuning curve widths, contributing to their inability to predict the grid scale factor). We additionally predict the expected number of modules, and estimate the number of cells required in the mEC to implement our proposed grid scheme (see our Discussion).
The extension to twodimensional grids is certainly nontrivial. There are many regular twodimensional lattices, and our paper shows that the triangular lattice is favored. This is in no way implied by a onedimensional analysis. We were able to study the twodimensional lattices because we developed an analytical calculation (presented in the Appendix) that greatly simplified the numerical analyses.
Within the context of specific models of grid to place cell transformations we also show effects on spatial coding of selectively lesioning grid modules. (The latter analyses have significant caveats arising from our lack of knowledge of the precise relation between grid and place cells – this an important component of the comments of Referee 1 and our corresponding edits.)
All of these results go beyond the idea of a geometric progression of scales that Mathis et al. arrive at in one dimension through an extensive and sometimes inconclusive numerical analysis coupled with a study of the Fisher Information in the grid system, subject to the caveats described above.
(b) It is true that, as stated in the in the conclusion, in the work of Mathis et al. the optimal ratio depends on “the number of neurons per module and peak firing rate”. But the prediction of the optimal ratio here also varies over a wide range, depending on the assumption on the decoding scheme (and probably the shape of the tuning curves, assumption on the correlation between neurons etc.).
For the reasons stated above, our analysis has a wider range of validity that Mathis et al. (please see our Discussion).
We do not agree that the optimal ratio here varies over a “wide range”. It is quite remarkable to us that an extremely simplistic winnertakeall decoder and an optimal probabilistic decoder give optimal ratios that are so closely clustered. The results do not depend in detail on the shapes of tuning curves etc. (please see our response to comment 1). Concerning the roles of correlations between grid cells, as Reviewer 1 points out, there is now work by the Herz group (Mathis et al., 2013) and by the Roudi group (Dunn et al., 2015) that suggests that there are only weak noise correlations between grid cells that are not aligned and of the same period (we now cite this work in the fourth paragraph of the Discussion).
Whether our prediction is “tight” or not may here be a case of beauty being in the eye of the beholder. The art of doing theory often involves making the right assumptions about the relevant and irrelevant factors. We made simple general assumptions that lead to remarkable (in our view) predictions for the architecture that agree with experiment. A legitimate way to do theoretical neuroscience is to make informed assumptions, build a theory with these assumptions, and then use the match between predictions of the theory and data as guide to whether the assumptions are reasonable. Certainly, methodologically, this seems like a very reasonable way to proceed, and is well within the venerable tradition of theoretical work in the older field of physics.
Reviewer #3:
This excellent paper uses a very simple principle for demonstrating that coding of grid cells is better than coding of place cells, and generates some postdictions following this simple principle. The basic idea is that grid cells act as a kind of “Baseb” representation of space, and it is shown that the representation is optimal when the base chosen is base e (2.71828…). From that, various postdictions follow (which conform nicely with known experiments). Specifically, grid cell modules have a constant scale ratio, which should be √e in the simplest model, and closer to the real experimental value (1.4) in a probabilistic model of the cells coding. Furthermore, there should be a certain optimal ratio between the grid field width and the spacing between grid points.
The paper interacts nicely the papers of the group of Andreas Herz, which deal with similar issues using Fisher information. I have no major concerns, as I think the paper is well written, deals with an important subject, looks sound mathematically, and has a nice treatment of relation to experimental data.
Thank you for these remarks. Please see below for changes we have made in response to the specific comments.
The only issue I would like to be dealt with is to make the Discussion more clear as to the relation between this paper and the papers from the Herz group (including the relevant recent one from 2015). Specifically, they have a treatment of the issue of grid cell coding through Fisher information, and it could be of value to connect the work performed here to their line of thought, at least minimally by adding some discussion to the paper (elaborating on the existing paragraph).
We have now elaborated on the connection to the work from the Herz group. Specifically, we have added discussion of their use of the Fisher information and their different formulation of a resolution constraint. Please see the Discussion. Please also see our response to the editor’s remarks, and our response to Reviewer 2.
Another small question I am curious about is whether the winnertakeall decoder could be seen as a limitcase of the probabilistic decoder. But if that is the case, I do not completely understand the “leap” from e to 2.4.
The winnertakeall decoder is not quite a simple limit case of the probabilistic decoder as we have formulated it. One way to think of it is to imagine the WTA decoder as an approximation that replaces the smooth posterior with a flat function that drops to zero outside its support. The slight difference in the optimal ratio arises technically from the truncation of the tails in the Gaussian posterior, and the flattening of the posterior inside the region of support. Compared to the Gaussian, a boxcar likelihood has less precision (because it spreads out uniformly rather than being concentrated on the center), but it also implies less possibility of ambiguity (because it has zero tails). So the WTA decoder chooses a more aggressive (larger) scale ratio that improves precision, without being penalized by increased ambiguity.
We explain this point in the paragraph of the subsection “Probabilistic decoder” that starts “Why is the predicted scale factor based on the probabilistic decoder somewhat smaller than the prediction based on the winnertakeall analysis? […]”. We have slightly extended this paragraph.
https://doi.org/10.7554/eLife.08362.012Article and author information
Author details
Funding
National Science Foundation (NSF) (PHY1058202)
 XueXin Wei
 Jason Prentice
 Vijay Balasubramanian
PSL Research University Paris (Fondation PierreGilles de Gennes)
 Vijay Balasubramanian
The Starr Foundation
 Jason Prentice
National Science Foundation (NSF) (PHY1066293)
 Vijay Balasubramanian
National Science Foundation (NSF) (EF0928048)
 XueXin Wei
 Jason Prentice
 Vijay Balasubramanian
National Science Foundation (NSF) (PHY1125915)
 Vijay Balasubramanian
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Acknowledgements
NSF grants PHY1058202, EF0928048, PHY1066293, and PHY1125915 supported this work, which was completed at the Aspen Center for Physics and the Kavli Institute for Theoretical Physics. VB was also supported by the Fondation Pierre Gilles de Gennes. JP was supported by the C.V. Starr Foundation. XW conceived of the project and developed the winnertakeall framework with VB. JSP developed the probabilistic framework and twodimensional grid optimization. VB and XW carried out simulated lesion studies. XW, JSP, and VB wrote the article.
Reviewing Editor
 Frances K Skinner, University Health Network, Canada
Publication history
 Received: April 27, 2015
 Accepted: September 1, 2015
 Accepted Manuscript published: September 3, 2015 (version 1)
 Version of Record published: October 23, 2015 (version 2)
Copyright
© 2015, Wei et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics

 2,395
 Page views

 453
 Downloads

 10
 Citations
Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.