Abstract
For 20 years the beautiful structure in the grid cell code has presented an attractive puzzle: what computation do these representations subserve, and why does it manifest so curiously in neurons. The first question quickly attracted an answer: grid cells subserve path-integration, the ability to keep track of one’s position as you move about the world. Subsequent work has only solidified this link: bottom-up mechanistic models that perform path-integration match the measured neural responses, while experimental perturbations that selectively disrupt grid cell activity impair performance on path-integration dependent tasks. A more controversial area of work has been top-down normative modelling: why has the brain chosen to compute like this? Floods of ink have been spilt attempting to build a precise link between the population’s objective and the measured implementation. The holy grail is a normative link with broad predictive power which generalises to other neural systems. We review this literature and argue that, despite some controversies, the literature largely agrees that grid cells can be explained as a (1) biologically plausible (2) high fidelity, non-linearly decodable code for position that (3) subserves path-integration. As a rare area of neuroscience with mature theoretical and experimental work, this story holds lessons for normative theories of neural computations, and on the risks and rewards of integrating task-optimised neural networks into such theorising.
1 Introduction
It has been 20 years since the discovery of the most surprising single neuron response yet described: grid cell activity correlates with an animal’s self-position, activating when the animal is in a hexagonal lattice of positions (Hafting et al.), fig. 1A. Perhaps even more surprising than their original discovery is the finding that the grid cells lattices come in discrete modules of which a rodent will have a handful (Stensola et al.), fig. 1C. Grid cells in the same module have receptive fields that are translated (but not rotated) versions of one another which uniformly tile the space of possible phases, fig. 1B. Finally, alongside the grid cells in layer II of medial entorhinal cortex, layer III hosts cells that fire at conjunctions of a hexagonal lattice of positions and a particular heading direction (Sargolini et al.), fig. 1D. There exists extensive additional phenomenology; but these four phenomena form a cohesive explanatory target:
P1. Hexagonal-lattice tuning curves
P2. For each grid cell there is a family of grid cells, called a module, which share the same tuning curve but translated, tiling the whole space.
P3. The grid cell code contains multiple modules with different lattices.
P4. The existence of paired conjunctive grid-heading direction cells.
Giving these striking findings our questions are clear: what do grid cells do? And why in this way?

Structure of the grid cell code.
A: Neurons are tuned to a hexagonal lattice of positions in 2D space. B: They are grouped into modules: neurons in the same module have translated (but not rotated) receptive fields, and across a module they uniformly sample the phases (translations). C: There are only a handful of modules in one animal, each with its own lattice, and ~ 1000s neurons covering the possible phases. D: For each grid module there is a population of grid cells that are conjunctively tuned to both the underlying grid of the module, and a particular heading direction. E: These conjunctive neurons can implement path-integration by pushing the bump of neural activity around the module (Burak and Fiete), like the ring attractor in the fly central complex (Hulse and Jayaraman), using a shifted connectivity pattern: pure spatial neurons project to conjunctive neurons with the same spatial tuning profile (red connections), which project back to the spatial neurons shifted by their velocity tuning (blue connections). When the rightward neurons are more active than the leftward, this will cause the activity bump to move rightwards on the ring, implementing path-integration.
A large body of work has convincingly answered the first question: the grid cell representation subserves path-integration. It has long been posited that the mammalian brain is capable of integrating its velocity to track self-position (Tolman), and as soon as grid cells were discovered they became the likely neural implementation (McNaughton et al.). In the intervening time the evidence has only built.
The second question is normative: why has biology chosen to perform path-integration using grid cells? Answering this question does not just satisfy curiosity; it promises principles to predict grid cell behaviour in novel situations, and the possibility that the same principles will generalise to other neural circuits. With the wealth of careful evidence that has accumulated the normative question seems well-posed and tractable. Despite this, there has been significant controversy in the field, producing a menagerie of different models whose commonalities and relative advantages are unclear.
This review seeks to clarify the normative grid cell theory literature. We proceed as follows:
We begin with path-integration. We recall perturbative and mechanistic evidence that links grid cells to path-integration. Then we intuitively link the existence of translated tuning curves, P2, to path-integration.
We then describe non-path-integrating ‘efficient coding’ theories that model grid cells as only a high-quality positional encoding, not as position codes that connect to one-another via path-integration. We contrast with some natural instantiations of efficient coding for which place cells rather than grid cells are optimal. Then we show that those efficient coding approaches that do generate hexagonal tuning curves, P1, are unable to match the modular structure: sets of grid cells with translated axis-aligned tuning curves, P2. We justify this by explaining how this feature is detrimental for an efficient code, but crucial for path-integration.
Next, we describe models that combine efficient coding with path-integration, and show that many classes of such models are able to capture the translated, axis-aligned, structure of grid cells, though most are limited to a single module. Further, we discuss the precise velocity update mechanism and the discrepancies between normative models and biology, in particular, P4.
Finally, we discuss how nonlinear encoding objectives differ qualitatively from linear. Only with a nonlinear objective, along with a path-integration constraint, do multiple modules of grid cells appear, matching data, P3.
We conclude with a unified normative view: theories that combine path-integration, nonlinear position encoding, and efficiency in the form of biological constraints (usually synaptic or neural activity energy efficiency and nonnegative firing rates) can cohesively capture P1, P2 and P3: multiple axis-aligned grid modules. Further, we sketch remaining puzzles, including regarding P4, and lessons for the future.
2 Grid Cells Perform Path-Integration
In this section we link the existence of a translated set of tuning curves, P2, to path-integration. We begin by reviewing evidence that grid cells are involved in path-integration. We then sketch intuitively how a translated set of tuning curves can naturally underlie path-integration.
2.1 Non-Normative Evidence Linking Grid Cells to Path-Integration
In this section we briefly review two of the key strands of evidence that suggest grid cells subserve path-integration: mechanistic models and perturbation effects.
Mechanistic Models
Mechanistic models that perform path-integration match neural observations. The most successful of these are continuous attractor neural networks (CANNs). CANNs were originally developed to model path-integration of heading direction (Skaggs et al.; Redish, Elga, and Touretzky). Their simplest implementations comprise one population of neurons that encode the animal’s heading direction, and two further populations that code for conjunctions of heading direction and angular velocity, either to the left or right, fig. 1E. These conjunctive heading-velocity neurons can then be used to update the heading direction representation. First theoretically posited in the 90s, these circuits have since been verified experimentally, most beautifully in the fruit fly (Kim et al.).
Subsequent work extended CANNs to two-dimensional space, initially to model hippocampal place cells (Touretzky and Redish; Samsonovich and McNaughton; Conklin and Eliasmith). One difficulty in moving from a compact space of heading directions to an infinite space of (2D) positions is encoding the space in a finite set of neurons. Work that predated the discovery of grid cells proposed encoding space periodically, predicting lattice tuning curves but with square rather than hexagonal lattices (Samsonovich and McNaughton). Subsequent work has shown how attractor dynamics in these 2D continuous attractor circuits can naturally leads to hexagonal grid and conjunctive cells (Fuhs and Touretzky; Guanella, Kiper, and Verschure; Pastoll et al.; Burak and Fiete), and multiple modules (Kang and Balasubramanian; Khona, Chandra, and Fiete).
P4, the layer III conjunctive neurons, provide crucial evidence for these models. In a CANN each pure grid cell (i.e. tuned only to space) excites a set of conjunctive grid cells which have the same spatial tuning curve, fig. 1E, but are additional tuned to movement in particular direction, fig. 1D. In a CANN these cells implement path-integration by projecting back to the pure grid cell whose receptive field is translated along the direction of motion tuning, fig. 1E. Not only do these modelled cells match those observed in layer III, but, remarkably, measured connections between layer II and III neurons estimated from spike-time connectivity match the shifted projection pattern (Vollan et al.), presenting a ringing endorsement for the model.
There are other mechanistic models, notably the oscillatory-inteference model (Burgess, Barry, and O’keefe; Burgess; Bush and Burgess; Giocomo and Hasselmo; Hasselmo). These models were motivated by the strong theta-frequency effects in entorhinal, including grid-cell phase precession (Hafting et al.; Reifenstein et al.). However, they are unable to explain the presence of conjunctive grid cells, and more recent versions of CANN models that include theta-modulations can explain frequency effects like phase precession and theta sweeps (Vollan et al.). As such, there is strong mechanistic evidence that circuits supporting path-integration can match the measured biological effects.
Perturbation Effects
Concurrently, behavioural evidence has shown that perturbing the grid cell system impairs animals’ ability to perform path-integration dependent tasks. First, lesions to the medial entorhinal cortex impair path-integration (Van Cauter et al.; Steffenach et al.). Second, disrupted spatial navigation is a known symptom of Alzheimer’s disease, and this effect is thought to arise due to disruptions in grid coding in the medial entorhinal cortex. Evidence comes from genetic knock-in models of Alzheimer’s which have disrupted grid cells (Jun et al.; Ying et al.), alongside impaired path-integration abilities (Ying et al.). Further, people at genetic risk of Alzheimer’s show disrupted grid coding long before displaying other symptoms of Alzheimer’s (Kunz et al.). Finally, and most precisely, removal of NMDA glutamate receptors from retro-hippocampal regions led to a selective disruption of grid cells while leaving other spatially selective cells intact. This perturbation caused behavioural disruptions to path integration (Gil et al.). In sum, the behavioural evidence is specific and strong.
2.2 An Intuitive Guide to the Grid Cell Solution to Path-Integration
We now outline how translational symmetry amongst tuning curves, P2, forms a natural substrate for path-integration. For simplicity, we work here with binary neurons that are either on or off, but the arguments generalise.
Path-integration involves updating your representation in response to movement. Upon taking a step, Δx, you have to update your internal encoding of position, g(x), appropriately:

A place cell code would make such updates very easy. The combination of the currently-active place cell and your movement specify the next representation: the place cell displaced by the movement, fig. 2A. However, this requires a place cell for every potential position, limiting how many positions you can encode.

Path-integration with different codes
A: Path-integrating with a place cell code is easy, current cell plus step uniquely determines next cell, but it is limited by the number of cells. B: Multifield cells improve the coding capacity but make path-integration more challenging, instead resources must be devoted to learning a mapping between unique combinations of cells. C: Within a grid moudle, current cell plus movement again uniquely determines the next cell: no matter which firing field of a grid cell you are in, thanks to the translational symmetry, you always know which cell to activate after a step. As such, grid cells elegantly combine the easy path-integration of place cells, with the higher capacity coding of multifield cells, and the path-integration mechanism generalises across space.
Instead imagine a cell that activates in multiple positions—a multifield place cell, fig. 2B. These neurons can improve your encoding of position: rather than giving each position a unique cell, they are given a unique combination of cells, of which there are many more, improving the capacity of the code. However, this implies a more complex path-integration mechanism: knowing that a neuron is active and which movement you make is not enough; you need to know the full set of currently active neurons, and, upon stepping north, must have a mechanism to map each combination to its neighbour one step north. This, while possible, is much more complex and specific only to the particular arrangement of firing fields.
Modules of grid cells combine the coding quality of multifield cells with simple path-integrability (Kubie and Fenton). Each position is encoded by a combination of neurons, one in each module, leading to a more informative multifield-like code. Crucially, however, the path-integration problem is separated by modules, and within each module it is simple. Knowing that one neuron in a module is active and that you make a movement north uniquely determines which neuron in that module should be active next—the one with a receptive field translated one step north, fig. 2C. By baking translational symmetry into the multifield pattern path-integration is made easy.
In short, these are the functional insights that underlie the grid cell code: a dense multifield code for position combined with easy module-wise path-integration. Indeed, in the final section, we conclude by outlining how a combination of these two functional goals with simple biological considerations (nonnegative small firing rates) leads to grid cells. For now we turn to attempts to model grid cells without reference to path-integration.
3 Grid Cells are not the most Efficient Code for Space
In the previous section we outlined the links between path-integration and grid cells, in particular their modular translated receptive-field structure, P2. In contrast, in this section we review what we term ‘efficient coding’ theories of grid cells. These normative models posit that grid cells are the most efficient encoding of position, without mentioning path-integration. We labour on these models as many have become prevalent, yet they lack the key computational feature that defines entorhinal cortex—path-integration—and do not match many critical aspects of grid cell data. We will begin by showing instantiations of efficient coding that do not generate hexagonal tuning curves. We will then discuss efficient coding that do generate hexagonal tuning curves, P1, but will show that in each case they do not capture the translated receptive fields, P2, a symptom of dropping path-integration.
3.1 Context: Many Efficient Coding Models do not generate Grid Cells
Most efficient coding theories can be decomposed into two parts. The first measures the quality of the encoding, for example, how well can a linear decoder predict where you are from your representation. The second measures or enforces the efficiency or biological plausibility of the code, for example via low nonnegative firing rates. Combinations of the two lead to some of the famous results in theoretical neuroscience, such as histogram equalisation via the fly eye’s nonlinearity (Laughlin), whitening via centre-surround in retinal ganglion cells (Atick and Redlich), or sparsifcation of natural images via the V1 gabor code (Olshausen and Field).
Before studying efficient coding theories that generate grid cells, we make a useful counterpoint: very natural instantiations of efficient coding of space do not produce grids. Comparing between these theories clarifies the choices that lead to grids. Sengupta et al. use the similarity matching objective: given two inputs (e.g. positions), x and x′, and their neural encodings, g(x) and g(x′), this objective encourages the dot-product similarity of the representation, g(x)T g(x′), to match that of the input similarity structure, xTx, through maximising the following loss:

Sengupta et al. take inputs from a compact continuous space, such as angles on a ring, and (reasonably) assume that the input similarity, xTx, decays with distance: nearby points are similar, distant are dissimilar. From this they analytically derive that, with infinitely many neurons, place cells are the optimal nonnegative representation. This is not specific to this loss: recent work has drawn similar conclusions from an information theoretic measure of coding quality (Deighton et al.). This is somewhat natural, place cells are a very informative code, and a much simpler one than multifield codes. When there are enough neurons such that a place cell code can tile the space with sufficient resolution, these works present evidence that some efficient coding approaches prefer place cells (in section 5 we also show that place cells are preferred even with few neurons).
As such, it seems difficult for efficient coding of space alone to produce grid cells. To modify an efficient coding theory we can either change how coding quality is measured or the efficiency constraints. Many efficient coding models can be described in this way and succeed in generating hexagonal lattice tuning curves, P1. They are, however, unable to account for each module’s axis-aligned translated receptive field structure, P2. We conceptually cluster these approaches into two groups, nonnegative bandpass filter models, which we review next, and clustering models, which we review in appendix A.
3.2 Grid Cells via Nonnegative Bandpass Filtering
We now review nonnegative efficient coding grid cell models that generate hexagonal lattices via nonnegative Fourier combinations, and in particular, a bandpass filter effect. These include nonnegative PCA models (Dordek et al.; Sorscher et al.; Sorscher et al.) and metric encoding models (Pettersen et al.).
Nonnegative PCA of difference-of-Gaussian Place Cells
The first set of models use an encoding objective that rewards the representation for containing high power at a critical spatial frequency, then use nonnegativity to produce a hexagonal lattice. The pivotal link in these arguments was first described by Dordek et al. who modelled grid cells as the nonnegative PCA of difference-of-Gaussian place cells, producing hexagonal receptive fields. This link is neat, but, in brief, it suffers from two major flaws. First, it relies on the use of difference-of-Gaussian place cells which are not observed; second, it fails to produce modules of translationally-symmetric grid cells.
The similarities to the approaches in section 3.1 are large; the largest difference is the choice of target, x. rather than something like Gaussian place cells, whose similarity structure decays with distance, they use difference-of-Gaussian cells. Dordek et al. (later paralleled by Sorscher et al. and Sorscher et al.) nicely explain the effect of this substitution: difference-of-Gaussian cells lead to a bandpass covariance structure peaked at a particular frequency band fig. 3A, leading the optimal linearly-decodable representation to highly encode this frequency. Combining this with a lattice discretisation effect from the finite room leads to square grid cells (Dordek et al.). Finally, enforcing nonnegative firing rates changes the optimal solution from square to hexagonal grids, justified either through a triplet interaction effect (Sorscher et al.; Sorscher et al.), or the efficiency in positivising the code (Dordek et al.).

Grid Cells via Bandpass Filtering.
A: A Gaussian place cell code has a covariance whose frequency content is a smoothly-decaying Gaussian, left, but a difference-of-Gaussian code has covariance whose frequency content peaks at a non-zero frequency, figure from Sorscher et al. B: The grid cells that result from nonnegative PCA on difference-of-Gaussian place cells are not translationally symmetric, each population contains grid cells whose axes are rotated relative to one another (for example, the left and rightmost grid cells from dordek have lattices rotated 30° relative to one another), figures from Dordek et al. and Sorscher et al. C: We create a representation, g(x), that contains a single frequency, and plot the conformal loss, eq. (3), as a function of this single frequency for a few σ values. This loss is minimised (dark blue) at an intermediate value of frequency: a bandpass filtering effect. D: Metric encoding also produces a population of grid cells that are rotated relative to one another, figure from (Pettersen et al.).
This approach has been influential with many papers using the nonnegative PCA of difference-of-Gaussian place cells (Dordek et al.; Sorscher et al.; Sorscher et al.; Schøyen et al.; Tang, Barron, and Bogacz). It has also been controversial, prompting a rebuttal (Schaeffer, Khona, and Fiete), a rebuttal to the rebuttal (Sorscher et al.), and two further rebuttals cubed (Schaeffer et al.; Schaeffer et al.). One point of disagreement lay in the finetuning of parameters required to produce grid cells: an interesting point, but clearly not fatal since the brain could simply use these parameters. A more existential threat comes from the choice of difference-of-Gaussian tuning curves. These fit hippocampal place cells less well than Gaussian curves, but, as the theoretical analysis states, are clearly vital for the production of hexagonal grid cells. Many more realistic choices of place cells don’t produce grid cells in this framework (Schaeffer et al.), since they don’t generate the required bandpass filter. This could be an interesting prediction about the relationship between place and grid coding, but currently there’s no evidence this particular link exists.
Second, and fundamentally, these approaches do not capture the translated receptive field structure of grid modules. Instead, they produce grid cells whose orientations cluster into two groups offset at 30 degrees (Pettersen et al.) fig. 3B, a pattern that is not observed experimentally. Further, when they do produce multiple modules, the intermodule relationship appears to be worryingly governed by numerical discretisation effects (Sorscher et al.), nor does the framework offer an explanation of conjunctive cells, P4. Only when combined with a path-integrating task (for example by training an RNN to both path-integrate and linearly project to difference-of-Gaussian place cells) do you get axis-aligned grid cells, a topic we’ll return to. Hence, this theory appears to be, at best, part of the solution.
Metric Encoding
A seemingly-distinct class of theories study a loss that encourages the ‘neural metric’ to match the metric of space. We will show that we can understand these as performing a similar bandpassing effect as discussed.
A metric is a function that measures distances between points. Matching a particular metric means that the distance between two points, x and Δx, is preserved in the distance between the representation of those points, g(x) and g(Δx), at least for a small region of space (small Δx):

where s is a scaling factor. Normative approaches including losses like these are common routes to grid cells often in combination with path-integration (Gao et al.; Gao et al.; Xu et al.; Pettersen et al.). Here we focus on the findings of Pettersen et al.: optimising a nonnegative unit-norm representation to preserve distances while penalising the L1 norm of the firing rates is sufficient to generate hexagonal firing fields without path-integration. The loss used is:


The first term, called the conformal loss, forces the neural distance, ‖g(x) – g(x′)‖2, to match the separation in space, but only when x and x′ are close, via the 

When σ is smaller this loss generates hexagonal grids. We now show that this can also understood as a Fourier bandpass effect. The loss contains two biases, one that penalises high frequencies, another low frequencies, that together create a bandpass filter. The local region, encapsulated by σ, sets a lower bound on the frequency content of the code: if your code contains a component oscillating slower than 
Having established the bandpass filter, similar arguments to the previous section can then be used to justify how positivity and capacity constraints might lead to grid cells. Indeed, hexagonal grid cells with a single lengthscale emerge from this optimisation, with the lengthscale controlled by σ (Pettersen et al.). This is not a complete picture: for example, it is an interesting mathematical puzzle that combining this loss with an L1 capacity constraint, but not an L2, leads to hexagonal grids (Pettersen et al.). Regardless, these grid cells still suffer from the same shortcoming of other efficient coding only approaches: the grids are not aligned within the same module, rather, they feature the same loose 30° alignment as the Fourier approaches, fig. 3E. Only by adding path-integration is this effect removed.
Summary
Nonnegative combinations of Fourier components can generate hexagonal grid cells. In addition to some plausibility concerns (place cells are not well modelled by difference-of-Gaussians), without path-integration, these models are unable to reproduce the translationally symmetric modular structure that is vital for path-integration.
3.3 Conclusion: Inefficiency of Axis-Aligned Grid Cells
From this large body of work (see also clustering models in appendix A) we conclude that grid cells, despite clearly being a good code, are not the optimal efficient code of 2D space. In natural instantiations of the efficient coding problem the optimal solution are place cells (with either one or multiple fields depending on the problem, section 5). This matches unpublished findings from Tzushuan Ma’s PhD thesis (Ma), and recent work that shows multifield place cells, as in the hippocampus, are a very good code (Rich, Liaw, and Lee; Harland et al.; Eliav et al.). Changing the problem in various ways can make hexagonal-lattice receptive fields optimal, either through a bandpass filter, section 3.2, or a dense packing argument, appendix A. However, it never recovers translational symmetry. This is intuitive: the grid-cell code has some glaring design flaws from a pure efficient coding perspective. The periodicity of grid cells means they identically encode points separated by the lattice symmetry, rendering a single cell unable to distinguish them. The translational symmetry within a module means that rather than helping each other to decode new points, points that are indistinguishable to one neuron are also indistinguishable to all neurons in the module! Breaking the symmetry, either by rotating and scaling the grid lattices of different neurons or removing the lattice entirely, usually improves the coding quality. As such, translated receptive fields, P2, are a key symptom of grid cells’ role in path-integration, and very hard to justify from an efficient coding perspective.
4 Path-integration + Position Encoding = A Module of Grid/Place cells
In section 2.2, we outlined how grid modules’ translational symmetry forms an ideal substrate for path-integration, something that purely efficient coding approaches are unable to capture. Here, we review various models that combine path-integration with an encoding loss and recover a single module of axis aligned grid cells.
4.1 Path-Integrating Models of Grid Cells
Path-Integrable Efficient Codes
Dorrell et al., similarly to unpublished work (Ma), use mathematical analysis to combine path integration with the earlier efficient coding approaches. Identically to an efficient coding approach, the representation is asked to encode space subject to some efficiency constraints. However, crucially, the code is also asked to permit path-integration: g(x + Δx) = f(g(x), Δx) predicting next representation, g(x + Δx), from the current representation, g(x), and velocity, Δx. For mathematical analysis, this constraint is enforced using action-dependent weight matrices: each weight matrix has to correctly implement all transformations of the code for a given action, independent of the animal’s current position:

This constraint ensures that if the agent is at a position x, it can use W(Δx) to predict where it will reach next, permitting path-integration. Further, it can be mathematically derived that this constraint forces the code to contain a small number of Fourier features, providing a basis for further analysis. Combining this with an efficient coding loss leads to either one or multiple modules depending on the choice of loss (Dorrell et al.). It does not directly explain the conjunctive grid coding, nor are action dependent weight matrices particularly biologically plausible. Both of these problems can be alleviated through action gating, a plausible scheme to implement action-dependent weight matrices as seen in other models (Logiaco, Abbott, and Escola).
Efficient Coding of Trajectories
Rebecca et al., following similar work by Waniek, formulate grid cells in a reversed manner: rather than requiring velocity to update the encoding from one timestep to the next, they instead predict velocity from each current and next encoding. From this approach, and a small number of assumptions, they show that a single hexagonal grid module is optimal for predicting velocity. While elegant, this argument suffers from using binary neurons and a discretisation of space, and struggles to naturally encapsulate multiple modules. Regardless, this alternate formulation of path-integration makes some useful novel predictions, such as how a 2D module should encode a 1D sequence.
Grids as Eigendecomposition of Transition Matrices
A set of models have formalised spatial coding via transitions on 2D graphs. For example, Stachenfeld, Botvinick, and Gershman argue that the hippocampus encodes a successor representation (a simple function of a transition matrix) of space, and that the thresholded-nonnegative eigenvectors of the successor representation (and thus the transition matrix)—which are periodic—correspond to grid cells. Later Yu, Behrens, and Burgess generalised this approach, showing that directed, rather than diffusive, transitions matrices can be used to path-integrate. However, the grid cells that emerge from eigende-composition of such transition matrices are unlike real grid cells. They exist in modules of only two neurons, many of which are not hexagonal grids but instead form bands or amorphous blobs, fig. 3C, especially in non-square rooms (Stachenfeld, Botvinick, and Gershman). Further, while one of the selling points of the successor representation theory is its sensitivity to transition statistics, pure grid cells only emerge with a diffusive policy, whereas real grid cells are more robustly hexagonal (Stensola et al.; Vollan et al.). Thus, while these models are an elegant mathematical framing, they leave several unanswered questions: why only some eigenvectors match grid behaviour; why each modelled grid module has only 2 neurons per module; why empirical grid cells are not so dramatically affected by transition statistics; and how this model could account for conjunctive grid cells.

Successor representation eigenvectors are poor models of grid cells, figure from Stachenfeld, Botvinick, and Gershman.
Neural Network Models
The most common path-integration approach is to train recurrent neural networks (RNN) to path-integrate, and then to use the learnt internal representation as a model of grid cells. In its simplest instantiation, RNNs are provided a sequence of actions, and required to output the corresponding sequence of positions. This captures all three aspects of the efficient path-integrating code above: the code must path-integrate, it must distinguish different points so they can be decoded, and it must do efficiently; with low weights (if using regularisations) and with nonnegative activities (if using ReLU nonlinearities). However, the precise design choices, and the results, have varied considerably.
Some models provide the action as a standard input to the RNN, a(t):

while others learn a mapping between the action and the recurrent weight matrix, similar to the normative models above:

Some networks predict (x,y) coordinates, others Gaussian place cells or difference-of-Guassian place cells.
Some networks use a ReLU nonlinearity, enforcing nonnegativity, others use tanh.
Weight or activity is often constrained, either through a regularisor, or through a unit norm constraint.
Other regularisors might be added, most often the conformal isometry loss, section 3.2.
An early pair of results suggested that path-integrating RNNs could model grid cells. Cueva and Wei trained standard RNNs to path-integrate and found grid and band-like neurons, though these grids were often square rather than hexagonal. Key choices included the use of tanh rather than ReLU nonlinearity, meaning the activities were both positive and negative, and reading out (x,y) coordinates rather than a place cell code. Concurrently, Banino et al. trained a large reinforcement learning model and showed that a feedforward layer in the network, heavily regularised by dropout, learnt somewhat griddy neurons, though there are concerns that these ‘grid cells’ are as grid-cell-like as low-pass filtered noise (Sorscher et al.).
Since then, the class of models that learn an action-dependent weight matrix, eq. (7), have been very successful. First studied by Issa and Zhang, who derived conditions for such a model to work, these were then used as part of a larger model of the hippocampal-entorhinal system by Whittington et al. and Whittington et al., who trained sub-networks to path-integrate, and found hexagonal modules of grid cells, though they baked the modular structure into the network. Another vein of work used the conformal isometry losses and a difference-of-Gaussian place cell readout to learn a single module of hexagonal grid cells (Gao et al.; Gao et al.; Xu et al.). Finally, Schaeffer et al. showed that training the action-dependent matrices in a ReLU RNN with a unit-norm constraint, an activity loss to reduce network capacity, a conformal loss, and a separation loss, led to multiple modules of axis aligned grid cells. Since these models do not explicitly capture the way velocity is coded by neurons, instead embedding it in the changing weight matrix, this architecture will never capture the conjunctive grid cells. Despite this, they present a ringing endorsement for the idea that optimising for a good, efficient, path-integrating code for position is sufficient for recovering grid-cells.
Path-integrating in more standard RNNs, eq. (6), can also lead to grid cells. Sorscher et al. and Sorscher et al. trained such an RNN to predict difference-of-Gaussian place cells and found a single axis-aligned module of grid cells, later supported by Tang, Barron, and Bogacz. A similar story was seen in Pettersen et al., who showed that a metric approach combined with path-integration led to a single module of axis-aligned hexagonal grid cells. Finally, Xu et al. show that a standard RNN formulation with a unit-norm, positivity, and conformal constraint is sufficient to generate a single module of grid cells, matching theoretical work (Schøyen et al.). Each of these approaches highlight a move from efficient coding-only approaches to path-integration: the coding losses alone produce hexagonal grid cells, but the axes of these grid cells are not aligned, section 3.2. Additionally asking for path-integration aligns the axes.
Each of these models demonstrates that RNNs trained to path-integrate naturally generate a module of grid cells. We will focus on two further points of discrepancies. In section 5, we will discuss how many of these models are limited to a single module. First, however, no model has reported the path-integration mechanism using conjunctive grid cells, P4, as in purely mechanistic models (Burak and Fiete), a discrepancy we will discuss next.
4.2 A Velocity Update Puzzle
In this section we review an ongoing puzzle regarding the precise grid cell velocity-update mechanism. In section 2.1 we discussed the how the pre-eminent mechanistic models, CANNs, use conjunctive neurons to path-integrate, matching connectivity measurements (Vollan et al.). Here, we outline a discrepancy between this and normative models.
Of the path-integrating theories listed in section 4.1, most do not comment on velocity-update mechanism. They either abstract away from this part of the model, or use an action-dependent weight matrix that muddies how such dependence arises. The only models which do include such effects are RNNs with standard updates, eq. (6). Surprisingly, Schøyen et al. and Pettersen et al. found that such networks learn a population of band-like cells, and that these are the neurons that seem to do the work of performing path-integration—the network can path-integrate without the grid cells! This is in contrast to a CANN model in which the grid cells are vital for the path-integration. Chu et al. elegantly explain this finding: in task-optimised RNNs the two-dimensional path-integration problem is effectively broken down into two one-dimensional problems. Along two directions a population of cells integrates motion using a standard ring attractor architecture and, due to their focus on one dimension, these cell’s tuning curves resemble band cells. Then, since they are using a bandpass filter loss which specifically encourages the formation of grid cells section 3.2, a module of axis-aligned grid cells is generated from the band cells.
As such, it seems that the brain and task-optimised RNNs with standard architectural choices use fundamentally different path-integration mechanisms. Resolving this discrepancy remains an open question.
4.3 Conclusion: Path-Integration and Axis-Aligned Grid Cells
Overall, it seems well established that RNNs optimised to perform a task that includes (1) path-integration, (2) encoding of position, and (3) biological constraints (mainly nonnegativity and low firing rates) robustly learn grid cells. However, as yet the precise structure of the set of necessary constraints is unclear, especially when using a more standard RNN architecture, and the discrepancy between velocity-update mechanisms remains puzzling.
5 Only with Nonlinear Encoding are Multimodular/Combinatorial Solutions Optimal
By encoding each position with a unique combination of cells, combinatorial codes achieve higher capacity than unimodal codes, section 2.2. However, this comes at a trade-off in ease of decoding position from such a code. In particular, here we outline how ‘linear’ approaches cannot make use of multi-field codes and instead prefer either place cells or one module of grid cells; only with more powerful ‘nonlinear’ approaches do combinatorial multifield place or multimodular grid representations become optimal. Lastly, we provide a cohesive summary of the conditions in which grid cells are optimal positional representations—nonlinear efficient codes of path-integration—and review successes at predicting the optimal size and alignment of grid modules.
5.1 Combinatorial Codes Require Nonlinearity
Consider a population of N binary neurons; assigning each position its own disjoint set of cells can encode at most N positions, one per neuron. Alternatively, a combinatorial scheme which assigns each position a unique but overlapping set of cells can produce up to 2N unique codes, enormously expanding the set of encodable positions. It is this basic fact that makes combinatorial positional codes, be that the apparently random multi-scale code in the hippocampus (Eliav et al.) or the multimodular structure of grid cells, more effective.
Yet, using such a combinatorial code requires nonlinear processing. Imagine trying to decode whether or not you are in position x. In a simple place cell code this can be done linearly: simply check whether the place cell uniquely corresponding to x is on or off. It’s similarly easy to decode position in a rotation of a place cell code. But in a combinatorial code, x corresponds to many place cells, and each place cell corresponds to many x. Decoding x from a combinatorial code thus requires responding to a specific conjunction of place cells, and this is not something that a linear decoder can do. It requires nonlinearity.
‘Functionally linear’ losses prefer single grid modules
Losses that rely on linear decoding of place cells, PCA of place cells, or linear similarity objectives, such as eq. (2), struggle to profit from multimodularity. Indeed in our previous work we demonstrated that losses that are a linear function of similarity, such as eq. (2), exhibit a failure mode: they encourage further distinguishing already well distinguished positions rather than those that are poorly distinguished. This representational pressure leads to place cells or single modules of grid cells, rather than a combinatorial code (Dorrell et al.). This finding reflects a broader pattern: all prior works that use metric encoding or nonnegative PCA of difference-of-Gaussian place cells is similarly ‘functionally linear’, and to the best of our knowledge, all works that combine such losses with path-integration lead to a single module (Sorscher et al.; Sorscher et al.; Tang, Barron, and Bogacz; Schøyen et al.; Pettersen et al.). We note that while some models do report multiple modules using these losses, they only do so by baking a multiple modular structure into the code to begin with (Gao et al.; Gao et al.; Xu et al.), i.e. multiple modules do not emerge as the optimal code.
‘Functional nonlinearity’ profits from multiple modules
This failure model of linear losses motivated us to introduce the following ‘nonlinear’ similarity matching objective (Dorrell et al.):

In this loss, if the representations of two points are already well distinguished (g(x) and g(x′) are already further apart than σ), no further gain is achieved by distinguishing them further. Instead, the code focuses its efforts on distinguishing poorly distinguished points. This encourages the formation of combinatorial codes, which make best use of the available neurons. Indeed, we know of only two normative models that derive multiple translationally symmetric modules as the optimal solution, ours (Dorrell et al.) and Schaeffer et al. Both use the nonlinear similarity matching objective we proposed, eq. (8).
In sum, we suggest that this division between ‘functionally nonlinear or linear’ losses—which correspond to linear or nonlinear decodability of position—can neatly explain which approaches generate single or multiple modules, depending on whether the loss is flexible enough to take full advantage from a combinatorial code.
5.2 The Interplay of Path-Integration, Nonlinear Decoders, and Resource Constraints
We are now in a position to summarise the optimality of different spatial representations as a function of a small number of key modelling choices: linear versus nonlinear loss functions, whether path integration is required, and neural resource constraints (i.e., the number of neurons; throughout, we assume nonnegative neural activity with unit norm).
One initially surprising finding is that, when many neurons are available, place cells are optimal independent of other considerations. In section 3.1 we related how place cells are the optimal nonnegative similarity matching code when there are more neurons than positions to be distinguished. We find that the same is true with a nonlinear similarity matching loss, and/or with an additional path-integration constraint (for example, by enforcing actionability, eq. (5), Dorrell et al.). We suggest this is because when there are enough neurons, simple place cell codes can tile the space at sufficient resolution.
When the number of neurons are scarce, under linear losses place cells are optimal without a path-integration requirement and a single module of grid cells when path-integration is required. Both these codes are not combinatorial as linear losses do not profit from combinatorial codes, fig. 5 top. On the other hand, with a nonlinear loss multifield (combinatorial) place cells are optimal without a path-integration requirement, while multiple modules of axis-aligned grid cells are optimal when path-integration is required, fig. 5 bottom.

A Space of Optimal Codes.
We optimise a nonnegative, unit-norm representation of position to minimise a similarity matching objective either linear, eq. (2), or nonlinear, eq. (8), with or without a path-integrating constraint, eq. (5). With more neurons than positions all choices lead to place cells (not shown). With few neurons and no path-integration (left column) we get place cells with a linear objective, and random multifields with a nonlinear objective (see also fig 15C, Dorrell et al.). Adding a path-integration constraint leads to either one grid module for the linear similarity loss, or multiple under the nonlinear loss (for more discussion, see Dorrell et al.).
5.3 Efficient Coding using Multimodular Codes
We have discussed how combining low nonnegative firing rates with a sufficiently flexible nonlinear decoding and path-integration leads to multiple modules of translationally symmetric grid cells. We now consider one final normative question: how should these modules actually be structured? What lattice should they use (e.g. square or hexagon)? What should the relative size and orientation between modules be? And how many neurons per module?
The first forays in tackling these question assumed a multimodular structure and then optimised the remaining parameters to maximise the mutual information between neural activity and position, through proxies such as the Fisher information. Having demonstrated that a multimodular grid code encodes space with a higher accuracy than a place cell code (Sreenivasan and Fiete; Mathis, Herz, and Stemmler), it was found that, of all lattice choices, hexagonal lattices were optimal (Mathis, Herz, and Stemmler; Mathis, Herz, and Stemmler). Subsequent related works derived similar results(Stemmler, Mathis, and Herz; Wei, Prentice, and Balasubramanian) and emphasised the effect of independent per-module noise (Towse et al.). Further, the same set of ideas have been used to suggest that fewer neurons are required in grid modules with longer lengthscales (Mosheiff et al.).
Much work then analysed the optimal choice of ratio between the lattice lengthscales of successive grid modules. Early experimental work suggested a geometric progression of lengthscales with a constant ratio of between 1.4 and 1.7 (Stensola et al.; Barry et al.), findings that were matched by multiple theoretical accounts (Wei, Prentice, and Balasubramanian; Mathis, Herz, and Stemmler). However, it remains unclear whether a geometric progression model is actually well-matched to data, especially as measuring multiple modules simultaneously is technically difficult. Indeed, recent models based on developmental arguments predict non-geometric ratios that also appear to match measurements well (Khona, Chandra, and Fiete), while our own work which suggests that grid modules should be related by non harmonic ratios (Dorrell et al.).
Grid modules are not only defined by their lengthscale, but also the relative orientation to other modules. To understand these relative orientations, we used the same efficient coding arguments (that show multiple modules of grid cells are optimal) to predict that successive grid modules should be oriented at small angles relative to one another (Dorrell et al.), matching measurements (Stensola et al.; Lykken et al.). Finally, encoding arguments have also proved useful at understanding how grid cells code 1D space (Rebecca et al.), the alignment of grid axes to square rooms (Rebecca et al.), and the changing of grid lattice parameters to different room shapes (Stensola et al.; Dorrell et al.).
In sum, having arrived at a multimodular structure, efficient coding is a useful framework for understanding the details of the multimodular arrangement.
6 Discussion
Over a decade of normative grid cell theorising points to a core claim: grid cells form a (1) high-fidelity, (2) pathintegrating, (3) biologically-plausible code for space. In contrast, normative attempts to explain grid cells without path-integration cannot match their translational symmetry, section 3; and theories using ‘overly linear’ measures of coding capacity struggle to explain multimodular structure, section 5. This coheres with mechanistic and perturbative work to support a compelling narrative regarding the grid cell code.
There remain puzzles. While models based on action dependent weight matrices recover the multi-modular axis-aligned structure of grid cells in multiple models (Dorrell et al.; Schaeffer et al.), these models are unable to model the conjunctive grid cells. Models using standard RNNs can make statements about precise velocity update mechanisms (Sorscher et al.; Schøyen et al.; Chu et al.), but do so in ways that don’t match biology (Schøyen et al.; Chu et al.), are at times badly behaved (Schaeffer, Khona, and Fiete; Schøyen et al.; Pettersen et al.), and struggle to produce multiple modules of grid cells. As such, a normative model that cohesively captures all four grid cell phenomena we began with remains at large. That said, it seems likely that a careful combination of the best parts of existing models might succeed. We now discuss two broader open questions, and a few implications of this body of work.
6.1 Future Work
Grid Cells in Other Spaces
We have focused on grid cells in 2D, a natural question is how might they behave in other spaces. Normative theories of path-integrable representations naturally generalise to other spaces, and almost always predict multiple modules densely packed lattices in other spaces (Stemmler, Mathis, and Herz; Dorrell et al.), matching similar formulations in one dimension (Aceituno, Dall’Osto, and Pisokas). However, it appears that grid cells are a bespoke 2-dimensional system: 1-dimensional maps are understood by mapping onto a slice of the grid lattice (Yoon et al.; Jacob et al.; Rebecca et al.); conversely, 3D grid cells appear to have multiple randomly scattered fields (Ginosar et al.; Grieves et al.), in contrast to either the models discussed so far, and more boutique projection models (Klukas, Lewis, and Fiete). Models have been proposed that cohesively capture some aspects of both 2D and 3D coding (Ginosar et al.), but, as reviewed, appendix A, they do a poor job at fitting 2D behaviour. Whether there is some preserved structure in the 3D recordings, or a more general model that explains how grid cells encode spaces beyond 2D remains a topic for further work.
Warping of Grid Cells to Environments or Rewards
One finding is that grid cells don’t always look so… griddy. In trapezoidal environments the lattice bends along the walls (Krupic et al.), the lattice lengthscale gets smaller near boundaries (Hägglund et al.), in large environments there are often inhomogeneities (Stensola et al.; Gutiérrez-Guzmán, Hernández-Pérez, and Dannenberg) (though these sometimes disappear with experience; Carpenter et al.), grid fields warp in response to rewards (Boccara et al.), and the grid metric stretches in inhomogeneous environments (Wen et al.). Some models have taken this at face value, and attempted to normatively explain the warped grid responses, for example as the optimal code for uncertainty (Kang, Wolpert, and Lengyel). Others have argued that the warping is the effect of an optimally mixed encoding of additional variables beyond space (Whittington et al.; Dorrell et al.). A final approach models these effects as a re-centering of the grid code in response to an external cue, such as a boundary (Ocko et al.). Since these last two approaches understand inhomogoneities through perturbations to an underlying pure grid cell code, they are consistent with existing normative theories. Indeed, the observed rate maps could represent pure grid code after a spatially dependent recentering operation, making perfect grids appear bent in some environments or towards some rewards. However, the same is not true of the first model, and, as yet, no model is able to bridge these two domains clearly.
6.2 Some Implications
How constrained are these ideas?
Across this body of work, the way in which the three ideas: ‘high-fidelity’, ‘path-integrable’, or ‘biological’, have been formalised has varied. This is a good thing, demonstrating robustness to ad hoc modelling choices. However, some recurring motifs stand-out. In all cases, the biological constraints limit the capacity of the system (e.g. by limiting the range of firing rates), and ensure the problem is not rotationally invariant, using a nonnegativity constraint either on neural firing or on weights. Similarly, path-integration always implies some mechanism for forward modelling: predicting the next encoding from your previous encoding and an action. Finally, the implementation of a high-fidelity code has relied on some form of ‘functional nonlinearity’ in the decoding loss.
Single Neurons are Pleasingly Constraining
Broadly, it is potentially unclear how much measuring a small number of single neurons can reliably guide our understanding of the brain (Whittington and Dorrell). Alternative approaches advocate for studying population-level metrics (e.g. Stringer et al.). There are only ~ 10000s grid cells in a rat (using estimates from Clark and Nolan; Gatome et al.; Diehl et al.), yet reviewing this literature we see that it has been incredibly constraining. Fitting just four high-level properties of the system has identified a core set of computational principles across models, and has proved adept at discounting alternative hypotheses. This is a ringing endorsement for the plodding progress of standard neuroscience.
RNNs as neural models
Using task-optimised neural networks as neural models is somewhat controversial; in complex tasks they are often as confusing as the brain (Banino et al.), limiting the insights we can gain from them. Yet the grid cell literature presents a compelling case for their power when coupled with clear experimentation, and thorough analysis. Task-optimised networks permit you to try a variety of hypotheses relatively quickly and flexibly. Their downside is that the signal you measure might have been caused by any number of choices made in architecture, training, or regularisation, and it is often hard to test for all of these. Simplifying the model to the point where theoretical work is possible can provide insight, allowing fine-tuning of the RNN experiments. For grid cells, iterations of this cycle seem to have nearly converged. We are optimists, and hope this will be more broadly true, suggesting a version of ‘analytic connectionism’ that pairs careful theory and network modelling. Yet, we note that in the grid cell world this has already taken a decade of intense arguments: it is not necessarily easy.
The Power of Normative Modelling
Early work demonstrated that multimodular grid cells are a much more informative code for space than place cells (Mathis, Herz, and Stemmler), leading to a view of grid cells as an efficient code for space. We hope this review has disabused you of this notion: grid cells are an efficient, but not the most efficient code for space—rather, they are the most efficient path-integrating code for space: random multifield place cells are the most efficient code, fig. 5. This highlights a role for normative modelling: by searching amongst all possible codes we are forced to consider all alternatives, highlighting how, if the only goal was efficiency, the best choice would never be grid cells. This null result cleanly highlights a key missing ingredient: path-integration.
6.3 Conclusion
In conclusion, the manifold structures present in the grid cell system have provided impressive constraints for normative theorising. After much work, the field has settled on a consistent set of normative theories: grid cells are a high-fidelity, path-integrable, biological (i.e. constrained and axis-dependent) code for space, agreeing with mechanistic and experimental work. In the future we hope these insights will generalise to grid cells in more complex settings, other neural systems, and provide broad lessons for successful normative theorising.
Code
A simple jupyter notebook to generate the optimal representations in fig. 5 and fig. 6 can be found at https://github.com/WilburDoz/If_Grid_Cells_are_the_answer_what_is_the_Question.git.
Supplementary material
A Hexagonal Lattices via Dense Packing Arguments
Hexagonal lattices are the densest packing of spheres in 2D space, or analogously, the best arrangement of sensors to minimise the average distance between all points in 2D space and the nearest sensor. One family of efficient-coding-only approaches use this idea to produce hexagonally tuned cells.
Mok and Love argue that place cells form a conceptual clustering of inputs: which place cells is active for each input corresponds to its cluster and the quality of the encoding is given by the resolution of the clustering (i.e. the best clustering would give every input its own cluster, the worst would assign all inputs to one cluster). They argue that space can be thought of as a uniform continuum of inputs to be explained, and that, thanks to dense packing, the optimal choice of a finite set of place cells (clusters) is a hexagonal grid. They then argue that grid cells are a measure of proximity between points in space and their nearest cluster—which in this model is a measure of how well fit that point is by the learnt clusters. Since the data is best explained at cluster centres this forms a hexagonal lattice.
Ginosar et al., prompted by their discovery of non-periodic encodings by grid cells of three-dimensional space (see discussion), present a parsimonious model that explains both 3D and 2D representations. They model grid fields as particles that repulse each other at short distances and attract at intermediate, the dynamics then pushes the particles towards lower energy states, and the optimal state is a dense packing. Matching neural observations, running these dynamics in 2D leads to dense packing hexagonal lattices, while in 3D it often leads to jammed sub-optimal solutions without global periodic structure.
A slightly related idea appears in Huber. In this memory model the classic roles of place and grid cells are reversed, place cells encode where a memory happens (a conjunction of a thing and a place) while grid cells encode the thing that is happening. Grid tuning curves are produced by arguing that the grid cell is encoding a variable that is uniform across space. The model then assumes that inputs that are nearby in space will be grouped into the same memory, while those beyond a critical distance will trigger a new memory. These dynamics lead to a hexagonal lattice receptive field, which can be understood via dense packing.
Despite the elegant simplicity of these approaches, simple functional questions remain non-obvious and key phenomena unexplained. Most pertinently for our current argument, no approach naturally incorporates the translational symmetry of a grid module: in Mok & Love or Huber it is not obvious why grid cells would code for a translated version of either the conceptual fit to data or a set of memories, while in Ginosar et al. some mechanism would be required to align these densely packing lattices across neurons. Similarly unclear is why there are modules with a discrete set of lengthscales or conjunctive grid cells. Finally, why we should think of grid cells as a measure of hippocampal fit, as a discretised version of a uniform variable, or as a set of repulsing particles, when more compelling narratives exist is unclear. Nonetheless, in conjunction with other ideas, dense packing does explain the choice of hexagonal lattice in many models (Stemmler, Mathis, and Herz; Dorrell et al.).
B Efficient Coding Metric Loss with Large Lengthscale Produces Place Cells

We optimise a metric encoding loss, eq. (3) with large σ and find the optimal representation is place cells, matching the correspondance with the similarity matching objective, section 3.1.
We use a periodic environment for convenience, hence the multiple patches observed correspond to parts of the same field.
Acknowledgements
We thank Ben Sorscher, Mikhail Khona, Rylan Schaeffer, Tim Behrens, and Peter Doohan for reading earlier drafts of this work, and especially highlight Charles Burns and Markus Pettersen for their detailed and helpful comments.
We thank the following funding sources: Gatsby Charitable Foundation (GAT3755; W.D.); Sir Henry Wellcome Post-doctoral Fellowship (222817/Z/21/Z; J.C.R.W); European Research Council Starting Grant (NARFB/101222868; J.C.R.W).
References
- Theoretical principles explain the structure of the insect head direction circuiteLife 13:e91533https://doi.org/10.7554/eLife.91533PubMedGoogle Scholar
- Towards a theory of early visual processingNeural computation 2:308–320https://doi.org/10.1162/neco.1990.2.3.308Google Scholar
- Vector-based navigation using grid-like representations in artificial agentsNature 557:429–433https://doi.org/10.1038/s41586-018-0102-6PubMedGoogle Scholar
- Experience-dependent rescaling of entorhinal gridsNature neuroscience 10:682–684https://doi.org/10.1038/nn1905PubMedGoogle Scholar
- The entorhinal cognitive map is attracted to goalsScience 363:1443–1447https://doi.org/10.1126/science.aav4837PubMedGoogle Scholar
- Accurate path integration in continuous attractor network models of grid cellsPLoS computational biology 5:e1000291https://doi.org/10.1371/journal.pcbi.1000291PubMedGoogle Scholar
- Grid cells and theta as oscillatory interference: theory and predictionsHippocampus 18:1157–1174https://doi.org/10.1002/hipo.20518PubMedGoogle Scholar
- An oscillatory interference model of grid cell firingHippocampus 17:801–812https://doi.org/10.1002/hipo.20327PubMedGoogle Scholar
- A hybrid oscillatory interference/continuous attractor network model of grid cell firingJournal of Neuroscience 34:5065–5079https://doi.org/10.1523/jneurosci.4017-13.2014PubMedGoogle Scholar
- Grid cells form a global representation of connected environmentsCurrent Biology 25:1176–1182https://doi.org/10.1016/j.cub.2015.02.037PubMedGoogle Scholar
- Unfolding the Black Box of Recurrent Neural Networks for Path IntegrationbioRxiv https://doi.org/10.1101/2025.10.25.684492Google Scholar
- Task-anchored grid cell firing is selectively associated with successful path integration-dependent behavioureLife 12:RP89356https://doi.org/10.7554/eLife.89356PubMedGoogle Scholar
- A controlled attractor network model of path integration in the ratJournal of computational neuroscience 18:183–203https://doi.org/10.1007/s10827-005-6558-zPubMedGoogle Scholar
- Emergence of grid-like representations by training recurrent neural networks to perform spatial localizationarXiv preprint arXiv:1803.07770 https://doi.org/10.48550/arxiv.1803.07770
- Higher-Order Spatial Information for Self-Supervised Place Cell LearningarXiv preprint arXiv:2407.06195
- Grid and nongrid cells in medial entorhinal cortex represent spatial location and environmental features with complementary coding schemesNeuron 94:83–92https://doi.org/10.1016/j.neuron.2017.03.004PubMedGoogle Scholar
- Extracting grid cell characteristics from place cell inputs using non-negative principal component analysiseLife 5:e10094https://doi.org/10.7554/eLife.10094PubMedGoogle Scholar
- Actionable Neural Representations: Grid Cells from Minimal ConstraintsThe Eleventh International Conference on Learning Representations https://openreview.net/forum?id=xfqDe72zh41
- Range, not Independence, Drives Modularity in Biologically Inspired RepresentationsThe Thirteenth International Conference on Learning Representations
- Multiscale representation of very large environments in the hippocampus of flying batsScience 372:eabg4020https://doi.org/10.1126/science.abg4020PubMedGoogle Scholar
- A spin glass model of path integration in rat medial entorhinal cortexJournal of Neuroscience 26:4266–4276https://doi.org/10.1523/jneurosci.4353-05.2006PubMedGoogle Scholar
- Learning grid cells as vector representation of self-position coupled with matrix representation of self-motionInternal Conference on Learning Representations
- On Path Integration of grid cells: isotropic metric, conformal embedding and group representationAdvances in neural information processing systems 34Google Scholar
- Number estimates of neuronal phenotypes in layer II of the medial entorhinal cortex of rat and mouseNeuroscience 170:156–165https://doi.org/10.1016/j.neuroscience.2010.06.048PubMedGoogle Scholar
- Impaired path integration in mice with disrupted grid cell firingNature neuroscience 21:81–91https://doi.org/10.1038/s41593-017-0039-3PubMedGoogle Scholar
- Locally ordered representation of 3D space in the entorhinal cortexNature 596:404–409https://doi.org/10.1038/s41586-021-03783-xPubMedGoogle Scholar
- Computation by oscillations: implications of experimental data for theoretical models of grid cellsHippocampus 18:1186–1199https://doi.org/10.1002/hipo.20501PubMedGoogle Scholar
- Irregular distribution of grid cell firing fields in rats exploring a 3D volumetric spaceNature neuroscience 24:1567–1573https://doi.org/10.1038/s41593-021-00907-4PubMedGoogle Scholar
- A model of grid cells based on a twisted torus topologyInternational journal of neural systems 17:231–240https://doi.org/10.1142/s0129065707001093PubMedGoogle Scholar
- Tiling of large-scaled environments by grid cells requires experiencebioRxiv https://doi.org/10.1101/2025.02.16.638536PubMedGoogle Scholar
- Hippocampus-independent phase precession in entorhinal grid cellsNature 453:1248–1252https://doi.org/10.1038/nature06957PubMedGoogle Scholar
- Microstructure of a spatial map in the entorhinal cortexNature 436:801–806https://doi.org/10.1038/nature03721PubMedGoogle Scholar
- Grid-cell distortion along geometric bordersCurrent Biology 29:1047–1054https://doi.org/10.1016/j.cub.2019.01.074PubMedGoogle Scholar
- Dorsal CA1 hippocampal place cells form a multi-scale representation of megaspaceCurrent Biology 31:2178–2190https://doi.org/10.1016/j.cub.2021.03.003PubMedGoogle Scholar
- Grid cell mechanisms and function: contributions of entorhinal persistent spiking and phase resettingHippocampus 18:1213–1229https://doi.org/10.1002/hipo.20512PubMedGoogle Scholar
- A memory model of rodent spatial navigation in which place cells are memories arranged in a grid and grid cells are non-spatialeLife 13:RP95733https://doi.org/10.7554/eLife.95733PubMedGoogle Scholar
- Mechanisms underlying the neural computation of head directionAnnual review of neuroscience 43:31–54https://doi.org/10.1146/annurev-neuro-072116-031516PubMedGoogle Scholar
- Universal conditions for exact path integration in neural systemsProceedings of the National Academy of Sciences 109:6716–6720https://doi.org/10.1073/pnas.1119880109PubMedGoogle Scholar
- Path integration maintains spatial periodicity of grid cell firing in a 1D circular trackNature communications 10:840https://doi.org/10.1038/s41467-019-08795-wPubMedGoogle Scholar
- Disrupted place cell remapping and impaired grid cells in a knockin model of Alzheimer’s diseaseNeuron 107:1095–1112https://doi.org/10.1016/j.neuron.2020.06.023PubMedGoogle Scholar
- A geometric attractor mechanism for self-organization of entorhinal grid moduleseLife 8:e46687https://doi.org/10.7554/eLife.46687PubMedGoogle Scholar
- Spatial uncertainty and environmental geometry in navigationbioRxiv https://doi.org/10.1101/2023.01.30.526278PubMedGoogle Scholar
- Global modules robustly emerge from local interactions and smooth gradientsNature :1–10https://doi.org/10.1038/s41586-024-08541-3PubMedGoogle Scholar
- Ring attractor dynamics in the Drosophila central brainScience 356:849–853https://doi.org/10.25378/janelia.5648314.v1Google Scholar
- Efficient and flexible representation of higher-dimensional cognitive variables with grid cellsPLoS computational biology 16:e1007796https://doi.org/10.1371/journal.pcbi.1007796PubMedGoogle Scholar
- Grid cell symmetry is shaped by environmental geometryNature 518:232–235https://doi.org/10.1038/nature14153PubMedGoogle Scholar
- Linear look-ahead in conjunctive cells: an entorhinal mechanism for vector-based navigationFrontiers in neural circuits 6:20https://doi.org/10.3389/fncir.2012.00020PubMedGoogle Scholar
- Reduced grid-cell-like representations in adults at genetic risk for Alzheimer’s diseaseScience 350:430–433https://doi.org/10.1126/science.aac8128PubMedGoogle Scholar
- A simple coding procedure enhances a neuron’s information capacityZeitschrift für Naturforschung c 36:910–912PubMedGoogle Scholar
- Thalamic control of cortical dynamics in a model of flexible motor sequencingCell reports 35https://doi.org/10.1016/j.celrep.2021.109090PubMedGoogle Scholar
- Functional independence of entorhinal grid cell modules enables remapping in hippocampal place cellsbioRxiv https://doi.org/10.1101/2025.09.24.677985Google Scholar
- Towards a theory for the emergence of grid and place cell codesPhD thesis Massachusetts Institute of Technology Google Scholar
- Optimal population codes for space: grid cells outperform place cellsNeural computation 24:2280–2317https://doi.org/10.1162/neco_a_00319PubMedGoogle Scholar
- Multiscale codes in the nervous system: the problem of noise correlations and the ambiguity of periodic scalesPhysical Review E—Statistical, Nonlinear, and Soft Matter Physics 88:022713https://doi.org/10.1103/physreve.88.022713PubMedGoogle Scholar
- Resolution of nested neuronal representations can be exponential in the number of neuronsPhysical review letters 109:018103https://doi.org/10.1103/physrevlett.109.018103PubMedGoogle Scholar
- Path integration and the neural basis of the 'cognitive map'Nature Reviews Neuroscience 7:663–678https://doi.org/10.1038/nrn1932PubMedGoogle Scholar
- A non-spatial account of place and grid cells based on clustering models of concept learningNature communications 10:5685https://doi.org/10.1038/s41467-019-13760-8PubMedGoogle Scholar
- An efficient coding theory for a dynamic trajectory predicts non-uniform allocation of entorhinal grid cells to modulesPLoS computational biology 13:e1005597https://doi.org/10.1371/journal.pcbi.1005597PubMedGoogle Scholar
- Emergent elasticity in the neural code for spaceProceedings of the National Academy of Sciences 115:E11798–E11806https://doi.org/10.1073/pnas.1805959115PubMedGoogle Scholar
- Emergence of simple-cell receptive field properties by learning a sparse code for natural imagesNature 381:607–609https://doi.org/10.1038/381607a0PubMedGoogle Scholar
- Feedback inhibition enables theta-nested gamma oscillations and grid firing fieldsNeuron 77:141–154https://doi.org/10.1016/j.neuron.2012.11.032PubMedGoogle Scholar
- Self-supervised grid cells without path integrationbioRxiv https://doi.org/10.1101/2024.05.30.596577Google Scholar
- Spatial periodicity in grid cell firing is explained by a neural sequence code of 2-D trajectorieseLife 13:RP96627https://doi.org/10.7554/eLife.96627PubMedGoogle Scholar
- A coupled attractor model of the rodent head direction systemNetwork: computation in neural systems 7:671Google Scholar
- Grid cells in rat entorhinal cortex encode physical space with independent firing fields and phase precession at the single-trial levelProceedings of the National Academy of Sciences 109:6301–6306https://doi.org/10.1073/pnas.1109599109PubMedGoogle Scholar
- Large environments reveal the statistical structure governing hippocampal representationsScience 345:814–817https://doi.org/10.1126/science.1255635PubMedGoogle Scholar
- Path integration and cognitive mapping in a continuous attractor neural network modelJournal of Neuroscience 17:5900–5920https://doi.org/10.1523/jneurosci.17-15-05900.1997PubMedGoogle Scholar
- Conjunctive representation of position, direction, and velocity in entorhinal cortexScience 312:758–762https://doi.org/10.1126/science.1125572PubMedGoogle Scholar
- No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal CircuitICML 2022 Workshop AI4Science
- Disentangling Fact from Grid Cell Fiction in Trained Deep Path IntegratorsArXiv :arXiv-2312https://doi.org/10.48550/arxiv.2312.03954PubMedGoogle Scholar
- Self-supervised learning of representations for space generates multi-modular grid cellsAdvances in Neural Information Processing Systems 36:23140–23157Google Scholar
- Testing assumptions underlying a unified theory for the origin of grid cellsarXiv preprint arXiv:2311.16295 https://doi.org/10.48550/arxiv.2311.16295
- Coherently remapping toroidal cells but not grid cells are responsible for path integration in virtual agentsIscience 26https://doi.org/10.1016/j.isci.2023.108102PubMedGoogle Scholar
- Hexagons all the way down: Grid cells as a conformal isometric map of spacePLOS Computational Biology 21:e1012804https://doi.org/10.1371/journal.pcbi.1012804PubMedGoogle Scholar
- Manifold-tiling localized receptive fields are optimal in similarity-preserving neural networksAdvances in neural information processing systems 31Google Scholar
- A model of the neural basis of the rat’s sense of directionAdvances in neural information processing systems 7Google Scholar
- A unified theory for the computational and mechanistic origins of grid cellsNeuron 111:121–137https://doi.org/10.1016/j.neuron.2022.10.003PubMedGoogle Scholar
- A unified theory for the origin of grid cells through the lens of pattern formationAdvances in neural information processing systems 32Google Scholar
- When and why grid cells appear or not in trained path integratorsbioRxiv https://doi.org/10.1101/2022.11.14.516537Google Scholar
- Grid cells generate an analog error-correcting code for singularly precise neural computationNature neuroscience 14:1330–1337https://doi.org/10.1038/nn.2901PubMedGoogle Scholar
- The hippocampus as a predictive mapNature neuroscience 20:1643–1653https://doi.org/10.1038/nn.4650PubMedGoogle Scholar
- Spatial memory in the rat requires the dorsolateral band of the entorhinal cortexNeuron 45:301–313https://doi.org/10.1016/j.neuron.2004.12.044PubMedGoogle Scholar
- Connecting multiple spatial scales to decode the population activity of grid cellsScience Advances 1:e1500816https://doi.org/10.1126/science.1500816PubMedGoogle Scholar
- The entorhinal grid map is discretizedNature 492:72–78https://doi.org/10.1038/nature11649PubMedGoogle Scholar
- Shearing-induced asymmetry in entorhinal grid cellsNature 518:207–212https://doi.org/10.1038/nature14151PubMedGoogle Scholar
- High-dimensional geometry of population responses in visual cortexNature 571:361–365https://doi.org/10.1038/s41586-019-1346-5PubMedGoogle Scholar
- Learning grid cells by predictive codingarXiv preprint arXiv:2410.01022 https://doi.org/10.48550/arxiv.2410.01022
- Cognitive maps in rats and menPsychological review 55https://doi.org/10.1037/h0061626PubMedGoogle Scholar
- Theory of rodent navigation based on interacting representations of spaceHippocampus 6:247–270https://doi.org/10.1002/(sici)1098-1063(1996)6:3<247::aid-hipo4>3.0.co;2-kPubMedGoogle Scholar
- Optimal configurations of spatial scale for grid cell firing under noise and uncertaintyPhilosophical Transactions of the Royal Society B: Biological Sciences 369:20130290https://doi.org/10.1098/rstb.2013.0290PubMedGoogle Scholar
- Distinct roles of medial and lateral entorhinal cortex in spatial cognitionCerebral Cortex 23:451–459https://doi.org/10.1093/cercor/bhs033PubMedGoogle Scholar
- Left-right-alternating theta sweeps in entorhinal-hippocampal maps of spaceNature :1–11https://doi.org/10.1038/s41586-024-08527-1PubMedGoogle Scholar
- Transition scale-spaces: A computational theory for the discretized entorhinal cortexNeural computation 32:330–394https://doi.org/10.1162/neco_a_01255PubMedGoogle Scholar
- A principle of economy predicts the functional architecture of grid cellseLife 4:e08362https://doi.org/10.7554/eLife.08362PubMedGoogle Scholar
- One-shot entorhinal maps enable flexible navigation in novel environmentsNature 635:943–950https://doi.org/10.1038/s41586-024-08034-3PubMedGoogle Scholar
- Generalisation of structural knowledge in the hippocampal-entorhinal systemAdvances in neural information processing systems 31https://doi.org/10.48550/arxiv.1805.09042Google Scholar
- How much neuroscience does a neuroscientist need to know?arXiv preprint arXiv:2601.02063 https://doi.org/10.48550/arxiv.2601.02063
- Disentanglement with biological constraints: A theory of functional cell typesThe Eleventh International Conference on Learning Representations
- The Tolman-Eichenbaum machine: unifying space and relational memory through generalization in the hippocampal formationCell 183:1249–1263https://doi.org/10.1016/j.cell.2020.10.024PubMedGoogle Scholar
- On conformal isometry of grid cells: Learning distance-preserving position embeddingThe Thirteenth International Conference on Learning Representations
- A Theory of Usable Information under Computational ConstraintsInternational Conference on Learning Representations https://doi.org/10.48550/arxiv.2002.10689
- Disruption of the grid cell network in a mouse model of early Alzheimer’s diseaseNature Communications 13:886https://doi.org/10.1038/s41467-022-28551-xPubMedGoogle Scholar
- Grid cell responses in 1D environments assessed as slices through a 2D latticeNeuron 89:1086–1099https://doi.org/10.1016/j.neuron.2016.01.039PubMedGoogle Scholar
- Prediction and Generalisation over Directed Actions by Grid CellsarXiv preprint arXiv:2006.03355 https://doi.org/10.48550/arxiv.2006.03355
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Cite all versions
You can cite all versions using the DOI https://doi.org/10.7554/eLife.111058. This DOI represents all versions, and will always resolve to the latest one.
Copyright
© 2026, William Dorrell & James Whittington
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 0
- downloads
- 0
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.