1 Introduction

Being able to accurately determine your location in an environment is an essential skill shared by any navigating system, both animal and machine. Hippocampal place cells [1] are believed to be crucial for this ability in animals. Place cells get their name from their distinct spatial tuning: A single place cell only tends to fire in select locations within a given recording environment [2], [3].

When an animal is moved between different recording arenas, or a familiar environment is significantly manipulated, place cells can undergo global remapping [4], wherein spatial responses are uncorrelated across environments. For less severe changes to the environment (e.g., mild changes in smell or color), place cells can also exhibit less drastic tuning curve changes in the form of partial [5], rate [6] and orientation [7] remapping. Furthermore, geometric modifications of a recording environment elicit distinct place field changes. For example, elongating an environment induces field elongation [8]. Adding a novel wall to a familiar environment may spur so-called field doubling [9], where a second place field emerges, situated at the same distance from the new wall as the field used to be to from original.

Since the discovery of place cells, a range of other neuron types with navigational behavior correlates have been discovered experimentally. These include head direction cells [10], grid cells [11], border cells [12], [13], band cells [14] and object vector cells [15]. Some of these spatial cells can also exhibit changes in their firing profile when an animal is moved between different recording arenas or a familiar environment is sufficiently manipulated [10], [16].

How does the orchestra of spatial cell types observed in the brain cooperate to do navigation? One popular theory posits that spatial cells collectively set up cognitive maps of the animal’s surroundings [17]–[19]. In the past, the term cognitive map has been used colloquially, referring to everything from a neural representation of geometry to charts of social relationships [17], [19]– [22]. In this work, we make the intuitive notion of a spatial cognitive map precise by proposing a mathematical definition of it. As we will show, this definition serves as a foundation for developing models of spatial cell types and can be used to describe several normative models in the literature.

A range of models have already been proposed in an attempt to explain the striking spatial tuning and remapping behaviors exhibited by place cells. One prominent theory holds that place cell activity results from upstream input from grid cells in the medial Entorhinal Cortex (mEC) [5], [23], [24]. However, there are several experimental findings that challenge this so-called forward theory. For instance, place cells tend to mature prior to grid cells in rodent development [25], [26]. Also, place cell inactivation has been associated with abolished grid cell activity, rather than the other way around [27]. Another approach is to suggest that non-grid spatial cells are responsible [9], [27], [28]. However, the exact origins of place fields and their remapping behavior remain undetermined.

How, then, would one go about modeling place cells in a way that allows for discovering how place fields emerge, how remapping occurs, and how different cell types relate? An exciting recent alternative is to leverage normative models of the Hippocampus and surrounding regions, using neural networks optimized for a navigation task. When trained, such models learn tuning profiles similar to their biological counterparts [22], [29]–[35]. To the best of our knowledge, however, no normative models have tackled the problem of directly learning place cell formation and remapping. Only some address remapping, but do so for other cell types or brain regions [22], [35]–[37].

Using our definition of a cognitive map, we therefore propose a normative model of spatial navigation with the flexibility required to study place cells and remapping in one framework. In our model, the output representations of a neural network are decoded into a position estimate. Simultaneously, the network is tasked with accurate position reconstruction while path integrating. Crucially, the non-trainable decoding operation is inspired by the localized firing patterns of place cells, but with minimal constraints on their individual tuning profile, and their population coding properties.

We find that our model learns representations with spatial tuning profiles similar to those found in the mammalian brain, including place units in the downstream output layer and predominantly border-tuned units in the upstream recurrent layer. We thus find that border representations are the main spatially tuned basis for forming place cell representations, aligning with previous mechanistic theories of place cell formation from border cells [9].

Interestingly, our model does not learn grid-like representations despite being able to path integrate. Thus, our work raises questions about the necessity of grid cells for path integration. However, we find that the centers of the learned place fields arrange on a hexagonal lattice in open arenas. This indicates that although grid-like cells are not necessary to form place cells, optimal position decoding still dictates hexagonal symmetry. Inspired by this, we decode center locations for CA1 place fields in mice (data provided by [38]), and find that biological place cells exhibit clustering, in a similar manner as our model.

We train our model in multiple environments and observe that the network learns global, rate and geometric remapping akin to biological place cells. We find that remapping in the place-like units of the network can be understood as a consequence of sparse input from near-independent sets of upstream, rate-remapping boundary-tuned units. Thus, we show that border cell input can explain not only place field formation, but also remapping.

2 Results

2.1 Decoding the Cognitive Map

The foundation for the proposed place cell model is a learned cognitive map of space. We define a spatial cognitive map as a (vector-valued) function û N that minimizes

where u(xt) M is some target spatial representation at a true location xt (e.g. 2) at a particular time t, while û is the learned representation, constrained according to some conditions R. Lastly, zt is a latent position estimate corresponding to xt. In our case, we consider a recurrently connected neural network architecture navigating along simulated trajectories. As such, zt can be thought of as the network’s (internal) position estimate at a particular trajectory step, formed by integrating earlier locations and velocities. For details, refer to Model & Objective.

Each entry in û can be viewed as the firing rate of a unit in an ensemble of N simulated neurons. On the other hand, u is an alternative representation of the space that we wish to represent. In machine learning terms, is the loss function, while R is a regularization term. In our case, we want to gauge the similarity between the learned and target representations, and for R to impose biological constraints on the learned û.

The target representation u does not need to be of the same dimensionality as û, or even particularly biologically plausible. This is evident in several prominent models in the literature [29]–[31], [39] which can be accomodated by the proposed definition in Eq. (1) (see A Taxonomy of Cognitive Maps for complete descriptions). As an example, Cueva et al. trained a recurrent neural network (RNN) to minimize the mean squared error between a Cartesian coordinate target representation and a predicted coordinate decoded from the neural network [29]. Remarkably, by adding biologically plausible constraints (including “energy” constraints and noise injection) to this network, the authors found that the learned representations resembled biological (albeit square) grid cells.

As the goal of this work is to arrive at a model of place cells, we will denote the learned representation as p. We take p to be produced by a neural network with non-negative firing rates, whose architecture is illustrated in Fig. 1a). Specifically, the network features recurrently connected units (with states g) that project onto output units (with states p), in loose analogy to the connectivity of the Entorhinal Cortex and CA1 subfield of the Hippocampus [4], [40].

The model and task.

a) Overview of the decoding approach: Given a simulated trajectory with coordinates x, the output states of the network are decoded in terms of their spatial center locations µ, which in turn are used to decode an estimate of the current location . The network is trained to minimize the squared difference between true and decoded positions. b) Illustration of the proposed decoding procedure. For a single unit, the center location is estimated as the average location, weighted by the unit activity along a trajectory. By iterating this procedure, every unit can be assigned a center location. A location can then be estimated as the average center location, weighted by the activity of the corresponding unit at a particular time. Repeating this for every timestep, full trajectories can be reconstructed. c) The investigated geometries, each with an example simulated trajectory. Each environment is labelled by its context signal (one-hot vector). d) Illustration of the network architecture and inputs. g features recurrently connected units, while p receives densely connected feedforward input from g. When moved between environments, the state of the RNN is maintained (gprev). The input v denotes Cartesian velocities along simulated trajectories, while c is a constant (in time and space) context signal.

We constrain the “energy” of the learned representation by imposing an L1 penalty on its magnitude, use Cartesian coordinates as our target representation, and the mean squared error as our loss. In other words,

where xt is a true Cartesian coordinate, | · | p denotes the p-norm and λ is a regularization hyperparameter. Crucially, however, Cartesian coordinates are not predicted directly by the network, but are decoded from the population activity of the output layer. This decoder is non-trainable and inspired by the localized firing profile of place cells. Concretely, we form a position estimate directly from the population activity, according to

where ε is a small constant to prevent zero-division, while

is the estimated center of a given output unit. Note that the decoding essentially just consists of two soft maximum operations: Eq. (4) estimates the location of a cell’s maximal activity and Eq. (3) yields a predicted position using a weighted average (i.e. an approximate center of mass) of unit activity and their corresponding center locations.

Intuitively, if one cell is highly active at a particular location, its center location will be pulled toward that position. If the centers of the entire ensemble can be established, a position estimate can then be formed as a weighted (by firing rate) sum of the ensemble activity. If multiple units in a particular region of space are co-active, the position estimate is pulled towards the (weighted) average of their center locations. This approach allows us to extract a position estimate directly from the neural ensemble without knowing the shape or firing characteristics of a given unit.

Fig. 1a) provides a high-level overview of the proposed decoding scheme and the network explored in this work, and Fig. 1b) provides a more detailed account of how output unit activity is decoded to estimate the network’s position.

The network is tasked with minimizing Eq. (2) while simultaneously path integrating (see 4.1 for details) along simulated trajectories in six distinct environments. Each environment, along with example trajectories is shown in Fig. 1c). To discriminate between environments the network is also provided with a constant context signal that uniquely identifies the geometry. An overview of the network architecture and inputs is given in Fig. 1d), and each context signal in inset in Fig. 1c).

2.2 Learned Representations and Remapping

With a model in place, we proceed by investigating the learned representations and behaviors of the trained network. Fig. 2a) shows the evolution of the decoding error (the average Euclidean distance between true and predicted trajectories) as a function of training time for the RNN. The validation set error closely trails the training error, and appears to converge around 0.15. The error is computed as an average over all six environments and over full trajectories (time). Thus, the decoding error includes initial timesteps, where the network has no positional information. Disentangled errors for each environment and along trajectories (time) can be seen in supplementary Fig. A1, showing how different environments have different error profiles, and how errors decrease after some initial exploration. This can also be seen in Fig. 2b), which showcases a slice of a true and corresponding predicted trajectory for timesteps 250 to 400 in the square environment.

Trained network performance and representations.

a) Euclidean distance (error) between target and reconstructed trajectories over training time. Shown is the error for both training and validation datasets. b) A slice (timesteps 250 to 400) of a decoded trajectory (dashed, red) and the corresponding target trajectory (black). c) Ratemaps for the 16 output units with the largest mean activity, in the square environment. d) Same as c), but for recurrent units.

Having established that network predictions align with target trajectories (confirming that the network has learned to path integrate), we consider the learned representations of the network. Fig. 2c) displays ratemaps of the 16 most active output units in the square environment. Notably, the responses of these units resemble biological place cell tuning curves. The learned place fields appear unimodal, isotropic and with a monotonically decaying firing rate from its center, much like a Gaussian profile. However, some units display more irregular fields, especially near boundaries and corners. Responses of the most active recurrent units resemble biological border cells (Fig. 2d)).

For both the output and recurrent layers, a large fraction of units are silent, or display no clear spatial tuning (see Fig. A2 and A3 for ratemaps in all environments). For example, in the square arena, approximately half of all output units are silent.

Interestingly, when comparing network spatial responses across environments, units display changes in their tuning curve. This effect can be clearly observed in unit ratemaps shown in Fig. 3a). In the numerical experiment, the trained network is first run in the square environment (A), before being moved to the square with a central wall (B), and subsequently returned to the original square (A’). Visually, many output units exhibit marked shifts in their preferred firing location when the network is moved between contexts (i.e. transfers A to B or B to A’). However, returning to the original context appears to cause fields to revert to their original preferred firing locations. Besides firing location modifications, units also exhibit distinct rate changes.

Comparing representations across environments.

a) Top: The network is run in a familiar square environment (A), transferred to the square with a central wall (B) and revisits the original square (A’). The network state persists between environments, and starting locations are randomly drawn. Bottom: i) Ratemaps for a subset of recurrent units (g) with largest minimum mean rate across arenas. Rows represent unit activity, with max rate inset on the right. ii) Same as i), for output units. b) Distribution of spatial correlations comparing ratemaps from active units across similar contexts (A, A’) and distinct contexts (A, B). Shuffled distributions are randomly paired units across environments. The dashed red line indicates the 95th percentile of the shuffled distribution. c) Distribution of rate overlaps for all units with non-zero activity in any environment. d) Distribution of rate differences. e) Ratemap population vector correlations for units with nonzero activity at every timestep for transitions (timestep 500) from A to B.

Besides output units, recurrently connected units also display remapping behaviors when the network is moved between environments. As shown in the unit ratemaps of Fig. 3a) i), boundary units also exhibit rate changes. In particular, several units are silenced when moving between conditions. However, none of the included units exhibit changes in their preferred firing location. Thus, recurrent units appear to remap mainly through pronounced rate changes.

These observations are supported by multiple analyses (Fig. 3b-e)). In particular, the distribution of output unit spatial correlations across different environments (A and B) matches that expected from a shuffled distribution. Conversely, correlations comparing different visits of the same environment (A and A’) are different from a shuffled distribution (Fig. 3b)). This behavior is consistent with global remapping behavior [4], [6]. Notably, the network’s remapping occurs with fixed weights (i.e., after training).

Rate overlaps (Fig. 3c)) display similar distributional properties: Comparing across environments yields rate overlaps resembling those from a shuffled distribution, and comparing similar environments yields higher rate overlaps.

Rate differences (Fig. 3d)) also follow the same trend. In this case, the difference in rates between A and A’ are chiefly zero-centered and approximately symmetric, suggesting that there are only small rate changes when revisiting an environment. The rate difference between environments (and between shuffled units), is also roughly symmetric. However, in this case the distribution is bimodal with peaks corresponding to maximally different rates. Thus, a large number of output units are active in only one environment. Again, the distribution of differences between distinct contexts closely trails a shuffled distribution.

As shown in Fig. 3e), ratemap population vector correlations mirror the transition between environments, both for recurrent units and output units. Included are correlations for the transition from A to B (at timestep 500). Notably, there is a sharp drop-off in correlation at the transfer time, demonstrating that population vectors are uncorrelated between environments for both unit types. Conversely, ratemaps are highly correlated within an environment. However, there is a time delay before maximum correlation is achieved.

Together, these findings show that the model learns place- and border like spatial representations. Moreover, output units exhibit global remapping between contexts, whereas recurrent units mainly rate remap.

2.3 Effects of Geometric Manipulations

In addition to remapping between different contexts, we show that manipulating familiar geometries induces distinct representational changes. In particular, Fig. 4a) shows how unit ratemaps respond as the familiar square environment is elongated horizontally. Intriguingly, the learned place-like fields of the output units appear to expand with the arena. For sufficient elongation, responses even appear to split, with an additional, albeit weaker firing field emerging ( e.g. lower right output unit). Elongation behavior has also been observed in biological place cells in similar experiments [8].

Effects of geometric manipulations on learned representations while maintaining the original context signal.

a) Ratemaps of 9 recurrent (g) and output units (p) during horizontal elongation of a familiar square context. The top inset indicates the geometry and context signal (A), as well as manipulation of the environment (horizontal stretch by factors of 2 and 3). b) Similar to a), but the geometric manipulation consists of filling in the central hole of the familiar context (square with central hole, context B). c) Similar to a), but for joint horizontal and vertical elongation. d) Similar to c), but for uniform expansion of a familiar circular environment (C).

Expanding the square also elicits a distinct response in recurrent units: Unit firing fields extend to the newly accessible region, while maintaining their affinity for boundaries. A similar effect can be observed in Fig. 4b), where the central hole is removed from a familiar geometry. In this case, both recurrent and output units perform field completion by extending existing firing fields to previously unseen regions. This shows that the network is capable of generalizing to never-before seen regions of space.

Finally, we also considered the effects of expanding environments in a symmetric fashion. Included are results for the familiar square (Fig. 4c)) and circular (Fig. 4d)) environments. Unlike the single-axis expansion in Fig. 4a), network representations expand symmetrically in response to uniform expansion. However, some output units display distinct field doubling (see both Fig. 4c), bottom right, and Fig. 4d), middle row). For large expansions (3x), output responses become more irregular. However, in the square environment, there are still visible subpeaks within unit ratemaps. Also notable is the fact that some output units reflect their main boundary input (with greater activity near one boundary). Recurrent units, on the other hand, largely maintain their firing profile. In the circular environment, some output units display an almost center surround-like profile (e.g. middle row, two rightmost units). This peculiar tuning pattern is an experimentally testable prediction of our model.

2.4 Contexts are Attractive

We have demonstrated that the RNN exhibits signs of global remapping between different familiar contexts, and field changes when a familiar geometry is altered. In this section, we further explore the behavior of the network when perturbing its internal states out of its normal operating range. Finally, we also discover possible mechanisms supporting the network’s remapping ability.

The first analysis consists of injecting noise into the recurrent state of the network, to determine whether it exhibits attractor-like behavior. Fig. 5 shows resulting ratemap population vector correlations for an 800-step trajectories in the square context, when noise is injected at the midpoint of the sequence. When no noise is injected (σ = 0), both recurrent units (Fig. 5a)) and output units (Fig. 5b)) quickly settle into a stable, high correlation state. Unit ratemaps reveal that this state corresponds to network units firing at their preferred locations.

Effects of noise injection during navigation.

a) Ratemap population vector Pearson correlation between timepoints of 800-step trajectories in the square environment. At timestep 400, additive Gaussian noise (with standard deviation σ) is injected into the recurrent state (g). The top row shows correlations for different noise levels (σ = 0, 0.01, 0.1, and 1.0). The bottom row features ratemaps of the four units with largest mean activity, at different timepoints. Ratemaps are shown for σ = 0 and σ = 1.0. b) Same as a), but for output units (p).

When noise is injected, ratemap correlations temporarily decrease, before the network settles back into a steady-state configuration. Importantly, states before, and long after noise injection are highly correlated. We observe that this is the case even for large amounts of injected noise, as can be seen from unit ratemaps for σ = 1.0. We also observe that the time required to reach a steady state increases in proportion to the amount of noise injected. Thus, even though the network was trained without noise, it appears robust even to large perturbations. This suggests that the learned solutions form an approximate attractor.

To further explore the network’s behavior, we applied dimensionality reduction techniques to network states along a single trajectory visiting all geometries (and contexts).

Remarkably, we find that a low-dimensional projection of the recurrent state captures the shape of each traversed environment. The top row of Fig. 6a) showcases a 3D projection of the recurrent state, where each point is color coded by the visited environment. Besides reflecting the shape of the environment, the low-dimensional projection also showcases transitions between environments. For output units (bottom row of Fig. 6a)), the low-dimensional projection consists of intersecting hyperplanes, that appear to maintain some of the structure of the original geometry. For example, states produced in the square with a central hole, appears to maintain a central void in the low-dimensional projection. The difference between recurrent and output states may reflect the pronounced sparsity of the recurrent layer, as well as the observed reuse of output units during remapping. In other words, a large number of recurrent units are mutually silent across environments, which could make for easily separable states. In contrast, a larger fraction of output units are used, and reused, across environments leading to entangled and less separable states.

Low-dimensional behavior of the trained recurrent network.

a) Low-dimesional UMAP projection of the recurrent (top) and output unit (bottom) activity for a trajectory visiting all six environments. The color of a point in the cloud indicates the corresponding environment b) Fractional and cumulative explained variance using PCA for recurrent units for each environment. c) similar to b) but for output units. (color scheme as in a)). d) Eigenvalue spectrum of the recurrent weight matrix. The unit circle (gray) is inset for reference. e) Jitter plots of context weights corresponding to each environment. For every environment, the weight to each recurrent unit is indicated. f) Pearson correlation between context weights corresponding to different environments.

Using PCA, we find that the recurrent states of the network within an environment can be well described using just a few principal components (4 principal components explains >90 % of the variance). For reference, Fig. 6b) showcases the fraction of explained variance, as well as the cumulative variance of the recurrent state. However, the same is not true for the full trajectory visiting all environments (requiring around 20 principal components to achieve a similar amount of explained variance). This hints that the multi-environment representation can be factorized into several independent, low-dimensional representations, possibly one for each environment.

A similar trend is evident for output unit responses (shown in Fig. 6c)). However, in this case, a larger number of units is needed to explain a substantial fraction of the state variance for each environment (> 25 for approximately 70-90 % explained variance) with noticeable differences between environments. Also, almost all (> 75, out of a 100) principal components are required to account for the full output state across environments. It thus appears that more, independent units are active within a given environment, and that all 100 units are involved in encoding the full set of environments.

To begin exploring possible mechanisms supporting remapping, and the apparent independence of network states across environments, we investigated the weights of the recurrent layer. Fig. 6d) shows the eigenvalues of the recurrent weight matrix. It has several eigenvalues with above-unit magnitude. In other words, the RNN is potentially unstable. However, as shown in Fig. A1, the network exhibits stable decoding errors, even for very long sequences. Moreover, we know from Fig. 5 that the network is stable in the face of transient noise injection. One possibility is that large eigenvalues are associated with remapping where unstable eigenvectors are used to transition between discrete attractors representing distinct environments.

How then, does the network switch between representations? While a complete description depends on the non-linear behavior of the full RNN, we observe a relationship within the context input weights that could shed light on the behavior in the network. Concretely, we find that a large proportion of context weights are negative (Fig. 6e)), and the rows of this matrix are largely uncorrelated (Fig. 6f)). Thus, the context signal (which is non-negative) could inhibit independent sets of units, leading to sparse and orthogonal recurrent units across environments through rate changes.

2.5 Distribution of Learned Centers

Experimentally, place fields appear be irregularly distributed throughout large environments with a small increase in the number of fields near boundaries [41]. Place field phases have also been shown to correlate with the peak firing locations of grid cells [22]. We therefore explore whether there is structure to the spatial arrangement of the model’s learned place fields.

Fig. 7a) shows the arrangement of decoded centers for all units, collected over 100 long-sequence trajectories, in each environment. In other words, for each cell in the population, their centers are decoded 100 times, one for each trajectory. Surprisingly, we find that the decoded centers tend to reside on the vertices of a semi-hexagonal grid, especially in larger symmetrical geometries. This effect is especially evident in the square and large square environments. However, in all environments, this grid structure exhibits distortions, and in case of an anisotropic environment (the rectangle), the grid is clearly elongated along the horizontal axis. Our findings accord with the notion that place fields are likely to reside on the vertices of a hexagonal grid [22]. However, our model does not feature any grid-like units.

a) All center locations for each unit in every geometry, decoded from 100, 30000-timestep trajectories for units with high spatial information. b) Center locations and marginal distributions of centers in each environment, for active units along a single trajectory. c) Displacement of centers between environments for units with high spatial information. Every unit is color coded by its spatial location in the environment on the diagonal. For each row, the distribution of the included units are shown in every other environment. d) Same as c), but for all units. e) Experimental CA1 place field centers decoded from ratemaps for a mouse foraging in a square 75×75 cm environment (left) and corresponding kernel density estimate (right). f) Ripley’s H for the field centers in e) and random (uniform) distributions on the same 15×15 grid as in e). The shaded region indicates two standard deviations for 100 random samplings of the grid.

In Fig. 7b) we display an example decoding of centers along with their one-dimensional marginal distributions. We find that the centers seem to cluster somewhat along the borders of the environment, similar to experimental observations in [41]. Unlike the aggregate over multiple trajectories, the single-trajectory decoding does not reveal an equally pronounced hexagonal arrangement.

Besides exhibiting a striking hexagonal arrangement within an environment, we also observe that there is no apparent pattern to the transformation between environments. This once again supports the finding that units undergo global-type remapping between environments (see Fig. 7c-d)) where color coding is relative to position in environment along diagonal).

To investigate whether fields in biological place cells display center clustering similar to our model, we decoded the field centers of 225 CA1 place cells (data provided by [38]), see Ripley’s H & Clustering for details. Figure 7e) shows the distribution of place field centers, and a corresponding kernel density estimate. We can see that field centers cluster near boundaries. Moreover, there appears to be a tendency for the clusters to arrange in a hexagonal fashion, similar to our computational findings.

To further quantify the regularity in the spatial arrangement of field centers we considered Ripley’s H function. Figure 7f) shows Ripley’s H for the field centers, as well as a random baseline sampled on a 15×15 grid matching the experimental ratemap resolution. We find that Ripley’s H is larger for the experimental data than for the random uniform samples. This indicates that the place field centers cluster more than expected (outside two standard deviations) for uniform sampling. The clustering is stronger at small distances and intermediate ones (0-5 cm and around 7-12 cm). We also observed similar clustering for other animals, but none exhibited a similarly pronounced spatial arrangement in the kernel density estimate (see Experimental Phase Distributions for more).

3 Discussion & Conclusion

In this work, we have proposed a neural network model that forms place-like spatial representations by decoding learned cognitive maps. Remarkably, the trained network exhibits a range of behaviors previously reserved for biological place cells, including global remapping across environments and field deformation during geometric manipulations of familiar arenas. Besides reproducing existing place cell experiments, our model makes some surprising predictions.

Our first prediction is that border-type input is sufficient to explain not only place field formation, but also place cell global remapping. While a strong relationship between border cells and place cells has been argued previously [9], [28], possible influences on Hippocampal remapping remain relatively unexplored. In our model, we find that place cell remapping arises as a result of sparse input from independent cell assemblies, enacted through strong boundary cell rate changes. Current experimental evidence suggests that border cells largely maintain their firing rate during conditions that elicit place cell remapping [12], [13]. However, we find that the border code is highly sparse, and so only a small number of such rate-remapping, boundary-type cells would actually be required. Thus, investigating whether border cells can display rate changes could be an interesting avenue for future research.

While it could be that border cells in the brain do not (rate) remap, a border-to-place model could still be viable through alternate pathways, such as via gating mechanisms. In this case, a boundary signal projected onto downstream place cells could be gated by contextual signals originating from the lateral Entorhinal Cortex (lEC). Jeffery demonstrated that a gated grid cell input signal could give rise to biologically plausible, place-like spatial signals [5]. In a similar way, gated boundary input could conceivably account for not only place field formation and boundary-selectivity, but also remapping.

Given the range of place cell behaviors our model reproduces, we hold that the border-to-place model it learns should be taken seriously. However, it is worth noting that there are still place behaviors unaccounted for in our work. For instance, we do not observe field doubling when walls are inserted in familiar environments (results not shown), as observed in vivo [9]. However, it is reasonable to suspect that this is due to the lack of sensory information available to the network, as there is no way for the network to detect novel walls. Therefore, adding boundary-selective sensory input to our network could conceivable uncover even more place cells behavior. This is also supported by the fact that Uria et al. observed field doubling in the responses of their model, which utilizes visual input. Thus, adding other sensory modalities may be a fruitful extension of the current model.

A related missing feature, is the lack of multiple firing fields as expressed by biological place cells, particularly in large recording environments [3]. While our network does exhibit more firing fields when familiar contexts are expanded, place cells can reliably exhibit multiple fields, likely as part of a coding strategy. In contrast, our decoding operation only extracts a single center location, which may limit the expressivity of the network. Future work could therefore consider alternative decoding approaches that place even less strict requirements on learned representations.

Our second surprising finding is the model’s conspicuous lack of grid cells. As already mentioned, grid cells have been proposed as a possible precursor to place cells [5], [23], [24]. Grid cells are also often posited as being important for path integration [11], [42], [43]. Accurate path integration is especially important in the absence of location-specific cues such as landmarks. The only pieces of information available to our model is a velocity signal, an environment-identifying context signal, and a weak, implicit boundary signal (since trajectories cannot exit environment boundaries). As such, there is no explicit sensory information, and path integration is required to solve the position reconstruction task. If grid cells are optimized chiefly for path integration, one would expect that the model learned grid-like solutions. As our model only learns border-type recurrent representations, our findings raise question concerning the necessity of grid cells for path integration, as well as the causal relationship between place cells and grid cells. That grid cells may not be required to do path integration has also been shown in other recent normative models [36].

While the lack of grid cells in this model is interesting, it does not disqualify grid cells from serving as a neural substrate for path integration. Rather, it suggests that path integration may also be performed by other, non-grid spatial cells, and/or that grid cells may serve additional computational purposes. If grid cells are involved during path integration, our findings indicate that additional tasks and constraints are necessary for learning such representations. This possibility has been explored in recent normative models, in which several constraints have been proposed for learning grid-like solutions. Examples include constraints concerning population vector magnitude, conformal isometry [32], [34], [44], capacity, spatial separation and path invariance [34]. That our model performs path integration without grid cells, and that a myriad of independent constraints are sufficient for grid-like units to emerge in other models, presents strong computational evidence that grid cells are not solely defined by path integration, and that path integration is not only reserved for grid cells.

Besides functional constraints, an important consideration when building neural network models is their architecture. In our model, information primarily flows from recurrently connected, mECtype units, to CA1-type units by feedforward projections. However, CA1 responses also feed back to the Entorhinal Cortex, via the Subiculum [40]. Such a loop structure is explored in [27], which also makes use of nongrid spatial cells to inform place field formation, similar to our findings. Incorporating a feedback pathway (from output units to recurrent units) could allow for exploring the connection between grid cells, place cells and remapping.

While our model does not produce grid-like representations, we do observe a striking, grid-like structure in the arrangement of output unit centers. Notably, these centers arrange hexagonally in arenas with open interiors, such as the large square. While a hexagonal placement of field centers has yet to be uncovered experimentally, Whittington et al. showed that place cell phases are correlated with grid cell peak locations across environments [22]. Because the network has learned this particular arrangement to optimize position reconstruction, a hexagonal phase pattern may be optimal for decoding one’s position, even in diverse geometries. This is also supported by the fact that we observe clustering in the center locations of CA1 place fields in mice data obtained in [38]. While all the animals we analyzed seem to have clustering around the edges, those with dense exploration in the middle of the environment seems to show clustering in the middle as well. In the future, larger recordings and in different animals could help solidify whether place field center clustering is a robust and ubiquitous phenomenon.

A hexagonal place field arrangement also suggests a possible connection between boundary, place, and grid cells. Boundary-tuned cells could inform place cell pattern formation, which in turn guides grid cell patterns. Such a border-to-place-to-grid cell model could explain grid cell behavior in non-standard or changing environments. For example, grid cells can exhibit (temporary) pattern elongation in novel environments [45]. This grid elongation could be induced by field elongation in place cells, which in turn is caused by boundary field continuation. Besides temporary rescaling, grid patterns are also permanently influenced by environment geometry [46], hinting that grid cells receive boundary-dependent input. Furthermore, it has been suggested that border cells serve an error-correcting function for grid cells during navigation [47]. In a boundary-to-place-to-grid model, grid error correction could arise from place-cell inputs informed by boundary responses, or from border cells directly.

In summary, our proposed model, with its notion of a spatial cognitive map and fixed decoding, allows for exploring place cell formation and remapping. In particular, we find that learned place-like representations are formed by boundary input from upstream recurrent units. Global remapping arises from sparse input from differentially activated boundary units. Our work has important implications for understanding Hippocampal remapping, place field formation, as well as the place cell-grid cell system.

4 Methods

Code Availability

Code to reproduce models, datasets and numerical experiments is available from https://github.com/bioAI-Oslo/VPC.

4.1 Model & Objective

In this work, we trained a recurrent neural network to solve the proposed position reconstruction task Eq. (2) using stochastic gradient descent. As the proposed objective function does not impose explicit constraints on the functional form or spatial arrangement of the learned representations, we trained the network in a small set of diverse geometries (see Fig. 1c) for an illustration). This was done to explore whether the network could learn different representations in different rooms, as a way of optimally encoding the space.

The recurrent network was only given velocity information as input and, therefore, had to learn to perform path integration in order to minimize the position reconstruction task. Concretely, the path integration task consisted of predicting self-position along simulated trajectories. For each trajectory, the network was initialized at a random location in the environment, without initial position information (see 4.2 for initialization details). At every time step t along a trajectory, the network received a Cartesian velocity signal vt, mimicking biological self-motion information. Denoting a particular point along a trajectory parameterized by time as xt, the decoded position of the network was

where zt is the network’s latent estimate of position at time t, formed by integration of previous positions and velocities. In our case, output states pt are computed as rectified linear combinations of an upstream recurrent layer (see 4.2 for a description). Meanwhile,

is the center location estimate for output unit i, formed using the network states during navigation.

Lastly, we provided the network with a constant one-hot context signal at every timestep, as a token for identifying the environment. The input at time t was therefore a concatenation of vt and a time-independent context signal c. See Fig. 1d) for an illustration.

4.2 Neural Network Architecture and Training

In this work, we consider a one-layer vanilla recurrent neural network (RNN) featuring Ng = 500 units. These recurrent units project linearly onto an output layer consisting of Np = 100 units. Both recurrent and output units were equipped with ReLU activation functions and no added bias.

At time t, the hidden state of the recurrent layer was given by

where is a matrix of recurrent weights, and a matrix of input weights, with ut being the input at time t, and NI the dimensionality of the input signal. The input consisted of a concatenation of a velocity signal and a six-entry, one-hot context signal, i.e. ut = cat(vt, c). Subsequently, output states were computed according to

where is a weight matrix.

Feedforward weights were all initialized according to a uniform distribution 𝒰(−ki, ki), where with Ni being the number of inputs to that layer. For the recurrent layer, the RNN weight matrix was initialized to the identity. This was done to mitigate vanishing/exploding gradients caused by the long sequence lengths used for training, as suggested by Le et al. [48].

To explore network dynamics when transitioning between different environments, we trained the recurrent network in a stateful fashion. This involved maintaining the recurrent state from the end of one trajectory, and using it as the initial state along a new trajectory. For each transition, the new environment and the starting location within that environment were sampled randomly (and uniformly). The network state is initially set to all-zero, providing no positional information at the start of any trajectory. While the network state was carried between different environments, gradient calculations were truncated at the end of each episode. To ensure stability, the network state was reset every ten trajectories, to an all-zero initial state.

Because the network is not provided with initial position information (all-zero initial state), the network has to infer its location within an environment (the identity of which is known due to the context-input) based on its geometry, e.g. through border interactions. This requires a large sample (long trajectory) of the geometry. The recurrent network was therefore trained on trajectories of sequence length T = 500. The minibatch size used for gradient descent was 64. Because of statefulness, the network therefore experienced effective sequences of 5000 timesteps during training. However, no gradient information was carried between subsequent trajectories.

For implementing models, we used the PyTorch python library [49]. We used the Adam optimizer [50] for training, with a learning rate of 104 and otherwise default parameters [49]. The network was trained for a total of 100 epochs using the training dataset detailed in 4.3. To regularize the network, we applied an L1 penalty to the recurrent network states, i.e. g. The associated L1 hyperparameter λ was set to 10.

4.3 Trajectory Simulation and Datasets

Networks were trained using simulated datasets of trajectories traversing 2D geometries. The starting location of a trajectory was sampled randomly and uniformly within an environment. To sample points uniformly from non-square geometries, a rejection sampling strategy was used: First, points were sampled according to a uniform distribution, whose support was given by the smallest rectangle that completely covered the given geometry. Then, ray casting was done to determine whether points were in the interior of the geometry. Concretely, a horizontal ray was cast from a given point and the number of intersections with the enclosure walls was determined. If the number of intersections was odd, the point was accepted as being inside the environment. If the number of intersections was even, the point was resampled. This procedure was iterated until the desired number of samples was obtained. Note that the interior determination method only works for extended objects, such as holes. Therefore, to add thin environment boundaries (infinitely thin walls), we simply superimposed two boundaries with no spatial separation.

To approximate the semi-smooth motions of foraging rodents, trajectory steps were generated by drawing step sizes according to a Rayleigh distribution with σ = 0.5, and heading direction from a von Mises distribution centered at the previous heading with scale parameter κ = 4. To ensure that the random walk remained within the geometry, we checked whether a proposed step intersected with any of the environment walls. If an intersection was detected, the heading direction was resampled until an allowed step was achieved. This procedure was iterated until the desired amount of timesteps was achieved. Note that step sizes were not resampled. This procedure yields smooth trajectories, with inherent turning away from boundaries. Trajectory positions were generated using a forward Euler integration scheme, with timestep dt = 0.1.

For computational efficiency, the network was trained and evaluated on precomputed datasets of trajectories. The full dataset contained 15000 trajectories, each of which was 500 timesteps long. Of these, 80 % was reserved for training, and the remaining 20 % for validation. In both datasets, an equal number of samples were included for every environment. All analyses were conducted using newly generated test trajectories.

4.4 Contexts and Geometries

To explore the possibility of the model learning to remap, we trained networks in multiple distinct environments, each labeled by a unique, one-hot context vector. The included geometries were square, circular and rectangular. In addition, we also included a large square, a square with a thin, central dividing wall, and finally, a square with a central hole. Each geometry and associated context signal is illustrated in Fig. 1c).

4.5 Numerical Experiments

4.5.1 Remapping Experiments

We conducted two remapping experiments to study whether the behavior of the trained neural networks aligned with those observed experimentally in rodents. The first consisted of running the trained network (with frozen weights) in multiple familiar geometries, similar to canonical remapping experiments [4], [16]. Referring to Fig. 3a) we ran the trained recurrent network along 25000-timestep sequences, that initially visited the square environment. Then, the network was transferred to the square with a central wall, before being returned to the square environment. For each trajectory, the starting position was sampled randomly within the geometry, and the state of the network was maintained between environments. The initial state of the network in the first environment was set to the zero vector.

The second set of experiments was designed to explore the consequences of geometric manipulations of familiar environments on the learned spatial representations. To do so, we ran the trained network (with fixed weights) in elongated versions of the familiar environments, similar to the experimental setup in O’Keefe et al. [8].

The first of these trials involved running the trained RNN in the square environment, with the appropriate context. However, during inference the environment walls were elongated by factors of 2 and 3 compared to their original length. For reference, Fig. 4a) illustrates the environment rescaling protocol.

The second trial concerned the effects of extending a familiar environment to previously unseen locations. Concretely, this experiment entailed transforming the environment with a central hole, into a square environment, while retaining the original context signal. In other words, the four walls of the central hole were removed, allowing movement in previously inaccessible parts of the arena.The third trial featured rescaling of the square environment into a larger square, i.e. proportional scaling in both horizontal and vertical directions while maintaining the context cue of the square environment. Again, wall lengths were scaled by factors of 2 and 3. The final geometric manipulation involved expanding the circular environment uniformly, again by factors of 2 and 3, respectively.

4.6 Attractor Dynamics and Noise Injection

To investigate whether the learned representations exhibited attractor-like behavior, we performed a noise-injection experiment. The experiment consisted of evaluating the trained RNN on 1000, 800-timestep trajectories within the square environment. At the midpoint of the trajectory Gaussian noise was injected into the recurrent state. This perturbed state was subsequently rectified, before the network was run normally for the remainder of the trajectory. We performed the same experiment for multiple noise levels σ ={ 0.0, 0.01, 0.1, 1 }, where σ determines the scale of the normal distribution used for noise generation. The state of the RNN directly after noise injection could therefore be described as

where χ is a vector of random variables, drawn from a multivariate normal distribution. τ denotes the time of noise injection, taken to be timestep 400, while [·]+ is a rectification operation.

To assess whether the representation was stable, and whether the state of the network was attractive, we computed ratemap population vector correlations (see 5 for details) between every timepoint in the sequence for each noise level.

4.7 Low Dimensional Representations and Explainability

To better understand the behavior of the recurrent network, we performed PCA, alongside dimensionality reduction using UMAP [51]. PCA was done on the recurrent and output states of the network as it was run on long (10000 timesteps in each environment) trajectories that visited every environment sequentially. For each environment transition, the state of the network was maintained. PCA was performed for each environment separately, as well as for the full trajectory visiting every environment. As an example, for a trajectory of length T, the output activity was projected to a a low-dimensional representation, with npca being the number of principal components.

The dimensionality reduction consisted of performing UMAP [51] on the states of the network along the full trajectory. This was done to explore whether network activity resided on a low-dimensional manifold. Population activity at each timepoint was subsequently projected down to three dimensions, yielding a dimensionality-reduced vector representing the full network activity at a particular point along the trajectory.

To further explore the dynamics of the network, we computed the eigenvalue spectrum of the recurrent weight matrix. Finally, we computed Pearson correlation coefficients between columns of the input weight matrix corresponding to different context signals.

5 Analyses

To compare the representational similarity of the network output across environments and time, we performed several analyses using unit ratemaps.

5.1 Ratemaps

Ratemaps of unit activity were computed by discretizing environments into bins. The rate was then determined by dividing unit activity by the number of visitations to that bin along a single trajectory. Unless otherwise specified, ratemaps were formed using 25000-timestep trajectories. For long-sequence experiments, a burn-in period of 500 initial timesteps was excluded from ratemap creation. This was done to only include the steady-state behavior of the network. For the remapping dynamics in Fig. 3e), ratemaps were created by aggregating responses over 500 distinct, 800-timestep trajectories.

5.2 Spatial Correlation

Following [16], we computed unitwise ratemap spatial correlations to investigate possible remapping behavior. For a single unit, the spatial correlation was calculated by computing the Pearson correlation coefficient between flattened unit ratemaps. We considered the correlations between the square environment, and the square with a central wall, due to their geometric similarity. In other words, the ratemap of a unit in the square environment was correlated with its ratemap in the square with a central wall environment. This procedure was repeated for all units that were active (exhibited nonzero activity) in both environments, and a distribution of spatial correlations was formed. As a baseline, a shuffled distribution was computed by correlating every active unit with every other active unit, across environments. Finally, correlations were computed for relative ratemap rotations of 0, 90, 180 and 270 degrees, and the maximimal correlation reported. This was done to account for the possibility that remapping consisted of a rigid rotation in space.

5.3 Ratemap Population Vector Correlation

To compare the representational similarity of entire unit populations at different timepoints (as in Fig. 5), we computed the Pearson correlation between ratemap population vectors at different times. A ratemap population vector was constructed by stacking the flattened ratemaps of every unit into a single array of dimension Nunits · Nx · Ny with Nunits being the number of units in the relevant layer, and Nx = Ny = 16 is the number of bins along the canonical x, y directions. Using Astropy [52], Gaussian smoothing with NaN interpolation was used to fill in unvisited regions. The smoothing kernel standard deviation was one pixel.

For the experiment featuring transfers between different environments (Fig. 3e)), only units with nonzero activity in one or more environments were included in the population vector.

5.4 Rate Overlap & Difference

As a measure of rate changes between conditions, we computed the rate overlap [4], and rate difference. Considering two conditions (e.g. comparing across two environments), rate overlap was computed by dividing the mean activity in the least active condition by that in the most active. Only units that were active in at least one condition were included in the analysis.

The rate difference was computed by simply subtracting the activity in one condition by that in another, and dividing by the sum of activity in both conditions. This measure is similar to the rate difference used in [6], but maintains the sign of the difference. As with the rate overlap, only units that were active in at least one condition were included.

For both the overlap and difference, a shuffled distribution was formed by randomly pairing units across conditions. For both quantities, pairings were performed 1000 times.

5.5 Spatial Information

To select the most place-like units for the phase distribution visualization, we computed the spatial information content [53] of all units. Using unit ratemaps of M bins, the spatial information of a single unit was computed as

where pi is the occupancy of bin i, fi the firing rate in that bin, while is the unit’s average firing rate over all bins. High spatial information units were subsequently selected as those whose spatial information were above the 2.5th percentile in all environments.

5.6 Ripley’s H & Clustering

To asses whether biological place fields exhibit non-uniform clustering, we computed Ripley’s H statistic [54] for the center locations of real place cells [38].

For a set of N points, we computed Ripley’s H in two steps: First, we determined Ripley’s K, which counts the average number of points within a distance R of a point, given by

where x and y are distinct points, |Ω| is the area of the domain Ω encompassing the set of points, while 1 is the indicator function. f (x, y) is a boundary correction factor, to account for a lack of observations outside the region Ω. We followed Lagache et al. and take

with

where ∂b(x, | x y|) is the circumference of a ball centered at x of radius |xy|, and the denominator the circumference of the part of the ball that is inside the geometry. We used the Shapely Python library [55] for computing intersections between balls and the enclosing geometry.

The second step was to normalize and center Ripley’s K, obtaining Ripley’s H, given by

For our analysis, we computed Ripley’s H for center locations of place cells in mice traversing a 75×75 cm square environment [38] over four distinct recording days. Centers, in this case, were decoded as the maximum firing location in 15×15 smoothed ratemaps. A total of 225 cells were included, corresponding to cells with spatial information above the 70th percentile. For each cell, the ratemap corresponding to the recording day with largest spatial information was selected.

As a baseline, Ripley’s H was computed for 100 sets of 225 points, sampled randomly and uniformly on a 15×15 square grid, matching the spatial discretization of the ratemaps used. For both baseline and real data, ball radii were varied from ε = 108 to approximately 26.5 cm, corresponding to a quarter of the square’s diagonal.

To visualize possible clustering of place fields, we computed Gaussian kernel density estimates of decoded field centers. This procedure was repeated for all animals, and only centers of cells with spatial information above the 70th percentile were included. For all kernel density estimates, the bandwidth parameter was set to 0.2, and kernels were evaluated on 64×64 grids. See [38] for details on ratemap creation and experiments.

Acknowledgements

We would like to thank J. Quinn Lee and Mark Brandon of McGill University as well as their co-authors, for graciously sharing their data with us. We hope others follow their example of open and helpful collaboration.

7 Author Contributions

MBP conceived the original model, did simulations and wrote the article. VSS developed the model, did simulations and wrote the article. AMS developed the model and wrote the article. MEL developed the model, supervised the process, and wrote the article.

A Appendix

A. Long Sequence Evaluation

To verify that the model performs accurate path integration, even for very long sequences, we computed Gaussian kernel density estimates of the Euclidean decoding error at every step along 100, 10000 timestep trajectories in each environment. The bandwidth parameter was set to approximately 0.4 according to Scott’s Rule, and the resulting error distributions are shown in Fig. A1.

Error distribution for long sequence evaluation.

Each pane shows the distribution and median of Euclidean distances (error) between true and decoded trajectories for the trained RNN evaluated on 100 long (10000 timestep) test trajectories in a particular environment (inset). The color indicates the kernel density estimate value at a particular timestep.

Ratemaps of all 100 output units in each environment.

The geometry is indicated atop every ensemble. Unit identity is given by its location on the grid (e.g. unit 1 is top left in each environment).

B Extended Model Ratemaps

Figures A2 and A3 show ratemaps for all 100 output units and 100 recurrent units, respectively. Responses in every environment are included. Notably, both output and recurrent units are sparse, with most recurrent units silent in a given environment. Output units display field shifts between environments, indicative of remapping.

C Experimental Phase Distributions

Figure A4 shows estimated distributions of center locations, for centers decoded from ratemaps of high spatial-information place cells in mice (data provided by [38]). While some distributions display no clear patterns in their center arrangements (e.g., for animal QLAK-CA1-74), some distributions do display signs of clustering and even patterns with regularity (e.g., QLAK-CA1-50, which has a hexagonal resemblance).

Ratemaps of 100 recurrent units in each environment.

The geometry is indicated atop every ensemble. Unit identity is given by its location on the grid (e.g. unit 1 is top left in each environment).

Place cell center distributions in mice.

Kernel Density estimates of center distributions, for centers decoded from 15×15 ratemaps of place cell activity for seven mice (indicated by title). Additionally, we show trajectories for all involved recording days, and the number of included cells is inset (N).

D A Taxonomy of Cognitive Maps

With the definition of a cognitive map in Eq. (1), we can categorise and compare recent normative neural navigation models. A range of models have recently been put forward that solve tasks similar to ours. In this section, we provide a brief recap of these models, and show that they may be viewed as instances of the cognitive map in Eq. (1) with different constraints and target representations.

Common to these models is that they all make use of random sampling of space in the form of simulated spatial trajectories in bounded 2D spaces, motivated by the foraging behaviour of rats. In addition, most works employ gradient-based optimization schemes, and optimize over independent minibatches. We will, however, omit indexing by minibatches for brevity.

For example, Dordek et al. used a target representation u(r) = p(r) of place cells modelled as either zero-mean Gaussians or difference of Gaussians, with r being a Cartesian coordinate which is encoded into a target place code. The target unit centre was sampled from a random, uniform distribution.

The task, in this case, was to perform non-negative PCA on the label place cells in a square domain Ω, i.e. finding a constrained low-dimensional representation of the label activity. Concretely, we can formulate PCA as the minimization problem

with W∈M×N, M≤ N and where and . The authors found that grid-like responses ĝ appear as an optimal low-dimensional representation of the target place code p. This formulation [39] is suitable for studying optimal cognitive maps in an idealized spatial setting. In the ideal setting, the candidate map is learned directly from true spatial coordinates, in contrast to the case where this information is latent, and agents have to build estimates of their location by integrating several sources of spatial information, such as landmark locations and path integration.

Cueva et al., Banino et al., Sorscher et al. learns latent spatial representations through path integration in a recurrent neural network model. The state of the recurrent network at time t is given by a recurrence relation

where Θ denotes a set of model parameters, and v(t) the input velocities at some time t, while Δt is an increment of time. For the RNNs described in the coming sections, we suppress the dependency on parameters Θ for the sake of readability.

Cueva et al. considered a version of the cognitive map Eq. (1) in which a recurrent neural network was trained to minimize the reconstruction error and soft constraints

where, is implemented using a continuous time RNN, with initial state ĝ0 = 0, and subsequent states given by the recurrence relation in Eq. (A9) and stationary noise ξt𝒩 (µ, σ2). Moreover, , and Win is a weight matrix for the velocity input vt to the RNN. The domain Ω is a 2D square arena visited along simulated trajectories, and the network only received velocity inputs along trajectories, necessitating path integration. In this case, the target representation is Cartesian coordinates u(rt) = rt Ω. The authors report that the learned recurrent representations ĝ appear square grid, band and border cell-like.

Banino et al. considered the case of a recurrent long short-term memory (LSTM) network trained to do supervised position prediction. Unlike [29], the training objective featured two target representations, ut = pt zt. The first target representation, pt = p(rt), was given by an ensemble of normalized, Gaussian place-like units, with rt Ω being Cartesian coordinates along discretized spatial trajectories in a square domain Ω. The second target representation consisted of an ensemble of units encoding heading direction, zt = z(ϕt), where ϕt is the head direction at time t. The representations of the head direction ensemble were given by a normalized mixture of von Mises distributions. At each step of path integration, the network received linear and angular velocity information along simulated trajectories. In summary, the loss and corresponding soft constraints can be written as

where CE is the cross entropy and Dropout [56] is a method that ablates random units with a specified rate during training to promote redundancy.

The cognitive map, in this case, is given by with ĥt defined by a recurrent neural network with a tanh activation function, while ĝt = Wgĥt is an intermediate linear layer. Finally, , and . The intermediate representations, a subset of the cognitive map, ĝt were found to display heterogeneous, grid-like responses.

Sorscher et al. reproduced [29], [30], [39] and refined the grid cell model in [30] by considering a simpler RNN structure (a vanilla RNN - although other variations have also been tested and shown to provide similar results [57]), removing head-direction inputs and outputs, the intermediate linear layer, dropout, refining the place cell target representation, and selecting the ReLU as the recurrent activation function [31].

where CE is again the cross entropy, pt = p(rt) is a difference of softmax place cell encoding of the current position of the virtual agent. In this case, the cognitive map is given by, where ĝt and, where W is a weight matrix. Notably, ĝt is computed using a vanilla RNN that learns implicit path integration using Cartesian velocity inputs. The authors report that the recurrent responses ĝt learn to exhibit striking hexagonal firing fields, similar to [39].

This brief taxonomy of normative navigation models hopefully shows how our definition of a cognitive map can be used describe a range of different models that learn biologically inspired representations through the lens of machine learning. Furthermore, our definition, and the notion of a target representation, could hopefully inspire new models. For example, one could consider decoding into a target representation of simulated grid cells.