1. Neuroscience
Download icon

A neural-level model of spatial memory and imagery

  1. Andrej Bicanski  Is a corresponding author
  2. Neil Burgess  Is a corresponding author
  1. University College London, United Kingdom
Research Article
  • Cited 2
  • Views 1,813
  • Annotations
Cite this article as: eLife 2018;7:e33752 doi: 10.7554/eLife.33752

Abstract

We present a model of how neural representations of egocentric spatial experiences in parietal cortex interface with viewpoint-independent representations in medial temporal areas, via retrosplenial cortex, to enable many key aspects of spatial cognition. This account shows how previously reported neural responses (place, head-direction and grid cells, allocentric boundary- and object-vector cells, gain-field neurons) can map onto higher cognitive function in a modular way, and predicts new cell types (egocentric and head-direction-modulated boundary- and object-vector cells). The model predicts how these neural populations should interact across multiple brain regions to support spatial memory, scene construction, novelty-detection, ‘trace cells’, and mental navigation. Simulated behavior and firing rate maps are compared to experimental data, for example showing how object-vector cells allow items to be remembered within a contextual representation based on environmental boundaries, and how grid cells could update the viewpoint in imagery during planning and short-cutting by driving sequential place cell activity.

https://doi.org/10.7554/eLife.33752.001

Introduction

The ability to reconstruct perceptual experiences into imagery constitutes one of the hallmarks of human cognition, from the ability to imagine past episodes (Tulving 1985) to planning future scenarios (Schacter et al., 2007). Intriguingly, this ability (also known as ‘scene construction’ and ‘episodic future thinking’) appears to depend on the hippocampal system (Schacter et al., 2007; Hassabis et al., 2007; Buckner, 2010), in which direct (spatial) correlates of the activities of single neurons have long been identified in rodents (O'Keefe and Nadel, 1978; Taube et al., 1990a; Hafting et al., 2005) and more recently in humans (Ekstrom et al., 2003; Jacobs et al., 2010). The rich catalog of behavioral, neuropsychological and functional imaging findings on one side, and the vast literature of electrophysiological research on the other (see e.g. Burgess et al., 2002), promises to allow an explanation of higher cognitive functions such as spatial memory and imagery directly in terms of the interactions of neural populations in specific brain areas. However, while attaining this type of understanding is a major aim of cognitive neuroscience, it cannot usually be captured by a few simple equations because of the number and complexity of the systems involved. Here, we show how neural activity could give rise to spatial cognition, using simulations of multiple brain areas whose predictions can be directly compared to experimental data at neuronal, systems and behavioral levels.

Extending the Byrne, Becker and Burgess model of spatial memory and imagery of empty environments (Burgess et al., 2001a; Byrne et al., 2007), we propose a large-scale systems-level model of the interaction between Papez’ circuit, parietal, retrosplenial, and medial temporal areas. The model relates the neural response properties of well-known cells types in multiple brain regions to cognitive phenomena such as memory for the spatial context of encountered objects and mental navigation within familiar environments. In brief, egocentric (i.e. body-centered) representations of the local sensory environment, corresponding to a specific point of view, are transformed into viewpoint-independent (allocentric or world-centered) representations for long-term storage in the medial temporal lobes (MTL). The reverse process allows reconstruction of viewpoint-dependent egocentric representations from stored allocentric representations, supporting imagery and recollection.

Neural populations in the medial temporal lobe (MTL) are modeled after cell types reported in rodent electrophysiology studies. These include place cells (PCs), which fire when an animal traverses a specific location within the environment (O'Keefe and Dostrovsky, 1971); head direction cells (HDCs), which fire according to the animal’s head direction relative to the external environment, irrespective of location (Taube and Ranck, 1990; Taube et al., 1990a; Taube et al., 1990b); boundary vector cells (Lever et al., 2009); henceforth BVCs), which fire in response to the presence of a boundary at a specific combination of distance and allocentric direction (i.e. North, East, West, South, irrespective of an agent’s orientation); and grid cells (GCs), which exhibit multiple, regularly spaced firing fields (Hafting et al., 2005). Evidence for the presence of these cell types in human and non-human primates is mounting steadily (Robertson et al., 1999; Ekstrom et al., 2003; Jacobs et al., 2010; Doeller et al., 2010; Bellmund et al., 2016; Horner et al., 2016; Nadasdy et al., 2017).

The egocentric representation supporting imagery has been suggested to reside in medial parietal cortex (e.g. the precuneus; Fletcher et al., 1996; Knauff et al., 2000; Formisano et al., 2002; Sack et al., 2002; Wallentin et al. (2006); Hebscher et al., 2018). In the model, it is referred to as the ‘parietal window’ (PW). Its neurons code for the presence of scene elements (boundaries, landmarks, objects) in peri-personal space (ahead, left, right) and correspond to a representation along the dorsal visual stream (the ‘where’ pathway; Ungerleider, 1982; Mishkin et al., 1983). The parietal window boundary coding (PWb) cells are egocentric analogues of BVCs (Barry et al., 2006; Lever et al., 2009), consistent with evidence that parietal areas support egocentric spatial processing (Bisiach and Luzzatti, 1978; Nitz, 2009; Save and Poucet, 2009; Wilber et al., 2014).

The transformation between egocentric (parietal) and allocentric (MTL) reference frames is performed by a gain-field circuit in retrosplenial cortex (Burgess et al., 2001a; Byrne et al., 2007; Wilber et al., 2014; Alexander and Nitz, 2015; Bicanski and Burgess, 2016), analogous to gain-field neurons found in posterior parietal cortex (Snyder et al., 1998; Salinas and Abbott, 1995; Pouget and Sejnowski, 1997; Pouget et al., 2002) or parieto-occipital areas (Galletti et al., 1995). Head-direction provides the gain-modulation in the transformation circuit, producing directionally modulated boundary vector cells which connect egocentric and allocentric boundary coding neurons. That this transformation between egocentric directions (left, right, ahead) and environmentally-referenced directions (nominally North, South, East, West) requires input from the head-direction cells found along Papez’s circuit (Taube et al., 1990a; Taube et al., 1990b) is consistent with its involvement in episodic memory (e.g. Aggleton and Brown, 1999; Delay and Brion, 1969).

During perception the egocentric parietal window representation is based on (highly processed) sensory inputs. That is, it is driven in a bottom-up manner, and the transformation circuit maps the egocentric PWb representation to allocentric BVCs. When the transformation circuit acts in reverse (top-down mode), it reconstructs the parietal representation from BVCs which are co-active with other medial temporal cell populations, forming the substrate of viewpoint-independent (i.e. allocentric) memory. This yields an orientation-specific (egocentric) parietal representation (a specific point of view) and constitutes the model’s account of (spatial) imagery and explicit recall of spatial configurations of known spaces (Burgess et al., 2001a; Byrne et al., 2007). Figure 1 depicts a simplified schematic of the model.

Simplified model schematic.

(A) Processed sensory inputs reach parietal areas and support an egocentric representation of the local environment (in a head-centered frame of reference). Retrosplenial cortex uses current head or gaze direction to perform the transformation from egocentric to allocentric coding. At a given location, environmental layout is represented as an allocentric code by activity in a set of BVCs, the place cells (PCs) corresponding to the location, and perirhinal neurons representing boundary identities (in a familiar environment, all these representations are associated via Hebbian learning to form an attractor network). Black arrows indicate the flow of information during perception and memory encoding (bottom-up). Dotted arrows indicate the reverse flow of information, reconstructing the parietal representation from view-point invariant memory (imagery, top-down). (B) Illustration of the egocentric (left panel) and allocentric frame of reference (right panel), where the vector s indicates South (an arbitrary reference direction) and the angle a is coded for by head direction cells, which modulate the transformation circuit. This allows BVCs and PCs to code for location within a given environmental layout irrespective of the agent’s head direction (HD). The place field (PF, black circle) of an example PC is shown together with possible BVC inputs driving the PC (broad grey arrows).

https://doi.org/10.7554/eLife.33752.002

To account for the presence of objects within the environment, we propose allocentric object vector cells (OVCs) analogous to BVCs, and show how object-locations can be embedded into spatial memory, supported by visuo-spatial attention. Importantly, the proposed object-coding populations in the MTL map onto recently discovered neuronal populations (Deshmukh and Knierim, 2013; Hoydal et al., 2017). We also predict a population of egocentric object-coding cells in the parietal window (PWo cells: egocentric analogues to OVCs), as well as directionally modulated boundary and object coding neurons (in the transformation circuit). Finally, we include a grid cell population to account for mental navigation and planning, which drives sequential place cell firing reminiscent of hippocampal ‘replay’ (Wilson and McNaughton, 1994; Foster and Wilson, 2006; Diba and Buzsáki, 2007; Karlsson and Frank, 2009; Carr et al., 2011) and preplay (Dragoi and Tonegawa, 2011; Ólafsdóttir et al., 2015). We refer to this model as the BB-model.

Methods

Here, we describe the neural populations of the BB-model and how they interact in detail. Technical details of the implementation, equations, and parameter values can be found in the Appendix.

Receptive field topology and visualization of data

We visualize the firing properties of individual spatially selective neurons as firing rate maps that reflect the activity of a neuron averaged over time spent in each location. We also show population activity by arranging all neurons belonging to one population according to the relative locations of their receptive fields (see Figure 2A–C), plotting a snapshot of their momentary firing rates. In the case of boundary-selective neurons such a population snapshot will yield an outline of the current sensory environment (Figure 2C). Naturally, these neurons may not be physically organized in the same way, and these plots should not be confused with the firing rate maps of individual neurons (Figure 2D). Hence, population snapshots (heat maps) and firing rate maps (Matlab ‘jet’ colormap) are shown in distinct color-codes (Figure 2).

Figure 2 with 1 supplement see all
Receptive field topology and visualization of neural activity.

(A1) Illustration of the distribution of receptive field centers (RFs) of place cells (PCs), which tile the environment. (A2) Receptive fields of boundary responsive neurons, be they allocentric (BVCs) or egocentric (PWb neurons), are distributed on a polar grid, with individual receptive fields centered on each delineated polygon. Two example receptive fields (calculated according to Equation 14) are overlaid (bright colors) on the polar grids for illustration. Note that each receptive field covers multiple polygons, that is neighboring receptive fields overlap. The polar grids of receptive fields tile space around the agent (red arrow head at center of plots), that is they are anchored to the agent and move with it (for both BVCs and PWb neurons). In addition, for PWb neurons the polar grid of receptive fields also rotates with the agent (i.e. their tuning is egocentric). (B1) As the agent (black arrowhead) moves through an environment, place cells (B2) track its location. (B2) Snapshot of the population activity of all place cells arranged according to the topology of their firing fields (see A1). (C1,2) Snapshots of the population activity for BVCs and boundary selective PW neurons (PWb), respectively. Cells are again distributed according to the topology of their receptive fields (see A2), that is each cell is placed at the location occupied by the centre of its receptive field in peri-personal space (ahead is shown as up for PW neurons; North is shown as up for BVCs). See Section on the transformation circuit, Video 1, and Figure 2—figure supplement 1 for the mapping between PW and BVCs patterns via the transformation circuit. (D1,2) Unlike snapshots of population activity, firing rate maps show the activity of individual neurons averaged over a whole trial in which the agent explores the environment, here for a place cell (D1) and for a boundary vector cell with a receptive field due East (D2, tuning distance roughly 85 cm).

https://doi.org/10.7554/eLife.33752.003
Video 1
Surface plots (heat maps) visualize theneural activity of populations of cells.

The video shows a visualization of the simulated neural activity in the retrosplenial transformation circuit as a simulated agent moves in a simple, familiar environment (See Figure 2-figure supplement 1 for further details). Individual sublayers of the transformation circuit are shown in a circular arrangement around the head direction ring. Head direction cells track the agent's heading and confer a gain modulation on the retrosplenial sublayers. The transformation circuit then drives boundary vector cells (see main text). Surface plots (heat maps) visualize the neural activity of populations of cells. Individual cells correspond to pixels/polygons on the heat maps (compare to figures). Cells are arranged according to the distribution of their receptive fields; however, this arrangement does not necessarily reflect anatomical relations. Bright colors indicate strong firing. Abbreviations: PWb, Parietal Window, egocentric boundary representations (ahead is up); HDCs, Head Direction Cells; TR, Retrosplenial transformation sublayers; BVCs, Boundary Vector Cells (North is up); egoc. agent view, egocentric field of view of the agentwithin the environment, purple outlines denote visibleboundary segments which correspond to sensoryinputs to the PWb (ahead is up); alloc. agent position, allocentric position of the agent in the environment (North is up).

https://doi.org/10.7554/eLife.33752.005

The parietal window

Perceived and imagined egocentric sensory experience is represented in the ‘parietal window’ (PW), which consists of two neural populations - one coding for extended boundaries (‘PWb neurons’), and one for discrete objects (‘PWo neurons’). The receptive fields of both populations lie in peri-personal space, that is are tuned to distances and directions ahead, left or right of the agent, tile the ground around the agent, and rotate together with the agent (Figure 2A2, Figure 3). Reciprocal connections to and from the retrosplenial transformation circuit (RSC/TR, see below) allow the parietal window representations to be transformed into allocentric (orientation-independent) representations (i.e. boundary and object vector cells) in the MTL and vice versa. Intriguingly, cells that encode an egocentric representation of boundary locations (akin to parietal window neurons in the present model) have recently been described (Hinman et al., 2017).

The agent model and population snapshots for object representations.

(A) Top panel: The egocentric field of view of the agent (black arrow head). Purple boundaries fall into the forward-facing 180 degree field of view and provide bottom-up drive to the parietal window (PWb; not shown, but see Figure 2C2). The environment contains two discrete objects (green circles). Bottom panel: Allocentric positions of the agent (black triangle) and objects (green circles). (B) Object-related parietal window (PWo) activity (top panel) and OVC activity (bottom panel) due to object 2, South-East of the agent, at time T1. (C) PWo activity (top panel) and OVC activity (bottom panel) due to object 1, North-East of the agent, at time T2. A heuristically implemented attention model ensures that only one object at a time drives the parietal window (PWo). (D) Illustration of the encoding of an object encountered in a familiar environment. Dashed connections are learned (as Hebbian weight updates) between active cells. Solid lines indicate connections learned in the training phase, representing the spatial context. Note that place cells (PCs) anchor the object representation to the spatial context.

https://doi.org/10.7554/eLife.33752.006

The agent model and perceptual drive

An agent model supplies perceptual information, driving the parietal window in a bottom-up manner. The virtual agent moves along trajectories in simple 2D environments (Figure 2B1). Turning motions of the agent act on the head direction network to shift the activity packet in the head direction ring attractor. Egocentric distances to environmental boundaries in a 180-degree field of view in front of the agent are used to drive the corresponding parietal window (PWb) neurons. The retrosplenial circuit (section "The Head Direction Attractor Network and the Transformation Circuit") transforms this parietal window activity into BVC activity, which in turn drives PC activity in the pattern-completing MTL network (O'Keefe and Burgess, 1996; Hartley et al., 2000). Thus, simplified perceptual drive conveyed to the MTL allows the model to self-localize in the environment based purely on sensory inputs.

The medial temporal lobe network

Spatial context

The medial temporal lobe (MTL) network for spatial context is comprised of three interconnected neural populations: the PCs and BVCs code for the position of the agent relative to a given boundary configuration, and perirhinal neurons code for the identity (e.g. texture, color etc) of boundaries (PRb neurons). Identity has to be signaled by cells separate from BVCs because the latter respond to any boundary at a given distance and direction.

Discrete objects

The allocentric object code is comprised of two populations of neurons. First, similarly to extended boundaries, the identity of discrete objects must be coded for by perirhinal neurons (PRo neurons). Second, we hypothesize an allocentric representation of object location, termed object vector cells (OVCs), analogous to BVCs (Figure 3B), with receptive fields at a fixed distance in an allocentric direction.

Interestingly, cells which respond to the presence of small objects and resemble OVCs have recently been identified in the rodent literature (Deshmukh and Knierim, 2013; Hoydal et al., 2017), and could reside in the hippocampus proper or one synapse away. Although we treat them separately, BVCs and OVCs could in theory start out as one population in which individual cells specialize to respond only to specific types of object with experience (e.g. to small objects in the case of OVCs; see Barry and Burgess, 2007).

The role of perirhinal neurons

OVCs, like BVCs and parietal window (PWo and PWb) neurons signal geometric relations between object or boundary locations and the agent, but not the identity of the object or boundary. OVCs and BVCs fire for any object or boundary occupying their receptive fields. Conversely, an object’s or boundary’s identity is indicated, irrespective of its location, by perirhinal neurons. They lie at the apex of the ventral visual stream (the ‘what’ pathway; Ungerleider, 1982; Mishkin et al., 1983; Goodale and Milner, 1992; Davachi, 2006; Valyear et al., 2006) and encode the identities or sensory characteristics of boundaries and objects, driven by a visual recognition process which is not explicitly modeled. Only in concert with perirhinal identity neurons does the object or boundary code uniquely represent a specific object or boundary at a specific direction and distance from the agent.

Connections among medial temporal lobe populations

BVCs and OVCs have reciprocal connections to the transformation circuit, allowing them to be driven by perceptual inputs (‘bottom up’), or to project their representations to the parietal window (‘top down’).

For simulations of the agent in a familiar environment, the connectivity among the medial temporal lobe populations which comprise the spatial context (PCs, BVCs, PRb neurons) is learned in a training phase, resulting in an attractor network, such that mutual excitatory connections between neurons ensure pattern completion. Hence, partial activity in a set of PCs, BVCs, and/or PRb neurons - will re-activate a complete, previously learned representation of spatial context in these populations. OVCs and PRo neurons are initially disconnected from the populations that represent the spatial context. The simulated agent can then explore the environment and encode objects into memory along the way.

The head direction attractor network and the transformation circuit

Head direction cells (HDCs) are arranged in a simple ring-attractor circuit (Skaggs et al., 1995; Zhang, 1996). Current head direction, encoded by activity in this attractor circuit, is updated by angular velocity information as the agent explores the environment. The head direction signal enables the egocentric-allocentric transformation carried out by retrosplenial cortex.

Because of their identical topology, the PWb/BVC population pair and the PWo/OVC population pair can each make use of the same transformation circuit. For simplicity we illustrate its function via BVCs and their PWb counterparts. The retrosplenial transformation circuit (RSC/TR) consists of 20 sublayers. Each sublayer is a copy of the BVC population, with firing within each sublayer also tuned to a specific head-direction (directions are evenly spaced in the [0360] degree range). That is, individual cells in the transformation circuit are directionally modulated boundary vector cells, and connect egocentric (parietal) PWb neurons and allocentric BVCs (in the MTL) in a mutually consistent way. All connections are reciprocal. For example, a BVC with a receptive field to the East is mapped onto a PWb neuron with a receptive field to the right of the agent when facing North, but is mapped onto a PWb neuron with a receptive field to the left of the agent when facing South. Similarly, a PWb neuron with a receptive field ahead of the agent is mapped onto a BVC with a receptive field to the West when facing West but is mapped onto a BVC with a receptive field to the North when facing North. Figure 2C depicts population snapshots that are mapped onto each other by the transformation circuit (also see Video 1), while Figure 2—figure supplement 1 illustrates the connections and firing rate maps at the single cell level. We hypothesize that the egocentric-allocentric transformation circuit is set up during development (see Appendix for the setup of the circuit).

Bottom-up vs top-down modes of operation

During perception, the egocentric parietal window representation is based on sensory inputs (‘bottom-up’ mode). The PW representations thus determine MTL activity via the transformation circuit. ‘Running the transformation in reverse’ (‘top-down’ mode), that is reconstructing parietal window activity based on BVCs/OVCs, is the BB-models account of visuo-spatial imagery. To implement the switch between modes of operation, we assume that the balance between bottom-up and top-down connections is subject to neuromodulation (see e.g. Hasselmo, 2006); Appendix, Equation 3 and following). For example, connections from the parietal window (PWb and PWo) populations to the transformation circuit and thence onto BVCs/OVCs are at full strength in bottom-up mode, but down-regulated to 5% of their maximum value in top-down mode. Conversely, connections from BVCs/OVCs to the transformation circuit and onwards to the parietal window are down-regulated during bottom-up perception (5% of their maximum value) and reach full strength only during imagery (top-down reconstruction).

Embedding object-representations into a spatial context: attention and encoding

Unlike boundaries, which are hard-coded in the simulations (corresponding to the agent moving in a familiar environment), object representations are learned on the fly (simulating the ability to remember objects found in new locations in the environment).

As noted above (section "The Role of Perirhinal Neurons"), to uniquely characterize the egocentric perceptual state of encountering an object within an environment requires the co-activation of perirhinal (PRo) neurons (signaling identity) and the corresponding parietal window (PWo) (signaling location in peripersonal space). Moreover, maximal co-firing of only one PRo neuron with one PWo neuron (or OVC, in allocentric terms) at a given location is required for an unambiguous association (Figure 3A–C). If multiple conjunctions of object location and identity are concurrently represented then it is impossible to associate each object identity uniquely with one location - that is, object-location binding would be ambiguous. To ensure a unique representation, we allow the agent to direct attention to each visible object in sequence (compare Figure 3B and C; for a review of attentional mechanisms see VanRullen, 2013). This leads to a specific set of PWo, OVC and PRo neurons, corresponding to a single object at a given location, being co-active for a short period while connections between MTL neurons develop (including those with PCs, see Figure 3D). Then, attention is redirected and a different set of PWo, OVC and PRo neurons becomes co-active. We set a fixed length for an attentional cycle (600 time units). However, we do not model the mechanistic origins of attention. Attention is supplied as a simple rhythmic modulation of perceptual activity in the parietal window.

To encode objects in their spatial context the connections between OVCs, PRo neurons and currently active PCs are strengthened. By linking OVCs and PRo neurons to PCs, the object code is explicitly attached to the spatial context because the same PCs are reciprocally connected to the BVCs that represent the geometric properties of the environment (Figure 3D). A connection between PRo neurons and HDCs is also strengthened to allow recall to re-instantiate the head direction at encoding during imagery (see Simulation 1.0 below).

Finally, if multiple objects are present in a scene we do not by default encode all perceivable objects equally strongly into memory. We trigger encoding of an object when it reaches a threshold level of ‘salience’. In general, ‘salience’ could reflect many factors; here, we simulate relatively few objects and assume that salience becomes maximal at a given proximity, and prevent any further learning thereafter.

Imagery and the role of grid cells

Grid cells (GCs; Hafting et al. 2005) are thought to interface self-motion information with place cells (PCs) to enable vector navigation (Kubie and Fenton, 2012; Erdem and Hasselmo, 2012; Bush et al., 2015; Stemmler et al., 2015), shortcutting, and mental navigation (Bellmund et al., 2016); Horner et al. 2016). Importantly, both self-motion inputs (via GCs) and sensory inputs (e.g. mediated via BVCs and OVCs) converge onto PCs and both types of inputs may be weighted according to their reliability (Evans et al., 2016). GCs could thus support PC activity when sensory inputs are unreliable or absent. Here, GC inputs can drive PC firing during imagined navigation (see Section Novelty Detection (Simulations 1.3, 1.4)), whereas perceived scene elements, mediated via BVC and OVCs, provide the main input to PCs during unimpaired perception.

We include a GC module in the BB-model that, driven by heuristically implemented mock-motor-efference signals (self-motion signals with suppressed motor output), can update the spatial memory network in the absence of sensory inputs. The GC input allows the model to perform mental navigation (imagined movement through a known environment). By virtue of connections from GCs to PCs, the GCs can shift an activity bump smoothly along the sheet of PCs. Pattern completion in the medial temporal lobe network then updates the BVC representation according to the shifted PC representation. BVCs in turn update the parietal window representation (top-down), smoothly shifting the egocentric field of view in imagery (i.e. updating the parietal window representations) during imagined movement. Thus, self-motion related updating (sometimes referred to as ‘path integration’) and mental navigation share the same mechanism (Tcheang et al., 2011).

Connection weights between GCs and PCs are calculated as a simple Hebbian association between PC firing at a given coordinate (according to the mapping shown in Figure 2A,B) and pre-calculated firing rate maps of GCs (7 modules with 100 cells each, see Appendix for details).

Model summary

An agent employing a simple model of attention alongside dedicated object-related neural populations in perirhinal, parietal and parahippocampal (BVCs and OVCs) cortices allow the encoding of scene representations (i.e. objects in a spatial context) into memory. Transforming egocentric representations via the retrosplenial transformation circuit yields viewpoint-independent (allocentric) representations in the medial temporal lobe, while reconstructing the parietal window representation (which is driven by sensory inputs during perception) from memory is the model’s account of recall as an act of visuo-spatial imagery. Grid cells allow for mental navigation. Figure 4 shows the complete schematic of the BB-model, see Figure 2—figure supplement 1 for details of the RSC transformation circuit.

The BB-model.

‘Bottom-up’ mode of operation: Egocentric representations of extended boundaries (PWb) and discrete objects (PWo) are instantiated in the parietal window (PWb/o) based on inputs from the agent model while it explores a simple 2D environment. Attention sequentially modulates object-related PW activity to allow for unambiguous neural representations of an object at a given location. The angular velocity of the agent drives the translation of an activity packet in the head direction ring attractor network. Retrosplenial cortex (RSC) carries out the transformation from egocentric representations in the PW to allocentric representations in the MTL (driving BVCs and OVCs). The transformation circuit consists of 20 sublayers, each maximally modulated by a specific head direction while the remaining circuit is inhibited (Inh). In the medial temporal lobe network, perirhinal neurons (PRb/o) code for the identity of an object or extended boundary. PCs, BVCs and perirhinal neurons are reciprocally connected in an attractor network. Following encoding after object encounters, PCs are also reciprocally connected to OVCs and PRo neurons. ‘Top-down’ mode of operation: Activity in a subset of PCs, BVCs, and/or perirhinal neurons spreads to the rest of the MTL network (pattern completion) by virtue of intrinsic connectivity. With perceptual inputs to the PW disengaged (i.e. during recollection), the transformation circuit reconstructs parietal window (PWb/o) activity based on the current BVC and OVC activity. Updating PCs via entorhinal cortex (EC) GC inputs allows for a shift of viewpoint in imagery.

https://doi.org/10.7554/eLife.33752.007

Quantification

To obtain a measure of successful recall or of novelty detection (i.e. mismatch between the perceived and remembered scenes), we correlate the population vectors of the model’s neural populations between recall (the reconstruction in imagery) and encoding. These correlations are compared to correlations between recall and randomly sampled times as the agent navigates the environment in bottom-up mode. This measure of mismatch could potentially be compared to experimental measures of overlap between neuronal populations (e.g. Guzowski et al., 1999) in animals, or ‘representational similarity’ measures in fMRI, e.g. Ritchey et al., 2013).

Simulations

In this section, we explore the capabilities of the BB-model in simulations and derive predictions for future research. Each simulation is accompanied by a Figure, a supplementary video visualizing the time course of activity patterns of neural populations, and a brief discussion. In Section Discussion, we offer a more general discussion of the model.

Encoding of objects in spatial memory and recall (Simulation 1.0)

We let the agent model explore the square environment depicted in Figure 3A. However, the spatial context now contains an isolated object (Figure 5). During exploration, parietal window (PWb) neurons activate BVCs via the retrosplenial transformation circuit (RSC/TR), which in turn drive place cell activity. Similarly, when the object is present PWo neurons are activated, which drive OVCs via the transformation circuit. At the same time, object/boundary identity is signalled by perirhinal neurons (PRb/o). When the agent comes within a certain distance (here 55 cm) of an object, the following connection weights are changed to form Hebbian associations: PRo neurons are associated with PCs, HDCs, and OVCs; OVCs are associated with PCs and PRo neurons (also see Figure 3D); PCs are already connected to BVCs (in a familiar context). The weight change is calculated as the outer product of population vectors of the corresponding neuronal populations (yielding the Hebbian update), normalized, and added to the given weight matrix.

(A) Bottom-up mode of operation. Population snapshots at the moment of encoding during an encounter with a single object in a familiar spatial context. Left to right: PWb/o populations driven by sensory input project to the head-direction-modulated retrosplenial transformation circuit (RSC/TR, omitted for clarity, see Video 1 and Figure 2—figure supplement 1); The transformation circuit projects its output to BVCs and OVCs; BVCs and PRb neurons constitute the main drive to PCs; perirhinal (PRb/o) neurons are driven externally, representing object recognition in the ventral visual stream. At the moment of encoding, reciprocal connections between PCs and OVCs, OVCs and PRo neurons, PCs and PRo neurons, and PRo neurons and current head direction are learned (see Figure 3D). Right-most panels show the agent in the environment and the PC population snapshot representing current allocentric agent position. (B) Top-down mode of operation, after the agent has moved away from the object (black triangle, right-most panel). Current is injected into a PRo neuron (bottom right of panel), modelling a cue to remember the encounter with that object. This drives PCs associated to the PRo neuron at encoding (dashed orange connections show all associations learned at encoding). The connection weights switch globally from bottom-up to top-down (connections previously at 5% of their maximum value now at 100% and vice versa; orange arrows). PCs become the main drive to OVCs, BVCs and PRb neurons. BVC and OVC representations are transformed to their parietal window counterparts, thus reconstructing parietal representations (PWb/PWo) similar to those at the time of encoding (compare left-most panels in A and B). That is, the agent has reconstructed a point of view embodied by parietal window activity corresponding to the location of encoding (red triangle, right-most panel). Heat maps show population firing rates frozen in time (black: zero firing; white: maximal firing).

https://doi.org/10.7554/eLife.33752.008

After the agent has finished its assigned trajectory, we test object-location memory via object-cued recall. That is, modeling some external trigger to remember a given object (e.g. ‘Where did I leave my keys?'), current is injected into the PRo neuron coding for the identity of the object to-be recalled. By virtue of learned connections, the PRo neuron drives the PCs which were active at encoding. Pattern completion in the MTL recovers the complete spatial context by driving activity in BVCs and PRb neurons. The connections from PRo neurons to head direction cells (Figure 3D) ensure a modulation of the transformation circuit such that allocentric BVC and OVC activity will be transformed to yield the parietal representation (i.e. a point of view) similar to the one at the time of encoding. That is, object-cued recall corresponds to a full reconstruction of the scene when the object was encoded. Figure 5 depicts the encoding (Figure 5A) and recall phases (Figure 5B) of simulation 1.0. Video 2 shows the entire trial. To facilitate matching simulation numbers and figures to videos, Table 1 lists all simulations and relates them to their corresponding figures and videos.

Table 1
List of simulations, their content, corresponding Figures and videos
https://doi.org/10.7554/eLife.33752.009
Simulation no.ContentRelated figuresVideo no.
 0Activity in the transformation circuitFigure 2—figure supplement 11
 1.0Object-cued recallFigures 5 and 6,8A2
 1.0n1Object-cued recall with neuron lossFigure 8B3
 1.0n2Object-cued recall with firing rate noiseFigure 8C4
 1.1Papez’ circuit Lesion (anterograde amnesia)Figure 7A5
 1.2Papez’ circuit Lesion (retrograde amnesia)Figure 7B6
 1.3Object novelty (intact hippocampus)Figure 9A7
 1.4Object novelty (lesioned hippocampus)Figure 9B8
 2.1Boundary trace responsesFigure 10A,B,C9
 2.2Object trace responsesFigure 10D10
 3.0Inspection of scene elements in imageryFigure 1111
 4.0Mental NavigationFigure 1212
 5.0Planning and short-cuttingFigure 1313
Video 2
This video shows a visualization of the simulated neural activity as the agent moves in a familiar environment and encounters a novel object.

The agent approaches the object and encodes it into long-term memory. Upon navigating past the object the agent initiates recall, reinstating patterns of neural activity similar to the patterns present during the original object encounter. Recall is identified with the re-construction of the original scene in visuo-spatial imagery (see main text). Please see caption of Video 1 for abbreviations. 

https://doi.org/10.7554/eLife.33752.010

Recollection in the BB-model results in visuo-spatial imagery of a coherent scene from a single viewpoint and direction, that is it implements a process of scene construction (Burgess et al., 2001a; Byrne et al., 2007; Schacter et al., 2007; Hassabis et al., 2007; Buckner, 2010) at the neuronal level. A mental image is re-constructed in the parietal window reminiscent of the perceptual activity present at encoding. Note that during imagery BVCs (and hence PWb neurons, Figure 5B) all around the agent are reactivated by place cells, because the environment is familiar (the agent having experienced multiple points of view at each location during the training phase). We do not simulate selective attention for boundaries (i.e. PWb neurons), although see Byrne et al., 2007.

Similar tasks in humans appear to engage the full network, including Papez’ circuit, where head direction cells are found (for review see Taube, 2007); retrosplenial cortex (where we hypothesize the transformation circuit to be located) (Burgess et al., 2001a; Lambrey et al., 2012; Auger and Maguire, 2013; Epstein and Vass, 2014; Marchette et al., 2014; Shine et al., 2016); medial parietal areas (Fletcher et al., 1996; Hebscher et al., 2018); parahippocampus and hippocampus (Hassabis et al., 2007; Addis et al., 2007; Schacter et al., 2007; Bird et al., 2010); and possibly the entorhinal cortex (Atance and O'Neill, 2001; Bellmund et al., 2016; Horner et al. 2016; also see Simulation 4.0).

At the neuronal level, a key component of the BB-model are the object vector cells (OVCs) which code for the location of objects in peri-personal space. In Figure 5 the cells are organized according to the topology of their receptive fields in space, with the agent at the center (also compare to Figure 2A2). However, in rodent experiments individual spatially selective cells (like PCs or GCs) are normally visualized as time-integrated firing rate maps. We ran a separate simulation with three objects in the environment to examine firing rate maps of individual cells. OVCs show firing fields at a fixed allocentric distance and angle from objects (Figure 6). The BB-model predicts that OVC-like responses should be found as close as one synapse away from the hippocampus and were introduced as a parsimonious object code, analogous to BVCs and exploiting the existing transformation circuit. However, these rate maps show a striking resemblance to similar data from cells recently reported in the hippocampus of rodents (Figure 6C, compare to Deshmukh and Knierim, 2013). While Deshmukh and Knierim (2013) found these cells in the hippocampus, the object selectivity of these hippocampal neurons may have been inherited from other areas, such as lateral entorhinal cortex (Tsao et al., 2013), parahippocampal cortex (due to their similarities to BVCs) or medial entorhinal cortex (Solstad et al., 2008; Hoydal et al., 2017).

Firing fields of object vector cells.

(A) Firing rate maps for representative object vector cells (OVCs), firing for objects with a fixed allocentric location and direction relative to the agent. Object locations superimposed as green circles. Note that the objects have different identities, which would be captured by perirhinal neurons, not OVCs. Compare to Figure 4 in Deshmukh and Knierim, 2013. White lines point from objects to firing fields. Red dotted line added for comparison with B. (B) Distribution of the objects in the arena and an illustration of a possible agent trajectory.

https://doi.org/10.7554/eLife.33752.011

Anatomical connections between the potential loci of BVCs/OVCs and retrosplenial cortex (the suggested location of the egocentric-allocentric transformation circuit) exist. BVCs have been found in the subicular complex (Lever et al., 2009), and the related border cells and OVCs in medial entorhinal cortex (Solstad et al., 2008; Hoydal et al., 2017). Both areas receive projections from retrosplenial cortex (Jones and Witter, 2007), and project back to it (Wyss and Van Groen, 1992).

Papez’ circuit lesions induce amnesia (Simulations 1.1, 1.2)

Figure 5 depicts the model performing encoding and object-cued recall. However, the model also allows simulation of some of the classic pathologies of long-term memory. Lesions along Papez’ circuit have long been known to induce amnesia (Delay and Brion, 1969; Squire and Slater, 1978; 1989; Parker and Gaffan, 1997; Aggleton et al., 2016). Thus, lesions to the fornix and mammilary bodies severely impact recollection, although recognition can be less affected (Tsivilis et al., 2008). In the context of spatial representations, Papez’ circuit is notable for containing head direction cells (as well as many other cell types not in the model). That is, the mammillary bodies (more specifically the lateral mammillary nucleus, LMN), anterior dorsal thalamus, retrosplenial cortex, parts of the subicular complex and medial entorhinal cortex all contain head direction cells (Taube, 2007; Sargolini et al., 2006). Thus, lesioning Papez’ circuit removes (at least) the head direction signal from our model, and is modeled by setting the input from head direction cells to the retrosplenial transformation circuit (RSC/TR) to zero.

In the bottom-up mode of operation (perception), the lesion removes drive to the transformation circuit and consequently to the boundary vector cells and object vector cells. That is, the perceived location of an object (present in the egocentric parietal representation) cannot elicit activity in the MTL and thus cannot be encoded into memory (Figure 7). Some residual MTL activity reflects input from perirhinal neurons representing the identity of perceived familiar boundaries (i.e. recognition mediated by perirhinal cells is spared). In the top-down mode of operation (recall) there are two effects: (i) Since no new elements can be encoded into memory, post-lesion events cannot be recalled (anterograde amnesia; Simulation 1.1, Figure 7A, Video 5); and (ii) For pre-existing memories (e.g. of an object encountered prior to the lesion), place cells (and thus the remaining MTL populations) can be driven via learned connections from perirhinal neurons (e.g. when cued with the object identity; Simulation 1.2, Figure 7B, Video 6), but no meaningful egocentric representation can be instantiated in parietal areas, preventing episodic recollection/imagery. Equating the absence of parietal activity with the inability to recollect is strongly suggested by the fact that visuo-spatial imagery in humans relies on access to an egocentric representation (as in hemispatial representational neglect; Bisiach and Luzzatti, 1978). Simulations 1.1 and 1.2 show that the egocentric neural correlates of objects and boundaries present in the visual field persist in the parietal window only while the agent perceives them (they could also be held in working memory, which is not modelled here). Note that perirhinal cells and upstream ventral visual stream inputs are spared, so that an agent could still report the identity of the object.

Papez’ circuit lesions.

(A) In the bottom-up mode of operation (perception), a lesion to the head direction circuit removes drive to the transformation circuit and consequently to the boundary vector cells (BVCs) and object vector cells (OVCs). A perceived object (present in the egocentric parietal representation, PWo) cannot elicit activity in the MTL and thus cannot be encoded into long-term memory, causing anterograde amnesia. Place cells fire at random locations, driven by perirhinal neurons. (B) For memories of an object encountered before the lesion, place cells can be cued by perirhinal neurons, and pattern completion recruits associated OVC, BVCs and perirhinal neurons, but no meaningful representation can be instantiated in parietal areas, preventing episodic recollection/imagery (retrograde amnesia for hippocampus-dependent memories).

https://doi.org/10.7554/eLife.33752.012

Quantification, robustness to noise and neuron loss

Figure 8A shows correlations between population vectors of neural patterns during imagery/recall and those during encoding for Simulation 1.0 (Object-cued recall; Figure 5). OVCs and PCs exhibit correlation values close to one, indicating faithful reproduction of patterns. BVC correlations are somewhat diminished because recall reactivates all boundaries fully, compared to a field of view of 180 degrees during perception with limited reactivation of cells representing boundaries outside the field of view. PW neurons show correlations below one because at recall reinstatement in parietal areas requires the egocentric-allocentric transformation (i.e. OVC signals passed through retrosplenial cells), which blurs the pattern compared to perceptual instatement in the parietal window (i.e. imagined representations are not as precise as those generated by perception).

Correlation of neural population vectors between recall/imagery and encoding.

(A) In the intact model, OVCs and place cells exhibit correlation values close to one, indicating faithful reproduction of patterns. (B) Random neuron loss (20% of cells in all populations except for the head direction ring). (C) The effect of firing rate noise. Noise is also applied to all 20 retrosplenial transformation circuit sublayers (as is neuron loss; correlations not shown for clarity). Firing rate noise is implemented as excursions from the momentary firing rate as determined by the regular inputs to a given cell (up to peak firing rate). The amplitudes of perturbations are normally distributed (mean 20%, standard deviation 5%) and applied multiplicatively at each time step). White bars show the correlation between the neural patterns at encoding vs recall (RvE), while black bars show the average correlation between the neural patterns at recall vs pattern sampled at random times/locations (here every 100 ms; RvRP). Each bar is averaged over 20 separate instances of the same simulation (with newly drawn random numbers). Error bars indicate standard deviation across simulations.

https://doi.org/10.7554/eLife.33752.013

To test the model’s robustness with regard to firing rate noise and neuron loss, we perform two sets of simulations (modifications of Simulation 1.0, object-cued recall). In the first set we randomly chose cells in equal proportions in all model areas (except HDCs) to be permanently deactivated and assess recall into visuo-spatial imagery. Up to 20% of the place cells, grid cells, OVCs, BVCs, parietal and retrosplenial neurons were deactivated. Head direction cells were excluded because of the very low number simulated (see below). Although we do not attempt to model any specific neurological condition, this type of simulation could serve as a starting point for models of diffuse damage, as might occur in anoxia, Alzheimer’s disease or aging. The average correlations between the population vectors at encoding versus recall are shown in Figure 8B.

The ability to maintain a stable attractor state among place cells and head direction cells is critical to the functioning of the model, while damage in the remaining (feed-forward) model components manifests in gradual degradation in the ability to represent the locations of objects and boundaries (see accompanying Video 3). For example, if certain parts of the parietal window suffer from neuron loss, the reconstruction in imagery is impaired only at the locations in peri-personal space encoded by the missing neurons (indeed, this can model representational neglect; Byrne et al., 2007), see also Pouget and Sejnowski, 1997). The place cell population was more robust to silencing than the head-direction population (containing only 100 neurons), simply because greater numbers of neurons were simulated, giving greater redundancy. As long as a stable attractor state is present, the model can still encode and recall meaningful representations, giving highly correlated perceived and recalled patterns (Figure 8B).

Video 3
This video shows the same scenario as Video 2 (object-cued recall), however, with 20% randomly chosen lesioned cells per area.

The agent moves in a familiar environment and encounters a novel object. The agent approaches the object and encodes it into long-term memory. Upon navigating past the object, the agent initiates recall, reinstating patterns of neural activity similar to the patterns present during the original object encounter. Please see caption of Video 1 for abbreviations.

https://doi.org/10.7554/eLife.33752.014

The model is also robust to adding firing rate noise (up to 20% of peak firing rate) to all cells. Correlations between patterns at encoding and recall remain similar to the noise-free case, see Figure 8C. Videos 3 and 4 show an instance from the neuron-loss and firing rate noise simulations respectively.

Video 4
This video shows the same scenario as Video 2 (object-cued recall), however, with firing rate noise applied to all neurons (max. 20% of peak rate).

The agent moves in a familiar environment and encounters a novel object. The agent approaches the object and encodes it into long-term memory. Upon navigating past the object the agent initiates recall, reinstating patterns of neural activity similar to the patterns present during the original object encounter. Please see caption of Video 1 for abbreviations. 

https://doi.org/10.7554/eLife.33752.015
Video 5
This video shows a visualization of the simulated neural activity as the agent encounters an object and subsequently tries to engage recall similar to Simulation 1.0 (Video 2).

However, a lesion to the head direction system (head direction cells are found along Papez' circuit) precludes the agent from laying down new memories, because the transformation circuit cannot drive the medial temporal lobe. That is the transformation circuit cannot instantiate OVC/BVC representations derived from sensory input for subsequent encoding, leading to anterograde amnesia in the model agent (see main text). Please see caption of Video 1 for abbreviations. 

https://doi.org/10.7554/eLife.33752.016
Video 6
This video shows a visualization of the simulated neural activity as the agent moves through an empty environment and tries to engage recall of a previously present object.

A lesion to the head direction system (head direction cells are found along Papez' circuit) has been implemented similar to Simulation 1.1 (Video 5). The agent is supplied with the connection weights learned in Simulation 1.0 (Video 2), where it has successfully memorized a scene with an object. That is, the agent has acquired a memory before the lesion. However, even though cueing with the object re-activates the correct medial temporal representations, due to the lesion no reinstatement in the parietal window cannot occur, leading to retrograde amnesia for hippocampus-dependent memories in the model agent. Note, it is hypothesized that a cognitive agent only has conscious access to the egocentric parietal representation, as suggested by hemispatial representational negelct (Bisiach and Luzzatti, 1978) (see main text). Please see caption of Video 1 for abbreviations. 

https://doi.org/10.7554/eLife.33752.017

Novelty detection (Simulations 1.3, 1.4)

In the model, hippocampal place cells bind all scene elements together. The locations of these scene elements relative to the agent are encoded in the firing of boundary vector cells (BVCs) and object vector cells (OVCs). Rats show a spontaneous preference for exploring novel/altered stimuli compared to familiar/unchanged ones. We simulate one of these experiments (Mumby et al., 2002), in which rats preferentially explore one of two objects that has been shifted to a new location within a given environment, a behavior impaired by hippocampal lesions. In Simulations 1.3 and 1.4, the agent experiences an environment containing two objects, one of which is later moved. We define a mismatch signal as the difference in firing of object vector cells during encoding versus recall (modelled as imagery, at the encoding location), and assume that the relative amounts of exploration would be proportional to the mismatch signal.

With an intact hippocampus (Figure 9; Video 7), the moved object generates a significant novelty signal, due to the mismatch between recalled (top-down) OVC firing and perceptual (bottom-up) OVC firing at the encoding location. That detection of a change in position requires the hippocampus is consistent with place cells binding the relative location of an object (via object vector cells) to perirhinal neurons signalling the identity of an object.

Detection of moved objects via OVC firing mismatch.

(A) Two objects are encoded from a given location (left). After encoding, object one is moved further North. When the agent returns to the encoding location, the perceived position of object one differs from that at encoding (blue line, middle panel). When the agent initiates recall (right) the perceived location of object 1 (green filled circle) and its imagined location (end point of blue line) differ. (B) PC activity is the same in all three circumstances, that is PC activity alone is insufficient to tell which object has moved. (C-D) The perceived location as represented by OVCs during perception (C; objects 1 and 2 sampled sequentially at times T1, T2) and during recall (D; objects 1 and 2 sampled sequentially at times T3, T4). Blue circle in panel D indicates the previously perceived position of object 1. Inset bar graphs show the concurrent activity of perirhinal cells (PRo). (E) The mismatch in OVC firing results in near zero correlation between encoding and recall patterns for object 1 (black bar), while object 2 (white bar) exhibits a strong correlation, so that object one would be preferentially explored. Note, the correlation for object two is less than 1 because of the residual OVC activity of the other object (secondary peaks in both panels in D, driven by learned PC-to-OVC connections). (F) A hippocampal lesion removes PC population activity, so that OVC activity is not anchored to the agent’s location at encoding. (G-H) An incidental match between learned and recalled OVC patterns can occur for either object at specific locations (red arrow heads in second panel in G), but otherwise mismatch is signaled for both objects equally and neither object receives preferential exploration.

https://doi.org/10.7554/eLife.33752.018
Video 7
This video shows a visualization of the simulated neural activity in a reproduction of the object novelty paradigm of Mumby et al., 2002; detecting that one of two objects has been moved).

The agent is faced with two objects and encodes them (sequentially) into memory. Following some behavior one of the two objects is moved. Note, in real experiments the animal is removed for this manipulation. In simulation, this is unnecessary. Once the agent has returned to location of encoding, it is faced with the manipulated object array. The agent then initiates recall for objects one and two in sequence. The patterns of OVC re-activation can be compared to the corresponding patterns during perception (population vectors correlated, see main text). For the moved object, the comparison signals a change (near zero correlation). That object would hence be preferentially explored by the agent, and the next movement target for the agent is set accordingly (see main text). Please see caption of Video 1 for abbreviations. 

https://doi.org/10.7554/eLife.33752.019

Hippocampal lesions are implemented by setting the firing rates of hippocampal neurons to zero. A hippocampal lesion (Figure 9; Video 8) precludes the generation of a meaningful novelty signal because the agent is incapable of generating a coherent point of view for recollection, and the appropriate BVC configuration cannot be activated by the now missing hippocampal input. Connections between object vector cells and perirhinal neurons (see Figure 3D) can still form during encoding in the lesioned agent. Thus some OVC activity is present during recall due to these connections. However, this activity is not location specific. Without the reference frame of place cells and thence BVC activity this residual OVC activity during recall can be generated anywhere (see Figure 9F–H). It only tells the agent that it has seen this object at a given distance and direction, but not where in the environment it was seen. Hence, the mismatch signal is equal for both objects, and exploration time would be split roughly evenly between them. However, if the agent happens to be at the same distance and direction from the objects as at encoding, then perceptual OVC activity will match the recalled OVC activity (Figure 9G,H), which might correspond to the ability of focal hippocampal amnesics to detect the familiarity of an arrangement of objects if tested from the same viewpoint as encoding (King et al., 2002; but see also Shrager et al., 2007).

Video 8
should be compared to Video 7.

It shows a reproduction of the object novelty paradigm of Mumby et al., 2002; detecting that one of two objects has been moved). The agent is faced with two objects and encodes an association between relative object location (signaled by OVCs) and object identity (signaled by perirhinal neurons) - see Video 7 for encoding phase. Due to the hippocampal lesion, these associations cannot be bound to place cells. Once one of the two objects is moved (compare to Simulation 1.3) the agent initiates recall and the patterns of OVC re-activation are compared to the corresponding patterns during perception (population vectors correlated, see main text). Recall is initiated at two distinct locations to highlight the following effect of the lesion: Since associations between OVCs and perirhinal neurons are not bound to a specific environmental location a comparison of OVC patterns between perception and recall signals mismatch everywhere for both objects except for the two special locations at which imagery is engaged in the video. At each of those locations, the neural pattern due to the learned association happens to coincide with the pattern during perception for one of the two objects. Hence no object can be singled out for enhanced exploration. Match and Mismatch is signaled equally for both objects (see main text). Please see caption of Video 1 for abbreviations. 

https://doi.org/10.7554/eLife.33752.020

Rats also show preferential exploration of a familiar object that was previously experienced in a different environment, compared with one previously experienced in the same environment, and this preference is also abolished by hippocampal lesions (Mumby et al., 2002; Eacott and Norman, 2004; Langston and Wood, 2010). We have not simulated different environments (using separate place cell ensembles), but note that ‘remapping’ of PCs between distinct environments (i.e. much reduced overlap of PC population activity; e.g. Bostock et al., 1991; Anderson and Jeffery, 2003; Wills et al., 2005) suggests a mismatch signal for the changed-context object would be present in PC population vectors. Initiating recall of object A, belonging to context 1, in context 2, would re-activate the PC ensemble belonging to context 1, creating an imagined scene from context one which would mismatch the activity of PCs representing context two during perception. A hippocampal lesion precludes such a mismatch signal by removing PCs.

Finally, it has been argued that object recognition (irrespective of context) is spared after hippocampal but not perirhinal lesions (Aggleton and Brown, 1999; Winters et al., 2004; Norman and Eacott, 2004) which would be compatible with the model given that its perirhinal neuronal population signals an object’s identity irrespective of location.

‘Top-down’ activity and trace responses (Simulations 2.1, 2.2)

Simulations 1.3 and 1.4 dealt with a moved object. Similarly, if a scene element (a boundary or an object) has been removed after encoding, probing the memorized MTL representation can reveal trace activity reflecting the previously encoded and now absent boundary or object.

Section (Bottom-up vs top-down modes of operation) summarizes how top-down and bottom-up phases are implemented by a modulation of connection strengths (see Figures 1 and 4, Materials and methods section Embedding Object-representations into a Spatial Context: Attention and Encoding, and Appendix). During perception, the ‘top-down’ connections from the MTL to the transformation circuit and thence to the parietal window are reduced to 5% of their maximum strength, to ensure that learned connections do not interfere with on-going, perceptually driven activity. During imagery, the ‘bottom-up’ connections from the parietal window to the transformation circuit and thence to the MTL are reduced to 5 percent of their maximum strength.

In rodents, it has been proposed that encoding and retrieval are gated by the theta rhythm (Hasselmo et al., 2002): a constantly present modulation of the local field potential during exploration. In humans, theta is restricted to shorter bursts, but is associated with encoding and retrieval (Düzel et al., 2010). If rodent theta determines the flow of information (encoding vs retrieval) then it may be viewed as a periodic comparison between memorized and perceived representations, without deliberate recall of a specific item in its context (that is, without changing the point of view). In Simulations 2.1 and 2.2, we implement this scenario. There is no cue to recall anything specific, regular sensory inputs are continuously engaged, and we periodically switch between bottom-up and top-down modes (at roughly theta frequency) to allow for an on-going comparison between perception and recall. Activity due to the modulation of top-down connectivity during perception propagates to the parietal window representations (PWb/o), allowing for a detection of mismatch between sensorily driven and imagery representations.

In Simulation 2.1, the agent has a set of MTL weights which encode the contextual representation of a square room with an inserted barrier (i.e. a barrier was present in the training phase). However, when the agent explores the environment, the barrier is absent (Figure 10A). Due to the modulation of top-down connectivity, the memory of the barrier (in form of BVC activity) periodically bleeds into the parietal representation during perception (Figures 10B1, 2 and 3 and Video 9). The resultant dynamics carry useful information. First, letting the memory representation bleed into the perceptual one allows an agent, in principle, to localize and attend to a region of space in the egocentric frame of reference (as indicated by parietal window activity) where a change has occurred. A mismatch between the perceived (low bottom-up gain) and partially reconstructed (high bottom-up gain) representations, can signal novelty (compare to Simulations 1.3, 1.4), and could underlie the production of memory-guided attention (e.g. Moores et al., 2003; Summerfield et al., 2006). Moreover, the theta-like periodic modulation of top-down connectivity causes the appearance of ‘trace’ responses in BVC firing rate maps, indicating the location of previously encoded, now absent, boundary elements (Figures 10C1 and 2)

‘Top-down’ activity and ‘trace’ responses.

(A) An environment containing a small barrier (red outline) has been encoded in the connection weights in the MTL, but the barrier has been removed before the agent explores the environment again. (B) Activity snapshots for PWb (B1), BVC (B2) and PC (B3) populations during exploration. The now absent barrier is weakly represented in parietal window activity due to the periodic modulation of top-down connectivity during perception, although ‘bottom-up’ sensory input due to visible boundaries still dominates (see main text). (C1) High gain for top-down connections yields BVC firing rate maps with trace fields due to the missing boundary. Left: BVC firing rate map. Right: An illustration of the BVC receptive field (teardrop shape attached to the agent at a fixed allocentric direction and distance) with the agent shown at two locations where the cell in the left panel fires maximally. (C2) Same as C1 for a cell whose receptive field is tuned to a different allocentric direction. (D1) Similarly to the missing boundary in A, a missing object (small red circle) can produce ‘trace’ firing in an OVC (D2). Every time the agent traverses the location from which the object was encoded (large red circle in D1), learned PC-to-OVC connections periodically reactivate the associated OVC. (D3) The same PCs also re-activate the associated perirhinal identity cell (PRo), yielding a spatial trace firing field for a nominally non-spatial perirhinal cell (red circle).

https://doi.org/10.7554/eLife.33752.021
Video 9
This video shows a visualization of the simulated neural activity as the agent moves in a familiar environment.

However, a previously present boundary has been removed. The agent is supplied with a periodic (akin to rodent theta) modulation of the top-down connection weights (please see main text). The periodic modulation of these connections allows for a probing of the memorized spatial context without engaging in full recall and reveals the memory of the environment to be incongruent with the perceived environment. BVC activity due to the memorized (now removed) boundary periodically 'bleeds' into the egocentric parietal window ¨representation, in principle allowing the agent to attend to the part of environment which has undergone change (location of removed boundary). Time integrated neural activity from this simulation yields firing rate maps which show traces of the removed boundary (see Figure 10 in the manuscript). Note, the video is cut after 1 min to reduce filesize. The full simulation covers approximately 300 s of real time. Please see caption of Video 1 for abbreviations. 

https://doi.org/10.7554/eLife.33752.022

Simulation 2.2 (Figures 10D1,D2 and Video 10) shows similar’ trace’ responses for OVCs. The agent has a set of MTL weights which encode the scene from Simulation 1.0 (Figure 5) where it encountered and encoded an object. The object is now absent (small red circle in Figure 10D1), but the periodic modulation of top-down connectivity reactivates corresponding OVCs, yielding trace fields in firing rate maps. This activity can bleed into the parietal representation during perception (e.g. at simulation time 9:40-10:00 in Video 6), albeit only when the location of encoding is crossed by the agent, and with weaker intensity than missing boundary activity (the smaller extent of the OVC representation leads to more attenuation of the pattern as it is processed by the transformation circuit).

Video 10
This video shows a visualization of the simulated neural activity as the agent moves in a familiar environment.

However, a previously present (and encoded) object has been removed. The agent is supplied with a periodic (akin to rodent theta) modulation of the top-down connection weights (please see main text). The periodic modulation of these connections allows for a probing of the memorized spatial context. With every passing through the encoding location OVC activity (reflecting the now removed object) and perirhinal activity is generated by place cells covering the encoding location. This re-activation yields firing rate maps which show traces of the removed object in OVCs, and induces a spatial firing field for the nominally non-spatially selective perirhinal neuron (compare to Figure 10 in the manuscript). Note, the video is cut after 1 min to reduce filesize. The full simulation covers approximately 300 s of real time. Please see caption of Video 1 for abbreviations. 

https://doi.org/10.7554/eLife.33752.023

Interestingly, perirhinal identity neurons, which normally fire irrespective of location, can appear as spatially selective trace cells due to the periodic modulation of top-down connectivity at the location of encoding. Figure 10D3 shows the firing rate map of a perirhinal identity neuron. Every time the memorized representation is probed (high top-down gain), if the agent is near the location of encoding, the learned connection from PCs to perirhinal cells (PRo) lead to perirhinal firing for the absent object, yielding a spatial trace firing field for this nominally non-spatial cell.

The presence of some memory-related activity during nominally bottom-up (perceptual) processing can have benefits beyond the assessment of change discussed above. For instance, additional activity in the contextual representations (BVCs, PC, PRb neurons) due to pattern completion in the MTL can enhance the firing of BVCs coding for scene elements outside the current field of view. This activity can propagate to the PW, as is readily apparent during full recall/imagery (Figure 5) but is also present in weaker form during perception. Such activity may support awareness of our spatial surrounding outside of the immediate field of view, or may enhance perceptually driven representations when sensory inputs are weak or noisy.

Sampling multiple objects in imagery (Simulation 3.0)

Humans can focus attention on different elements in an imagined scene, sampling one after another, without necessarily adopting a new imagined viewpoint. This implies that the set of active PCs need not change while different objects are inspected in imagery. Moreover, humans can localize an object in imagined scenes and retrieve its identity (e.g. ‘What was next to the fireplace in the restaurant we ate at?”).

In Simulation 1.0 (encoding and object-cued recall, Figure 5), in addition to connection weights from perirhinal (PRo) neurons to PCs and OVCs, the reciprocal weights from OVCs to PRo neurons were also learned. These connections allow the model to sample and inspect different objects in an imagined scene. To illustrate this we place two objects in a scene and allow the agent to encode both visible objects from the same point of view. Encoding still proceeds sequentially. That is, our attention model first samples one object (boosting its activity in the PW) and then the other.

We propose that encoded objects that are not currently the focus of attention in imagery can attract attention by virtue of their residual activity in the parietal window (weak secondary peak in the PWo population in Figure 11B). Thus, any of these targets can be focused on by scanning the parietal window and shifting attention to the corresponding location (e.g. ‘the next object on a table’). Boosting the drive to such a cluster of PWo cells in the parietal window leads to corresponding activity in the OVC population (via the retrosplenial transformation circuit). The learned connection from OVCs to perirhinal PRo neurons will then drive PRo activity corresponding to the object which, at the time of encoding, was at the location in peripersonal space which is now the new focus of attention. Mutual inhibition between PRo neurons suppresses the previously active PRo neuron. The result is a top-down drive of perirhinal neurons (as opposed to bottom-up object recognition), which allows inferring the identity of a given object. That is, by shifting its focus of attention in peripersonal space (i.e. in the parietal window) during imagery the agent can infer the identity of scene elements which it did not initially recall.

Inspecting scene elements in imagery.

The agent encounters two objects. (A) Activity in PWo (left) and OVCs (right) populations when the agent is attending to one of the two objects during encoding. Both objects are encoded sequentially from the same location (time index 0.22 in Video 11). The agent then moves past the objects. (B) Imagery is engaged by querying for object 1, raising activity in corresponding PRo neurons (far right) and switching into top-down mode (similar to Simulation 1.0, Figure 5 and Video 2), leading to full imagery from the point of view at encoding. Residual activity in the OVC population at the location of object 2 (encoded from the same position, that is driven by the same place cells) translates to weak residual activity in the PWo population. (C) Applying additional current (i.e. allocating attention) to the PWo cells showing residual activity at the location of object 2 (leftmost blue arrow) and removing the drive to the PRo neuron corresponding to object 1 (because the initial query has been resolved) leads to a build-up of activity at the location of object two in the OVC population (blue arrow between PWo and OVC plots). By virtue of the OVC to PRo connections (blue between OVC and Pro plots), the PRo neuron for object two is driven (and inhibits PRo neuron 1, right-most blue arrows). Thus, the agent has inferred the identity of object 2, after having initiated imagery to visualize object 1, by paying attention to its egocentric location in imagery.

https://doi.org/10.7554/eLife.33752.024

Figure 11 and Video 11 show sequential (attention-based) encoding, subsequent recall and attentional sampling of scene elements. The agent sequentially encodes two objects from one location (Figure 11A), moves on until both objects are out of view, and engages imagery to recall object one in its spatial context (Figure 11B). The agent can then sample object two by allocating attention to the secondary peak in the parietal window (boosting the residual activity by injecting current in the PWo cells corresponding to the location of object 2). This activity spreads back to the MTL network, via OVCs, driving the corresponding PRo neuron (Figure 11C). Thus, the agent infers the identity of object 2, by inspecting it in imagery. Attention ensures disambiguation of objects at encoding, while reciprocity of connections in the MTL is necessary to form a stored attractor in spatial memory.

Video 11
This video shows a visualization of the simulated neural activity as the agent sequentially encodes two objects into long-term memory.

Upon navigating past the objects the agent initiates recall, cueing with the first object. The OVC representations of both objects are bound to the same place cells. These place cells thus generate a secondary peak in the OVC representation corresponding to the non-cued object. This activity propagates to the parietal window. Allocating attention to this secondary peak in the egocentric parietal representation (i.e. injecting current), propagates back to OVCs, which then drive the perirhinal cells for the non-cued object. That is, the agent infers the identity of the second object which is part of the scene (see main text). Please see caption of Video 1 for abbreviations. 

https://doi.org/10.7554/eLife.33752.025

Grid cells and mental navigation (Simulation 4.0)

The parietal window neurons encode the perceived spatial layout of an environment, in an egocentric frame of reference, as an agent explores it (i.e. a representation of the current point of view). In imagery, this viewpoint onto a scene is reconstructed from memory (top-down mode as opposed to bottom-up mode). We refer to mental navigation as internally driven translation and rotation of the viewpoint in the absence of perceptual input. In Simulation 4.0, we let the agent encode a set of objects into memory and then perform mental navigation with the help of grid and head direction cells.

Grid cell (GC) firing is thought to update the location represented by place cell firing, driven by signals relating to self-motion (O'Keefe and Burgess, 2005; McNaughton et al., 2006; Fuhs and Touretzky, 2006; Solstad et al., 2006). During imagination, we suppose that GC firing is driven by mock motor-efference signals (i.e. imagined movement without actual motor output) and used to translate the activity bump on the sheet of place cells. Pattern completion in the MTL network would then update the BVC population activity accordingly, which will spread through the transformation circuit and update parietal window activity. That is, mock motor efference could smoothly translate the viewpoint in imagery (i.e. scene elements represented in the parietal window flow past the point of view of the agent). Similarly, mock rotational signals to the head direction attractor could rotate the viewpoint in imagery. Both together are sufficient to implement mental navigation.

GCs are implemented heuristically, approximating the output of more sophisticated models (e.g., Burgess et al., 2007; Burak and Fiete, 2009; Bush and Burgess, 2014). Firing rate maps for 7 modules of 100 cells each are pre-calculated (see Appendix), providing the firing rates of GCs as a function of location. GC to PC weights are pre-calculated as Hebbian associations (to simulate a familiar environment), where the connection strength is maximal if the center of a PC’s receptive field coincides with (one of) the GC’s firing peaks. During (bottom-up) perception and navigation, GC input provides a small contribution to PC activity, which is mainly determined by BVC inputs (O'Keefe and Burgess, 1996; Hartley et al., 2000; Lever et al., 2009), to highlight the ability of the model to self-localize based on sensory inputs. Stronger grid cell input simply makes the location estimate more stable without detriment to the model. In the absence of reliable sensory information strong GC inputs are required to make PCs fire reliably (Bush et al., 2014; Poucet et al., 2014; Evans et al., 2016). Imagery is an extreme case of this situation, where no sensory input is provided to PCs. Consequently, GC input is up-regulated during imagery (similar to other connections in the switch from bottom-up to top-down modes), constituting a major input to PCs. This GC input can then translate the agent’s viewpoint in imagery (via their effect on PCs) without directly affecting the transformation circuit.

Figure 12 and Video 12 show an example of mental navigation. The agent approaches three objects in sequence, encodes them into memory and then initiates recall cued by object 1. From that (imagined) location, it initiates mental navigation in a straight line. GCs shift the PC activity bump along the trajectory. The allocentric boundary representation (BVCs) follows the shifting PCs (due to pattern completion) and the retrosplenial transformation circuit (RSC/TR) translates the shifting BVC representation into a shifting (i.e. ‘flowing past the agent’) egocentric representation of boundary distance (imagery of motion in an imagined scene, not shown in Figure 12, however, see Video 12). Importantly, the imagined trajectory takes the agent through the area of space at which object three was encoded, however this time coming from a novel direction. The transformation circuit nevertheless instantiates the correct activity in the parietal window for object 3, making it appear to the agent’s right, instead of to it’s left (as during its original encoding, coming from object 2). Not only does the object populate the imagined scene as the agent mentally navigates past it, the event also generates an imagined representation which has never been experienced by the agent.

Translating an established BVC pattern due to updated perceptual input (in response to real motion) also translates the PC activity bump. In fact, this is how perceptual information updates the estimate of position (self-localization) in a familiar environment (PWb→RSC→BVC→PC) in the model. Similarly, shifting the activity pattern across PCs via GCs in mental navigation can update the parietal window (PWb) during mental navigation (GC→PC→BVC→RSC→PWb). With this account of mental exploration of different routes (including potentially novel imagined experiences; see next section), the model provides a neural implementation of important aspects of ‘scene construction’ (Hassabis et al., 2007) and ‘episodic future thinking’ (Schacter et al., 2007), although these concepts also extend beyond the capabilities of the model (see Discussion). The inclusion of GCs allows for a parsimonious account of mental navigation in humans, consistent with observation of grid-like activity during imagined movement through familiar environments (Bellmund et al., 2016; Horner et al. 2016).

Mental navigation with grid cells.

Left to right: allocentric agent position (black triangle) and recent trajectory (black dashed line); PWo, OVC, and PC population snapshots; GC input to PCs (i.e. GC firing rates multiplied by connection weights from GCs to PCs); PRo neurons. The rightmost panel indicates which objects have been encoded. (A) The agent is exploring the environment and has just encoded the second object into memory (right-most bar chart). Object one has been encoded near the start of the trajectory. (B) After encoding the third object and moving past it, the agent initiates imagery, recalling object one in its spatial context (top-down mode) from a point of view West of object 1, facing East (red triangle). (C) Mock motor efference shifts GC activity (dashed arrow on GC input to PCs) and thence drives the PC activity bump representing (imagined) agent location. The allocentric (BVCs) and egocentric (PWb) boundary representation follow suit (see main text and Video 12). As the PC activity bump passes the location at which object 3 was encoded, corresponding OVC activity is elicited by learned connections (and is transformed into PWo activity (solid orange arrows indicate GCs updating PCs, PCs updating OVCs, etc). NB object 3 appears in the reconstructed scene ahead-right of the agent (PWo snapshot, second panel), despite being encoded ahead-left of the agent when it moved southwards from object 2 toward object 3. The corresponding perirhinal neuron is also driven to fire by PCs (orange arrow in PRo panel).

https://doi.org/10.7554/eLife.33752.026

Shortcutting and ‘preplay-like’ activity (Simulation 5.0)

It is a small step from imagined movement to planned navigation. GCs have been suggested to compute the vector to a goal location from the current location (see Kubie and Fenton, 2012; Erdem and Hasselmo, 2012; Bush et al., 2015; Stemmler et al., 2015), a capability necessary to explain the ability to take a shortcut across previously unexplored territory (Tolman, 1948). We propose that this ability is based on mental (vector-based) navigation supported by GCs. In Simulation 5, we let the agent explore a novel part of the environment, extending a pre-existing representation of a spatial context. Simulation 5.0 consists of three distinct phases: planning movement across a previously unvisited area to a reward location (phase 1); actual navigation of this shortcut (phase 2); and finally mental navigation across the now familiar area (phase 3).

In phase 1, the agent generates a trajectory along the shortest path to the goal using GCs (i.e. a straight line where the barriers happened to be in the way, Figure 13B). However, unlike in Simulation 4.0 (Figure 12), this process differs from mental navigation since the unexplored part of the environment is devoid of any meaningful PC-BVC connections and so a scene cannot be generated in the parietal window (PWb). Extending the medial temporal lobe (MTL) representations requires incorporating additional place cells into the MTL attractor. These (future) place cells are referred to as ‘reservoir cells’ and have no relationship to physical space yet, so visualizing their firing rates in a topographic manner is not possible. However, as the agent generates a trajectory towards its goal using GCs, sparse random GC-to-PC connections cause a subset of the reservoir cells to fire (Figure 13B and Video 13). The activity of reservoir cells does not form an attractor bump, as PC-PC connections have not been learned, but their firing is normalised to a level of activity similar to when an attractor bump is present (implemented by an adaptive feedback current, see Appendix). In Figure 13B (rightmost panel) reservoir cells are ordered according to their time of maximum firing along the imagined trajectory.

Planning, taking and imaging a trajectory across an unexplored area.

The agent is located in an environment where the direct trajectory between two salient locations (purple dots, left column) covers an unexplored part of the environment. PCs potentially firing in the unexplored area (‘reservoir cells’) receive only random connections from GCs (see unstructured grid cell input in column 5). Left to right panel columns: allocentric agent position (triangle); PWb, BVC, PC population rates; GC inputs to PCs (see Figure 12); and (B-D only) firing of ‘reservoir’ PCs along the trajectory (x axis), stacked and ordered by time of peak firing along the trajectory (y axis). (A) Starting situation. (B) Phase 1; imagined movement across the obstructed space leads to preplay-like activity in reservoir PCs (rightmost panel). Red arrow indicates the reservoir PCs are driven by grid cells. No egocentric representation can be generated from BVCs because ‘reservoir’ PCs have no connections to BVCs, that is they are not yet part of the MTL attractor. (C) Phase 2; the barrier is removed and the agent navigates the trajectory in real space. GCs again drive PCs (thick grey arrow), so the temporal sequence of reservoir cell activity in (A) is recapitulated in the spatial sequence of PC activity. Sensory inputs drive the PW (bottom-up mode) and hence BVCs (grey arrow between panels 2 and 3). Hebbian learning proceeds between PCs and BVCs (dashed grey line), and from GCs to PCs (reinforcing the drive from GCs to PCs, grey arrow between panels 4 and 5). (D) Phase 3; having traversed the novel part of the environment, the agent initiates imagery and performs mental navigation along the newly learned trajectory. The learned connections now instantiate the correct BVC and PW activity in top-down mode (orange arrows indicating flow of information, similar to Figure 12).

https://doi.org/10.7554/eLife.33752.027
Video 12
This video shows a visualization of the simulated neural activity as the agent performs a complex trajectory and encodes three objects into long-term memory along the way.

Upon navigating past the third object the agent initiates recall, cueing with the first object, and subsequently performs mental navigation (imagined movement in visuo-spatial imagery) with the help of grid cells. Grid cells update the place cell representation along the trajectory. The egocentric parietal representation is updated along the imagined trajectory, that is scene elements are flowing past the point of view of the agent. Note, the imagined trajectory does not correspond to a previously taken route. Nevertheless, when the imagined trajectory takes the agent past the encoding location of object 3, it is instantiated in the OVC and PWo representations (see main text). Grid cell firing rates are shown multiplied by their connection weights to place cells. Please see caption of Video 1 for further abbreviations.

https://doi.org/10.7554/eLife.33752.028
Video 13
This video shows a visualization of the simulated neural activity as the agent performs mental navigation across a blocked shortcut.

Newly recruited cells in the hippocampus exhibit activity reminiscent of preplay. Upon removal of the barrier the agent traverses the shortcut and associates the newly recruited hippocampal cells with the perceptually driven activity in the MTL. Subsequent mental navigation across the short cut yields activity in hippocampal cells reminiscent of replay (see main text). Grid cell firing rates are shown multiplied by their connection weights to place cells. Please see caption of Video 1 for further abbreviations.

https://doi.org/10.7554/eLife.33752.029

In phase 2, the barriers are removed and the agent performs the previously imagined trajectory in real space, using the novel shortcut. The same GCs are active along the trajectory and hence the same reservoir cells which fired before exploring the area, now fire in a spatial sequence along the actual trajectory. Since the agent is now actively perceiving its environment BVCs are driven in a bottom-up manner and Hebbian-like plasticity can strengthen connections between BVCs and reservoir cells as they fire along the trajectory (an analogous mechanism should also associate perirhinal neurons, which is omitted here). Hence, the reservoir cells have now effectively become place cells, with firing fields tied to the agent’s location in space (Figure 13C and Video 13). Figure 13C (rightmost panel) shows place cell activity along the trajectory during phase 2. Crucially, these cells are plotted in the order of activity shown during the previous imagined navigation (Figure 13B), indicating ‘pre-play-like’ behavior, in that the sequence of PC firing seen prior to first exploration is subsequently recapitulated during actual navigation (Figure 13C; Dragoi and Tonegawa, 2011; Ólafsdóttir et al., 2015).

Finally, in phase 3, the agent initiates imagery and performs mental navigation along the shortcut (i.e. recalls the episode of traversal), demonstrating that the MTL representation has been extended, and that a scene can be generated (Figure 13D and Video 13). The newly learned connections from reservoir PCs to BVCs complete the MTL representation of the spatial context and the transformation circuit reinstates the corresponding parietal window (PWb) representation (imagery, Figure 10D, panel 2).

The ability to plan a route by driving a sweep of PC activity with a sweep of GC activity via established GC-PC connections (Figures 1213) could relate to the observation of ‘forward sweeps’ of PC activity during navigation (Johnson and Redish, 2007; Pfeiffer and Foster, 2015) and ‘replay’ during rest (Wilson and McNaughton, 1994; Foster and Wilson, 2006; Diba and Buzsáki, 2007; Karlsson and Frank, 2009; Carr et al., 2011). However, both of these phenomena, and the ‘pre-play-like’ activity discussed above, occur at compressed timescales in experimental animals. Thus, modeling forward sweeps, replay or pre-play would require a spiking neuron model able to capture the faster time scale of the sharp-wave ripple events associated with replay and pre-play, and the theta sequences associated with forward sweeps (Burgess et al., 1994; Skaggs et al., 1995; Gupta et al., 2012).

Discussion

We propose a model of how sensory experiences, which are ultimately egocentric in nature, are transformed into viewpoint-invariant representations for long-term spatial memory in the medial temporal lobe (MTL) via processing in parietal and retrosplenial cortices. According to the model, imagery and recollection of scenes correspond to the re-construction of egocentric representations in parietal areas (the parietal window, PWb/o) from MTL representations. The MTL is the repository of viewpoint-invariant knowledge which is used to generate spatially coherent scenes by retrieving information consistent with perception from a single location and orientation. Pattern completion (via attractor dynamics) implements retrieval of a neural representation across the MTL, while head-direction cells enable the translation into egocentric coordinates via a retrosplenial transformation circuit, making use of gain-field neurons (Snyder et al., 1998; Galletti et al., 1995; Pouget and Sejnowski, 1997). Thus, for example, unilateral lesions in parietal regions could cause perceptual hemispatial neglect, and unilateral lesions to parietal or retrosplenial cortex could cause representational hemispatial neglect (in imagery) for a scene for which the MTL representation is complete (Bisiach and Luzzatti, 1978; see also Pouget and Sejnowski, 1997; Burgess et al., 2001a; Byrne et al., 2007).

The model can be used to account for human spatial cognition at the level of single neurons far from the sensory periphery: Place cells, head direction cells, gain-field neurons, boundary- and object-vector cells (BVCs and OVCs), and grid cells. Future work should try to integrate the present account of spatial cognition with recent progress concerning spatial coding in parietal areas (Nitz, 2006; Nitz, 2009; Nitz, 2012; Harvey et al., 2012; Whitlock et al., 2012; Raposo et al., 2014; Vedder et al., 2017), and a broader view of retrosplenial function (e.g., Alexander and Nitz, 2015; Alexander and Nitz, 2017). Notably, BVCs were predicted by an early predecessor of the present model (Hartley et al., 2000; Burgess et al., 2001a). Here, we have introduced OVCs to show how items introduced into a familiar environment may be coded for and incorporated into long-term memory. Intriguingly, OVC-like responses have been reported recently (Deshmukh and Knierim, 2013; Hoydal et al., 2017). We also explored how long-term memory might be probed to assess novelty. Finally, we incorporated grid cells and investigated their role in exploratory behavior and mental navigation. We can thus begin to frame abstract notions such as episodic future thinking and scene construction in terms of neural mechanisms, although we note that these concepts extend beyond our model to include completely fictional scenes/scenarios (Burgess et al., 2001a; Hassabis et al., 2007; Schacter et al., 2007).

Recall of objects in a spatial context

We have proposed that items/objects are associated to a given (spatial) context via place cells, which index the local sensory panorama, including local objects. Attaching representations of discrete objects (in the form of object vector cell activity) to a contextual representation via place cells aligns well with neuropsychological experiments that show position specificity in visual object memory (Hollingworth, 2007). In such experiments, object memory is superior when the target object is presented at the same position in the scene as it had been viewed originally (also see object novelty Simulations 1.3, 1.4). The hippocampus in particular has been implicated in combining information about objects, locations, and contexts (Warburton and Brown, 2010; Eacott and Gaffan, 2005; Barker and Warburton, 2015), consistent with the model. Similarly, studies suggest the hippocampus and precuneus are necessary for maintaining object-location binding even in working memory (Piekema et al., 2006; Olson et al., 2006).

The direction-independence of place cell firing in open environments implies that all possible local views at a given location could be associated with the corresponding place cells, potentially encompassing the boundaries in all directions around that location. Only by supplying head (or gaze) direction, and transforming the activity to the parietal window, a specific point of view can be represented. Note that given the anatomical loci of head direction cells along Papez circuit, the role of head direction as a modulatory factor in the egocentric-allocentric transformation (modeled as within retrosplenial cortex) provides a good explanation for impaired episodic memory resulting from Papez circuit lesions (Figure 7; Delay and Brion, 1969; Squire and Slater, 1978; 1989; Parker and Gaffan, 1997; Aggleton et al., 2016; Tsivilis et al., 2008). It also explains why permanent landmarks should evoke stronger responses in retrosplenial cortex (Auger et al., 2012), because permanent landmarks provide a more stable directional reference for the transformation circuit (see also Bicanski and Burgess, 2016). In summary, head direction cells likely serve to specify a direction of view, and not a movement direction (Raudies et al., 2015), which could instead be expressed in the firing phase of grid cells or place cells (Maurer et al., 2014; Cei et al., 2014).

The encoding strategy for objects allows an agent to reactivate the set of place cells which were active when the object was encountered and thus reconstruct the local view at encoding in the parietal window. This models the explicit recollection of a spatial scene populated with objects as an act of visuo-spatial imagery. It provides an explanation for the neural activity seen in the MTL, retrosplenial cortex and precuneus during imagery for familiar scenes (Burgess et al., 2001a; Hassabis et al., 2007; Schacter et al., 2007). The agent could also use the place cells activated during imagery as ‘goal cells’ and use grid cells to calculate a vector to navigate to the remembered location (Bush et al., 2015); not simulated here), accounting for the role of the MTL in goal-directed navigation (e.g. reviewed in Burgess et al., 2002).

The present model of explicit recall for items in context is a small step on the long road to understanding episodic memory at the neuronal level. However, not all memories for items requires reconstruction of a spatial scene. Recall of factual information (semantic memory) is not modeled, while memory for the attributes of an object irrespective of its context would require only perirhinal involvement. The BB-model only applies to imagery for coherent spatial scenes, and suggests that this is necessary for episodic recollection in which the past event is ‘re-experienced’ (Tulving, 1983), and certainly for remembering the spatial context of encountering an object.

Key components of the model are the ‘bottom-up’ transition from egocentric perceptual representations to allocentric MTL representations and the ‘top-down’ transition from MTL representations back to egocentric imagery. By informing perception in a top-down manner, the MTL can effectively predict perceptual input in familiar environments, allowing novelty detection and enhancing perception with remembered information. If we view imagery as a top-down reconstruction of perceptual representations, the MTL together with the retrosplenial transformation circuit could be seen as a generative model for scenes, consistent with generative models of memory such as (Káli and Dayan, 2001). It has been proposed that the bottom-up/top-down transition between encoding and retrieving occurs rhythmically at the frequency of the theta rhythm (Hasselmo et al., 1996; Burgess et al., 2001a; Hasselmo et al., 2002; Byrne et al., 2007; Douchamps et al., 2013). Theta might underlie a periodic probing of memorized representations; however, full recollection in imagery can last for long periods of time and need not correspond to specific phases of theta in humans (Düzel et al., 2010).

Mnemonic effects of newly learned connections and ‘trace cells’

We have proposed (Figure 10) that the relative strength of top-down and bottom-up connections can change smoothly and under control of the agent (e.g. via the release of a neuromodulator) to allow memory representations to influence neural activity during perception. This allows the agent to localize and attend to a region of space in the egocentric frame of reference where a given scene element used to be located, even if it has subsequently been moved, changed or removed. Moreover, the neural activity caused by increasing top-down connections can signal where the environment has changed. Interestingly, Tsao et al. (2013) recently reported ‘trace cells’ in lateral entorhinal cortex, whose firing reflects the previous presence of a now missing object, while related ‘mis-place’ cells have been reported in CA1 (O'Keefe, 1976).

We have shown that nominally non-spatially selective cells like perirhinal identity neurons can manifest a spatial trace firing field when re-activation occurs at the encoding location (Figure 10D3). This may help to reconcile the notion that lateral entorhinal cortex processes non-spatial information (Van Cauter et al., 2013; Hargreaves et al., 2005) with the spatial responses of trace cells (Tsao et al., 2013) in lateral entorhinal cortex. However, the trace cells of Tsao et al. (2013) do not fire when the object is present, but only in the subsequent absence of the object. Thus they might signal the mismatch between the remembered object and its absence, that is reflecting a comparison of perceptually driven and memory driven firing of the model perirhinal cells.

Finally, even in the absence of changes to the memorized spatial configuration, mnemonic representations can enhance perception, for example allowing the firing of cells coding for scene elements outside the current field of view. This activity is supported by pattern completion in the MTL, and may support people’s awareness of the presence of boundaries or objects outside of their field of view within a familiar environment.

Attention

Although we do not model the mechanistic origins of attention (see e.g. Itti and Koch, 2001), attentional modulation in the present model is crucial for unambiguous representations of multiple objects within a scene. If multiple objects are encoded from the same viewpoint, multiple OVC and perirhinal (PRo) neurons can be co-active, precluding the formation of a unique representation for each object-location conjunction, that is, precluding the solving the object-location binding problem. Thus, we require the objects to be sampled rhythmically and encoded sequentially in the parietal window (Figures 3 and 11), consistent with experimental literature suggesting rhythmic and sequential sampling (VanRullen et al., 2007; Landau and Fries, 2012; for review see VanRullen, 2013). If attentional cycles have a limited duration, then there may be insufficient time for activity to build up in the corresponding neuronal populations and support robust encoding into memory if there are too many objects within a scene, producing a capacity limit (see also Lisman and Idiart, 1995; Bays and Husain, 2008).

The attentional modulation described above can also act in imagery (within the parietal window), allowing the agent to inspect different parts of an imagined scene (see also Byrne et al., 2007). The model proposes that, in the absence of perceptual inputs, perirhinal neurons can be driven in a top-down fashion from hippocampus, thus reinstating an activity pattern in perirhinal cortex similar to the one present at encoding. Only the co-firing of these perirhinal neurons (PRo) and the corresponding BVCs and OVCs provides a unique representation of a given object in a given context, at a given location. The proposed binding of OVCs and PRo neurons, subject to attention, might also provide a functional interpretation of the hippocampus’ role in memory-guided attention (e.g., Summerfield et al., 2006).

Mental navigation, short-cutting, and planning

The model suggests a role for grid cell activity in human spatial cognition. Since both self-motion related inputs (via grid cells) and sensory inputs converge onto place cells, grid cells can update the point of view and allow an agent to translate its imagined location. If imagery can inform degraded perception (e.g. in the dark), obstacles can be identified and a suitable path can be planned. Thus, although mental navigation cannot be equated with path integration, we suggest that they reflect a common grid cell-dependent mechanism, which is required when sensory inputs are absent or unreliable. Indeed, humans likely make use of spatial imagery even in apparently non-visual tasks such as triangle completion in darkness (Tcheang et al., 2011), and there is evidence for grid-like brain activity during mental navigation (Bellmund et al., 2016; Horner et al. 2016).

The model of mental navigation provides a mechanistic neural-level account of some aspects of ‘scene construction’ and ‘episodic future thinking’ (Schacter et al., 2007; Hassabis et al., 2007; Buckner, 2010) with regard to familiar spaces. Mental navigation allows an agent to test future behavior, like the approach of a target from a new direction as depicted in Video 12. This suggests that the same neural infrastructure involved in scene perception and reconstruction also subserves planning and hypothesis testing (e.g. asking ‘Which way should I go?’ or ‘What would I encounter if I went that way?’). If grid cells (acting on place cells) change the point of view during imagined movement this must be reconciled with the relationships between grid cells and place cells seen during periods of rest or planning (see e.g., Ólafsdóttir et al., 2016; O'Neill et al., 2017; Trettel et al., 2017; Buzsáki and Chrobak, 1995).

Grid cells have been proposed to support the computation of vectors to a goal (Kubie and Fenton, 2012; Erdem and Hasselmo, 2012; Bush et al., 2015; Stemmler et al., 2015). That is, they can plan trajectories across known and potentially unknown terrain (shortcuts). We proposed that grid cells recruit new hippocampal cells (future place cells) in previously unexplored parts of a familiar environment (Figure 13 and Video 13). Planning a trajectory across unexplored space engenders preplay-like activity in place cells (Dragoi and Tonegawa, 2011; Ólafsdóttir et al., 2015), whereas mental navigation is reminiscent of ‘replay’ (Wilson and McNaughton, 1994; Foster and Wilson, 2006; Diba and Buzsáki, 2007; Karlsson and Frank, 2009; Carr et al., 2011) or ‘forward sweeps (Johnson and Redish, 2007; Pfeiffer and Foster, 2015), although the faster propagation speed (e.g. during sharp wave ripples) of these sequences of place cell activity are beyond the scope of the present model. Nevertheless, the model suggests that sweeps of activity in the grid cell population may play are role in these aspects of place cell firing, and could correspond to route planning (Kubie and Fenton, 2012; Erdem and Hasselmo, 2012; Bush et al., 2015; Yamamoto and Tonegawa, 2017).

Conclusions

It has been argued that the MTL-retrosplenial-parietal system supports the construction of coherent scenes (Burgess et al., 2001b; Byrne et al., 2007; Hassabis et al., 2007; Schacter et al., 2007; Buckner, 2010). However, if recollection corresponds to the (re-)construction of something akin to a perceptual experience (the defining characteristic of episodic memory; Tulving 1985), then this places strong spatial constraints on how episodic memory works. A vast number of different combinations of information could be retrieved from the body of long-term knowledge in the MTL, but only a small subset would be consistent with a single point of view, making the episodic ‘re-experiencing’ of events or visuo-spatial imagery congruent with perceptual experiences. The BB-model combines this insight with established knowledge and new hypotheses about how location, orientation, and surrounding environmental features are associated and represented by neural population activity.

This account includes functional roles for the specific firing characteristics of diverse populations of spatially selective cells across multiple brain regions, and distinguishes the egocentric representations supporting conscious (re-)experience from the more abstract (allocentric) representations involved in supporting computations. The resultant systems-level account provides a strong conceptual framework for considering the interplay between structures in the MTL, retrosplenial cortex, Papez circuit’ and parietal cortex in support of spatial memory. It follows Tulving’s theoretical specification of episodic memory, and - spanning Marr’s theoretical, algorithmic and implementational levels (Marr and Poggio, 1976) - bridges the gap between a neuropsychological description of spatial cognition (founded on behavioral and functional imaging data) and the neural representations supporting it.

Appendix 1 

BB-Model Details

Neuron model

All neuron populations in the BB-model, with the exception of for grid cells, are composed of rate-coded neurons and implemented according to the following equations.

(1) xit+1=xit+dtτkit
(2) rit+1=11+exp-2βixit+1-αi

Where x is the vector of activations (Equation 1, vectors/matrices displayed in bold) for all neurons belonging to the population marked by the subscript i (e.g. PCs, BVC, etc.). Within a population all neurons are identical. The superscript indicates the temporal dimension, with t + 1 referring to the updated state variable for the next time step (step size dt). τ is the decay time-constant of the rate equation. The sigmoid with parameters α, β (Equation 2) serves as a non-linearity to map activations onto firing rates. The term ki in Equation 1 contains all population specific inputs. Equations 3 through 13 summarize the inputs to the model populations.

(3) kPC=xPC+φPC,PCWPC,PCrPC+ PmodφBVC,PCWBVC,PCrBVC+φPRb,PCWPRb,PCrPRb+φOVC,PCWOVC,PCrOVC+ImodφPRo,PCWPRo,PCrPRo+φGC,PCWGC,PCrGC+IFB

Here (and below) Wi,j is the matrix of connection weights from population i to j, φi,j is a gain factor, and rj refers to the vector of firing rates of population j. IFB is a feedback current ensuring a set total of activity in the place cell sheet (numerical value 15). Imod and Pmod refer to neuromodulation for bottom-up vs top-down modes of operation, and the mode is set determined externally. that is setting these values according to behavioural needs of the agent (perception vs imagery/recollection) implements the switch between bottom-up and top-down modes. Pmod is one in bottom-up mode of operation and 0.05 in top-down mode. Imod is 0.05 in bottom-up mode of operation and one in top-down mode. Abbreviations: PC; place cells, BVC; boundary vector cells, OVC; object vector cells; PRb; boundary selective perirhinal neurons, PRo; object selective perirhinal neurons, PW; parietal window neurons, TR; transformation circuit neurons, HDC; head direction cells, GC; grid cells.

(4) kBVC=xBVC+BImodφPC,BVCWPC,BVCrPC+φBVC,PCWBVC,PCrBVC+φPRb,BVCWPRb,BVCrPRb+ B1 PmodφTRb,BVCWTR,BVCi=120rTRbi
(5) kOVC=xOVC+φOVC,OVCWOVC,OVCrOVC+B ImodφPC,OVCWPC,OVCrPC+ B ImodφPRo,OVCWPRo,OVCrPRo+ B1 PmodφTRo,OVCWTR,BVCi=120rTRoi

B is the ‘bleed’ parameter for a smooth modulation of bottom-up vs top-down connectivity, during perception (simulations 2.1, 2.2). Sums over transformation sublayers run from 1 to 20, the number of distinct sublayers (see description of transformation circuit in main text and below).

(6) kPRb=-xPRb+ImodφPC,PRbWPC,PRbrPC+φBVC,PRbWBVC,PRbrBVC+IPRb
(7) kPRo=xPRo+φPRo,PRoWPRo,PRorPRo+φPC,PRoWPC,PRorPC+φOVC,PRoWOVC,PRorOVC+IPRo+Icue

Icue is an externally supplied (i.e. not causally determined by other model components) trigger current to initiate recall in imagery.

IPRo and IPRb are externally supplied inputs to perirhinal identity neurons that represent the result of a recognition process along the ventral visual stream which is not explicitly modelled, and both inputs are only present in bottom-mode (i.e. during perception). IPRo is binary (object attended and present vs not attended/not present), while the magnitude of IPRb depends linearly on the extent of the boundary that is visible and its distance to the agent.

(8) kPWb=xPWbPWbbathrPWb+ B1IPWbagent+ B ImodφTR,PWbi=120WTR,PWbirTRi
(9) kPWo=xPWoPWobathrPWo+B1IPWoagent+ B ImodφTR,PWoi=120WTR,PWoirTRi

PWb/obath is an inhibitory input based on the total activity in the PWb/o population (sum of the population vector in Equations 8 and 9). IPWb/oagent refers to the sensory/perceptual inputs to the PWb/o populations. That is, these input currents are generated in response to the presence of boundaries/objects in the field of view in order to be injected into the corresponding populations.

(10) rIP= 1+exp-2βIP φHD,IPrHD-αIP-1

 connects onto the different sublayers of the transformation circuit (small hexagon in Figure 4), ensuring suppression of activity in all sublayers except where the positive modulatory input from HDCs ensures that inhibition is overcome.

(11) kTRbi=xTRbiTRbbathrTRbi+φHD,TRbWHD,TRbirHD+φIP,TRbrIP+ImodφBVC,TRbWBVC,TRbrBVC+B1 PmodφPWb,TRWPWb,TRirPWb for i ϵ {1,20}
(12) kTRoi=xTRoiTRobathrTRoi+φHD,TRoWHD,TRoirHD+φIP,TRorIP+ImodφOVC,TRoWOVC,TRorOVC+B1 PmodφPWo,TRoWPWo,TRoirPWo for i ϵ {1,20}

The superscript i in Equations 11 and 12 refers to the individual sublayers of the retrosplenial transformation circuit (i ranging from 1 to 20). For convenience and in order to visualize object (item) and boundary (contextual) related representations separately the transformation is applied separately to the PWb/o representations but the same connectivity is used. TRb/obath are analogous to PWb/obath.

(13) kHD=xHD+φHD,HDWHD,HDrHD+IcueφPRo,HDWPRo,HDrPRo+φrotcwWrotrHD+φrotccwWrotrHD

cw and ccw in Equation 13 are 0 or 1 depending on whether on the agent is performing a clockwise or counterclockwise turn, respectively. The scaling factor φrot is set to ensure a match between the agent’s rotations speed and the translation of the activity packet in the head direction ring attractor.

The firing rate dynamics of GCs are not modelled. GCs exist as firing rate maps which span the environment. GC rates are sampled from these rate maps by looking up the pixel value closest to the agent’s location. See section Grid cell rate maps, mental navigation, and preplay setup for the generation of the grid maps.

See Appendix 1—table 1 for population sizes.

Appendix 1—table 1
Model Parameters.

Top to bottom: α, β sigmoid parameters; φ connection gains; Φ constants subtracted from given weight matrices (e.g. PC to PC connections) to yield global inhibition; bath parameters; range thresholds for object encoding; l learning rates for simulation 5; S sparseness of connections for reservoir PCs; σρ, σϑ spatial dispersion of the rate function for BVCs. The additive constant (σρ = (r + 8) * σ0) corresponds to half the range of BVC grid and prevents σρ from converging to zero close to the agent. Ni population sizes. Products of numbers reflect geometric and functional aspects. E.g. receptive fields of PCs tile 2 × 2 m arena with 44 × 44 cells. Polar grids are given by 16 radial distance units (see A.2) and 51 angular distance units. For the transformation circuit this number is multiplied by the number of transformation sublayers, that is 20.

https://doi.org/10.7554/eLife.33752.032
α5
β0.1
αIP50
βIP0.1
φPWb-TR50
φTR-PWb35
φTR-BVC30
φBVC-TR45
φHD-HD15
φHD-IP10
φHD-TR15
φHDrot2
φIP-TR90
φPC-PC25
φPC-BVC1100
φPC-PRb6000
φBVC-PC440
φBVC-PRb75
φPRb-PC25
φPRb-BVC1
φGC-PC3
φPWo-TR60
φTR-PWo30
φTR-OVC60
φOVC-TR30
φPC-OVC1.7
φPRo-OVC6
φPC-PRo1
φOVC-PC5
φOVC-oPR5
φPRo-PC100
φPRo-PRo115
φinh-PC0.4
φinh-BVC0.2
φinh-PRb9
φinh-PRo1
φinh-HD0.4
φinh-TR0.075
φinh-TRo0.1
φinh-PW0.1
φinh-OVC0.5
φinh-PWo1
ΦPC-PC0.4
ΦBVC-BVC0.2
ΦPR-PR9
ΦHD-HD0.4
ΦOVC-OVC0.5
ΦPRo-PRo01
PWbath0.1
PWbath0.2
TRbath0.088
Object enc. threshold18 cm
Object enc. Threshold (3.1)36 cm
lGC-resPC0.65*10^−5
lresPC-BVC0.65*10^−5
lBVC-resPC0.65*10^−5
SGC-resPC3%
SresPC-resPC6%
σρ(r + 8) * σ0
σ00.08
σϑ0.2236
NPC44 × 44
NBVC16 × 51
NTRb/o20 × 16×51
NOVC16 × 51
NPRb/oDependent on simulation environment
NPWb/o16 × 51
NIP1
NHD100
NGC100 per module
Nreservoir437

Receptive fields of place cells and boundary vector cells

In the training phase for the contextual representation (see section Connection Profiles) BVC and PWb neurons have activation functions of the following type. If a boundary segment is located at the coordinates (ρ,ϑ), then the activity of each boundary selective cell is proportional to the distance of its receptive field from that boundary segment. If i, ϑi) are the polar coordinates of the receptive field of the i-th BVC or PWb neuron, then the firing rate r is calculated according to the following equation:

(14) rBVCi=1ρexp-ϑia-ϑaσϑ2exp-ρi-ρσρ2

where σϑ and σρ define the spatial dispersion of the rate function r. The radial dispersion increases with distance (i.e. σρ is a function of the radius; see e.g. Barry and Burgess 2007).

The radial separation of distance bins (see Figure 2A2) increases linearly from 0.21 to 1.71 along the radius of length 16 distance units (corresponding to approx. 145 cm for the 2 × 2 m environment). Internal to the model a distance unit is given by 2/NPC (see place cell resolution below). The same function is used to calculate the perceptual input to the parietal window due to objects and boundaries during simulation and to calculate activations of parietal window neurons and retrosplenial cells during the setup of the transformation circuit (see below). The receptive fields of BVCs, OVCs, PWb, PWo neurons and retrosplenial cells tile the space in polar coordinates with a radial resolution of 1 receptive fields per arbitrary distance unit (range: 0–16, see above) and an angular resolution of 51 receptive fields over 2π radians.

Similarly, to set up the PC weights in the training phase PC rates are calculated via the following equation:

(15) rPCi=expxi-x2+yi-y20.52

where (x,y) is the location of the agent and (xi,yi) the location of the receptive field of the PC in question. The firing fields of PCs tile the environment in a Cartesian grid with resolution 0.5 (i.e. two PCs per arbitrary distance unit). However, note that during simulations PCs are never driven by this activation function. Only BVCs, PR neurons and GCs drive PCs during simulation, unlike PWb/o neurons which must receive sensory/perceptual inputs in bottom-up mode.

Connection profiles

The encoding procedure (section Bottom-up vs top-down modes of operation) describes how object related connections are learned. The contextual representation of BVC, PC and PRb neurons, as well as the connections to and from the transformation circuit are set up in a training phase prior to running any simulations. To set up the transformation circuit randomly oriented boundary segments are chosen (20.000 times per transformation sublayer for a total of 400.000 instances), and the corresponding firing rates (calculated according to Equation 14) for PWb neurons and the transformation circuit sublayers are instantiated. For each transformation circuit sublayer the randomly generated activity pattern is rotated by a different angle (rotation angle chosen from 20 evenly spaced head directions). Connection weights are then calculated as outer products of the population vectors, yielding a matrix of Hebbian-like associations between the populations. The connections from the retrosplenial transformation circuit to BVCs are one-to-one connections between BVCs and the cells in each of the 20 transformation sublayers (i.e. the connections are given by the identity matrix) since this connection only needs to convey the outcome of the gain modulation across the RSC sublayers. That is, rotations of activity patterns occur on the connection to and from the parietal window. Video 1 shows all sublayers of the transformation circuit, subject to gain modulation from HDCs as a simulated agent navigates a simple environment, see also Figure 2—figure supplement 1. The entire transformation of egocentric boundary inputs to BVCs effectively constitutes a model of BVC generation from sensory inputs. Finally, connections between HDCs and the 20 transformation circuit sublayers are calculated algorithmically, by associating each sublayer with one of 20 evenly spaced HD activity bumps on the head direction ring.

With a functioning transformation circuit, and after specifying the location and extent of extended boundaries in the environment, the agent is placed at a random location and orientation in the environment and the activations of BVCs and PCs are calculated via equations 14 and 15. PRb activations are instantiated based on the identity of the visible landmark segments. Connection weights between these three populations (supporting the contextual representation) are again calculated as outer products of the corresponding population vectors, yielding matrices of Hebbian-like associations between the populations.

Weights are normalized such that the sum total of weights converging on a given target neuron is 1, which is assumed to be the result of some homeostatic process, a widely agreed upon feature of synaptic plasticity (Keck et al., 2017). Weights are scaled by scalar gain factors φi (see Appendix 1—table 1) to produce appropriate responses in targets of afferent connections.

Grid cell rate maps, mental navigation, and preplay setup

Grid cells are implemented as firing rate maps. Each map consists of a matrix of the same dimensions as the PC sheet (44 × 44 pixels) and is computed as 60 degrees offset, superimposed cosine waves using the following set of equations.

(16) b0=cos0sin0b1=cosπ3sinπ3b2=cos2π3sin2π3
(17) zi=RjbiFx+xoffset
(18) rGC=max0,cosz0+cosz1+cosz2

Here b0, b1 and b2 are the normal vectors for the cosine waves. Rj is the standard 2D rotation matrix where the index j ranges from 1 to 7 and refers to the rotation angle of the matrix (7 random orientations for 7 grid modules, here 0, π/3, π/4, π/2, π/6, 1.2π, 1.7π). F is the frequency of the grids, starting at 0.0028*2π. The scales of successive grids are related by the scaling factor 2 (Stensola et al. 2012). For each grid scale offsets are sampled uniformly along the principle axes of two adjacent equilateral triangles on the grid (i.e. the rhomboid made of 4 grid vertices).

Motion through GC maps (i.e. a GC sweep) during mental navigation and preplay is implemented by sampling the GC rate along the imagined trajectory superimposed on the GC rate map. The firing rate value (i.e. the pixel of the rate map) is determined by rounding the x and y values of the imagined trajectory to the nearest integer value. This sampling is equivalent to a shift of a hexagonal pattern of activation on a 2D sheet of entorhinal cells, as suggested in mechanistic models of grid cells (Burak and Fiete, 2009).

For simulation 5.0 (planning; Video 13 and Figure 13) the reservoir place cells are supplied with random afferent connections from grid cells (sparseness 3%), and are also randomly interconnected amongst themselves (sparseness 6%). Place cells representing the familiar context and reservoir PCs inhibit each other (inhibitory connections 50% stronger than the default inhibition among place cells representing the context). Weights among reservoir place cells are normalized to the mean of the total amount of positive weights converging onto a typical place cell representing the familiar context. Grid cell weights to reservoir place cells are similarly normalized (80% stronger than default). These additions suffice to produce random, preplay-like activity in reservoir place cells as soon as the central peak of the grid cell ensemble begins to drive the reservoir. The inhibitory connections to and from the context network assure that either the reservoir place cells or the context network wins out. No changes to the adaptive feedback current are necessary (IFB in Equation 3). Finally, during preplay connections from BVCs and perirhinal neurons to place cells are turned off to avoid interference which can arise due to the very simple layout of the environment (many boundary configurations experienced by the agent are similar). No other changes to the default model are necessary. To visualize the spatio-temporal sequence of the firing of reservoir place cells during the three phases of simulation 5.0 (planning, perception, recall; see main text) the firing of reservoir cells is recorded along the imagined or real trajectory, and stacked (rightmost panels in Figure 13) to yield figures akin to typical preplay/replay experiments. The firing rates are normalized and thresholded at 10% of the maximum firing rate for clarity. That is, cells that do not fire, or fire at very low rates are not shown. Due to learning during the actual traversal of the novel part of the environment (phase 2, perception) some cells can increase their firing rate above the threshold. As a consequence the number of cells which is plotted in the stacked rate maps grows marginally between phase 1 (preplay) and phase 2. However, ordering PCs in phases 2 and 3 according to the sequence derived from the preplay is done before thresholding. Hence the correct order derived from phase 1 (preplay) is applied to the cells recorded in phases 2 and 3.

Agent and attention models

To ensure an unambiguous representation of an object at a given location (see main text) we implement a heuristic model of directed attention. A fixed length for an attentional cycle (600 ms) is allocated and divided by the number of visible objects, yielding a time per object tO. The PWo population is then driven for tO ms with the cueing current IPWoagent (see Equation 9) for each visible object in sequence.

The agent moves in straight lines within the environment, following a path defined by a list of coordinates. Upon reaching a target the rotation towards the next subgoal is performed, followed by the next segment of translation. The rotational velocity is implicitly given by a fixed offset of the translation weights for the HD ring attractor (approximately 18 degrees; see e.g. Zhang, 1996; Song and Wang, 2005; Bicanski and Burgess, 2016 for more sophisticated methods of integrating rotational velocity). Translational velocity is fixed at 25 cm per second.

The agent model is agnostic about the size of the arena and nature of the agent. It can be viewed as rodent like agent or alternatively a human-like agent. The environment is covered by 44 × 44 PCs. that is 1/44 of the length/width of the environment corresponds to one distance unit. Assuming a timestep of e.g. 1ms and an arena size of approximately 2 × 2 m2 for a rodent-like agent yields a translation speed of approximately 10 cm/s. Assuming a human-like agent in an environment of approximately 10 × 10 m2 yields a translation speed of approximately 56 cm/s, corresponding to a slow paced walk for a human subject. In either case the speed is orders of magnitude below the time scale of neural rate dynamics.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
    Memory for events and their spatial context: models and experiments
    1. N Burgess
    2. S Becker
    3. JA King
    4. J O'Keefe
    (2001b)
    Philosophical Transactions of the Royal Society B: Biological Sciences 356:1493–1503.
    https://doi.org/10.1098/rstb.2001.0948
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
    Le Syndrome De Korsakoff. Masson
    1. J Delay
    2. S Brion
    (1969)
    Le Syndrome De Korsakoff. Masson.
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
    Neural systems for landmark-based wayfinding in humans
    1. RA Epstein
    2. LK Vass
    (2014)
    Philosophical Transactions of the Royal Society B: Biological Sciences 369:20120533.
    https://doi.org/10.1098/rstb.2012.0533
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
    Egocentric representation of environmental boundaries in the striatum
    1. JR Hinman
    2. GW Chapman
    3. ME Hasselmo
    (2017)
     Society for Neuroscience.
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81
  82. 82
  83. 83
    Advances in Neural Information Processing Systems
    1. S Káli
    2. P Dayan
    (2001)
    24–30, Hippocampally-dependent consolidation in a hierarchical model of neocortex, Advances in Neural Information Processing Systems.
  84. 84
  85. 85
  86. 86
  87. 87
  88. 88
  89. 89
  90. 90
  91. 91
  92. 92
  93. 93
  94. 94
  95. 95
  96. 96
  97. 97
  98. 98
  99. 99
  100. 100
  101. 101
  102. 102
  103. 103
  104. 104
  105. 105
  106. 106
    The Hippocampus as a Cognitive Map
    1. J O'keefe
    2. L Nadel
    (1978)
    Oxford: Clarendon Press.
  107. 107
  108. 108
  109. 109
  110. 110
  111. 111
  112. 112
  113. 113
  114. 114
  115. 115
  116. 116
  117. 117
  118. 118
  119. 119
  120. 120
  121. 121
  122. 122
  123. 123
  124. 124
  125. 125
  126. 126
  127. 127
  128. 128
  129. 129
    A model of the neural basis of the rat's sense of direction
    1. WE Skaggs
    2. JJ Knierim
    3. HS Kudrimoti
    4. BL McNaughton
    (1995)
    Advances in Neural Information Processing Systems 7:173–182.
  130. 130
  131. 131
  132. 132
  133. 133
  134. 134
  135. 135
  136. 136
  137. 137
  138. 138
  139. 139
  140. 140
  141. 141
    Head direction cells in the deep layer of dorsal presubiculum in freely moving rats. InSociety of neuroscience abstract
    1. JS Taube
    2. JB Ranck
    (1990)
    The Journal of Neuroscience : The Official Journal of the Society for Neuroscience 10:420–435.
  142. 142
  143. 143
  144. 144
  145. 145
  146. 146
  147. 147
  148. 148
    Elements of Episodic Memory
    1. E Tulving
    (1983)
    Clarenden Press.
  149. 149
    Two cortical visual systems
    1. LG Ungerleider
    (1982)
    Analysis of visual behavior pp. 549–586.
  150. 150
  151. 151
  152. 152
  153. 153
  154. 154
  155. 155
  156. 156
  157. 157
  158. 158
  159. 159
  160. 160
  161. 161
  162. 162
  163. 163
  164. 164
  165. 165
  166. 166
  167. 167
  168. 168

Decision letter

  1. Laura Colgin
    Reviewing Editor; The University of Texas at Austin, Center for Learning and Memory, United States
  2. Michael J Frank
    Senior Editor; Brown University, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "A Model of Spatial Memory and Imagery – From Single Neurons to Cognition" for consideration by eLife. Your article has been reviewed by three peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Michael Frank as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This paper describes a model that proposes a framework to explain how multiple, highly complex regions interact to produce egocentric and allocentric representations of space. Although all reviewers found the model to be interesting and novel, they also agreed that the Results, Materials and methods, and rationale for the model were often not presented in a clear and concise manner that would be optimal for a broad readership. Reviewers also felt that the simulation movies were insufficient to demonstrate the importance of the model and request a number of essential revisions, including incorporation of clear performance goals for the model, together with appropriate quantitative measures.

Essential revisions:

1) A primary shortcoming is that the model is not used to simulate any specific behavioral tasks. Instead, after describing the architecture, the paper presents a series of several simulations (depicted by figures and videos) to demonstrate examples of neural population activity (over time periods lasting tens of seconds) during a few selected types of navigational behavior. These example simulations do not provide adequate support for strong claims made in the Discussion section, where it is argued that the model accounts for a wide range of findings including position specificity in visual object memory (Hollingworth, 2007), impaired episodic memory resulting from Papez circuit lesions (Delay and Brion 1969), neural activity seen during imagery for scenes in the MTL, retrosplenial cortex and precuneus (Burgess, Maguire et al., 2001; Hassabis et al. 2007; Schacter et al. 2007), and 'some aspects' of 'scene construction' and 'episodic future thinking' (Schacter et al., 2007; Hassabis et al., 2007; Buckner 2010). To support such claims, the model should more faithfully reproduce experimental designs from these prior studies (as is, there does not seem to be any quantitative metric by which these broad claims can be evaluated). It is also implied that the model accounts for "trace cells" that fire when the agent visits the prior location of a missing object (Tsao et al., 2015), and although trace-like activity is shown in some example simulations, it is not specifically stated which neural population in the model would correspond to trace cells, nor are simulated trace cell firing rate maps presented. It is additionally claimed that the model accounts for preplay and replay activity of place cells, but it appears that simulated preplay and replay events do not occur on a compressed time scale in the model as they do in real rodents, which is not addressed. In other words, the heavy reliance on qualitative (rather than quantitative) performance assessments make it difficult to offer anything more than a subjective evaluation of the model's capabilities. A more objective evaluation might be possible if the authors run more simulations to quantitatively compare simulated vs. real neural activity (or simulated vs. real task performance), explore how the model's performance degrades under realistic noise or uncertainty conditions, etc.

2) Several key mechanisms – such as sequential shifting of attentional focus (which is essential for solving the place-object binding problem), neuromodulation (which allows the model to transition between sensory processing and mental imagery modes), and population activity in the grid cell map (which is essential for generating preplay/replay trajectories) – are not explicitly simulated by the model. Rather, these signals are provided "for free" as inputs to the network. When reciting the list of phenomena that the model can explain (see prior point), the authors should take care to include only phenomena that fall within the purview of what is actually being simulated by the network, rather what is being provided for free.

3) The depiction and explanation of BVC and PWb data (initially shown in Figure 2A2, C1, and C2 and also described in subsequent figures) is unclear. Why are small receptive fields shown close to the agent in Figure 2A2? What does this mean? In the Video 1, receptive fields do not appear to get smaller or bigger as the agent moves around. Are the cells with receptive fields close to the agent the ones that fire at the actual boundaries, not a distance away from the boundaries? If not, in Figure 2C1, why are BVCs shown to be firing, presumably at the north and east boundaries of the environment, when the agent is in the center of the environment (i.e., not in the boundary)? Why are so many BVCs firing at the same time when the agent is in a particular location? Are these different cells that fire at different distances from the boundaries? Figure 2A2 is described as showing receptive fields for BVCs, but these receptive fields are usually more rectangular in shape, whereas they are depicted as circular here. In "bottom up" mode (e.g., Figure 5, top) it appears that only BVCs encoding boundaries ahead of the animal (those in the visual field?) are active. Is this consistent with the firing of real neurons (such as border cells)? Is there any evidence that real BVCs are active only when a rodent faces toward but not away from their coded boundary?

4) The authors show a schematic of their model in Figure 4, which includes top-down and bottom-up connections between different brain areas. Lacking from the paper are multiple citations to anatomical studies that demonstrate that these connections are realistic (e.g., Jones and Witter, 2007, for projections from retrosplenial cortex to deep layers of entorhinal cortex). For example, has it been shown that the entorhinal cortex projects back to the retrosplenial cortex?

5) In Figure 7D, why isn't anything activated for the new object?

6) The authors go to great lengths to realistically model MTL activity based on rodent literature. However, it was not clear how the model relates to rodent literature regarding parietal coding (e.g. Harvey, Coen and Tank, 2012; Raposo, Kaufman and Churchland, 2014; Whitlock et al., 2012) and retrosplenial coding (Vedder, Miller, Harrison, Smith, Cerebral Cortex, 2017; Alexander and Nitz, 2017; Alexander and Nitz, 2015). Can the model account for the type of coding observed in navigating rodents in these regions? Or are there specific predictions the model can make about what would be observed or how these types of observed neural codes can be interpreted in the context of spatial cognition?

7) In some places, terms are used that assume prior knowledge to a degree that can make the paper difficult to parse. Some examples include 'gain-field circuit' (Introduction), the current model as an extension of the 'BBB model' (Introduction), 'heuristically implemented', 'mock-motor-efference'. Terms like this could use additional (but very brief) explanation when introduced.

8) The paper can be a bit long in places – particularly the sections on preplay and replay.

9) Figure 4: was activity from place cells to grid cells considered?

10) The presence of object specific coding appears to be a 'strong prediction' of the model, as object coding reported thus far is minimally selective for specific objects. The authors detail potential places where this activity might be found but they may want to consider that these types of representations lay outside the cortex or emerge as a population level representation (i.e. would not be observable at the level of single cell tuning curve responses).

[Editors’ note: this article was subsequently rejected after the authors submitted their revisions but they were invited to resubmit after an appeal against the decision.]

Thank you for submitting your work entitled "A Neural-Level Model of Spatial Memory and Imagery" for consideration by eLife. Your article has been reviewed by one peer reviewer, and the evaluation has been overseen by a Reviewing Editor and a Senior Editor. The reviewers have opted to remain anonymous.

We regret to inform you that your work will not be considered further for publication in eLife.

After consultation, we feel that the paper remains too cumbersome for a general audience, and some points remain unclear.

Reviewer #3:

In this paper, Bicanski and Burgess present a network model to propose how multiple populations of spatially tuned neurons are functionally interconnected with one another to form a spatial memory network with multiple capabilities.

A core feature of the model is that it proposes how bidirectional transformations between the egocentric and allocentric reference frames might be performed by networks in the retrosplenial cortex. The paper ambitiously attempts to demonstrate numerous capabilities for the model, such as mapping a familiar environment, learning the locations of unique landmark objects in such an environment, detecting novelty when changes occur in an environment, and simulating effects of brain lesions upon memory and navigation. The paper incorporates 13 videos, 13 figures, and 18 methods equations. Despite its length (nearly 60 pages), the paper does not provide enough detail for readers to fully understand some key aspects of the model (see below). This should not be interpreted as an entreaty to make the paper even longer. Rather, the paper should probably be split up into at least two publications, each dedicated to exploring specific capabilities of the model with more clarity and depth.

As noted in the prior review, simulations of mental navigation and route planning do not seem to be rooted in the model's core ability to perform bidirectional egocentric / allocentric transformations. Instead, it is the addition ("for free") of a grid cell network connected to place cells endows the model with this capability. How does the egocentric / allocentric transformation network contribute to the mental navigation and route-planning process? What is the functional purpose for transforming allocentric replay events back into the egocentric coordinate frame to generate imagery? The grid cell driven simulations do not even appear until the subsection “Grid cells and mental navigation (Simulation 4.0)”, and given how information dense the preceding pages are, perhaps the addition of the grid cell network and accompanying simulations might be better suited to a second paper, dedicated more specifically to the topic of route planning?

Another problem raised in review, which has not been adequately addressed, is that the sequential shifting of attentional focus is not explicitly simulated by the model, and not explained fully enough for readers to comprehend how this process contributes to the simulation results. To solve the problem of binding each object or boundary's location to its identity, the model adopts a sequential shift of attention strategy. If I understand correctly, each population of boundary (BVC) or object (OVC) cells is activated one at a time in conjunction with its corresponding boundary (PRb) or object (PRo) identity cell. The length of each attentional cycle is stated to be 600 time units. How does information about multiple objects and boundaries get integrated across multiple attentional cycles to form a stable and non-fluctuating representation of the environment as a whole? Do PCs have slow activity decay kinetics that can span multiple attentional cycles? How is such temporal integration accounted for in analyses like that shown in Figure 8, where momentary "snapshots" of population vectors are being correlated with one another across encoding vs. recall? Because of the sequential attention mechanism, one would expect that an incomplete representation of the environment (i.e., just one object and one boundary) would be active at any given time. So I don't quite see how it is possible to use an instantaneous snapshot of the population vector to perform these correlation analyses, unless the snapshot is being taken at the end of some temporal integration process that spans multiple attention cycles?

Finally, since the heart of the model is the retrosplenial transformation network, it is a bit surprising that no simulation results are shown to demonstrate the predicted firing properties of retrosplenial neurons that perform the egocentric / allocentric transformation (in either direction) during imagery and recall. Do any testable predictions for unit recording studies arise from the firing properties of these model neurons? If so, then it seems like they should be included in the paper.

In summary, there are some innovative features of this model that would of interest to the research community, but the format of the paper (including its length) may not be best suited for publication in eLife. The authors might wish to consider splitting the paper up into two publications (e.g., one on coordinate transformation, and one on mental navigation and route following) so that some of the missing details of the model can be more thoroughly described without adding even more to the current manuscript's excessive length.

[Editors’ note: what now follows is the decision letter after the authors submitted for further consideration.]

Thank you for resubmitting your work entitled "A Neural-Level Model of Spatial Memory and Imagery" for further consideration at eLife. Your revised article has been favorably evaluated by Michael Frank (Senior Editor), a Reviewing Editor, and one new reviewer.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

Reviewer #1:

In the revised manuscript, the authors have done a lot of work to respond to previous criticisms, including adding new results to respond to the initial criticisms.

There is much value and novelty in this model, which provides a potential explanation of how egocentric and allocentric representations of space are reconciled. Also, as the authors point out in their appeal, the excessive length of the manuscript is due to results that were added to satisfy the original reviews. I feel confident that the specific feedback provided below will allow the authors to clarify points that remain puzzling.

1) I didn't really understand what the "transformation sublayers" are. Are they between areas or in the retrosplenial cortex part of the model? How are they different from "individual sublayers" (subsection “The Head Direction Attractor Network and the Transformation Circuit”, second paragraph)?

2) In all of the figures and videos involving the top-down mode/recall/imagery, why does firing occur for all directions of boundaries in PWbs and BVCs?

3) In Figure 12A, why are OVCs to the east firing when object 2 is in the northeast of allocentric space?

4) In Video 8, I don't understand at time = 1.67 seconds how the OVC pattern matches that during perception for object 2. It looks like object 1. OVC activity at NW at 1.67 seconds looks the same as OVC activity at NW at 6.52 seconds, the time at which the text states that activity matches that during perception for object 1. Related to this, in Video 12, at time = 11.52 s, OVC do not seem to be representing object 1 (i.e., it is not in the SE). The mental navigation representation of object 3 in OVC at 15.21 seconds looks the same as the mental navigation representation of object 1 at time = 11.36 seconds.

https://doi.org/10.7554/eLife.33752.035

Author response

Summary:

This paper describes a model that proposes a framework to explain how multiple, highly complex regions interact to produce egocentric and allocentric representations of space. Although all reviewers found the model to be interesting and novel, they also agreed that the Results, Materials and methods, and rationale for the model were often not presented in a clear and concise manner that would be optimal for a broad readership. Reviewers also felt that the simulation movies were insufficient to demonstrate the importance of the model and request a number of essential revisions, including incorporation of clear performance goals for the model, together with appropriate quantitative measures.

We hope our changes and additions to the manuscript in response to the comments provided remedy any criticisms. We now replicate specific experimental findings to supplement the previously submitted simulations and movies and expand the quantitative treatment of results. Please see in particular our extensive response to comment 1.

Quantification

In addition to the replication of several experimental paradigms (i.e. replication of behaviour/phenomenology), we have introduced a quantitative measure to assess the performance of the model. Our model’s main claim is that it constitutes a model of spatial memory and imagery specified at the level of single neurons. Hence we now quantify the similarity between the model representations during encoding and during recall, by correlating the population vectors during these two distinct states for the model’s populations. That is, we now measure the extent of re-activation of population vectors to quantify the extent of memory or novelty being signaled by the various neural populations. The magnitude of the correlation is compared to correlations of the recalled pattern to randomly sampled times while the model operates in bottom-mode (i.e. during perception). Author response image 1 shows an example from simulation 1.0 (object-cued recall). This measure can also be compared between simulations with and without noise (see end of replies to comment 1). Perirhinal neurons are not shown because of their small number (4 for the 4 walls, 1 for the object) and because the correlation is trivially 1 when cueing with the object (current is injected into the object-specific perirhinal neuron to induce recall).

Author response image 1
Example of pattern comparison via correlations of population vectors from simulation 1.0 (object cued recall).

White bars show the correlation between the neural patterns during imagery/recall and those during encoding (RvE), while black bars show the average correlation between the neural patterns during imagery/recall and random patterns (sampled every 100 ms; RvRP). Note that OVCs and PCs exhibit correlation values close to one, indicating faithful reproduction of patterns. BVC correlations are somewhat diminished because recall fully reactivates all boundaries indiscriminately compared to a limited field of view during perception with only modest reactivation outside the field of view. PW neurons show correlations below one because at recall reinstatement in parietal areas requires the egocentric allocentric transformation (i.e. OVC signals passed through retrosplenial cells), which smears out the pattern compared to perceptual instatement in the parietal window (i.e. imagined representations are not as precise as their counterparts during perception).

https://doi.org/10.7554/eLife.33752.034

This quantification of the mismatch between population vectors reflecting perceived and recalled representations could potentially be compared to experimental measures of overlap between neuronal populations (e.g. Guzowski et al., 1999 in animals, or ‘representational similarity’ measures in fMRI, e.g. Ritchey et al., 2013) if perception and recall can be separated experimentally. We use the same measure to infer the extent of novelty-related exploration (see response 1A, and original submission) and to assess robustness with regards to noise (please see reply section 1G).

Comment 1 requires the most comprehensive response and we split our reply in multiple parts (A-H) to focus on the individual points raised.

Essential revisions:

1) A primary shortcoming is that the model is not used to simulate any specific behavioral tasks. Instead, after describing the architecture, the paper presents a series of several simulations (depicted by figures and videos) to demonstrate examples of neural population activity (over time periods lasting tens of seconds) during a few selected types of navigational behavior. These example simulations do not provide adequate support for strong claims made in the Discussion section, where it is argued that the model accounts for a wide range of findings including position specificity in visual object memory (Hollingworth, 2007), impaired episodic memory resulting from Papez circuit lesions (Delay and Brion 1969), neural activity seen during imagery for scenes in the MTL, retrosplenial cortex and precuneus (Burgess, Maguire et al., 2001; Hassabis et al. 2007; Schacter et al. 2007), and 'some aspects' of 'scene construction' and 'episodic future thinking' (Schacter et al., 2007; Hassabis et al., 2007; Buckner 2010). To support such claims, the model should more faithfully reproduce experimental designs from these prior studies (as is, there does not seem to be any quantitative metric by which these broad claims can be evaluated).

We agree with the referees that our more general simulations (designed to showcase the capabilities of the model) should be supplemented by replications of published experiments to better link our claims to the cited literature. We have added several items to the Results section, replicating behaviour/phenomenology from both human and rodent studies, focusing and on Papez’ circuit lesions and object novelty respectively. We have also restructured the section on top-down activity and trace cells, conducting new simulations replacing the previous simulations 2.1-4.

1A) Object novelty, hippocampal vs. perirhinal lesions

We now simulate experiments showing that hippocampal vs. perirhinal lesions differentially affect change detection, and suggest that object vector cells play a crucial role. In experiments the effects of lesions are commonly tested by exploiting the spontaneous preference that rats display for exploring novel/altered stimuli compared to familiar/unchanged ones. We assume that the relative amount of exploration will be proportional to the mismatch signal in the model (exploiting the introduction of a quantitative measure, please see above), and simulate key aspects of the following study.

Mumby et al. (2002) tested (among other manipulations) whether rats can detect that one of two objects has been shifted to a new location in a given environment. We have added simulations of this type of task (a moved object) with and without an intact hippocampus to the manuscript (simulations 1.3 and 1.4). The agent experiences an environment with two objects in which one is later moved. The mismatch signal is quantified as the difference in the firing of object vector cells during bottom-up mode (perception at the location of encoding) and during recall (imagery, at the same location).

Intact hippocampus:With an intact hippocampus (Simulation 1.3 and Video 7) the agent can generate a meaningful novelty signal, since the perceived location of the object relative to the agent is different from the recalled position. We compare the mismatch signal between the two objects in the arena and set the next movement target for the agent (to have a visual manifestation of the outcome of the comparison in the video) to the object for which the biggest mismatch has been detected (lowest correlation of population vectors). Figure 9 depicts the crucial aspects of this simulation and has been added to the manuscript as an application of the model to a real task. That detection of a change in position requires the hippocampus is consistent with place cells binding the relative location of an object (via object vector cells) to perirhinal neurons signalling the identity of an object.

Hippocampal lesion:Hippocampal lesions (Simulation 1.4 and Video 8) are implemented by setting the firing rates of hippocampal neurons to zero at all times. A hippocampal lesion precludes the generation of a meaningful novelty signal because the agent is incapable of generating a coherent point of view, since no appropriate BVC configuration can be selected by the now missing hippocampal input. A connection between object vector cells and perirhinal neurons (see Figure 3D in original manuscript) can still form in the lesioned agent. Thus upon recall some OVC activity is present (in imagery) due to a connection between perirhinal neurons and OVCs. However, such activity is not location specific. That is, without the reference frame of BVCs/PCs (place cells selecting active BVCs) this residual spurious activity cannot specify absolute object location since it can be generated everywhere (see Figure 2F-H). It only tells agent it has seen this object at given relative distance, but not where. Hence the mismatch signal is equal for both objects, and consequently exploration time would be split roughly 50/50 between the two objects.

On rare occasions, by chance, an object could be at the correct distance for the agent to give a good match between the perceived location and the unanchored OVC activity elicited by perirhinal neurons in imagery (Figure 9G, H). However, such an incidental match can occur for both the un-moved, as well as the moved object. Note that this account is corroborated by human data, which shows that some focal hippocampal amnesics can recall a familiar arrangement of objects if the point of view between test phase and sample phase is matched (King et al., 2002; but see also Shrager et al., 2007).

Contextual effects:Rats show preferential exploration of a familiar object that was previously experienced in a different environment, compared with one previously experienced in the same environment, and this preference is also abolished by hippocampal lesions (Mumby et al., 2002; Eacott and Norman 2004; Langston and Wood 2009). We have not simulated different environments (using separate place cell ensembles), but note that the ‘remapping’ of place cells between distinct environments (i.e. much reduced overlap of place cell population activity, Bostock, Muller and Kubie, 1991; Andersen and Jeffery 2003; Wills et al., 2005) suggests a mismatch signal for the changed-context object can be derived by comparing PC population vectors. Initiating recall of object 1, belonging to context 1, in context 2, would drive the place cells ensemble belonging to context 1, creating an imagined scene in context 1. During perception place cells representing context 2 would instead continue to be active. That is, since two environments would be represented by distinct place cells ensembles in the model (approximating remapping conditions) this would trivially generate a mismatch signal, and a hippocampal lesion obviously precludes such a mismatch signal specific to either the same-context or changed-context object.

Perirhinal lesions:Finally, it has been argued that object recognition (irrespective of context) is spared after hippocampal lesions but not perirhinal lesions (Aggleton and Brown, 1999; Winters et al., 2004; Norman and Eacott, 2004). These results are certainly compatible with the model given that its perirhinal neuronal population signals an object’s identity irrespective of location.

1B) The effect of Papez’ Circuit lesions

The second type of experimental finding we now replicate is the effect of lesions along Papez’ Circuit, outside of the hippocampus, in causing amnesia (Delay and Brion 1969; Squire and Slater 1978, Squire et al., 1989; Parker and Gaffan 1997; Aggleton et al., 2016). Lesions to the fornix and mammillary bodies severely impact recollection, although recognition is less affected (Tsivillis et al., 2008). Although Papez’ Circuit includes more regions than are covered by our model, in the context of spatial representations Papez’ circuit brain areas are notable for containing head direction cells. That is, the mammillary bodies (more specifically the lateral mammillary nucleus, LMN), anterior dorsal thalamus, retrosplenial cortex, parts of the subicular complex and medial entorhinal cortex. Thus lesioning Papez’ circuit corresponds to (at least) removing the head direction signal in our model. This signal is crucial for the operation of the transformation circuit. (In reality the disruption of inputs from the medial and lateral septum, conveying theta rhythmicity and cholinergic tone could also be important, but are not part of the current model.)

We have conducted new simulations to illustrate the ability of the model to account for the known results of such lesions (Simulations 1.1, 1.2 and Videos 5, 6). The lesion is modelled by setting the input provided by head direction cells to the retrosplenial transformation circuit to zero at all times.

In the bottom-up mode of operation (perception) a lesion to the head direction cell ensemble removes drive to the transformation circuit and consequently to the boundary vector cells and object vector cells. That is, the perceived location of an object (present in the egocentric parietal representation) cannot elicit activity in the MTL and thus cannot be encoded into memory. I.e. there is no BVC/OVC activity which could be associated with currently active place cells. In the top-down mode of operation (recall)there are two effects. 1) Since no new elements can be encoded into memory there is nothing to recall (anterograde amnesia) (Simulation 1.1), and 2) Even if there are pre-existing memories (e.g. of an object encoded prior to the lesion) and place cells can be driven via a learned connection from perirhinal neurons (e.g. when cued with the object encoded prior to the lesion; Simulation 1.2), no meaningful representation can be instantiated in parietal areas, preventing recollection/imagery, thus causing retrograde amnesia for hippocampus-dependent episodic memories (other forms of retrieval of neocortical memories would not necessarily be affected). Simulations 1.1 and 1.2 show that the egocentric neural correlates of objects and boundaries present in the visual field would persist in the parietal window only as long the agent perceives them (they could also be held in working memory, which we do not model). Figure 7 depicts the key elements of simulations 1.1 and 1.2.

Note that perirhinal cells and any presumed upstream ventral visual stream inputs are spared, and hence an agent could still report the identity of an attended object.

Also note that grid cell firing would also be disrupted by removal of head direction inputs (Winter, Clark and Taube, 2015).

Winter SS, Clark BJ, Taube JS. Disruption of the head direction cell network impairs the parahippocampal grid cell signal. Science. 2015 Feb 5:1259591

1C) Position specificity and Hollingworth (2007)

In Hollingworth (2007) participants viewed images of natural scenes or object arrays in a change detection task and memory performance was higher when the target object position remained the same from study to test. This advantage was lost when the relative spatial relationships among contextual objects was disrupted. The author concludes that episodic scene representations are formed through the binding of objects to scene locations, and object position is defined relative to a larger spatial representation coding the relative locations of contextual objects (also see Hannula et al., 2006). This seems like a very good match to the role played by object vector cells (OVCs) in our model. OVCs are bound to a context via place cells, which means that all objects, when represented by OVC firing (in conjunction with perirhinal identity cells), preserve their relative arrangements (see e.g. Simulations 1.3, 1.4, and 3.0 for two objects encoded from the same position).

Hannula DE, Tranel D, Cohen NJ. The long and the short of it: relational memory impairments in amnesia, even at short lags. Journal of Neuroscience. 2006 Aug 9;26(32):8352-9.

The rationale behind the inclusion of this citation on our part was that transferring Hollingworth’s paradigm into 3D (e.g. overlooking an arrangement of objects in a 3D scene), maps perfectly onto the toy environment our virtual agent moves in, and would provide a good example of the use of object vector cells. In fact, detecting a change of an object’s position within the scene maps 1:1 to the simulation conducted for 1A, which exploits position specificity conferred by object vector cells. Hence we hope the referees agree that the inclusion of the object novelty paradigm described in 1A warrants the citation of Hollingworth (2007).

1D) 'Scene construction' and 'episodic future thinking'

We address the concerns about our suggestion that the model serves as an explanation for some aspects of 'scene construction' and 'episodic future thinking' separately.

First, we clarify that the model provides a potential explanation for the neural activity seen in MTL, retrosplenial cortex and precuneus (e.g. Hassabis, Kumaran and Maguire, 2007) during imagery for previously experienced scenes (i.e. when subjects were instructed to perform recall of a past event) in terms of the overlap between the model structures and the reported brain regions (see also, Burgess et al., 2001; Schacter, Addis and Buckner, 2007). However, we now note that the explanatory power of the model with regards to imagined future (or potentially non-existent) scenes is more limited (please also see our brief discussion of episodic future thinking below).

With regard to the neural activity seen in experiments during imagery for scenes in the MTL Hassabis, Kumaran and Maguire (2007) note that “the imaginary constructions produced by four of the five patients ([hippocampal amnesics]) were greatly reduced in richness and content compared with those of controls. The impairment was especially pronounced for the measure of spatial coherence, indicating that the constructions of the amnesic patients tended to consist of isolated fragments of information, rather than connected scenes.” It is currently impossible to map our quantification onto subjective, participant-dependent metrics like ‘spatial coherence’ as used in Hassabis, Kumaran and Maguire (2007). However, we have shown that a hippocampal lesion (see 1A) at best allows for the association of fragments of a scene (e.g. perirhinal cells being linked OVCs). No coherent scene can be constructed in the model without hippocampal place cells binding all scene elements together. We would cautiously suggest that such a deficit could potentially underlie such lack of spatial coherence reported by amnesics in Hasssabis, Kumaran and Maguire (2007) with regard to previously experienced scenes.

Concerning the model providing an explanation for some aspects of 'episodic future thinking' (Schacter, Addis and Buckner, 2007; Hassabis, Kumaran and Maguire, 2007; Buckner 2010), we apologise if this was overstated. Episodic future thinking explicitly links memory with planning for future behaviour. “Brain regions that have traditionally been associated with memory appear to be similarly engaged when people imagine future experiences” Schacter, Addis and Buckner (2007). Our model does describe how spatial scenes can be imagined, and how this process relies on structures traditionally associated with memory. However, our simulations do not include imagination of completely novel scenes, and we now take care to state that the concept of episodic future thinking extends beyond our simulations. Nevertheless, our simulations do include imagination of novel trajectories within a familiar environment, which could be useful for planning future actions. For example, in simulation 4.0 (Figure 12; mental navigation), the agent imagines performing new trajectory and object 3 appears in the egocentric frame of reference in a relative position to the agent not experienced during encoding. We believe this constitutes a small (albeit crucial) step towards a neural account of episodic future thinking. We are now more careful in our phrasing regarding claims about episodic future thinking.

It is also implied that the model accounts for "trace cells" that fire when the agent visits the prior location of a missing object (Tsao et al., 2015), and although trace-like activity is shown in some example simulations, it is not specifically stated which neural population in the model would correspond to trace cells, nor are simulated trace cell firing rate maps presented.

1E) Trace cells

The reviewers are right that the topic of trace cells merits some clarification. In response to this comment we have restructured the section of the manuscript previously devoted to novelty and top-down activity. Novelty is now covered separately, by simulations 1.3,1.4 (replication of Mumby et al. 2002). What used to be simulation set 2 (simulations 2.1-4 in the previous version of the manuscript) is now devoted to a more in-depth discussion of trace responses.

The trace property is not necessarily tied to a specific anatomical area or cell type. We use the term trace cell to refer to any cell that used to fire in response to an object or boundary and subsequently fires in its absence. As requested we now show examples of rate maps which predict trace fields as a result of repeated memory probes which may occur roughly with the frequency of theta in rodents.

We define a memory probe as follows: a smooth modulation of top-down connectivity during perception, subject to the following criteria: 1) Sensory inputs are not disengaged, and 2) there is no cue to recall anything specific (like for instance an object in its context). This kind of probe of MTL representations can inform the perceptually driven egocentric parietal representation about absent scene elements (please see main manuscript text and Figure 10A, B). The exact duration of the probe is not important as long as it leaves enough time for activity to build up in the MTL. For the trace response simulations the memory probe occurs with a frequency of rodent theta (though no mechanism of theta generation is modelled) because it has previously been suggested that theta orchestrates the flow of information (Hasselmo, Bodelón and Wyble, 2002), at least in rodents.

However, note that some trace cell firing could also result from non-spatially selective cells. We now show that nominally non-spatially selective cells like perirhinal identity neurons can manifest a spatial trace firing field if encoding is restricted to a limited region of space (cf. Figure 10D3). This finding may help to reconcile the common notion that lateral entorhinal cortex processes non-spatial information (Van Cauter et al., 2012) with spatial responses of trace cells (Tsao, Moser and Moser, 2013) in lateral entorhinal cortex. However, unlike perirhinal neurons in the model the trace cells of Tsao et al. (2015) do not fire when the object is present, but only subsequently, in the absence of the object. Thus they might signal the mismatch between the expected presence of an object and its actual absence. However, crucially such a signal may be based on trace firing of cells similar to perirhinal cells in the model. I.e. an object specific cell which was active at the time of encoding must necessarily participate in the generation of a mismatch signal.

It is additionally claimed that the model accounts for preplay and replay activity of place cells, but it appears that simulated preplay and replay events do not occur on a compressed time scale in the model as they do in real rodents, which is not addressed.

1F) Replay and Preplay

We agree that the section on “replay” and “preplay” would benefit from a more careful presentation. We simplified the corresponding section considerably (which also addresses point 8 below, shortening the section), making clear the limitations of the model to explain these types of phenomena.

The same results are reported as before but we more correctly refer to them as simulations of planning and recall. The main aim of simulation 5.0 (Figure 13) was to show how the model can cope with shortcuts, extending its cognitive map into un-explored territory by incorporating new place cells from a reservoir. That is, the agent engages in planning, performing mental navigation along the shortest path to the goal by using a forward sweep of grid cell activity to drive corresponding activity in established place cells (across familiar ground) or in proto-place cells (across unexplored ground). We now simply point out that the internally-generated sequential activity of place cells is reminiscent of forward replay or preplay (referring to ‘preplay-like’ activity) but do not claim (and state so explicitly) to have fully modelled those phenomena.

The planning stage engenders preplay-like activity in place cells (Dragoi and Tonegawa 2011; Olafsdottir et al., 2016), whereas the mental navigation phase is reminiscent of replay (Wilson and McNaughton 1994; Foster and Wilson 2006; Diba and Buzsaki 2007; Karlsson and Frank 2009; Carr, Jadhav and Frank 2011) or ‘forward sweeps (Johnson and Redish 2007; Pfeiffer and Foster 2015), although the faster propagation of these sequences compared to those seen during actual navigation is not addressed. Both replay and preplay usually occur during sharp wave ripple events (at a compressed time scale). To account for sharp wave ripple events and theta sequences requires a spiking neuron model. Nevertheless, the causal role played by grid cells in the model suggests that some forms of replay in place cells could be driven by a sweep of activity in the grid cell population along a given trajectory, and may correspond to route planning (Kubie and Fenton 2012; Erdem and Hasselmo 2012; Bush et al. 2015; Yamamoto and Tonegawa 2017).

It is possible that a compressed time scale could be simulated if grid cell sweeps occur at faster timescales within SWRs or theta cycles, and without re-activation of parietal representations. But such extensions are outside the scope of the model (spiking neurons, theta rhythmicity, SWRs) and are not simulated.

In other words, the heavy reliance on qualitative (rather than quantitative) performance assessments make it difficult to offer anything more than a subjective evaluation of the model's capabilities. A more objective evaluation might be possible if the authors run more simulations to quantitatively compare simulated vs. real neural activity (or simulated vs. real task performance), explore how the model's performance degrades under realistic noise or uncertainty conditions, etc.

1G) Comparison to real tasks, quantification, and noise

We hope to have addressed the reviewers’ main concerns regarding quantification and comparisons to real tasks by replicating several finding reported in the literature (please see 1A, B, E), and by providing a quantitative measure of the model’s performance (correlation of population vectors). We have also provided more examples of neural activity shown in the form of rate maps (trace cells) for comparison with experimental data. We note that the design of the model was also constrained by known data in terms of the tuning curves used for spatial cells and the assumed anatomical connections between regions.

Noise/robustness

As far as noise and robustness are concerned, we perform two additional sets of simulations (all modifications of simulation 1.0, object-cued recall). In the first simulation set we randomly chose cells throughout the model to be permanently deactivated (equal proportions of place cells, grid cells, OVCs, BVCs, parietal and retrosplenial neurons etc.). This allows us to assess the robustness of the model’s capability to perform encoding and recall in visuo-spatial imagery with regards to neuron loss. Simulations with neuron loss could also serve as a model for the effects of diffuse damage, as might occur in anoxia, Alzheimer’s disease or normal aging, although we do not attempt to model any specific neurological condition. We have repeated this simulation 20 times (each time with newly drawn random numbers) and present the average correlation between the population vectors at encoding vs. recall in Figure 8.

We have tested silencing up to 20% of the cells per affected model component. The agent is still able to encode and recall the corresponding representations (cf. Figure 8). The ability to maintain a stable attractor state among place cells and head direction cells is critical to the functioning of the model, while damage in the remaining (feed-forward) model components manifests in gradual degradation in the ability to represent the locations of objects and boundaries (please see accompanying Video 3). For example, if certain parts of the parietal window suffer from neuron loss, the reconstruction in imagery is impaired only at the locations in peri-personal space which were coded for by the now missing neurons (indeed, this can be the basis for a model of hemi-spatial neglect, Byrne, Becker and Burgess 2007). The place cell populations were more robust to silencing than the head-direction population, simply because they were simulated in comparatively greater numbers of neurons and therefore exhibit redundancy.

Similarly, the model is also robust to adding firing rate noise (up to 20% of peak firing rate) to all cells. Correlations between patterns at encoding and recall remain similar to the noise-free case. Figure 8 shows the correlation between the population vectors at encoding vs. recall for all model components for 20 iterations. We include one representative example from both simulation sets as new videos (Videos 3 and 4).

2) Several key mechanisms – such as sequential shifting of attentional focus (which is essential for solving the place-object binding problem), neuromodulation (which allows the model to transition between sensory processing and mental imagery modes), and population activity in the grid cell map (which is essential for generating preplay/replay trajectories) – are not explicitly simulated by the model. Rather, these signals are provided "for free" as inputs to the network. When reciting the list of phenomena that the model can explain (see prior point), the authors should take care to include only phenomena that fall within the purview of what is actually being simulated by the network, rather what is being provided for free.

We wholeheartedly agree with the reviewers. It was certainly not our intention to claim that the model explains the mechanistic origins of attention, how grid cell firing patterns are generated by the brain, or the exact nature of the proposed neuromodulatory signal. We now emphasise more strongly in the main text that these phenomena are not explained by the model at a mechanistic level. However, we believe it remains fair to say that the model proposes what role attention plays is memory encoding. This statement can be divorced from the mechanistic origins of attention and thus constitutes a valid prediction. Similarly, assuming grid cell-like inputs to the model (e.g. in the form of pre-calculated firing rate maps) allows for predictions about the role of grid cells (e.g. in mental navigation, driving the point of view) without needlessly replicating one of the numerous articles that have proposed models of the mechanistic origin of grid cell signals.

3) The depiction and explanation of BVC and PWb data (initially shown in Figure 2A2, C1, and C2 and also described in subsequent figures) is unclear. Why are small receptive fields shown close to the agent in Figure 2A2? What does this mean? In the Video 1, receptive fields do not appear to get smaller or bigger as the agent moves around. Are the cells with receptive fields close to the agent the ones that fire at the actual boundaries, not a distance away from the boundaries? If not, in Figure 2C1, why are BVCs shown to be firing, presumably at the north and east boundaries of the environment, when the agent is in the center of the environment (i.e., not in the boundary)? Why are so many BVCs firing at the same time when the agent is in a particular location? Are these different cells that fire at different distances from the boundaries? Figure 2A2 is described as showing receptive fields for BVCs, but these receptive fields are usually more rectangular in shape, whereas they are depicted as circular here. In "bottom up" mode (e.g., Figure 5, top) it appears that only BVCs encoding boundaries ahead of the animal (those in the visual field?) are active. Is this consistent with the firing of real neurons (such as border cells)? Is there any evidence that real BVCs are active only when a rodent faces toward but not away from their coded boundary?

We have endeavoured to clarify these aspects in the first presentation of boundary vector cell and PW firing. Figure 2 was poorly explained previously and we take care now to be more precise. In particular the relationship between receptive field centers mapped on a polar grid and the firing of individual cells is now hopefully clearer.

Further clarifications

- The videos and figures never show receptive fields (with the exception of Figure 2A2 now), but neural firing of cells arranged according to where in space their receptive fields are centered.

- In the previously submitted version of the manuscript, receptive fields only scaled along the angular axis with distance (constant angular resolution) because we had modelled PW neurons and BVCs after the previously published model of Byrne, Becker and Burgess (2007). However, a better fit to the known data on BVCs is that the centers of receptive fields should also exhibit increasing radial separation as shown in panel A2 now. This has now been remedied. Similarly, the radial dispersion of the receptive fields (given by σ_rho in Equation 14) now also scales with distance from the center, such that far away receptive fields still cover multiple bins/grid points. We have rerun all model simulations with this updated topology of polar receptive fields (see Appendix), which provides a closer fit to Barry and Burgess (2007) and Lever et al. (2009).

- There are BVCs for all distances, as suggested by the arrangement of the receptive fields as a polar grid around the agent. This is why BVCs fire for boundaries of the environment, when the agent is e.g. in the center of the environment.

- Many BVCs firing at the same time collectively represent all the currently perceived boundaries. That is, for extended boundaries multiple grid points/receptive fields are covered by a boundary and hence multiple cells fire. Without extra allocation of attention to specific portions of a boundary BVC firing thus represents, to a first approximation, a context of the environment (in conjunction with place cells).

- The shape of the receptive is not rectangular but rather given by the product of exponentials in Equation 14. Panel A2 hopefully makes this clearer.

- To emphasise the clear distinction between perception and recall/imagery we have restricted perceptual activity to the field of view. To the best of our knowledge there currently is not enough data to assess if BVCs or border cells fire only when a boundary is actively perceived as long an animal is not engaged in recall. However note that boundaries outside the field of view are weakly represented due to pattern completion (see for instance in panel C1). A periodically recurring memory probe as outlined for the simulation that yielded BVC trace rate maps (see 1E) can increase this effect further.

4) The authors show a schematic of their model in Figure 4, which includes top-down and bottom-up connections between different brain areas. Lacking from the paper are multiple citations to anatomical studies that demonstrate that these connections are realistic (e.g., Jones and Witter, 2007, for projections from retrosplenial cortex to deep layers of entorhinal cortex). For example, has it been shown that the entorhinal cortex projects back to the retrosplenial cortex?

We thank the referees for catching this oversight. We now include proper citations for all the connections. Jones and Witter (2007) report that “projections from the remaining cingulate areas [including retrosplenial cortex] preferentially target the postrhinal and medial entorhinal cortices as well as the presubiculum and parasubiculum”. Since we make no strong claims about the locus of OVCs we believe this reference is compatible with the present model, regardless of whether BVCs/OVCs are preferentially located in the subicular complex (Lever et al., 2009; with OVC co-located by analogy) or in medial entorhinal cortex (Solstad et al., 2008; Høydal et al., 2017).

Regarding the back projection to retrosplenial cortex, Wyss and van Groen (1992) have reported such connections. We briefly discuss these connections when we discuss the possible locus of OVCs.

5) In Figure 7D, why isn't anything activated for the new object?

Figure 7D depicted the neural representations during imagery and the novel object had not yet been encoded into memory. In addition, if it had been, it would have needed to be in the focus of attention to be accompanied by, for instance, perirhinal activity. For a similar simulation, note that, if it had been encoded in addition to an old, removed object, the situation would have been covered by what was previously Figure 8 (sampling multiple items in imagery, now Figure 11). However, we have restructured the corresponding manuscript section to focus on trace cells, which can be compared to experimental findings and which yield explicit predictions.

6) The authors go to great lengths to realistically model MTL activity based on rodent literature. However, it was not clear how the model relates to rodent literature regarding parietal coding (e.g. Harvey, Coen and Tank, 2012; Raposo, Kaufman and Churchland, 2014; Whitlock et al., 2012) and retrosplenial coding (Vedder, Miller, Harrison, Smith, Cerebral Cortex, 2017; Alexander and Nitz, 2017; Alexander and Nitz, 2015). Can the model account for the type of coding observed in navigating rodents in these regions? Or are there specific predictions the model can make about what would be observed or how these types of observed neural codes can be interpreted in the context of spatial cognition?

We agree with the reviewers that the rodent literature concerning the parietal side of the model has been neglected in the text. We have instead been more inspired by the human side of the literature for this model component. This is mainly due to the fact that neurons resembling parietal window cells, as presented in the model, have only recently been reported (Hinman, Chapman and Hasselmo 2017), compared to vast literature on place cells, grid cells, head direction cells, and, to some extent, boundary (vector) cells. Some findings, e.g. the firing patterns correlated with multiple frames of reference (e.g. Wilber et al., 2014) have parallels with the model (in this case with our retroplenial model component. However, although intriguing correlations have been reported in parietal cortex (Nitz 2006, 2009, 2012; Harvey, Coen and Tank 2012; Whitlock et al., 2012; Raposo, Kaufman and Churchland 2014; Vedder et al., 2016) it remains difficult (compared to place cells or grid cells) to identify a unified representational code underlying all of the reported recordings. It is however an intriguing topic for future work. Also because trying to incorporate these findings would considerably inflate the already lengthy manuscript.

7) In some places, terms are used that assume prior knowledge to a degree that can make the paper difficult to parse. Some examples include 'gain-field circuit' (Introduction), the current model as an extension of the 'BBB model' (Introduction), 'heuristically implemented', 'mock-motor-efference'. Terms like this could use additional (but very brief) explanation when introduced.

We now clarify these terms upon first occurrence. The BBB abbreviation has been removed since it only occurred once.

8) The paper can be a bit long in places – particularly the sections on preplay and replay.

We have shortened these sections. We realize that the rest of the paper is a bit long in places and have endeavoured to be more concise where possible, within the constraints of the requested corrections and additional work.

9) Figure 4: was activity from place cells to grid cells considered?

We did not consider this type of activity for multiple reasons. 1) On a practical level, since grid cells are implemented as firing rate maps it is not clear how any influence of place cells on grid cells could be implemented. 2) On a theoretical level, it has been argued that place cells (when driven by sensory inputs) can help anchor grid cells, which exhibit drift due to path integration errors. Regarding the dynamics of the model, since grid cells are represented by static firing rate maps, there was no need for stabilizing place cell inputs to grid cells. We believe the functional meaning of the connection from place cells to grid cells is not relevant for the model in its current form, although intriguing new work on the issue (e.g. grid cells performing PCA on place cell representations; Dordek et al., 2016) might warrant a closer look in future work.

10) The presence of object specific coding appears to be a 'strong prediction' of the model, as object coding reported thus far is minimally selective for specific objects. The authors detail potential places where this activity might be found but they may want to consider that these types of representations lay outside the cortex or emerge as a population level representation (i.e. would not be observable at the level of single cell tuning curve responses).

We are not quite sure of the question here. Our model does indeed take into account that ‘object coding reported thus far is minimally selective for specific objects.’ A given object vector cell fires for any object, as long as an object is at a certain distance in a certain direction: it is not selective for specific objects. This is a key prediction, similar to the original prediction that BVCs respond to any boundary at the correct distance and direction (Hartley et al., 2000). Object specific coding is only present in the perirhinal identity cells, and there are many reports of this type of coding there (e.g. Miller et al., 1993; Xiang and Brown 1998; Zhu et al., 1995). Only the conjunction of OVC firing and perirhinal identity cells uniquely defines a specific object at a specific place relative to the agent. Uniquely specifying absolute position requires place cells as well. Thus, we agree that, although individual cells have well-defined tuning curves, it is a population code of OVCs, PCs and perirhinal cells which uniquely defines the neural representation of an object in its place.

Miller, E. K., Li, L. & Desimone, R. Activity of neurons in anterior inferior temporal cortex during a short-term memory task. J. Neurosci. 13, 1460–1478 (1993).

Xiang, J. Z. & Brown, M. W. Differential neuronal encoding of novelty, familiarity and recency in regions of the anterior temporal lobe. Neuropharmacology 37, 657–676 (1998).

Zhu, X. O., Brown, M. W. & Aggleton, J. P. Neuronal signalling of information important to visual recognition memory in rat rhinal and neighbouring cortices. Eur.J. Neurosci. 7, 753–765 (1995).

[Editors’ note: the authors’ responses to first round of re-review follow]

Reviewer #3:

In this paper, Bicanski and Burgess present a network model to propose how multiple populations of spatially tuned neurons are functionally interconnected with one another to form a spatial memory network with multiple capabilities.

A core feature of the model is that it proposes how bidirectional transformations between the egocentric and allocentric reference frames might be performed by networks in the retrosplenial cortex. The paper ambitiously attempts to demonstrate numerous capabilities for the model, such as mapping a familiar environment, learning the locations of unique landmark objects in such an environment, detecting novelty when changes occur in an environment, and simulating effects of brain lesions upon memory and navigation. The paper incorporates 13 videos, 13 figures, and 18 methods equations. Despite its length (nearly 60 pages), the paper does not provide enough detail for readers to fully understand some key aspects of the model (see below). This should not be interpreted as an entreaty to make the paper even longer. Rather, the paper should probably be split up into at least two publications, each dedicated to exploring specific capabilities of the model with more clarity and depth.

As noted in the prior review, simulations of mental navigation and route planning do not seem to be rooted in the model's core ability to perform bidirectional egocentric / allocentric transformations. Instead, it is the addition ("for free") of a grid cell network connected to place cells endows the model with this capability. How does the egocentric / allocentric transformation network contribute to the mental navigation and route-planning process? What is the functional purpose for transforming allocentric replay events back into the egocentric coordinate frame to generate imagery? The grid cell driven simulations do not even appear until the subsection “Grid cells and mental navigation (Simulation 4.0)”, and given how information dense the preceding pages are, perhaps the addition of the grid cell network and accompanying simulations might be better suited to a second paper, dedicated more specifically to the topic of route planning?

Contrary to the reviewer’s new statement (not mentioned in the first review), mental navigation and route planning do indeed require the model's core ability to perform bidirectional egocentric / allocentric transformations. As stated multiple times throughout the manuscript one of the core hypotheses of the present model is that an agent only has conscious access to the egocentric representation of space around itself, which embodies a point of view. This hypothesis is strongly supported by striking, classical empirical findings like hemispatial neglect. Thus, updating the (imagined) location of the agent by the action of grid cells on place cells in the medial temporal lobe still requires the egocentric-allocentric transformation to allow imagery of the simulated outcome. Without this transformation, there would be no mental navigation.

Another problem raised in review, which has not been adequately addressed, is that the sequential shifting of attentional focus is not explicitly simulated by the model, and not explained fully enough for readers to comprehend how this process contributes to the simulation results. To solve the problem of binding each object or boundary's location to its identity, the model adopts a sequential shift of attention strategy. If I understand correctly, each population of boundary (BVC) or object (OVC) cells is activated one at a time in conjunction with its corresponding boundary (PRb) or object (PRo) identity cell. The length of each attentional cycle is stated to be 600 time units. How does information about multiple objects and boundaries get integrated across multiple attentional cycles to form a stable and non-fluctuating representation of the environment as a whole? Do PCs have slow activity decay kinetics that can span multiple attentional cycles? How is such temporal integration accounted for in analyses like that shown in Figure 8, where momentary "snapshots" of population vectors are being correlated with one another across encoding vs. recall? Because of the sequential attention mechanism, one would expect that an incomplete representation of the environment (i.e., just one object and one boundary) would be active at any given time. So I don't quite see how it is possible to use an instantaneous snapshot of the population vector to perform these correlation analyses, unless the snapshot is being taken at the end of some temporal integration process that spans multiple attention cycles?

With regard to the details of the attentional modulation, the OVCs are subject to the attentional modulation because we simulate the encoding of novel objects into a familiar context. Attention is required at encoding to allow the relevant representations to be bound together (in this case OVCs, place cells and object-identity neurons). At retrieval, the associations have been formed, and a complete scene representation is stably present. Population vectors are sampled at the time of encoding, which is a precise moment in simulation: there is no need for averaging across a cycle because the encoded object is in the focus of attention at the time of encoding. These population vectors are then compared to the stable representations during retrieval/imagery.

Finally, since the heart of the model is the retrosplenial transformation network, it is a bit surprising that no simulation results are shown to demonstrate the predicted firing properties of retrosplenial neurons that perform the egocentric / allocentric transformation (in either direction) during imagery and recall. Do any testable predictions for unit recording studies arise from the firing properties of these model neurons? If so, then it seems like they should be included in the paper.

The firing properties of retrosplenial neurons have been shown in Video 1, and we would happily add a static supplementary figure (e.g. of rate maps) to formulate an additional prediction.

In summary, there are some innovative features of this model that would of interest to the research community, but the format of the paper (including its length) may not be best suited for publication in eLife. The authors might wish to consider splitting the paper up into two publications (e.g., one on coordinate transformation, and one on mental navigation and route following) so that some of the missing details of the model can be more thoroughly described without adding even more to the current manuscript's excessive length.

In addition, the second round reviewer requests full models of attention, grid cells, and parietal cortex. We do not think that would be possible, even if split over 2 papers. More importantly, we do not see how the absence of these additional models details diminishes the thorough and comprehensive theoretical account of spatial memory we have proposed.

[Editors’ note: the author responses to the re-review follow.]

Thank you for resubmitting your work entitled "A Neural-Level Model of Spatial Memory and Imagery" for further consideration at eLife. Your revised article has been favorably evaluated by Michael Frank (Senior Editor), a Reviewing Editor, and one new reviewer.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

Reviewer #1:

In the revised manuscript, the authors have done a lot of work to respond to previous criticisms, including adding new results to respond to the initial criticisms.

There is much value and novelty in this model, which provides a potential explanation of how egocentric and allocentric representations of space are reconciled. Also, as the authors point out in their appeal, the excessive length of the manuscript is due to results that were added to satisfy the original reviews. I feel confident that the specific feedback provided below will allow the authors to clarify points that remain puzzling.

1) I didn't really understand what the "transformation sublayers" are. Are they between areas or in the retrosplenial cortex part of the model? How are they different from "individual sublayers" (subsection “The Head Direction Attractor Network and the Transformation Circuit”, second paragraph)?

The transformation sublayers are the same as the individual sublayers. The whole transformation circuit consists of 20 individual sublayers (please see Video 1 and the new supplementary figure, Figure 2—figure supplement 1), each maximally modulated by a different head direction. We apologise for the confusion and have standardized the terminology throughout, using the term ‘sublayer’ consistently with retrosplenial cortex. The term ‘circuit’ is used to refer to the whole of the retrosplenial model component, and the generic term ‘layer’ is replaced by the term ‘population’. We have also partially rewritten (and shortened) subsection “The Head Direction Attractor Network and the Transformation Circuit” to further clarify the structure of the retrosplenial component further.

2) In all of the figures and videos involving the top-down mode/recall/imagery, why does firing occur for all directions of boundaries in PWbs and BVCs?

This is because the simulated environment is familiar, so that the agent has experienced it from many different points of view at each location. Thus there are reciprocal connections between place cells and BVCs for all boundaries, for a given location. Since, during imagery, PWb neurons are driven by BVC activity (via the transformation circuit) they fire for boundaries in all directions (see Figure 2—figure supplement 1), even though the PWb neurons behind the agent would not be driven by perception. We now make extra note of this at the first occurrence of recall/imagery (near Figure 5).

3) In Figure 12A, why are OVCs to the east firing when object 2 is in the northeast of allocentric space?

The object is in the North East of the map, but relative to the agent it lies East. Hence OVCs representing the location of object 2 are firing East of the origin of the OVC plot because the object is East of the agent. The OVC grid of receptive fields (like the BVC grid) is anchored to the agent (i.e. the origin is the location of the agent and moves with it). The allocentric aspect lies in the independence of BVC/OVC responses with regard to the orientation of the agent (e.g. whether the object is to the left or the right). We now make an extra note of this in the discussion of receptive field topology (in the caption of Figure 2). Hopefully the new Figure 2—figure supplement 1 also helps to clarify the nature of these receptive fields.

4) In Video 8, I don't understand at time = 1.67 seconds how the OVC pattern matches that during perception for object 2. It looks like object 1. OVC activity at NW at 1.67 seconds looks the same as OVC activity at NW at 6.52 seconds, the time at which the text states that activity matches that during perception for object 1. Related to this, in Video 12, at time = 11.52 s, OVC do not seem to be representing object 1 (i.e., it is not in the SE).

In Video 8 the time in imagery (during the 1st recall episode) can be divided into 2 phases. From roughly 1.60 to 2.70s object 1 is being recalled (with corresponding OVC pattern). From 2.70 to 2.80s object 2 is being recalled (with corresponding OVC pattern). Only for object two there is a match to the perceptual representation (experienced prior to recall). The confusion is likely due to the fact that the annotation appears already at time 1.60 but actually refers mainly to phase two, when object 2 is recalled after object 1. We apologize for the inaccuracy and have now split the annotation to be more precise.

The mental navigation representation of object 3 in OVC at 15.21 seconds looks the same as the mental navigation representation of object 1 at time = 11.36 seconds.

This is again due to the fact that the OVC grid (similar to the BVC grid) translates with the agent, but is independent of the orientation of the agent. At both times (around 11.35s and around 15.21s) the object is roughly South East to the agent at a similar distance (though not precisely at the same angle) but, both times with different place cell representations, which disambiguate the representational code overall. We hope the changes we made in response to comment 3 also clarify this case.

https://doi.org/10.7554/eLife.33752.036

Article and author information

Author details

  1. Andrej Bicanski

    Institute of Cognitive Neuroscience, University College London, London, United Kingdom
    Contribution
    Conceptualization, Software, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    a.bicanski@ucl.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3356-1034
  2. Neil Burgess

    Institute of Cognitive Neuroscience, University College London, London, United Kingdom
    Contribution
    Conceptualization, Supervision, Funding acquisition, Methodology, Project administration, Writing—review and editing
    For correspondence
    n.burgess@ucl.ac.uk
    Competing interests
    Reviewing Editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0646-6584

Funding

European Research Council (NEUROMEM)

  • Andrej Bicanski
  • Neil Burgess

Human Brain Project SGA1 (720270)

  • Andrej Bicanski
  • Neil Burgess

European Commission (SpaceCog)

  • Andrej Bicanski
  • Neil Burgess

Human Brain Project SGA2 (785907)

  • Andrej Bicanski
  • Neil Burgess

Wellcome Trust

  • Neil Burgess

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We acknowledge funding by the ERC Advanced grant NEUROMEM, the Wellcome Trust, the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 720270 Human Brain Project SGA1 and grant agreement No. 785907 Human Brain Project SGA2, and the EC Framework Program 7 Future and Emerging Technologies project SpaceCog. We thank all members of the SpaceCog project, and James Bisby, Daniel Bush and Tim Behrens for useful discussions. The authors declare no competing financial interests.

Senior Editor

  1. Michael J Frank, Brown University, United States

Reviewing Editor

  1. Laura Colgin, The University of Texas at Austin, Center for Learning and Memory, United States

Publication history

  1. Received: November 28, 2017
  2. Accepted: July 29, 2018
  3. Version of Record published: September 4, 2018 (version 1)

Copyright

© 2018, Bicanski et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,813
    Page views
  • 250
    Downloads
  • 2
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)