A model of hippocampal replay driven by experience and environmental structure facilitates spatial learning

  1. Nicolas Diekmann
  2. Sen Cheng  Is a corresponding author
  1. Institute for Neural Computation, Faculty of Computer Science, Ruhr University Bochum, Germany
  2. International Graduate School of Neuroscience, Ruhr University Bochum, Germany
9 figures, 1 table and 1 additional file

Figures

The grid world.

(A) An example of a simple grid world environment. An agent can transition between the different states, that is, squares in the grid, by moving in the four cardinal directions depicted by …

Figure 2 with 3 supplements
Illustration of the Spatial structure and Frequency-weighted Memory Access (SFMA) replay model.

(A) The interaction between the variables in our replay mechanism. Experience strength C(e), experience similarity D(e|et) and inhibition of return I(e) are combined to form reactivation prioritization …

Figure 2—figure supplement 1
Step by step example of a replay sequence generated by SFMA.

From left to right: the different variables of SFMA as well as the reactivated experiences. Experience strength C(e) had the same value for all experiences and was therefore omitted. From top to …

Figure 2—figure supplement 2
Reconciling non-local replays and preferential replay for current location in online replay.

(A) Histogram of starting locations of offline replay recorded in the virtual version of Gupta et al.’s experiment. Raw occurrences for each bin were divided by the maximum number of bin occurrences …

Figure 2—figure supplement 3
Possible replay modes and example replay trajectories.

(A) Experience similarities of the different replay modes, that is, default, reverse, forward and attractor mode, for a given experience (marked in green). (B) Example replay trajectories generated …

Figure 3 with 4 supplements
Statistics of replay has large impact on spatial learning.

(A) The three goal-directed navigation tasks that were used to test the effect of replay on learning: linear track, open field and maze. In each trial, the agent starts at a fixed starting location …

Figure 3—figure supplement 1
Directionality of consecutive replay pairs.

(A) Pairs produced on a linear track when using the default mode for replays generated at the start (top) and end (bottom) of a trial. A pair of consecutively reactivated experiences et-1 and et was …

Figure 3—figure supplement 2
Example replay trajectories in different spatial navigation tasks.

(A) Replay trajectories generated at the start and end of trials in the linear track for different replay modes (default and reverse) and different stages of learning (early and late). Color …

Figure 3—figure supplement 3
Reward changes trigger higher probability of activating the reverse mode.

(A) The probability of replay being generated in the reverse mode for agents trained in open field (left) and T-maze (right) for 300 trials. Reward was changed once after 100 trials (+0.1) and again …

Figure 3—figure supplement 4
The number of sequences decreases with experience.

The number of sequences per replay as a function of experienced trials for three different environments, that is, linear track, open field and labyrinth. The replay lengths were 10, 10 and 50, …

Figure 4 with 5 supplements
Replays resemble random walks across different parameter values for Default Representation (DR) discount factor and inhibition decay.

(A) Example replay sequences produced by our model. Reactivated locations are colored according to recency. (B) Displacement distributions for four time steps (generated with βM=5). (C) A linear …

Figure 4—figure supplement 1
Replays generated using the reverse mode also resemble random walks across different parameter values.

Replays generated using the reverse mode also resemble random walks across different parameter values for Default Representation (DR) discount factor and inhibition decay. (A) Displacement …

Figure 4—figure supplement 2
Displacement distributions for different inverse temperature values.

Displacement distributions for different inverse temperature values (γDR=0.9 and λ=0.9). (A) Displacement distribution for the first four time steps with βM=5 given homogeneous experience strength. (B) The …

Figure 4—figure supplement 3
For heterogeneous experience strengths replays also resemble random walks across different parameter values.

For heterogeneous experience strengths replays also resemble random walks across different parameter values for Default Representation (DR) discount factor and inhibition decay. (A) Homogeneous …

Figure 4—figure supplement 4
The starting positions of replay are randomly distributed across the environment.

The starting positions of replay are randomly distributed across the environment in a 2-d open field. (A) Distribution of replay starting positions given homogeneous experience strengths. Starting …

Figure 4—figure supplement 5
Prioritized Memory Access’ (PMA) ability to produce sequences is severely disrupted when the gain calculation for n-step updates is adjusted.

(A) Sequences generated by PMA (Mattar and Daw, 2018) given uniform need and all-zero gain. Because utility is defined as a product of gain and need, the gain must have a nonzero value to prevent …

Figure 5 with 3 supplements
Replay of shortcuts results from stochastic selection of experiences and the difference in relative experience strengths.

(A) Simplified virtual version of the maze used by Gupta et al., 2010. The agent was provided with reward at specific locations on each lap (marked with an R). Trials started at bottom of the center …

Figure 5—figure supplement 1
Number of shortcut replays on a trial by trial basis differs depending on behavioral statistics.

Shortcut replays in each trial for different experimental conditions: alternating-alternating (AA), right-left (RL), right-alternating (RA) and alternating-left (AL). For all conditions the number …

Figure 5—figure supplement 2
Experience strengths resulting from different behavioral statistics in Gupta et al., 2010 experiment.

Experience strengths after the first (left panels) and second half (right panels) given the different behavioral statistics (rows). For simplicity, the experience strength shown in each state is the …

Figure 5—figure supplement 3
Following strongly stereotypical behavior efficient learning occurs for the default mode after a change in goal location.

Top: he learning performance for different replay modes in a T-maze environment. The goal arm changes after 300 trials. For learning of the initial goal, the default mode is clearly outperformed by …

Figure 6 with 1 supplement
The default representation (DR) allows replay to adapt to environmental changes.

(A) Reactivation maps, that is, the fraction of state reactivations, for all replays recorded in three environments while the agent is located at the white square. Experience similarity is based on …

Figure 6—figure supplement 1
Reverse mode replays adapt to environmental changes.

(A) Reactivation maps, that is, the fraction of state reactivations, for reverse mode replay recorded in three environments (γDR=0.1). The color bar indicates the fraction of reactivations across all …

Figure 7 with 1 supplement
Preplay of cued, but unvisited, locations can be explained by visual exploration.

(A) Schematic of the simplified virtual maze and task design used to model study by Ólafsdóttir et al., 2015. First, in an initial run phase, the agent ran up and down the stem of the maze. Second, …

Figure 7—figure supplement 1
Preplay of yet unvisited cued locations can be explained by visual exploration (reverse mode).

(A) Fractions of reactivated arm locations. Experiences associated with the cued arm (right) are preferentially reactivated. (B) The fractions of choosing the cued arm and uncued arm before and …

Increasing the reward modulation of experience strengths leads to an over-representation of rewarded locations in replays.

(A) Reactivation maps for online replays generated in an open field environment at trial begin (left) and end (middle) as well as for offline replays (right). Reward modulation was one. Replay tends …

Replay will enter aversive zones given previous experience.

(A) The reactivation map indicates that the dark zone (left half), which contains the shock zone, was preferentially replayed even after the shock administered and the agent stopped in the middle. (B

Tables

Table 1
Training settings for each task.
EnvironmentTrialsSteps per TrialReplay Length
Linear Track2010010
Open Field10010010
Labyrinth10030050

Additional files

Download links