(A) An example of a simple grid world environment. An agent can transition between the different states, that is, squares in the grid, by moving in the four cardinal directions depicted by …
(A) The interaction between the variables in our replay mechanism. Experience strength , experience similarity and inhibition of return are combined to form reactivation prioritization …
From left to right: the different variables of SFMA as well as the reactivated experiences. Experience strength had the same value for all experiences and was therefore omitted. From top to …
(A) Histogram of starting locations of offline replay recorded in the virtual version of Gupta et al.’s experiment. Raw occurrences for each bin were divided by the maximum number of bin occurrences …
(A) Experience similarities of the different replay modes, that is, default, reverse, forward and attractor mode, for a given experience (marked in green). (B) Example replay trajectories generated …
(A) The three goal-directed navigation tasks that were used to test the effect of replay on learning: linear track, open field and maze. In each trial, the agent starts at a fixed starting location …
(A) Pairs produced on a linear track when using the default mode for replays generated at the start (top) and end (bottom) of a trial. A pair of consecutively reactivated experiences and et was …
(A) Replay trajectories generated at the start and end of trials in the linear track for different replay modes (default and reverse) and different stages of learning (early and late). Color …
(A) The probability of replay being generated in the reverse mode for agents trained in open field (left) and T-maze (right) for 300 trials. Reward was changed once after 100 trials (+0.1) and again …
The number of sequences per replay as a function of experienced trials for three different environments, that is, linear track, open field and labyrinth. The replay lengths were 10, 10 and 50, …
(A) Example replay sequences produced by our model. Reactivated locations are colored according to recency. (B) Displacement distributions for four time steps (generated with ). (C) A linear …
Replays generated using the reverse mode also resemble random walks across different parameter values for Default Representation (DR) discount factor and inhibition decay. (A) Displacement …
Displacement distributions for different inverse temperature values ( and ). (A) Displacement distribution for the first four time steps with given homogeneous experience strength. (B) The …
For heterogeneous experience strengths replays also resemble random walks across different parameter values for Default Representation (DR) discount factor and inhibition decay. (A) Homogeneous …
The starting positions of replay are randomly distributed across the environment in a 2-d open field. (A) Distribution of replay starting positions given homogeneous experience strengths. Starting …
(A) Sequences generated by PMA (Mattar and Daw, 2018) given uniform need and all-zero gain. Because utility is defined as a product of gain and need, the gain must have a nonzero value to prevent …
(A) Simplified virtual version of the maze used by Gupta et al., 2010. The agent was provided with reward at specific locations on each lap (marked with an R). Trials started at bottom of the center …
Shortcut replays in each trial for different experimental conditions: alternating-alternating (AA), right-left (RL), right-alternating (RA) and alternating-left (AL). For all conditions the number …
Experience strengths after the first (left panels) and second half (right panels) given the different behavioral statistics (rows). For simplicity, the experience strength shown in each state is the …
Top: he learning performance for different replay modes in a T-maze environment. The goal arm changes after 300 trials. For learning of the initial goal, the default mode is clearly outperformed by …
(A) Reactivation maps, that is, the fraction of state reactivations, for all replays recorded in three environments while the agent is located at the white square. Experience similarity is based on …
(A) Reactivation maps, that is, the fraction of state reactivations, for reverse mode replay recorded in three environments (). The color bar indicates the fraction of reactivations across all …
(A) Schematic of the simplified virtual maze and task design used to model study by Ólafsdóttir et al., 2015. First, in an initial run phase, the agent ran up and down the stem of the maze. Second, …
(A) Fractions of reactivated arm locations. Experiences associated with the cued arm (right) are preferentially reactivated. (B) The fractions of choosing the cued arm and uncued arm before and …
(A) Reactivation maps for online replays generated in an open field environment at trial begin (left) and end (middle) as well as for offline replays (right). Reward modulation was one. Replay tends …
Environment | Trials | Steps per Trial | Replay Length |
---|---|---|---|
Linear Track | 20 | 100 | 10 |
Open Field | 100 | 100 | 10 |
Labyrinth | 100 | 300 | 50 |