CMR-replay.

a. Consider a task of encoding a sequence consisting of four items, each denoted by a shade of blue. b. We propose a model of replay that builds on the context-maintenance and retrieval model (CMR), which we refer to as CMR-replay. The model consists of four components: experience (f), context (c), item-to-context associations (M fc), and context-to-item associations (M cf). At each timestep during awake encoding, f represents the current item and c is a recency-weighted average of associated contexts of past and present items. CMR-replay associates f and c at each timestep, updating M fc and M cf according to a Hebbian learning rule. M fc and M cf respectively support the retrieval of an item’s associated context and a context’s associated items. During replay, f represents the current reactivated item and c is a drifting context representing a recency-weighted average of associated contexts of past and present reactivated items. Here, too, the model updates M fc and M cf to associate reactivated f and c. The figure illustrates the representations of f, c, M fc, and M cf as the model encodes the third item during learning. Lengths of color bars in f and c represent relative magnitudes of different features. Shades of grey illustrate the weights in M fc and M cf. Orange features represent task-irrelevant items, which do not appear as inputs during awake encoding but compete with task-relevant items for reactivation during replay. c. During both awake encoding and replay, context c drifts by incorporating the current item f’s associated context cf and downweighting previous items’ associated contexts. The figure illustrates how context drifts during the first time the model encodes the example sequence. d. The figure illustrates M fc and M cf updates as the model encodes the third item during the first presentation of the sequence. e. Consider the activation of items at the onset of sleep and awake rest across sessions of learning. At replay onset, an initial probability distribution across experiences a0 varies according to the behavioral state (i.e., awake rest or sleep). Compared to sleep, during awake rest, a0 is strongly biased toward features associated with external inputs during awake rest. For awake rest, the figure shows an example of a0 when the model receives a context cue related to the fourth item. Through repeated exposure to the same task sequence across sessions of learning, activities of the four task-related items (i.e., blue items) become suppressed in a0 relative to task-irrelevant items (i.e., orange items). f. Each replay period begins by sampling an experience ft=0 according to a0, where t denotes the current timestep. If ft=0 is a task-related item, its associated context cft=0 is reinstated as c0 to enter a recursive process. During this process, at each timestep t ≥ 1, ct−1 evokes a probability distribution at that excludes previously reactivated experiences. Given at, the model samples an experience ft and reinstates ft’s associated context cft, which is combined with ct−1 to form a new context ct to guide the ensuing reactivation. The dashed arrow indicates that ct becomes ct−1 for the next time step. At any t, the replay period ends with a probability of 0.1 or if a task-irrelevant item is reactivated.

Context-dependent variations in memory replay.

a. As observed in rodents (left), replay in CMR-replay (right) is predominantly forward at the start of a run and backward at the end of a run on a linear track. b. Consistent with rodent data (left), in CMR-replay (right), the proportion of forward replay is higher during sleep than during awake rest. c. The presence of external cues during sleep biases replay toward their associated memories both in animals (left) and in CMR-replay (right). Error bars represent ±1 standard error of the mean. *p<0.05; **p<0.01; ***p<0.001. Left image in a adapted from ref [28], Nature Publishing Group; left image in b adapted from ref [35], Wiley; left image in c adapted from [34], Nature Publishing Group.

Reward leads to over-representation in sleep and modulates the rate of backward replay.

a. Sleep over-represents experiences associated with reward in animals (left) and in CMR-replay (right). Error bars represent ±1 standard error of the mean. b. Varying the magnitude of reward outcome leads to differences in the frequency of backward but not forward replay in animals (left) and CMR-replay (right). In the animal data (left), error bars show 95% confidence intervals. For simulation results (right), error bars show ±1 standard error of the mean. Left image in a adapted from ref [17], eLife; images in the left column adapted from ref [19], Elsevier.

Replay activates remote experiences and links temporally-separated experiences.

a. The two panels show examples of remote and novel (shortcut) replay sequences observed in animals. The colored items indicate the temporal order of the sequences (light blue, early; purple, late). The red item denotes the resting position. b. CMR-replay also generates remote and shortcut rest replay, as illustrated in the two predicted sequences of neural firing in the two panels. c. Proportion of replay events that contain remote sequences in animals (left) and in CMR-replay (right). Error bars show ± 1 standard error of the mean in the data and model. d. In Liu et al. (2019), participants encoded scrambled versions of two true sequences X1X2X3X4 and Y1Y2Y3Y4: X1X2Y1Y2, X2X3Y2Y3, and X3X4Y3Y4 (Fig. 7g). After learning, human spontaneous neural activity showed stronger evidence of sequential reactivation of the true sequences (left). CMR-replay encoded scrambled sequences as in the experiment. Consistent with empirical observation, subsequent replay in CMR-replay over-represents the true sequences (right). Error bars show ± 1 standard error of the mean in the model. Images in a adapted from ref [15], Elsevier; left image in c adapted from ref [15], Elsevier; left image in d adapted from ref [21], Elsevier.

Variations in replay as a function of experience.

a. In CMR-replay, through repeated exposure to the same task, the frequency of replay events decreases (left), the average length of replay events increases (middle), and the proportion of replay events that are backward remains stable (after a slight initial uptick; right). b. With repeated experience in the same task, animals exhibit lower rates of replay (left) and longer replay sequences (middle), while the proportion of replay events that are backward stays relatively stable (right). c. In a T-maze task, where animals display a preference for traversing a particular arm of the maze, replay more frequently reflects the opposite arm [23] (left). CMR-replay preferentially replays the right arm after exposure to the left arm and vice versa (right). Error bars show ± 1 SEM in all panels. Images in b adapted from ref [31], Elsevier; left image in c adapted from [23], Nature Publishing Group.

Learning from replay.

a. Sleep increased the likelihood of reactivating the learned sequence in the correct temporal order in CMR-replay, as seen in an increase in the proportion of replay for learned sequences post-sleep. b. Sleep leads to greater reactivation of rewarded than non-rewarded experiences, indicating that sleep preferentially strengthens rewarded memories in CMR-replay. c. In the simulation of Liu et al. [9], CMR-replay encoded six sequences, each of which transitioned from one of three start items to one of two end items. After receiving a reward outcome for the end item of a sequence, we simulated a period of rest. After but not before rest, CMR-replay exhibited a preference for non-local sequences that led to the rewarded item. This preference emerged through rest despite the fact that the model never observed reward in conjunction with those non-local sequences, suggesting that rest replay facilitates non-local learning in the model. d. We trained a “teacher” CMR-replay model on a sequence of items. After encoding the sequence, the teacher generated replay sequences during sleep. We then trained a separate blank-slate “student” CMR-replay model exclusively on the teacher’s sleep replay sequences. To assess knowledge of the original sequence, we collected sleep replay sequences from both models, and assessed the probability that each model reactivates the item at position i + lag of the sequence immediately following the reactivation of the i-th item of the sequence, conditioned on the availability of the i-th item for reactivation. Both models demonstrated a tendency to reactivate the item that immediately follows or precedes the just-reactivated item on the original sequence. This result suggests that the student acquired knowledge of the temporal structure of original sequence by encoding only the teacher’s replay sequences. Error bars show ± 1 SEM.

Task simulations.

Each enclosed box corresponds to a unique item. Each arrow represents a valid transition between two items. Each dashed arrow represents a distractor that causes a drift in context between two items. Task sequences initiate at light grey boxes. Dark grey boxes represent salient items in each task. For tasks with multiple valid sequences, the order in which sequences are presented is randomized. a. Simulation of a linear track. b. Simulation of the task in Liu et al. [9]. c. Simulation of a two-choice T-maze. d. Simulation of a T-maze. e. Simulation of the task in Bendor and Wilson [34]. f. Simulation of a linear track task with distinct directions of travel. g. Simulation of input sequences in Liu et al. [21]. h. Simulation of a fix-item sequence.