Experimental procedure in the MEG.

Control: Eight minutes of resting state activity were recorded before participants encountered any stimuli. Localizer task: Next, the ten individual items were repeatedly presented to participants auditorily and visually to extract multisensory activity patterns. Learning: Participants learned triplets of the ten items pseudo-randomly generated based on an underlying graph, by performing a three-alternatives-forced-choice. Participants were unaware of the graph layout. Consolidation: Eight minutes of resting state activity were recorded. Retrieval: Participants’ recall was tested by cueing triplets from the graph. The letters in the pictograms are placeholders for individual images. Figure and caption adapted from Kern et al., (2024). For details, see Supplement Figure 1.

A) Memory performance of participants after completing the first block of learning, the last block of learning (block 2 to 6, depending on speed of learning), and retrieval performance after the resting state. B) Decoding accuracy of the currently displayed item during the localizer task for participants with a decoding accuracy above 30% (n=21, 4 participants’ data did not pass this criterion). The mean peak time point across all participants corresponded to 210 ms, with an average decoding peak accuracy of 42% (n=21). Note that the displayed graph combines accuracies across participants, where peak values were computed on an individual level and then averaged. Therefore, at a group level the individual mean peak does not necessarily correspond to the average peak. C) Classifier transfer within the localizer when trained and tested at different time points as determined by cross validation, after ∼200 ms, most training and testing time points generalize to each other (significant clusters indicated by cluster permutation testing, alpha<0.05).

A) Forward and backward sequenceness values on the y-axis at different time lags (x-axis) for control resting state (light coloured) and post-learning resting state (dark coloured). Two significance thresholds are indicated, the more lenient 95% quantile of the maximum sequenceness of all permutations (lightly dotted line) and the maximum mean peak sequenceness across all permutations of all subjects (bold dotted line). None of the thresholds were surpassed. The peak mean sequenceness has been marked by a vertical line respectively. B) Difference between pre-learning and post-learning resting states across time lags. Using FDR-corrected paired t-tests for each time lag revealed no significant difference between conditions (without correction three sequenceness differences were significant: forward 100 ms, p=0.024 and 240 ms, p=0.045, backward 200 ms, p=0.047). An additionally performed cluster-permutation test found a non-significant cluster at 230-240 ms forward direction (p=0.44). Bands indicate two standard errors of the mean of the difference scores. C) Individual participant’s retrieval memory performance relative to their sequenceness values at the peak mean sequenceness (indicated in A). Regression lines and 95% confidence intervals are fitted. For correlation with performance deltas between post and pre resting state retrievals see Supplement Figure 10. For individual sequenceness curves, see Supplement Figure 4.

Sequenceness (heatmap) separately for each of the eight minutes of resting state on the y-axis, and different time lags on the x-axis.

For each minute, we took either the first or second half of 30 seconds for the final calculation and the other 30-second segment for the exploration data set. Sequenceness crossed the 95% permutation threshold in the post-learning resting state in minute 4 (backward 20 millisecond time lag) alone. However, the sequenceness score was negative, rendering its interpretation non-trivial. Note that we did not apply any control for multiple comparisons across the segments.

Schematic of procedure for inserting simulated replay into the control resting state.

First, neural patterns are extracted, for each stimulus, from the peak of decodability during the localizer task. Participant-specific patterns are normalized by subtracting the average sensor values of seeing any stimulus from the sensor values of seeing only the stimulus of interest. Next, these subtle patterns are inserted into the control resting state at a fixed interval, following a one-step transition pattern according to the task sequence. A refractory period is retained between each replay event to prevent overlaps. For details see Methods section.

A) On the y-axis, sequenceness at 80-millisecond state-to-state time lag, at different replay densities on the x-axis with standard error as shaded orange. The 95% confidence interval significance threshold is reached at a density of ∼80 min-1, while the maximum permutation threshold is reached only around 150 min-1. B) Sequenceness across time lags for replay at baseline (0 min-1) and simulated for a density of 40, 80 and 120 min-1 with a simulated time lag of 80 milliseconds.

A) Relationship (Pearson’s r) between memory performance (behaviour) and sequenceness at 80 ms time lag on the y axis, with replay density on the x-axis for two scaling rules: Realistic scaling: Replay density was inversely scaled with participant’s performance by 50-100% (i.e., the worst performer had 100% while the best performer only had 50% of replay density). Maximal scaling: Replay density was scaled in from 0-100% (i.e., the best performer had no replay inserted). Significance of the correlation is only reached in the maximum scaling case at around 200 min-1. B) Correlation of sequenceness computed across different time lags in the control condition (i.e., without inserted replay). Even when no replay is present, the correlation fluctuates significantly depending at which time lag it has been computed, possibly due to baseline variations that fluctuate differentially between participants. C) Bootstrap power analysis showing how many participants would be necessary to detect a significant correlation between replay and performance in the maximally scaled scenario with a replay density of 80 min-1. 80% power to detect a correlation is reached around a sample size of around 90. D) Sequenceness at 80 milliseconds time lag in the control condition (left), in the realistic scenario with replay scaling from 50-100% (middle) and the maximal scenario with scaling from 0-100% (right). Participant’s performance is indicated by increasing saturation of blue. While all participants show an increase in sequenceness with increased scaling, some participants have a higher sequenceness value and an ordering according to performance is barely visible in the maximum scaling case.

Normalized probability estimates of time points in the control resting state for the target class of which the pattern has been added and non-target class.

Left: Time points at which no pattern has been inserted, both classes have a similar probability estimate at baseline. Right: At time points where the stimulus-specific pattern was added, the decoded probability estimate is significantly increased. Note that also the probability estimate of non-target classes is slightly increased at the timepoint of pattern insertion, which is due to slight shifts in sensor values that heightens all probabilities by a small amount.

Detailed overview of the experiment procedure.

A) During the localizer task, a word describing the stimulus was played via headphones and the corresponding visual item was then shown to the participant. In 4% of trials, the audio and visual cue did not match and in this case, participants were instructed to press a button on detection (to sustain attention to the stimuli and check for inattention). B) Graph layout of the task. Two elements could appear in two different triplets. The graph was directed such that each tuple had exactly one successor (e.g., apple→zebra could only be followed by cake and not mug), but individual items could have different successors (zebra alone could be followed by mug or cake). Participants never saw the illustrated birds-eye-view. C) During learning, in each trial one node was randomly chosen as the current node. First, its predecessor node was shown, followed by the current node with the participant then given a choice of three items. They were then required to choose the node that followed the displayed cue tuple. Feedback was then provided to the participant. This process was repeated until the participant reached 80% accuracy for any block or reached a maximum of six blocks of learning. D) The retrieval followed the same structure as the learning task, except that no feedback was given.

Mean classifier probabilities before and after normalization, per participant for time points selected for replay pattern insertion.

Left: Variance in baseline probability estimates can already be seen before pattern insertion, i.e. for some participants a higher baseline is present that for others. This higher baseline probability values (or rather their stronger absolute fluctuation) will lead to falsely increased sequenceness values in the GLM betas. Right: After normalization, baseline probabilities are similar across all participants. Blue shading indicates the participant’s performance with lighter shading at 50% and darkest shading at 100% memory performance.

Distribution of raw sensor values before and after insertion of the simulation pattern.

As MEG measures differences in magnetic field strength, the signal is centred on zero. Adding a pattern that is extracted from a zero-centred distribution should not affect the distributions mean and hence still produce a valid MEG signal, which is shown in the figure. However, a very small difference in spread between the two distribution can be seen, indicating that sensor values are not exactly following the same distribution characteristics after adding the stimulus-specific pattern, which might explain the small increase in baseline probability for non-target classes in Figure 8.

Individual sequenceness curves for all participants in the control resting state (left) and the post-learning resting state (right), for forward (upper), and backward sequenceness (bottom).

The participant’s performance is indicated in shades of blue, from the lightest shade at 50% to the darkest shade at 100% retrieval performance. Note that there is an oscillatory component present in all sequenceness curves, however, their phase and exact amplitude seems to be shifted.

Individual sequenceness curves for different segments of A) control resting state and B) post-learning resting state.

Each segment is taken from minute N of the respective resting state. For each participant, from each segment either the first or second 30-second interval was taken. Two significance thresholds are indicated, the more lenient 95% quantile of the maximum sequenceness of all permutations (lightly dotted line) and the maximum mean peak sequenceness across all permutations of all subjects (boldly dotted line). Sequenceness crossed the 95% permutation threshold in the post-learning resting state in minute 4 (backward 20 millisecond time lag). Note that sequenceness scores are negative for the latter, which would indicate a replay-suppression of these specific sequences.

Sequenceness curves when not correcting for alpha oscillations.

Alpha correction is usually applied to mitigate nuisance effects that are introduced by strong eyes-closed alpha by adding probability time lines in 100ms steps to the GLM (Y. Liu, Dolan, et al., 2021).

A) Original TDLM (Y. Liu, Dolan, et al., 2021) simulation with a replay density of 2000 min-1. B) Re-run of the code with realistic replay density of 15 min-1. Using this density, significance thresholds are barely reached. C) Baseline sequenceness with no inserted sequences, some participants reach significance even in absence of replay, as can be seen by the confidence intervals D) Left: Time points at which no pattern has been inserted, both classes have a similar probability estimate at baseline. Right: At time points where the stimulus-specific pattern was added, the decoded probability estimate is significantly increased, cf. Figure 8. E) Probability estimates before and after insertion of the replay pattern, visualised as in Supplement Figure 2. It can be seen that the probability estimates at baseline are close to zero while increased by several orders of magnitude. Using synthetic data probably overestimates probability estimates of true events. Note that different from our simulation, the replay time lag has here been sampled from a gamma prior with mean 40 milliseconds, such that the majority of events is simulated at 40 milliseconds, with some events getting assigned a lag of 30 or 50 milliseconds.

Cross-validation accuracy during the localizer across different regularization values.

For each regularization value, a cross-validation was performed per participant as documented in the methods section, evaluated at 150 to 250 milliseconds after stimulus onset. The best regularization value was found around C=9.1

Visualization of the sensor patterns that were inserted into the resting state per class, averaged across all participants.

Patterns were created on a per-participant-basis by taking the mean sensor value at the peak time point of decodability during the localizer of all trials of the specific class. From these patterns, the mean across all trials (the ERP) at the peak timepoint was subtracted, to remove activity related to general visual processing. Saturation indicates value magnitude (u.U.), red is positive sensor values, blue negative sensor value.

Individual participant’s retrieval performance delta of the post-rest retrieval to the pre-rest retrieval relative to their sequenceness values at the peak mean sequenceness (indicated in A).

Regression lines and 95% confidence intervals are fitted.