Introduction

A well-known feature of working memory (WM) is its limited capacity (Baddeley, 2000; Cowan, 2001), which constrains the amount of information that can be temporarily retained for future behavior. The limited capacity is posited to come in the form of discrete slots (W. Zhang & Luck, 2008) or continuous distribution resources (Bays & Husain, 2008; Ma et al., 2014). Meanwhile, in daily experiences, memorized items do not exist independently but are always part of a common framework or share the same structure, which could be leveraged to compress information and overcome the WM capacity challenge (Brady et al., 2011). For example, while shopping at a supermarket, numerous items could be grouped into a few categories, such as drinks, vegetables, fruits, and meats, to facilitate memory. Other types of abstract associations such as relational regularities could also mediate WM organization (Al Roumi et al., 2021; Mathy & Feldman, 2012). In addition, computational modeling suggests that higher-order structures (e.g., summaries and relative relations) reduce memory uncertainty by constraining individual-item representations (Brady & Tenenbaum, 2013; Ding et al., 2017).

Cognitive maps provide a general structure framework for organizing information in different tasks and across various domains(Whittington et al., 2020). They were first identified as representations of physical maps during navigation, but recently have been shown to also support other higher-level processes, such as conceptual knowledge, reasoning, planning, and decision-making (Behrens et al., 2018; Bellmund et al., 2018; O’keefe & Nadel, 1978). Two major neural signatures of cognitive maps, grid-like code (Constantinescu et al., 2016; Doeller et al., 2010; Hafting et al., 2005; Park et al., 2021) and neural replay in the hippocampal-entorhinal system (D. J. Foster & Wilson, 2006; Liu et al., 2019; Liu, Mattar, et al., 2021; Schuck & Niv, 2019; Skaggs & McNaughton, 1996), are identified in both spatial and non-spatial tasks. Neural replay denotes a time-compressed dynamic reactivation sequence in forward or backward direction, which is posited to not just repeat past experience, but reflect an internal model of the world (Kurth-Nelson et al., 2023; Ólafsdóttir et al., 2018).

Essentially, these higher-level processes can be described as mental explorations of a sequence of states within an abstract map, similar to trace a route on a physical map, implicating a common substrate for cognitive maps across domains. In line with the hypothesis, a recent study revealed the conjoined cognitive maps in the rodent hippocampus (i.e., physical space and abstract task variables) (Nieh et al., 2021), and alignment of different feature maps has also been found to speed learning performance (Aho et al., 2022). Furthermore, the generalization of cognitive maps is distance-dependent when searching for correlated rewards across domains (Wu et al., 2020). As such, cognitive maps might also be used to reorganize memory information across domains to overcome capacity bottlenecks in WM.

Here we sought to examine whether cognitive maps shared by multiple features in WM could be naturally leveraged and combined across domains to overcome WM storage limits. To address the question, participants were asked to memorize a color sequence presented at a list of spatial locations and later reproduce both the color and location sequences within two rings, respectively. In other words, subjects need to retain in WM two sequences of features, i.e., color and location, both of which could be characterized as sequence trajectories on their respective rings. Crucially, we manipulate the consistency between color and location sequence trajectories in the ring map. Specifically, for the aligned condition, the color and location sequence share a common spatial trajectory, i.e., separated by the same distance between successive items between maps (Figure 1B), whereas for the misaligned condition, they have distinct relative trajectories (Figure 1C). We hypothesize that humans would naturally detect and combine the structure shared by the two sequences to facilitate memory formation, even though it is unsupervised and non-mandatory, as proposed by efficient coding theory (Attneave, 1954).

Experimental paradigm

(A) Participants were presented with a sequence of disks of different colors and at different locations. They were asked to memorize both the location and color of the sequence and later reproduce the full location and color sequences one after another by clicking the corresponding positions on the respective report rings. During the “recall location” phase, a grey ring appeared and participants prepared for subsequent location recall without motor movement, to ensure memory decoding without motor interventions. During the following “response period”, subjects serially selected memorized spatial locations on a “location ring”. Next, a color ring appeared (“recall color”) for subjects to be ready for subsequent color recall. They then clicked the remembered colors on a “color ring” (“response period”). (B) Aligned trajectory condition (AT) wherein the trajectory distances between consecutive items (i.e., 1st to 2nd, 2nd to 3rd) of location and color sequences are identical, although the two sequences occupy different locations within their respective rings. (C) Misaligned condition (MAT), wherein the trajectory distances between consecutive items are different for location and color sequences.

To preview our findings, we provide converging behavioral and neuronal evidence for spontaneously leveraging common structures in a map to facilitate working memory. Behavioral results reveal that sequences with consistent color-location trajectories show enhanced memory precision and a significant correlation between reproduced color and location sequence trajectories. Interestingly, neural decoding of memory contents reveals that color sequences undergo temporally compressed and forward neural replay in a spontaneous manner when subjects recall location sequence that shares common underlying trajectory. Further neural decoding for the common trajectory reveals that previously formed segments of sequence trajectory are reactivated along with the newly formed trajectory segments to construct the full trajectory, which is associated with trajectory memory in behavior and neural replay. Taken together, our findings demonstrate that to make efficient use of limited capacity, the spontaneous integration of multiple sequences based on shared structures would result in spontaneous neural replay of the associated sequence and enhanced corresponding memory performance.

Results

Experimental procedure and behavior performance

Thirty-three human participants performed a visual sequence WM task while their brain activities were recorded using EEG. As shown in Figure 1A, at the beginning of each trial, three disks with different spatial locations and colors were sequentially presented. Participants were required to concurrently remember their locations and colors as well as their orders, i.e., one location sequence and one color sequence. After a 2-second memory delay, a grey ring (location ring) was presented to instruct participants to prepare for subsequent location sequence recall without making motor responses (Figure 1A, “recall location” period). This is to ensure memory signals are decoded without explicit motor interventions. Next, a cursor appeared at the center of the screen (“response period”), and participants clicked the three spatial locations on a “location ring” in their correct order. Upon completion of location recall, participants were instructed to prepare for color sequence recall (“recall color”), and they clicked three locations on the color ring based on the color sequence (“response period”). One key aspect was manipulating the consistency between location and color trajectories so that the two sequences share or do not share a common structure in a cognitive map. Specifically, in the aligned trajectory condition (AT condition), despite the location and color sequences occupying different positions within their respective rings, their trajectory distances (between 1st and 2nd items and between 2nd and 3rd) were the same (Figure 1B). In other words, by rotating certain angles, the three points in the two rings can exactly match, and the rotated angles varied from trial to trial, which allowed us to separately decode location sequence and color sequence in the following analysis. In contrast, the location and color sequences in the misaligned trajectory condition (MAT condition) differed both in positions and trajectory distances within rings (Figure 1C).

Aligned color-location trajectory improves memory performance

We first estimated memory precision for color and location sequences by calculating the reciprocal of circular standard deviation of response error (circular difference between reported location (color) and correct location (color)) across trials (1 ∕ σ) (Bays et al., 2009). As shown in Figure 2A, a 2-way repeated ANOVA (alignment (AT vs. MAT) × task (location vs. color)) revealed significant main effects for alignment (F(1,32) = 4.279, p = 0.047, ηp2 = 0.118) and task (F(1,32) = 139.382, p < 0.001, ηp2 = 0.813), but nonsignificant interaction effect (F(1,32) = 0.618, p = 0.438, ηp2 = 0.019). Specifically, AT condition had better memory performance than MAT condition, supporting our hypothesis that shared structure facilitates memory of multiple sequences. Moreover, location memory performed better than color memory. Further comparison revealed that the aligned condition mainly enhanced color memory (paired-t test, t(32) = 2.446, p = 0.020, Cohen’s d = 0.426) but not location (paired-t test, t(32) = 1.538, p = 0.134, Cohen’s d = 0.268). Better location vs. color memory performance indicates that alignment operation is less effective in improving memory (i.e., location sequence) that is already very robust (Wu et al., 2020). In terms of serial position in sequence, color sequences demonstrated better memory performance under AT versus MAT conditions, especially for the 2nd and 3rd items (paired-t test, 1st: t(32) = −0.315, p = 0.755, Cohen’s d = 0.055; 2nd: t(32) = 4.069, p < 0.001, Cohen’s d = 0.709; 3rd: t(32) = 2.583, p = 0.015, Cohen’s d = 0.450) (Figure 2B). Meanwhile, the location sequences exhibited similar performance for all positions (paired-t test, 1st: t(32) = 0.972, p = 0.338, Cohen’s d = 0.169; 2nd: t(32) = 1.245, p = 0.222, Cohen’s d = 0.216; 3rd: t(32) = 1.290, p = 0.206, Cohen’s d = 0.225) (Figure 2C). Overall, behavioral findings demonstrate an improvement in WM performance with a common trajectory across feature domains, and indicate that the aligned trajectories (1st-2nd, 2nd-3rd) may be applied to reduce the memory uncertainty for the 2nd and 3rd colors.

Behavioral performance

(A) Memory precision performance of location (black) and color (green) sequences for AT (dark color) and MAT (light color) conditions. Horizontal line in the boxplots denotes the median; box outlines denote the 25th and 75th percentiles; whiskers denote 1.5 × the interquartile range. Extreme values are denoted by crosses. (*: p < 0.05; **: p < 0.01; ***: p < 0.001). (B) Memory precision of 1st (purple), 2nd (turquoise) and 3rd (blue) items of location sequence, for AT (dark color) and MAT (light color) condtions. (C) Memory precision of 1st (purple), 2nd (turquoise) and 3rd (blue) items of color sequence, for AT (dark color) and MAT (light color) condtions. (D) Grand average (mean ± SEM) correlation coefficients of recalled trajectory error between location and color sequences, for 1st-to-2nd trajectory (brown), 2nd-to-3rd trajectory (brickred), and 1st-to-3rd trajectory (orange), under AT (dark color) and MAT (light color) conditions. Dots indicate individual participant. (E) Scatterplot of 1st-to-2nd trajectory memory error for location sequence (X-axis) and Color sequence (Y-axis) under AT condition. Note that the trajectory error of all trials within each subject was divided into 4 bins according to the location trajectory error, resulting in 33 (subject number)*4 (bins) dots in the plot. The brown line represents the best linear fit. (F) Same as E, but for 2nd-to-3rd trajectory. (G) Same as E, but for 1st-to-3rd trajectory.

Aligned color-location trajectory elicits color-location correlation in recalled trajectories

We further investigated whether the location-color trajectory alignment was truly leveraged in the memory process. Note that participants reproduced the color and location sequences by clicking three positions on the respective rings, i.e., reproducing two spatial trajectories, one for location and one for color (Figure 1A, reporting period). We therefore could examine the correlation between the reported location and color trajectories in their maps to determine whether the AT condition would result in a correlated pattern based on the reported sequences.

Specifically, we first calculated trajectory error (the circular difference between the reported trajectory and the true trajectory) for location and color features, and then accessed the correlation between the (signed) trajectory error of location and color features, for each subject. As shown in Figure 2D, AT condition showed significant correlations for both 1st-2nd (one-sample t-test, t (32) = 5.022, p < 0.001) and 2nd-3rd (one-sample t-test, t (32) = 3.113, p = 0.004) trajectories, but not for 1st-3rd trajectory (one-sample t-test, t (32) = 1.579, p = 0.124). In contrast, the MAT condition did not display any significant correlation (one-sample t-test; 1st-2nd: t (32) = 1.361, p = 0.183; 2nd-3rd: t (32) = 0.490, p = 0.628; 1st-3rd: t (32) = −0.582, p =0.565).

At the group level, motivated by a previous study (H. H. Li et al., 2021), we quantified the trajectory correlations by first binning all trials based on the location trajectory error and then extracting the color trajectory error for each bin, for each subject. As shown in Figure 2 EFG, a significant correlation was observed for the 1st-2nd (r = 0.270, p = 0.002) and 2nd-3rd trajectory (r = 0.276, p = 0.003), but not for the 1st-3rd trajectory (r = 0.070, p = 0.426). Similarly, the MAT condition did not exhibit any location-color correlation in trajectories (1st-2nd: r = 0.097, p = 0.277; 2nd-3rd: r = 0.025, p = 0.790; 1st-3rd: r = −0.065, p = 0.443; also see Supplementary Figure 1ABC), which excludes the possibility that the reported trajectory correlation was solely due to systematic response bias.

Together, behavioral findings indicate that memory facilitation arises from an automatic alignment of recalled trajectories across feature domains to compress information. In other words, instead of memorizing two 3-item sequences independently, subjects may just maintain two starting points and a common trajectory.

Neural decoding of location and color features during sequence presentation

We employed a time-resolved inverted encoding model (IEM) (Brouwer & Heeger, 2009, 2011) on EEG signals to examine the neural representation of location and color. Specifically, the slope of the reconstructed channel response was estimated to quantify the time-resolved decoding performance for the 1st, 2nd, and 3rd location and color, respectively (see details in Materials and methods) at each time point. We first focused on the encoding period when the 3-disk sequence was physically presented.

As shown in Figure 3A, the location of each of the 3 disks could be successfully decoded from EEG signals for both AT (1st location: 0.03–0.56 s, 1.14–1.26 s, 1.55–1.84 s; 2nd location: 1.56– 2.10 s, 2.65–2.92 s, 3.12–3.35 s; 3rd location: 3.06–3.50 s; corrected cluster p < 0.001) and MAT conditions (1st location: 0.03–0.52 s, 1.15–1.39 s, 1.58–1.84 s,; 2nd location: 1.52–2.10 s, 3.11– 3.36 s; 3rd location: 3.06–3.54 s; corrected cluster p < 0.001) during stimulus presentation period. Similarly, color information could also be decoded for both AT (1st color: 0.09–0.50 s, corrected cluster p <0.001;2nd color: 1.57–2.01 s, corrected cluster p <0.001; 3rd color: 3.11–3.49 s, corrected cluster p = 0.002) and MAT conditions (1st color: 0.04–0.45 s; 2nd color: 1.57–1.91 s, corrected cluster p <0.001; 3rd color: 3.11–3.55 s; corrected cluster p <0.001) (Figure 3B). It is noteworthy that location and color features were generated with the constraint that they could not occupy the same position within their respective rings. This thereby ensured the independent decoding of location and color features from the same neural signals. Moreover, the color feature exhibited weaker decoding strength than location, also consistent with behavioral results (Figure 2A).

Neural representation of memory contents during encoding period

(A) Grand average (mean ± SEM) neural decoding (slope of channel response) of location information for the 1st (purple), 2nd (turquoise) and 3rd (blue) disk as a function of time during the encoding period, for AT (left panel) and MAT conditions (right panel). Horizontal lines with corresponding colors denote significant time ranges (cluster-based permutation test, cluster-defining threshold p < 0.001, corrected significance level p < 0.001) (B) Same as A, but for color feature decoding.

Spontaneous replay of color sequence during location recall

After confirming location and color representations during encoding period, we next examined the neuronal correlates of sequence memory during retrieval. We are particularly interested in the “recall location” period, during which subjects need to remain still without making motor responses but at the same time prepare for subsequent location recall (see Figure 1A, and upper panel of Figure 4). During this period, subjects need to maintain two sequences: the location sequence which is immediately task-relevant, and the color sequence which is not task-relevant right now but will be recalled later. Behavioral analysis indicates the correlation between recalled location and color trajectories for AT condition (Figure 2), which suggests an active combination of common trajectories across features. As a result, we sought neural evidence for the reintegration between the color sequence and the location sequence for AT condition during this period.

Spontaneous color sequence replay during “recall location”

(A) Grand average (mean ± SEM) decoding performance for 1st (purple), 2nd (turquoise) and 3rd (blue) locations as a function of time during “recall locaiton” period, for AT (left panel) and MAT conditions (right panel). (B) Grand average (mean ± SEM) decoding performance for 1st (purple), 2nd (turquoise) and 3rd (blue) colors as a function of time during “recall locaiton” period, for AT (left panel) and MAT conditions (right panel). (Horizontal solid line: cluster-based permutation test, cluster-defining threshold p < 0.05, corrected significance level p < 0.05; Horizontal dashed line: marginal significance, cluster-defining threshold p < 0.1, 0.05 < cluster p < 0.1) (C) Grand average decoding performance within the respective significant time range, for 1st (purple), 2nd (turquoise) and 3rd (blue) colors, under AT (dark color) and MAT (lignt color) conditions. (D) Cross-correlation coefficient, calculated to quantify the extent of the neural representations of adjecent two items followed a forward (positive y) or backward (negative y) transition as a funciton of time lag, between 1st and 2nd colors (brown color) and between 2nd and 3rd (brickred color) colors, and their average (grey color). Dashed vertical line denotes the peak of the averaged cross-correlation time courses. Dashed horizontal lines denote the nonparametric statistical significance threshold (p < 0.05, permutation test). (E) Left panel: theoretical transition pattern for 3-item forward replay, i.e., 1st-2nd-3rd, characterized by cross-correlation at certain time lag. Right panel: empirical transitional pattern (actual cross-correlation matrix) at 130 ms time lag. A significant correlation was found between the two matrices (r = 0.690, p = 0.040), further confirming the forward replay of color sequence.

As shown in Figure 4A (left panel), the currently task-relevant location sequence during AT condition displayed strong decoding performance for the 1st location (0.11–0.46 s, corrected cluster p = 0.003), weak but significant decoding performance for the 3rd location (0.27–0.41 s, corrected cluster p = 0.011), but not for the 2nd location. Moreover, The MAT condition (Figure 4A, right panel) showed the similar location decoding profiles (1st location: 0.11–0.46 s, corrected cluster p < 0.001; 3rd location: 0.13–0.35 s, corrected cluster p = 0.002). The primacy effect might be due to the fact that the 1st location is the first to be recalled afterwards, and therefore it denotes either the most task-relevant feature or motor preparation. In fact, similar position effect has also been observed for color sequence during the color recalling period (Supplementary Figure 2).

Most importantly, we asked whether the “recall location” period also contains color sequence information, which is not task-relevant at the moment but will be recalled later. As shown in the left panel of Figure 4B, we observed significant reactivation of color sequence for AT condition. Specifically, the color sequence undergoes a temporally compressed, forward reply (1st color: 0.10–0.16 s, corrected cluster p = 0.048; 2nd color: 0.21–0.27 s, corrected cluster p = 0.046; 3rd color: 0.31–0.38 s, corrected cluster p = 0.089). In contrast, the MAT condition did not display any neural reactivation of the color sequence (Figure 4B, right panel). Direct comparison between AT and MAT conditions showed a significant reactivation difference (2-way repeated ANOVA, F(1,32) = 14.213, p = 0.001, η 2 = 0.308). Further analysis (Figure 4C) reveals that the AT-MAT difference is mainly due to the 1st (paired test, t(32) = 3.151, p = 0.003, Cohen’s d = 0.548) and 2nd items (t(32) = 1.914, p = 0.065, Cohen’s d = 0.334).

We next used a “sequenceness” approach (Kurth-Nelson et al., 2016; Liu et al., 2019) to characterize the sequential replay profile. Specifically, we calculated the cross-correlation coefficients between consecutive items, and then performed a permutation test to examine the statistical significance of the sequential replay profile by shuffling color labels across participants. Figure 4D shows a significant temporal lag around 110 ms and 140 ms for the 1st-2nd and 2nd-3rd color pairs, respectively, indicating a forward replay profile with temporal compression within approximately 130 ms (peak of the average of the two cross-correlation time courses).

Moreover, motivated by previous studies (Liu, Dolan, et al., 2021; Liu, Mattar, et al., 2021), we first constructed the theoretical transitional pattern for the 3-item sequence by assuming a Δt temporal lag between consecutive items (i.e., cross-correlation matrix). As shown in Figure 4E (left panel), the 1st item at time T could predict the reactivation of 2nd item at T+Δt, and the 2nd item at time T could predict the appearance of 3rd item at T+Δt. We then calculated the actual cross-correlation matrix (empirical transitional pattern) at 130 ms time lag which denotes the time lag of consecutive items in our findings (see Figure 4D, grey line), resulting in a 3 × 3 matrix (Figure 4E, right panel). A significant correlation was found between the empirical cross-correlation matrix and the theoretical transitional pattern (r = 0.690, p = 0.040), further confirming the forward replay of color sequence.

Taken together, when subjects prepare to reproduce location sequence during “recall location”, the currently task-irrelevant color sequence that shares a common trajectory with location sequence demonstrates a spontaneous sequential replay profile. Together with the color-location trajectory correlation in behavior, the findings suggest that replay-based neural mechanisms in WM mediate sequence combinations based on common structures.

Neural representation of common trajectory and its behavioral relevance

After supporting multi-sequence association based on common structure, we further directly accessed the neural representations of the common trajectory structure. Specifically, a linear support vector machine (SVM) was employed to decode the 1st-to-2nd and 2nd-to-3rd trajectory distance, with the 1st-to-3rd trajectory as a control. Since there are only 8 possible circular distances between every two locations on a ring, the chance level of decoding performance is 0.125. As shown in Figure 5A, the neural representation of the 1st-to-2nd trajectory appeared right after the presentation of the 2nd item for AT condition (1.55–2.11 s, corrected cluster p <0.001), but not after the onset of the 1st item. This is to be expected since the relationship between the 1st and 2nd items along the trajectory can only be established when the 2nd item occurs. Similarly, the 2nd-to-3rd trajectory could be decoded only after the appearance of the 3rd item (3.10–3.48 s, corrected cluster p < 0.001) and (Figure 5B). In contrast, the control condition, i.e., the 1st-3rd trajectory, showed nonsignificant neural representation (Figure 5C), which was consistent with the nonsignificant color-location trajectory correlation in behavior. Interestingly, as shown in Figure 5A, in addition to emerging right after the 2nd item, the 1st-to-2nd trajectory appeared again right after the 3rd item (3.19–3.48 s, corrected cluster p = 0.006). Therefore, when the color and location sequences share the same trajectory (i.e., AT condition), brain activities tend to co-represent the previously formed 1st-to-2nd trajectory (reactivation) and the newly formed 2nd-to-3rd trajectory, which may help to establish the full trajectory. Notably, AT condition entails the same trajectory for the color and location sequences, while MAT condition involves different trajectories for location and color, and no significant reactivation for 1st-to-2nd location trajectory representations was observed (see Supplementary Figure 1DEF).

(A) Grand average (mean ± SEM) neural decoding of 1st-to-2nd as a function of time during the encoding period, for AT condition. (B) Same as A, but for 2nd-to-3rd trajectory. (C) Same as A, but for 1st-to-3rd trajectory. (D) Participants were divided into two groups (higher-correlation group and lower-correlation group), based on their 1st-to-2nd color-location trajectory memory behavioral correlation. Grand average (mean ± SEM) neural decoding of 1st-to-2nd trajectory as a function of time, for higher-correlation group (N =16; left panel) and lower-correlation group (N = 16; right panel). (Horizontal solid line: cluster-based permutation test, cluster-defining threshold p < 0.05, corrected significance level p < 0.05).

Then, we were interested in whether the common trajectory reactivation was associated with behavior performance. Specifically, we divided all participants into two groups based on their 1st-to-2nd color-location trajectory memory correlations in behavior (Figure 2D) and then calculated the corresponding neural representation of the 1st-to-2nd trajectory, respectively. As shown in Figure 5D, both the higher-correlation and lower-correlation groups displayed significant neural decoding of the 1st-to-2nd trajectory right after the 2nd item (higher group: 1.56– 1.99 s, corrected cluster p <0.001; lower group: 1.64–1.97 s, corrected cluster p <0.001). Interestingly, only the higher-correlation group exhibited a significant reactivation of the 1st-to-2nd trajectory after the onset of the 3rd item (3.25–3.47 s, corrected cluster p = 0.018; Higher group vs. Lower group: bootstrap test, p = 0.068). Moreover, we were curious about the correlation between the common trajectory representation and latter forward replay pattern. Therefore, based on the average of trajectory neural representation (1st-2nd and 2nd-3rd trajectories) during presentation of the 3rd item when common trajectory reactivation was observed, we divided all participants into two groups, and observed higher trajectory representation group was accompanied with clearer forward replay (see Supplementary Figure 3). These findings provided direct evidence of the active involvement of common structure in organizing WM information across domains.

Taken together, when color and location sequences share the same trajectory (AT condition), the previously formed 1st-to-2nd trajectory is reactivated along with the newly formed 2nd-to-3rd trajectory. The co-occurrence of individual segments may contribute to the consolidation of a common sequence structure shared by color and location sequences. Importantly, the common structure representation could potentially predict neural replay and color-location correlation in memory performance, which further corroborates its importance in efficiently integrating sequences across domains in WM.

Discussion

Identifying the underlying structure to facilitate efficient information storage in WM is crucial to human intelligence. Here, we asked whether common structures shared across different feature domains would be spontaneously employed to facilitate memory of multiple sequences. Since both location and color features could be characterized by positions along a continuous ring, we systematically manipulated the trajectory consistency between the location and color sequences. Our results indicate that color-location trajectory alignment conditions are associated with better memory performance than misaligned conditions, and the memory benefit is attributed to structure-induced constraints on individual items that decrease representational uncertainty. We also provide novel neural evidence supporting that spontaneous neural replay are essential for consolidating multiple sequences with a common underlying structure. That is to say, the employment of common structure can effectively associate two independent sequences and induce spontaneous forward replay to consolidate corresponding information.

Events in daily experiences are not isolated but are always linked to each other. Therefore, instead of treating individual events as independent information, a more efficient way is to seek the link between seemingly unrelated events, i.e., “connecting the dots” in the WM system. Here the trajectories for both location and color sequences are defined in a ring coordinate system, providing an abstract-level cognitive map for memory formation. Our results demonstrate that subjects spontaneously realign sequence trajectories across features to facilitate memory of two sequences. In other words, instead of memorizing two 3-item sequences, subjects could just maintain two starting points and a common trajectory, an apparently more efficient way. Not that the trajectory alignment manipulation differs from the Gestalt principles of perceptual organization such as proximity and similarity principles (Goldstone & Medin, 1994), and it instead reflects a higher-order relationship between maps. Moreover, the alignment manipulation could not be accounted for by associative memory (Aho et al., 2022; Roads & Love, 2020), since the rotational orientation for alignments between the two maps differed on a trial-by-trial basis. As a consequence, participants could not predict the color sequence solely based on the location sequence in each trial. Finally, our study is also different from recent works on structure learning and generalization (Dekker et al., 2022; Garvert et al., 2017; Liu, Mattar, et al., 2021; Ren et al., 2022; Schapiro et al., 2013), as our task does not involve pre-exposure training or task-related rewards.

The fact that without task requirement, participants still spontaneously extracted underlying common structure and leveraged it to organize multiple item storage reflects the intelligence of our brain to achieve efficient information coding (Attneave, 1954). Indeed, the common structure we manipulated here is inspired by the theories of cognitive map (Behrens et al., 2018; Bellmund et al., 2018; O’keefe & Nadel, 1978), which have argued that reasoning in abstract domains follows similar computational principles as in spatial domains. This theory has been supported by accumulated neuroscientific evidence suggesting common neural substrates for knowledge representation across domains (Constantinescu et al., 2016; Garvert et al., 2017; Park et al., 2021; Schuck et al., 2016; Solomon et al., 2019; Theves et al., 2019). In line with these evidence, recent behavioral studies further prove the integration of information representation across domains based on common computing principles. For example, learning process is accelerated when two different feature maps are aligned (Aho et al., 2022), and distance-dependent generalization is observed across two different domains in order to search for correlated rewards (Wu et al., 2020). Here, we extend the functional role of cognitive map in efficiently organizing information across domains in human working memory and reveal replay-based neural mechanisms in facilitating multiple sequence storage based on common structures.

Neural replay refers to the sequential reactivation in the same or reversed order as previous experience. It was first observed in the rodent hippocampus and mainly for spatial navigation (Wilson & McNaughton, 1994), but has recently been found in many higher-level non-spatial tasks in human brains (Kurth-Nelson et al., 2016; Liu et al., 2019; Liu, Mattar, et al., 2021; Schapiro et al., 2018; Schuck & Niv, 2019; H. Zhang et al., 2018). In fact, neural replay has been posited to represent abstract structure (Huang et al., 2021; Liu et al., 2019), structure-based inference (Liu, Mattar, et al., 2021), and generalization (Barry & Love, 2022). Here, when subjects prepare to recall location sequences (“location recall”), neural replay occurs for color sequences but not for location sequences, supporting the spontaneous nature of neural replay, since color features are not task-relevant right now. The results also exclude other interpretations, such as motor preparation, eye movements, attentional sampling, sequential rehearsal, etc., since if that is the case, we would expect a similar neural replay profile for location sequences that are to be serially recalled soon. Furthermore, color neural replay only appears for the color-location aligned sequence (AT condition) but not for the misaligned sequences (MAT condition), implicating that neural replay serves to consolidate sequences that share a common structure. Therefore, our findings demonstrate new roles of neural replay in structure representation, that is, mediating structure alignment between sequences in the WM system.

It is posited that structure and content are represented in a factorized manner (Behrens et al., 2018; Bengio et al., 2013), and sequence structure representation that is independent of attached contents guides the replay of new experiences (Liu et al., 2019). Factorization representation is thought to help fast generalization of a previously learned structure to new contents (Sheahan et al., 2021; Zhou et al., 2020). Indeed, the ability to spontaneously perceive relational structures is posited to signify the major distinction between human and nonhuman primates (Dehaene et al., 2015; H. Zhang et al., 2022). Meanwhile, previous modelling work also suggests that higher-order structures incorporated in WM would serve as constraints on individual-item representations to reduce representational uncertainty (Brady & Tenenbaum, 2013; Ding et al., 2017). In this work, we provide evidence that not only structure is dissociated from content representation, i.e., factorization coding, but that the abstract structures can also be aligned in a spontaneous manner, i.e., linking structures, which together contribute to efficient representation of memory information.

Materials and methods

Participants

Thirty-six participants (18 males, age ranging from 17 to 25 years) were recruited to accomplish our multi-sequence working memory task. Three participants were removed, since they could not finish the whole experiment. No statistical methods were used to predetermine sample sizes, but our sample sizes are similar to previous studies (Li et al., 2021; Wolff et al., 2017). All participants had normal or corrected-to-normal vision with no history of neurological disorders. They were naïve to the purpose of the experiments, and provided written informed consent prior to the start of the experiment. This study was approved by the Research Ethics Committee at Peking University, and carried out in accordance with the Declaration of Helsinki.

Stimuli and tasks

Participants sat in a dark room, 60 cm in front of a Display++ monitor with 100 Hz refresh rate and a resolution of 1920 × 1080, and their head stabilized on a chin rest. At the beginning of trial, three disks (1.5° × 1.5° visual angle) were sequentially presented at different locations of the screen, with different colors. The spatial location of each disk was independently drawn from a fixed set of 9 locations, which were evenly distributed on an imaginary circle with radius of 7° visual angle from central fixation and spaced 40° from the nearest locations, with a small random jitter (± 1° – ± 3°) added to each. The color of the each disk was also independently selected from a fixed set of 9 colors, which were evenly distributed along a circle in Commission Internationale de l’Eclairage (CIE) L*a*b* space, and equidistant from the gray point at L* = 50, a* = 0, and b* = 0 (Brouwer & Heeger, 2009), and spaced by 40°, with a small random jitter (± 1° – ± 3°). Each disk was presented for 1 s, with 0.5 s interval between two adjacent disks. After 2 s delay during which only the fixation point remained on screen, a grey ring appeared for 0.5 s with the same radius (7° visual angle) from central fixation to instruct participants to recall three spatial locations without any movement. Then, a cursor appeared at fixation, and participants should report the remembered locations sequentially in their presented order by using a mouse to click on the grey location ring. After delivering three spatial location responses, a color ring was presented for 0.5 s (7° visual angle in radius) to instruct participants to the recall three colors without any movement. Similarly, a cursor then appeared, and participants were asked to report the remembered colors sequentially in their presented order by clicking on the color ring.

Note that even though color and location were different features, their values were both chosen from nine positions/values based on their respective ring (0° –320° in 40° increments), with the constraint that the color value and location value for the same item can’t be the same. In order to investigate whether common structure would organize multiple information storage in different domains, we modulated trajectory consistency. Specifically, in the aligned trajectory condition (AT), both the 1st-to-2nd and 2nd-to-3rd disk trajectory distances in location domain were the same to that in color domain. In other words, by rotating certain degree, the whole trajectory (from the 1st to 3rd point) was matched in the location and color maps. At the same time, we varied the rotated degree to align the two maps on a trial-by-trial basis, such that we could not predict color sequence solely based on location sequence. This manipulation was critical to independently decode color sequence and location sequence. In misaligned trajectory condition (MAT), the whole trajectories in the two maps were different, which meant we couldn’t rotate one map to exactly match the other map. Trials from AT and MAT conditions were interleaved, aiming to investigate the spontaneous information organization process in a more natural way. In each trial three locations were chosen independently from 9 values (0° –320° in 40° increments, each occurred 36 times with random order), but with a constraint that they should at least differ by 40°. The same rule was applied to three colors. Moreover, the color and location value from the same object were also constrained to be different. Participants should complete 648 trials in total, which was divided into two sessions on two separate days, separated by at most one week. It took approximately 3 hours to accomplish one session (including breaks).

EEG acquisition and preprocessing

The EEG data was recorded using a 64-channel EasyCap and two BrainAmp amplifiers (BrainProducts). Horizontal electrooculography (EOG) was recorded by an additional electrode around the participants’ right eye. The impedances of all electrodes were kept below 10 k. The EEG data was preprocessed offline using FieldTrip software (Oostenveld et al., 2011). Specifically, the data was first referenced to the average value of all channels, band-pass filtered between 2 and 50 Hz, and down-sampled to 100 Hz. The data was then baseline-corrected, by selecting the time range from 300 ms to 100 ms before the presentation of the 1st disk in each trial as baseline to be subtracted. Then, independent component analysis (ICA) was performed independently for each participant to remove eye-movement and artifact components, and the remaining components were back-projected onto the EEG electrode space. To further identify artifacts, we calculated the variance (collapsed over channels and time) for each trial. Trials with excessive variances were removed. Note that following decoding approach was based on the whole electrodes, except that location decoding in the encoding period was based on the posterior electrodes (P7, P5, P3, P1, Pz, P4, P6, P8, PO7, PO3, POz, PO4, PO8, O1, Oz and O2), considering eye movement was not strictly controlled in the present study.

Data analysis

Behavioral performance analysis

For each spatial location and color, the response error was first quantified by the circular difference between the reported location (color) and the true target location (color) in each trial. The memory precision was then estimated by calculating the reciprocal of circular standard deviation of response error. To explore the similarity of the perceived trajectory in spatial location and color domains, we calculated the circular correlation of the perceived trajectory (trajectory memory error) between the two domains for each participant. Trajectory response error was quantified by the circular difference between the reported trajectory and the true trajectory, e.g., the 1st-to-2nd trajectory error in location was calculated by the difference between the 1st location error and 2nd location error. In group level, we quantified the circular correlations of trajectory error by first sorting trials into four bins based on their location trajectory error for each participant, then binning the trials, computing the color trajectory error for each bin, and pooling the data across participants.

Time-resolved location and color decoding

Similar as previous studies (Brouwer & Heeger, 2009, 2011; Huang et al., 2021), in order to assess the time-resolved location and color information from the EEG signals, we implemented the inverted encoding model (IEM) to reconstruct the location and color information from the neural activities at each time point. The IEM assumes that the response in each sensor could be approximated as a linear sum of underlying neural populations encoding different values of the feature-of-interest (i.e., tuning channels). Here, the number of location and color tuning channels were both set to 9. Following previous work (Ester et al., 2015; Yu et al., 2020), the idealized feature tuning curves of nine channels were defined as nine half-wave rectified sinusoids centered at different location (color) values (0°, 40°, 80°, and so on) and raised to the 8th power.

We began by modeling the response of each EEG sensor as a linear sum of nine information channels, characterized by B1 = WC1, in the training data set, where B1 (m sensors × n trials) represents the observed response at each sensor, C1 (k channels × n trials) represents the predicted channel responses, W (m sensors × k channels) represents the weight matrix that characterizes the linear mapping from ‘channel space’ to ‘sensor space’. Therefore, given B1, and C1, the weight matrix W (m sensors × k channels) was calculated by using least-squares regression . Finally, the channel responses (C2) for the test data set (B2) could be extracted using the estimated , by .

Regarding the division of training and test set, a leave one-out cross-validation was implemented, such that data from all but one block was acted as B1 to estimate , while data from the remaining block was acted as B2 to estimate . This procedure ensures the independence between training set and testing set. The entire analysis was repeated until all blocks could be held out as a test set. The observed channel responses were then circularly shifted to a common center (0°) in reference to the location/color-of-interest in each trial, and averaged across trials for further analysis.

Consistent with previous studies (J. J. Foster et al., 2017; Huang et al., 2021), decoding performance was characterized by the slope of the estimated channel responses at each time by flipping the reconstructed curves across the center, averaging both sides, and performing linear regression. We further smoothed the slope time courses with a Gaussian kernel (s.d. = 40 ms) (Huang et al., 2021; Wolff et al., 2017).

Forward sequence measure

Following previous studies (Huang et al., 2018; Kurth-Nelson et al., 2016; Liu et al., 2019), cross-correlation was applied to examine whether the color reactivation pattern tended to follow certain order, e.g., a forward (1st-2nd-3rd) or reverse order (3rd-2nd-1st). If it was a forward sequence, the decoded performance of the 1st item at time T should be correlated with the decoding performance of 2nd item at time T + Δt, and correlated with the decoding performance of 3rd item at time T + 2*Δt, where Δt defines a lag between neural representations of two consecutive items. We first calculated the cross-correlation between the 1st and 2nd items and between 2nd and 3rd items at each time lag. Then we subtracted the reverse direction (2nd-1st; 3rd-2nd) from the forward direction (1st-2nd; 2nd-3rd) respectively at each time-lag, in order to exclude the autocorrelation effect (Kurth-nelson et al., 2016). The resulted cross-correlation time courses were then averaged to determine the time lag for two consecutive items, here, the time point of the peak of the averaged cross-correlation time course. As mentioned above, for a forward replay pattern, at time lag Δt, we would expect to observe significant transition from the 1st to 2nd items, and from the 2nd to 3rd items, while nonsignificant transition/correlation for the rest pairs, characterized by a theoretical forward transition pattern in Figure 4E (left panel). Meanwhile, the actual cross-correlation matrix can be estimated by computing the correlation coefficients for every pair (1st-1st, 1st-2nd, 1st-3rd; 2nd-1st, 2nd-2nd, 2nd-3rd; 3rd-1st, 3rd-2nd, 3rd-3rd) at the defined time lag (Figure 4D). Finally, we quantified the similarity between the observed transition pattern and theoretical forward transition pattern.

Time-resolved trajectory decoding

We implemented a linear support vector machine (SVM) to decode trajectory distance. Considering there were eight possible distances between every two items, i.e., ±160°, ±120°, ±80°, ±40° (chance level is 1/8), an eight-way decoder (One-VS-rest multiclass classifier) was used to decode trajectory. A 5-fold cross-validation scheme was used, and the classification accuracy was averaged across the folds. We repeated this process 10 times with each containing a new random partition of data into 5 folds, and then computed their mean accuracy. Note that color and location sequences shared the same trajectory distances in AT condition, while MAT condition involved different trajectories for location and color. The trajectory decoding in MAT condition was based on the trajectory distance in location domain, considering location information showed much stronger representation than color information (Figure 2).

Statistical analysis

To determine statistical significance of decoding performance time courses, we performed cluster-based permutation test (FieldTrip, cluster-based permutation test, 1000 permutations) (Maris & Oostenveld, 2007). We first identified clusters of contiguous significant time points (p < 0.05 or p < 0.001(during encoding period), two-tailed) from the calculated statistics (one-sample t-test, against 0 (slope value of the reconstructed channel response) for location/color decoding, or against 0.125 (classifier chance level) for trajectory distance decoding), and cluster-level statistics was calculated by computing the size of the clusters. Next, a Monte-Carlo randomization procedure was conducted to estimate the significance probabilities for each cluster. Specifically, 0 (for location/color decoding) or 0.125 (for trajectory decoding) with the same sample size was generated and shuffled with the original data 1000 times, and the cluster-level statistics were then calculated from the surrogate data to estimate the significance probabilities for each original cluster.

To determine statistical significance of cross-correlation coefficient (forward direction minus reverse direction), we performed a permutation test by shuffling color labels across participants 1000 times and followed the same procedure to calculate the cross-correlation time courses of the surrogate data, from which the 0.05 threshold level was estimated.

Acknowledgements

This work was supported by the National Science and Technology Innovation STI2030-Major Project (2021ZD0204103 to H.L.), National Natural Science Foundation of China (31930052 to H.L.), and Humboldt Research Fellowship for Postdocs to Q.H.. We thank Christian F. Doeller and Muzhi Wang for his helpful comments.

Supplementary

(A) Scatterplot of 1st-to-2nd trajectory memory error for location sequence (X-axis) and Color sequence (Y-axis) under MAT condition. Note that the trajectory error of all trials within each subject was divided into 4 bins according to the location trajectory error, resulting in 33 (subject number)*4 (bins) dots in the plot. The brown line represents the best linear fit. (B) Same as A, but for 2nd-to-3rd trajectory. (C) Same as A, but for 1st-to-3rd trajectory. (D) Grand average (mean ± SEM) neural decoding of 1st-to-2nd as a function of time during the encoding period, for MAT condition. (E) Same as A, but for 2nd-to-3rd trajectory. (F) Same as A, but for 1st-to-3rd trajectory.

(A) Left panel: grand average (mean ± SEM) time courses of the decoding performance for the 1st (purple), 2nd (turquoise) and 3rd (blue) WM colors during color recalling period. Right panel: grand average (mean ± SEM) time courses of the decoding performance in MAT condition. (B) Left panel: grand average (mean ± SEM) time courses of the decoding performance for the 1st, 2nd and 3rd WM locations during color recalling period. Right panel: grand average (mean ± SEM) time courses of the decoding performance in MAT condition. (Horizontal solid line: cluster-based permutation test, cluster-defining threshold p < 0.05, corrected significance level p < 0.05; Horizontal dashed line: marginal significance, cluster-defining threshold p < 0.1, 0.05 < cluster p < 0.1)

Participants were divided into two groups based on the average of trajectory decoding performance within the respective significant time range. (A) Grand average (mean ± SEM) time courses of the decoding performance for the 1st, 2nd and 3rd colors during location recalling period for high trajectory representation group. (B) Left panel: theoretical transition pattern for 3-item forward replay. Right panel: empirical transitional pattern (actual cross-correlation matrix) at 130 ms time lag definde in Figure 4D. A significant correlation was found between the two matrices (r = 0.715, p = 0.030), further confirming the forward replay of color sequence for high trajectory representation group. (C) Same as A, but for low trajectory representation group. (D) Same as B, but for low trajectory representation group, nonsignificant correlation between the two matrices was observed (r = 0.230, p = 0.553). (Horizontal solid line: time window with significnat activation (t-test, p < 0.05, without correction across time)