How does curriculum learning influence cognitive map formation?

Leonie Glitz; Zilu Liang; Helen C Barron; Christopher Summerfield

doi:10.7554/eLife.111035.1

Introduction

Cognitive maps are mental representations that encode how states of the world relate to each other (Behrens et al., 2018). They support goal-directed behaviours by allowing us to mentally explore the connections between places (Bellmund, Deuker, Navarro Schröder, & Doeller, 2016; Tolman, 1948), objects (Constantinescu, O’Reilly, & Behrens, 2016) and people (Park, Miller, & Boorman, 2021), and thereby make inferences about potentially new combinations of stimuli or events (Bellmund, Gärdenfors, Moser, & Doeller, 2018). How cognitive maps are used to make decisions has been the subject of intense research over recent years, with a focus on the neurophysiology of the mammalian hippocampal-entorhinal system (Momennejad, 2020; Stachenfeld, Botvinick, & Gershman, 2017). However, we know little about how cognitive maps are formed by experience, as people learn about the relations among stimuli. Here, we studied the impact of the sequence of training stimuli (or curriculum) on how cognitive maps are learned.

For almost all learners, the sequence of training stimuli (or curriculum) is an important driver of how knowledge is acquired. One salient feature of a curriculum is whether related materials are grouped or separated in time. Very often, temporal grouping is found to help learning. In educational settings, materials are naturally organised into thematically related lessons, so that (for example) students are not required to learn French and Spanish at once. Temporal grouping has also been found to facilitate learning in laboratory studies. During rote memorisation, a technique by which to-be-learned stimuli are organised into small groups (‘chunking’) can benefit both short- and long-term retention (Mitchell & Martin, 1997; Xu & Padilla, 2013). When learning different categorisation tasks in succession, grouping the tasks in temporally correlated blocks (‘blocking’) facilitates learning compared to a regime in which trials are randomly intermixed (Flesch, Balaguer, Dekker, Nili, & Summerfield, 2018; Noh, Yan, Bjork, & Maddox, 2016). Similarly, in a paradigm where multiple visual features of an object predict a corresponding spatial location, generalisation is improved when participants learn separately about the predictive value of each feature (e.g. colour separately from shape), compared to when they are intermixed in time (Dekker, Otto, & Summerfield, 2022).

Grouping information in time may help learning by allowing the student to ‘divide and conquer’ – to decompose a task into parts which are more manageable to learn (Krueger & Dayan, 2009; Mi & Summerfield, 2025). In biological systems, separation in time may help the learning system partition new knowledge, so that learning of one part does not interfere with another. Place cell codes in the hippocampus, for example, have been shown to overlap more for distinct contexts experienced on the same day than on subsequent days and conditioned fear responses are transferred across contexts experienced on the same day, even if the actual fear conditioning only occurred in one of the contexts (Cai et al., 2016), supporting the idea that separation in time can prevent interference.

However, it remains unclear how different learning curricula might support the formation of an integrated cognitive map to permit adaptive behaviour. Here, to test the effect of learning curriculum on cognitive map formation, participants learned to navigate grid or torus environments. Participants were always trained on 1-step transitions but were later tested on their ability to perform 2-step navigation which provided a readout for integration across the cognitive map. Across multiple experiments, we found that training on 1-step transitions was more successful at facilitating 2-step navigation when transitions were sampled randomly (disjoint condition), rather than being drawn from rows or columns in a temporally correlated fashion (grouped condition; Figure 1). More fine-grained examination of the behavioural data hinted that relatively impaired inference following grouped training may have occurred because grouped training encouraged participants to learn rows and columns as a series of unconnected ‘fragments’, and thus to learn a partially fragmented version of the cognitive map.

Illustration of the study design.
a illustration of three successive example trials within a block of trials in the grouped (right, blue) and disjoint (left, orange) training curriculum. Numbers represent nodes on the map, coloured boxes represent transitions between adjacent nodes learned. Notably, the successive trials in grouped training are overlapping (pseudo-random walk), whereas the ones in disjoint training are not b *Top.* Illustration of the mixed study design, note that background colours for design graphic correspond to plot marker colours in subsequent figures (Figure 2 and Figure 3). *Bottom*. Examples of the two stimulus categories, each was randomly assigned to one of the two within-subjects maps for each participant c *top* illustration of the between-subjects manipulation, filled squares correspond to blocks of training trials *bottom* illustration of the second between-subjects factor, map shape.

Results

Our experiments used maps in the form of quadrilateral lattices arranged into either a grid (5×5) or a torus (4×4; manipulated between-subjects), in which each state corresponded to a unique image of either a household item (map 1) or a fruit / vegetable (map 2), assigned randomly between participants (Figure 1b). Participants never viewed these full maps at once (e.g. they were never shown a bird’s eye view). To test the effect of learning curriculum on participants’ ability to perform multi-step inference, over three days we trained participants on an exhaustive set of one-step associations between states before then testing their ability to complete two-step inference. During training, they navigated from one state (e.g. a washing machine) to one of its neighbours (e.g. a newspaper) by selecting whether the newspaper was up, down, left or right of the washing machine and receiving feedback as to whether their choice was correct or not (Figure 2b). On the second day, they repeated the learning task for the other map. On the third day, following a short refresher on both maps, they were tested on their ability to navigate between states that were 2 steps apart, either along a single axis (straight or I trials) or crossing the horizontal and vertical axes (diagonal or L trials, Figure 2c). They were also tested on their ability to make proximity judgements between different states. Both tasks required participants to have integrated the one-step adjacency associations they had learned on the preceding two days into a mental map. We tested each of the two maps they had learned separately on day 3.

Task sequence and training accuracy.
a multi-day task schedule, text in boxes symbolises the sequence of events (from left to right), *test* events depicted with a grey background, *training* with a white background. Day 1, 2 and 3 refer to the day of the experiment on which the task in question occurred. b Trial sequence for a select-action training trial (for a select-state trial example, see Supplementary Results 1) including illustrations of feedback for a correct and incorrect choice. Washing machine is the starting node, the two shapes (here: yellow square and green triangle) indicate the two positions the adjacent node (here: newspaper) could be in relative to the starting node. Shapes at the top indicate which key participants need to press to select either of the options. c training accuracy (y-axis) averaged across initial training and refresher (left); diamond/hexagon markers indicate average accuracy for the different conditions, black verticals lines are standard error of the mean bars and grey horizontal lines are accuracies for individual participants, red horizontal line is chance accuracy; *** p<.0001, d trial sequence of a two-step navigation trial, as participants make a choice, the central node (current location) changes accordingly and the number of steps remaining decreases; e grid map layout with examples of the spatial relations between two-step starts and goals, example straight 2 step pair is indicated in purple and diagonal pair is indicated in green f illustration of the layout of a proximity test trial g as well as the types of proximity comparisons ; central image examples shown in black and options for proximity judgements shown in red; h illustration of the task layout of the reconstruction task employed for the grid groups.

To compare curricula for cognitive map learning, during training we manipulated the order in which participants experienced the 1-step transitions, as shown in Figure 1. We varied both the content of the individual blocks of one-step transitions (within-subjects) and the order in which participants experienced the blocks of transitions (between-subjects). For one map, the one-step transitions learned in a block of training trials encompassed all transitions within a single row or column of the lattice (grouped condition) in the form of a semi-random walk, whereas for the other map, the trials in a single block of training were drawn randomly from across the lattice structure (disjoint condition; Figure 1a), thereby manipulating the temporal proximity with which participants learned about adjacent nodes. Across participants, we introduced a blocking manipulation to further temporally separate trials that may cause interference, more specifically the rows and columns of the lattice on the grouped curriculum map. To do this, we manipulated the order of the blocks of trials, with some participants experiencing half of the transitions twice (in grouped: e.g. all row blocks twice), followed by the second half of the transitions twice (in grouped: e.g. all column blocks twice; blocked condition) and other participants experiencing the blocks of trials in an interleaved way (semi-blocked condition; Figure 1c).

In total, we conducted 4 experiments (two groups per map shape; one blocked and one semiblocked) involving a total of 182 participants, of whom 92 learned grid maps (49 blocked, 44 semi-blocked) and 90 learned torus maps (46 blocked, 44 semi-blocked). Each cohort started with 50 participants and final group numbers depended on the participant attrition rate across the three days of the experiment.

We model data from all four studies in a single statistical model, entering map shape (grid vs torus; between subjects, Figure 1c), our main between-subjects factor (fully blocked or semiblocked, Figure 1c) and our within-subjects factor (disjoint or grouped, Figure 1a) as predictors in linear mixed effects models. For a table of all model results, as well as all results split up by map shape and group, please see Supplementary Tables 1–3. Note that participants in the torus groups had lower accuracy on almost all cognitive map learning measures than participants in the grid groups, even when the necessarily lower training accuracy due to the lack of edges was taken into account (Supplementary Table 1).

Training accuracy was higher in the grouped condition

During one-step training, participants were significantly more accurate on the map with the grouped curriculum than the disjoint curriculum (see Figure 1, Figure 2 and Methods). Overall training accuracy for all 1-step transitions on days 1, 2 and 3 (refresher) was higher for the map learned via a grouped than a disjoint curriculum (β = -.08, z = -9.32, SE = .01, p_perm <.0001; Figure 2e; M_grouped =.91(SD .10), M_disjoint = .83 (SD .11)). The same effect was observed when the data were split into initial training (days 1 or 2: β = -.09, z = -9.69, SE = .01, p_perm <.0001, M_grouped =.91(SD .10), M_disjoint = .82 (SD .11)) and refresher (day 3: β = -.05, z = -5.72, SE = .01, p_perm <.0001, M_grouped =.93,(SD .09), M_disjoint = .87 (SD .12)). This accuracy advantage for the grouped curriculum is perhaps not surprising, as learning spatially proximal transitions (within rows and columns) together may be mutually reinforcing.

One possibility is that higher training accuracy on the grouped map occurs because subsequent trials were presented as a semi-random walk along a row or column; meaning that participants would frequently have to repeat the same action (e.g. go right) multiple times in a row (see Methods). To test whether the higher training accuracy could be attributed to purely response repetition, requiring less attention from the participant, we conducted a supplementary control experiment. In this, we compared grouped curricula in which rows and columns were learned strictly in order (e.g., A-B; B-C; C-D) or where they were shuffled within a row or column (e.g., B-C; A-B; D-C). In this additional study, we observed no difference between the ordered and shuffled grouped curriculum for training or test accuracy, suggesting that the grouping rather than response coherence was driving the effect (Supplementary Results 2).

There was no effect of blocking (blocked vs. semi-blocked condition) on training accuracy and no significant interaction (both p > .05, see Supplementary Table 1 for details), suggesting that one-step training accuracy did not vary as a function of the order of the blocks of training trials and that the effect of our within-subjects manipulation (disjoint vs grouped) on training accuracy was not dependent on the order of the blocks of training trials.

Multi-step navigation was more accurate after training in the disjoint condition

On day 3, following the refresher on the one-step training, participants navigated to a goal state that was 2 steps away from the start state, so that two directions had to be chosen in succession to successfully complete the journey. During the two-step navigation, participants were given feedback on their choices in the form of state-transitions, i.e. after they had made their first choice, the current node would change. They were not able to make any more than two choices, however, meaning participants needed to make two correct successive choices to successfully reach the goal node. This test phase was identical for both maps, so any differences must be attributable to the curriculum used to learn each map. Two-step navigation was significantly more accurate for the map learned using the disjoint curriculum, relative to the grouped curriculum (β = .09, z = 4.39, SE = .02, p <.0001; M_grouped = .56 (.28), M_disjoint = .64 (.29); Figure 3a). This effect was driven by a difference in accuracy on both the first (β = .04, z = 3.94, SE = .01, p <.0001, M_grouped = .74 (.19), M_disjoint = .78 (.19)) and second (conditional on first choice correct: β = .06, z = 4.43, SE = .01, p <.0001, M_grouped = .71 (.22), M_disjoint = .76 (.24)) step of two-step trials. As in training, we observed no effect of blocking on two-step accuracy (fully vs. semi-blocked, β = -.05, z = -1.10, SE = .04, p = .272, M_blocked = .57 (.29), M_semi-blocked = .62 (.28)) and no interaction between grouping and blocking (β = -.02, z = -0.80, SE = .03, p = .426).

Participants are significantly better at multi-step planning and inference after disjoint compared to grouped training.
a average two-step navigation accuracy across both types of two-step trials (y-axis), markers represent average accuracies, vertical black lines are standard error of the mean (SEM) bars, horizontal grey lines are accuracies for individual participants, red horizontal lines represent accuracy at chance; N.B. the same applies to the graphs in b, d, f, g and h b participants’ accuracy on diagonal two-step trials; c illustration of the two types of two-step trials (*top*) as well as the direction of difference for both types (higher accuracy after disjoint than grouped training, *bottom*) d average two-step navigation accuracy for straight two-step trials e illustration of the significance of difference in accuracy on the different proximity judgement types, *** p<.0001, ** p<.01, * p<.05, ~ n.s; black frame on grid indicates example central node, red frame on grid indicates two example comparisons for proximity. These icons are repeated in f and g f accuracy on proximity judgement task in grid groups; grid icons above the average markers indicate the type of proximity trial g accuracy on proximity judgement task in torus groups; grid icons above the average markers indicate the type of proximity trial h map reconstruction accuracy at the end of testing in the grid groups, y-axis shows Spearman correlation between actual grid maps and reconstructed maps. *Red line indicates accuracy when performing at chance.*

This advantage for the map trained with the disjoint curriculum was observed for diagonal (or “L”) trials (where the goal was diagonal to the start, and navigation involved integrating knowledge from temporally separate row and column training; β = .13, z = 5.07, SE = .03, p <.0001, M_grouped = .50 (.32), M_disjoint = .61 (.32), Figure 3c). Perhaps surprisingly, it was also observed for straight (or “I”) trials, where the two steps required were in the same row or column; β = .05, z = 2.20, SE = .02, p =.03, M_grouped = .62 (.27), M_disjoint = .66 (.29) (Figure 3d; see also Supplementary Table 1). However, using an expanded mixed linear effects model where type of two-step trial (diagonal vs. straight) was an additional predictor, we found a significant interaction between two-step trial type and our within-subjects manipulation on two step navigation accuracy (β = -.08, z = 5.03, SE = .02, p =.02), with the difference between disjoint and grouped map accuracy being greater for diagonal than straight trials during 2-step navigation (see Supplementary Table 1 for full model results).

To further investigate why the disjoint training curriculum resulted in better multi-step navigation, we also analysed the nature of the errors made. Specifically, we investigated whether participants were more likely to perform two steps in the same direction (in line with staying within a row or column) or take a corner on trials where they failed to reach the intended goal. We observed a weak trend for participants’ errors to consist of a greater proportion of ‘straight line errors’ after grouped than disjoint training (β = -.11, z = -1.95, SE = .06, p = .05, M_grouped = .37 (.25), M_disjoint = .32 (.25)). Coupled with the interaction between curriculum (grouped vs. disjoint) and trial type (diagonal vs. straight) on accuracy, this may imply that after grouped training, participants struggle to integrate information across rows and columns, i.e. across groups of transitions that were trained separately in time.

Although we did not provide overt feedback during 2-step navigation, participants were shown the state that resulted from their first transition, and so could in theory have learned during the test phase. However, we observed no significant differences in accuracy between the first and second block of 2-step navigation, and no significant interaction between block and either our grouping or blocking curriculum interventions (all p-values above an alpha level of 5%).

Proximity judgements were more accurate after training in the disjoint condition

We also measured how well participants were able to mentally reconstruct the physical layout of the map after each of the training curricula. After having completed the navigation part of the test phase, we asked participants to perform an additional task that involved making proximity judgements. Participants were shown an image (corresponding to a node) centrally on the screen and then asked to choose which of two other images (nodes) was closer to the central image on the map they had learned (Figure 3e). As with the two-step navigation, participants were more accurate at proximity judgements for the map learned with the disjoint (rather than grouped) curriculum (β = .05, z = 3.79, SE = .01, p_perm <.0001; M_grouped = .68 (.15), M_disjoint = .74 (.15)).

Between proximity trials, we varied the distance of the two nodes participants chose between relative to the starting node. There was a significant advantage of disjoint over grouped training for the 2-away (diagonal) vs 3-away (diagonal; Figure 3 e-g, β = .04, z = 2.09, SE = .02, p = .04, M_grouped = .65 (.19), M_disjoint = .70 (.18)) and 2-away (straight) vs 1-away (β = .08, z = 3.98, SE = .02, p_perm <.0001, M_grouped = .69 (.18), M_disjoint = .77 (.17)), but not for the 2-away (diagonal) vs 1-away judgements (p >.05, M_grouped = .71 (.18), M_disjoint = .76 (.16); all Figure 3 e-g). There was no effect of whether participants learned each map with disjoint or grouped training on their preference for judging straight or diagonal two-steps as closer (2 (diagonal) vs 2 (straight); β = .001, z = 0.03, SE = .02, p = .98, M_grouped =.47 (.18), M_disjoint = .47 (.19), see also Supplementary Table 1). Overall, this is consistent with the results from multi-step planning, suggesting that participants had integrated the learned paired associations into a cognitive map more successfully after disjoint than grouped training.

To test participants’ ability to reconstruct the maps based on different training curricula, we also conducted an additional test in which we asked participants to drag and drop the items into an arena in a way that matched the organisation of the map (in the grid cohorts). On average, participants were able to perform this task above chance for both maps and in all groups, as assessed by comparing the Fisher z-transformed correlation of their reconstructed map to the actual map to zero (blocked: M_grouped = .74 (.36), t(48) = 11.96, p<.0001, M_disjoint = .79 (.37), t(48) = 13.53, p<.0001; semi-blocked: : M_grouped = .77 (.29), t(42) = 12.29, p<.0001, M_disjoint = .80 (.31), t(42) = 13.14, p<.0001). However, while participants were numerically better at reconstructing the maps learned with disjoint training (M_diff =.04, SD_diff =.19), this difference was not statistically significant (β = .02, z = .75, SE = .03, p = .46).

Discussion

How humans and other animals learn mental maps has been intensively studied. However, training conditions that might facilitate the acquisition and use of relational knowledge are not well understood. We tested how curricula that grouped adjacent transitions in time (grouped vs. disjoint curriculum) impact human learning, as indexed by subsequent multi-step navigation performance. We found that disjoint training was most effective. This is perhaps surprising, because other recent studies have emphasised that correlating information in time can be helpful for the acquisition of associative and categorical information.

The most interesting explanation is that grouped training is detrimental because it encourages participants to learn “fragments” of knowledge that are encoded distinct from each other (corresponding to individual rows and columns of the map), whereas disjoint training encourages the learning of a more integrated relational map. Importantly, we controlled how trained associations varied within grouped and disjoint conditions, with disjoint conditions similarly organised into mini-blocks in which a small set of transitions were repeated in temporal proximity. Thus, the only difference between the conditions was how the states that were associated during training were arranged geometrically in the map. For example, in the grouped condition, participants might learn associations A-B and B-C in block 1, and F-G and G-H in block 2, encouraging them to form two “fragments” (A-B-C and F-G-H). By contrast, in the disjoint condition, participants could in theory learn A-B and F-G in block 1, and B-C and G-H in block 2. In other words, by integrating across blocks participants could form a “big picture” (A-B-C) indicating how the map was organised. In particular, it is widely accepted that paired associate inference can occur over time when a new association (e.g. B paired with C) elicits memory for a previous association (A-B), and the recall process provides a window for neural plasticity to allow A to be paired with C via B (Ghandour et al., 2019; Josselyn & Tonegawa, 2020; Liu et al., 2012; Schacter & Loftus, 2013; Yokose et al., 2017). In our disjoint condition, this process could have allowed for associative links to be formed between temporally distant memories, facilitating an integrated map representation and ultimately leading to superior 2-step navigation.

That being said, we report two observations that might be considered incompatible with this explanation. Firstly, if grouped training encouraged the formation of disconnected “islands” of knowledge corresponding to grouped states within rows and columns, then we might expect test trials that required navigation within a single row or column to be more effective in this case. Surprisingly, however, even the straight navigation trials (which involve movement along two successive states of a single row or column) were performed better in the disjoint condition, even if this effect was significantly weaker than for diagonal trials. Secondly, if rows and columns forming unconnected fragments is the culprit, then we might expect performance to be disproportionately worse on grouped training in the blocked condition (where they are learned in two entirely distinct halves of the training phase) relative to the semi-blocked condition (where they could in theory be encountered in adjacent mini-blocks). However, we saw no interaction between curriculum and blocking schedule in our study. Thus, we advance the “fragments” hypothesis somewhat tentatively.

A less interesting explanation for the advantage of disjoint training is that training in the grouped condition was simply easier as responses on successive trials built on each other, causing participants to try less hard or disengage during initial learning. We cannot rule out this explanation entirely, but point out that we did not observe any differences in training or test accuracy between a control version of this curriculum where response repetition was possible as opposed to impossible (while keeping adjacent transitions close together in time, Supplementary Results 3) as this explanation would predict if it were acting in isolation.

The results reported here, in which grouping in space and time seems to hurt (rather than help) learning, are consistent with a long tradition showing that in applied domains (e.g. educational settings, memorisation or motor learning), mixing different exemplars over time (“interleaving”) is beneficial to knowledge or skill acquisition (Goode & Magill, 1986; Rohrer & Taylor, 2007; Simon & Bjork, 2001; Taylor & Rohrer, 2010). These reports are consistent with established theories, such as complementary learning systems theory, which propose that mixing associations through mental rehearsal (or replay) allows us to form integrated knowledge structures. This benefit of mixing is in the spirit of the explanation provided above. However, the findings do contradict recent reports from the category learning literature, where blocking seems to help learning and generalisation (Dekker et al., 2022; Flesch et al., 2018; Noh et al., 2016). It may be that where the goal is not to learn a complex knowledge structure – like a map – but simply to compress exemplars by mapping them onto a smaller number of labels – the benefits of blocking emerge. Indeed, one recent paper has argued (in simulation) that blocking should help learn generalisation more than memorisation (Russin, Pavlick, & Frank, 2025).

We did not find any evidence for a significant effect of curriculum on participants’ ability to reconstruct the 2D maps using a drag and drop interface. This may have been because seeing all stimuli on screen allowed participants to use their knowledge of directly trained associations to integrate stimuli into a unified map based on logical reasoning and knowledge of the desired map shape. Unfortunately, we did not record the process of map reconstruction, otherwise the sequence in which participants reconstructed the map by moving the icons may have allowed us to test this directly.

In this paper, we have studied the impact that different orderings of training stimuli have on how cognitive maps are learned and found that training curriculum seems to have a large impact on the resulting cognitive map. Unlike in other learning tasks such as categorisation, not factorising associations into components such as rows or columns during initial exposure seems to be beneficial for subsequent map integration and adaptive behaviour. Our findings have important implications for both the design of future research and for how we interpret existing research. When learning new concepts, maps, or relations, we need to think carefully about how we can optimise our learning and how our chosen curriculum might affect what we - or our participants - actually learn rather than what we want them to learn.

Methods

Design

In this section, we will provide an overview of the task design and manipulations. More detailed information such as trial numbers, trial sequences and task details will be provided in the task and procedure section below. We conducted a three-day experiment, over the course of which each participant had to learn two novel maps of image associations each (Figure 1b, Figure 2a). The maps were grids (5×5) or tori (4×4) of states in which each state corresponded to a unique image with transitions between states in the up, down, left and right direction. Each of the two maps had a unique set of images assigned to it which either consisted of household items or fruit and vegetables (assigned randomly, Figure 1b). Participants learned these maps through trial-and-error from paired associate image associations on the first two days of the experiment (one map per day). On the third day, they completed a refreshed phase on both maps before being tested on their map knowledge. To assess the cognitive maps they had formed, we asked them to perform one and two-step navigation on the maps and make proximity judgements about sets of images. Both of these tasks required mental integration of the learned paired associates into a spatial cognitive map.

We conducted our experiment both using a 5×5 grid and a 4×4 torus to test whether any effects of our training manipulation (outlined below) held true across different geometric shapes and was not dependent on border or wall effects (see Figure 1c for illustration of how the border concern does not apply to the torus maps). N.B. that the number of transitions in each row or column are identical for a 5×5 grid and a 4×4 torus as the torus wraps around.

We manipulated the order in which participants experience the one-step transitions that make up the new cognitive maps to investigate how this affects their map learning and retention. For each of the map shapes, we employed a mixed 2×2 design (Figure 1b).

Our within-subjects manipulation pertains to the spatial sampling of transitions during training. Each participant learned one map in what we call the grouped curriculum and one map in the disjoint curriculum. In the grouped curriculum, the transitions experienced within each block of training were a pseudo-random walk along one row or one column of the map space (Figure 1a). The other map was learned using the disjoint curriculum, in which all transitions that made up the map were randomly shuffled and then divided into blocks of 8 transitions, meaning that each block of training subsequently consisted of 8 randomly sampled unique transitions from across the map (Figure 1a). Once assigned, which 8 transitions made up each unique block of disjoint training trials stayed consistent throughout the experiment to match the grouped training.

The between-subjects manipulation pertained to the temporal spacing of repetitions of transitions across blocks of training (Figure 1c). One group of participants per shape (grid, torus) would experience the blocks of training for both maps in a blocked fashion. In the grouped condition, this meant that all row or column blocks were experienced twice over before encountering the other dimension blocks twice over. It was introduced to further separate blocks of trials that may cause interference in time (see Results). In the disjoint condition, it meant seeing the first five (torus: 4) blocks twice over before experiencing the second five (torus: 4) blocks twice over. The other group of participants would experience the blocks of training in an interleaved fashion, which meant in the grouped condition that row and column blocks were interleaved until all blocks had been repeated twice and in the disjoint condition that the first five (torus: 4) and last five (torus: 4) blocks were interleaved and presented twice.

Task and procedure

We will explain the tasks making up the three-day experiment chronologically. The first two days and the first half of the third day consisted of the training phase and the second half of the third day was the test phase. Before starting the experiment on day 1, participants familiarised themselves with the controls of the task. Instead of using arrow keys, we asked participants to use the ASDF keys to indicate their responses. In this pre-experiment phase, participants had to learn their unique mapping of the ASDF keys onto the four directions left, right, up and down. This mapping stayed consistent throughout the task and was indicated using geometric shapes that remained on-screen throughout the task. In the familiarisation phase, one of the geometric shapes would appear on its associated square (left, right, up or down) and participants had to press the correct key out of ASDF to select the indicated direction. Participants performed this familiarisation phase until they had reached 90% accuracy on at least 26 trials.

Training phase

After completing the familiarisation phase, participants started the training phase. Participants learned one map on day 1 and the other map on day 2. The learning order of the grouped and disjoint maps was randomized across participants. On each day of training, participants in the grid groups completed 640 (torus: 512) training trials, split into 20 blocks of 32 trials (torus: 16 blocks of 32 trials). Each block of 32 trials consisted of 8 unique paired associates, repeated four times each. Each block of training trials was repeated twice over the course of training.

There were two different types of training trials. The layout was similar across these trials, but participants were required to make different decisions. In both cases, one stimulus (referred to as the first stimulus hereafter) from a given paired associate was presented at the centre of the screen, surrounded by four boxes corresponding to the four possible adjacent locations (Figure 2b).

In the first type of trial, referred to as select-action trials (Figure 2b), participants were shown the other stimulus (referred to as the second stimulus hereafter) from the associated pair at the bottom of the screen and were required to indicate its relative position with respect to the centrally presented stimulus (left vs. right or up vs. down). These trials comprised three out of four repetitions of each paired associate within a block. Feedback was provided via a coloured frame: if the response was correct, the second stimulus appeared in the selected location with a green frame; if the response was incorrect, the second stimulus appeared in the correct location, while the incorrectly selected location was highlighted with a red frame (Figure 2b).

The second type of trial, referred to as the select-stimulus trials (Supplementary Results 1), was introduced to increase participant engagement. Here, one location adjacent to the first stimulus (centrally presented) was highlighted, and participants were required to select which of two stimuli belonged in that location. If the response was correct, the correct stimulus appeared in the highlighted location with a green frame; if the response was incorrect, the correct stimulus appeared in the location, while the incorrectly selected stimulus was highlighted with a red frame (Supplementary Figure 1). These trials comprised the remaining one out of four repetitions per paired associate within each block.

Participants had up to 10s to indicate their choice on both types of training trial. This long timeout was mostly implemented to ensure people did not leave their desk for prolonged periods of time whilst completing the task at home. Positive feedback was shown for 2s and negative feedback was shown for 2.7s. Participants were shown their accuracy on the previous block in-between blocks and were allowed to take breaks in-between blocks. In the grouped condition, the second stimulus on trial t-1 became the central stimulus on trial t. In the disjoint condition, the central stimulus on trial t and second stimulus on trial t-1 tended to be non-adjacent/unrelated. Days 1 and 2 took approximately 90 minutes each.

After having learned the two maps on days 1 and 2 respectively, participants completed a shorter refresher of both maps on day 3. The refresher was identical to day 1 and 2 training, but instead of experiencing each block twice as during training (e.g. all row blocks twice, followed by all column blocks twice in the blocked condition), participants only experienced each block once (e.g. all row blocks once, followed by all column blocks once in the blocked condition), meaning each participant completed 10 (torus: 8) blocks of 32 trials per map. This refresher also took approximately 90 minutes (45 minutes per map). The order in which the maps were experienced in the refresher was the reverse of the initial training order, i.e. the map learned on day 2 was revisited first followed by the map learned on day 1.

Test phase

For the second half of day 3, which we consider the test phase, we assessed participants’ map knowledge using in three (torus: two) ways (Figure 2c-e): navigation test, proximity test and map reconstruction test. Participants completed these three tests in chronological order. The torus groups only completed the navigation test and the proximity test.

Navigation test

Participants were asked to navigate to goals that were either one or two steps away from a starting location. The one-step trials served as retention checks, as they resembled the selectaction trials during training except that participant chose among all possible actions instead of two. There were two types of two-step trials: straight trials, in which the start and goal locations came from the same row or column, and diagonal trials, in which they did not. It was not communicated to participants whether a two-step trial was a straight or diagonal trial. Each navigation block consisted of 25 two-step trials (12 straight and 13 diagonal) and 5 one-step trials presented in random order. Participants completed six navigation blocks in total, alternating between maps (60 trials of navigation per map, split into 2 blocks of 30).

On each navigation trial, participants would first see the goal stimulus with text underneath it indicating how many steps away from the start it was. After 500ms, the starting location would appear in the centre of the screen, surrounded by the four choice options. In the grid groups, if the starting location was by the edge of the grid, a wall icon would appear in the location(s) where the starting stimulus was wall-adjacent, and the affected locations could not be selected. Participants then had to take either one or two steps to reach the goal. They had 12s to make a choice and used the same controls as during select-action trials in training. As during training, they also had to first select and then confirm a choice (repeat keypress). As participants made choices, the central stimulus (current location) changed according to their choices and the counter with choices made and choices remaining updated accordingly. The trial terminated once the minimum number of steps required had been taken and participants could see whether they had reached the goal correctly or not by comparing the final location to the goal. Their final choice remained on-screen for 1.5s.

Proximity test

After completing the navigation test, participants commenced the second part of the test phase. In this part, they were asked to make proximity judgements to assess how well they had learned the two maps. On each trial, they were first shown a central stimulus (target) and then asked to indicate which of two other stimuli (probes) was closer to the central stimulus on the currently visible map (Figure 2f). Each probe had a defined distance (i.e., number of steps) from the target, resulting in a comparison between two distances on every trial. Participants were allowed a maximum of 12s to respond and no feedback was provided. Trials were split into four types according to the specific distance comparison involved, and each block had:

- 9 trials of 2-away (straight) vs 1-away
- 10 trials of 2-away (diagonal) vs 1-away
- 10 trials of 2-away (diagonal) vs 3-away (diagonal)
- 11 trials of 2-away (diagonal) vs 2-away (straight)

In the 2-way (straight) vs 1-away trials, for example, one probe was within the same row or column as the target and was two steps awa, while the other was one step away from the target. All were sampled randomly from all possible comparisons and presented randomly.

Map reconstruction test

The grid groups were asked to reconstruct the two maps by using their mouse and a drag and drop interface and we calculated a reconstruction accuracy score for each map.

Participants were incentivised with a bonus of 2.5p for each goal reached during the navigation phase and 2.5p for each correct proximity judgement (apart from the 2vs2 trials where there was no objectively correct answer), capped at a maximum bonus payment of £6.

Participants

A total of 182 participants (grid: 49 blocked, 43 interleaved; torus: 46 blocked, 44 interleaved) were recruited using Prolific Academic (https://www.prolific.com/) in 2022 and early 2023. All participants self-reported that they were between 19 and 40 years of age and resided in the United Kingdom. The mean age of all participants was 30.03 (SD: 5.88) and 80 identified as female. To ensure only motivated individuals completed our main task, we only invited participants who had previously passed a performance criterion on a brief attention task. The attention task was a short 6 minute 2-back task in which participants were instructed to press the space bar whenever the symbol presented two trials ago matched the symbol currently on the screen. Participants who detected at least 21/30 repetitions and had less than 10 false alarms on the 2-back task were invited to participate in the multi-day map learning experiment. Participants were compensated for their time at a rate of £10/hour, both for the attention and the main tasks. They were able to earn a performance-based bonus of up to £6 on the main task. All online experiments were approved by the Medical Sciences Research Ethics Committee of the University of Oxford.

Materials

25 clipart images of fruit and vegetables and 25 clipart images of common household items and furniture available under the common use license were taken from Google images and edited to have the same dimensions and background using Microsoft PowerPoint. 16 images were randomly selected from each category for each participant in the torus groups and all 25 images from each category were used for each participant in the grid groups. All stimuli used are displayed in Supplementary Figure 2, examples are displayed in Figure 1b.

Software

All experiments in this chapter were implemented in JavaScript (supported by php and html code) by modifying a custom lab-internal toolbox for running online experiments. The training curricula were pre-generated using Matlab 2019b. All statistical analyses were carried out in Python using the Pandas 1.2.3, NumPy 1.19, SciPy 1.60, Statsmodels 0.13 and Scikit-Learn 0.24.1 packages. All data figures were generated using Matplotlib 3.3.2 and Seaborn 0.12.

Statistical analyses

The majority of analyses presented in this paper were conducted using mixed linear models in the statsmodels.MixedLM package for python. All models had participants as the grouping variable. Most models used had the following format, with Y being the outcome variable of interest (usually accuracy)

Exceptions to this format were the following analyses.

The analysis conducted to determine whether there was an interaction between initial training or refresher (train feature) and our spatial manipulation was specified as

The analysis querying whether participants’ overall test accuracy increased from the first to the second test block was specified as

The normality of residuals was assessed for all analyses using the scipy.stats.normaltest function and if the residuals were non-normal at an alpha level of .01, a permutation test was conducted to determine the appropriate p-value. Where this is the case, p-values were denoted as p_perm. We used 1000 permutations and ran separate permutations for the different factors in the affected models. Where interaction effects were significant, the permutation was only applied to one of the two factors making up the interaction.

Map reconstruction scores for the grid groups were computed by first calculating a representational dissimilarity matrix (RDM) consisting of the Euclidean distance of each stimulus to each other stimulus in the reconstruction and then computing the spearman rank correlation score between the participant RDM and the actual veridical distance RDM. Fisher’s z transformed correlation scores were entered into a mixed effects model specified as

Additional information

Instructions

On days 1 and 2 and for the refresher on day 3, participants were instructed that they were travellers who had landed on an alien planet. To prove their goodwill, the aliens had tasked them with learning the layout of two of their planet’s moons (doughnut shaped moons in the torus groups and square fields lined with walls in the grid groups). The atmosphere on the moons was foggy, and thus participants would have to learn the position of objects relative to each other. Participants were informed that they would have to remember the learned information as they would be tested on it by the aliens on day 3 (but not told how) and that good performance on day 3 would be rewarded. They were shown an example trial sequence for training and asked some control questions about a small example torus in the torus groups to ensure they understood the concept of a torus shaped space. Before commencing the test phase on day 3, participants were informed that the aliens now wanted to test their knowledge and shown example trial sequences (with different sets of stimuli) for both test tasks as well as informed of the bonus payment that could be gained.

Post-task questionnaire

After completing each day of the task, participants were asked to indicate whether they had used any visual aids or written down any of the answers as well as which map(s) they had seen.

Curriculum sampling

Training curricula were pre-generated to ensure equal coverage of all paired associates in a row or column in the grouped condition. For ease of implementation, disjoint curricula were also pre-generated. For each participant, a pre-generated trial sequence for training and refresher was selected randomly out of 50 pre-generated sequences in the grid groups and 100 pre-generated sequences in the torus groups.

Supplementary Materials

Supplementary Results 1: select action training trials

Example of a select-stimulus trial
sequence in which the incorrect option was selected. Had the correct option been selected, the washing machine would have appeared with a green frame around it and the kettle would have disappeared at the time of feedback.

The variable dividing training trials into select-state and select-action trials was not saved due to a technical error for 17 of the torus group participants, which means that these results are based on observations from 165 participants.

Select-state trials

Participants’ training accuracy was significantly greater on the map that they learned using grouped than that learned with disjoint training (β = -.09, p <.0001). There was neither significant effect of the blocking manipulation on training accuracy nor a significant interaction of the blocking and spatial manipulation (both p>.05). When splitting this up into initial training and refresher, we found the same pattern of results, with participants’ training accuracy being significantly higher on their grouped than disjoint map (train: β = -.10, p <.0001; refresh: β = -.06, p <.0001) and no significant effect of the blocked manipulation nor an interaction (all ps >.05).

Select-action trials

Participants’ training accuracy was significantly greater on the map that they learned using grouped than that learned with disjoint training (β = -.08, p <.0001). There was neither a significant effect of the temporal manipulation on training accuracy nor a significant interaction of the temporal and spatial manipulation (both p>.05). When splitting this up into initial training and refresher, we found the same pattern of results, with participants’ training accuracy being significantly higher on their grouped than disjoint map (train: β = -.09, p <.0001; refresh: β = -.05, p <.0001) and no significant effect of our temporal manipulation nor an interaction (all ps >.05).

Supplementary Results 2: Results of the control experiment shuffled grouped vs continuous grouped

This control experiment was preregistered on OSF under this link https://osf.io/x8ju3/?view_only=df19ad57d03f40f098b5724bbfafb7fa. In brief, we preregistered and conducted a control experiment to assess whether differences in training accuracy (and the inference abilities) were indeed simply due to a reduced need for attention. Participants (N = 63) in the control condition also learned two maps, but this time both of them were spatially grouped. The difference between the two maps was that in one condition the transitions in a row or column were experienced as a pseudo-random walk (like in the main experiment), and in the other they were experienced in a shuffled manner. The latter meant that temporally adjacent transitions were not necessarily spatially adjacent. We hypothesised that we should see lower training accuracy in the shuffled grouped compared to the pseudo-random walk grouped map if the reason for the high training accuracy was the reduced need for attention rather than our grouping manipulation.

We did not find evidence for this, with there being no difference in training accuracy between the two curricula (β = -.01, SE = .01, z = -1.34, p =.18). To make sure that this was not due to cohort differences, we compared the training accuracy on both shuffled and continuous grouped to the grouped map training accuracy in the original blocked grouped cohort (N = 49). There was no difference between the continuous grouped condition accuracy in the control experiment and the original blocked grouped grid training accuracy (β = -.29, z = -1.39, SE = .20, p = .16). When compared to the whole behaviour-only cohort, there was also a significant interaction, with the difference between disjoint and grouped being significantly greater than between grouped continuous and grouped shuffled (β = .07 , z =5.92, SE = .01, p < .001).

If the disadvantage of the original grouped cohort was due to a reduced need for attention due to the trajectoriness of the condition, we would also expect to see higher inference accuracy in the shuffled condition. We do not see evidence for this in either 2-step navigation (diagonal: β = .03, z =1.5, SE = .02, p = .14; straight: β = -.00, z =-.17, SE = .02, p = .87) and neither 2-step navigation accuracy being significantly different from the matched blocked grouped grid-shape cohort (straight: β = -.23, z = -.85, SE = .27, p = .40; diag: β = -.22 , z = -.78, SE = .29, p = .44). Comparing to the whole original cohort, we also find a significant interaction for diagonal two-steps, with the difference between disjoint and grouped accuracy being significantly greater than between shuffled grouped and grouped accuracy (β = -.08 , z = -2.63, SE = .03, p = .008). We also do not find evidence for a significant difference between the shuffled grouped and original grouped condition in terms of proximity judgements (2v1 straight: β = .02, z =.73, SE = .02, p = .46; 2v1 diagonal: β = .01, z =.28, SE = .02, p = .78; 2v3: β = .03, z =1.19, SE = .23, p = .23), nor are the continuous grouped proximity judgement accuracies significantly different to the original blocked grouped grid accuracy (2v1 straight: β = -.28, z =-1.11, SE = .25, p = .27; 2v1 diagonal: β =-.27, z =-1.06, SE = . 26, p = .29; 2v3: β = -.24, z =-.90, SE = .26, p = .37). When comparing overall average proximity judgement accuracy in the control experiment to proximity judgement accuracy in the whole behaviour-only dataset, we also see a significant interaction, with the difference between disjoint and grouped being significantly greater than between shuffled grouped and original grouped (β = -.05, z =-2.67, SE = .02, p = .008).

While it is challenging to draw inferences based on null results, the combination of no difference between the shuffled and original grouped curriculum (with its accuracy being the same as the matched original data) and a significant interaction effect with our our main experiment curriculum effect suggests that a lack of a need to pay attention is not the main driver behind the worse inference performance in the grouped than disjoint condition

Full table of results for mixed effects model results reported in the main text + training accuracy split into initial training and refresher
All models in this table were specified as follows, with Y being the outcome variable of interest, cohort our between-subjects manipulation (blocked (1) vs semi-blocked (0)), condition the within-subjects manipulation (grouped 0 vs disjoint 1) and shape being torus (1) or grid (0). Y~ *cohort* * *condition* + *shape* + (1|*participant*)

Main text results split by torus and grid cohorts
Y~ *cohort* * *condition* + (1|*participant*)

Additional analyses reported in the main text
The analysis conducted to determine whether there was an interaction between type of two-step navigation trial and our spatial manipulation was specified as *accuracy* ~ *cohort* * *condition***trial type* + *shape* + (1|*participant*) The analysis querying whether participants’ overall test accuracy increased from the first to the second test block was specified as *accuracy* ~ *cohort* * *condition***test block* + *shape* + (1|*participant*) The results from these analyses are reported in the table below.

Pairwise comparisons (grouped vs disjoint) for the four experimental groups for the main text results

Data availability

Code for the analyses presented in this paper is available at https://github.com/LeonieGli/Glitz_et_al_2026 . Data is available at https://osf.io/592dj/overview .

Acknowledgements

This research was funded by a Wellcome Trust Discovery Award (https://wellcome.org/grant-funding/schemes/discovery-awards) (227928/Z/23/Z) to C.S., also supporting Z.L. It was also funded by UKRI (MR/W008939/1 to H.C.B.) and MRC (MR/W01971X/1 supporting L.G.). The Medical Research Council Centre of Research Excellence in Restorative Neural Dynamics is supported by MRC (UKRI 936). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. For the purpose of open access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.

Additional information

Author contributions

LG and CS designed research; LG collected data; LG analysed and visualised data; LG wrote the initial draft; CS, HCB, ZL and LG edited and revised the draft; CS and HCB supervised the project

Funding

Wellcome Trust (WT)

https://doi.org/10.35802/227928

Christopher Summerfield

UK Research and Innovation (UKRI) (MR/W008939/1)

Helen C Barron

UKRI | Medical Research Council (MRC) (MR/W01971X/1)

Helen C Barron

Significance of findings

Strength of evidence

Abstract

Introduction

Illustration of the study design.

Results

Task sequence and training accuracy.

Training accuracy was higher in the grouped condition

Multi-step navigation was more accurate after training in the disjoint condition

Participants are significantly better at multi-step planning and inference after disjoint compared to grouped training.

Proximity judgements were more accurate after training in the disjoint condition

Discussion

Methods

Design

Task and procedure

Training phase

Test phase

Navigation test

Proximity test

Map reconstruction test

Participants

Materials

Software

Statistical analyses

Additional information

Instructions

Post-task questionnaire

Curriculum sampling

Supplementary Materials

Supplementary Results 1: select action training trials

Example of a select-stimulus trial

Select-state trials

Select-action trials

Supplementary Results 2: Results of the control experiment shuffled grouped vs continuous grouped

Full table of results for mixed effects model results reported in the main text + training accuracy split into initial training and refresher

Main text results split by torus and grid cohorts

Additional analyses reported in the main text

Pairwise comparisons (grouped vs disjoint) for the four experimental groups for the main text results

All pairwise comparisons

All stimuli used

Data availability

Acknowledgements

Additional information

Author contributions

Funding

References

Article and author information

Author information

Leonie Glitz

Zilu Liang

Helen C Barron

Christopher Summerfield

Author Notes

Version history

Cite all versions

Copyright

Metrics