The hippocampus has long been implicated in both episodic and spatial memory, however these mnemonic functions have been traditionally investigated in separate research strands. Theoretical accounts and rodent data suggest a common mechanism for spatial and episodic memory in the hippocampus by providing an abstract and flexible representation of the external world. Here, we monitor the de novo formation of such a representation of space and time in humans using fMRI. After learning spatio-temporal trajectories in a large-scale virtual city, subject-specific neural similarity in the hippocampus scaled with the remembered proximity of events in space and time. Crucially, the structure of the entire spatio-temporal network was reflected in neural patterns. Our results provide evidence for a common coding mechanism underlying spatial and temporal aspects of episodic memory in the hippocampus and shed new light on its role in interleaving multiple episodes in a neural event map of memory space.https://doi.org/10.7554/eLife.16534.001
The hippocampus is one of the most extensively studied regions in the brain. However, two of its core functions, spatial navigation and episodic memory, have mostly been investigated in separate research lines (Eichenbaum, 2014). It has been suggested that the answer to the apparent duality in hippocampal function resides in a common mechanism that is required for both spatial navigation and episodic memory (Eichenbaum, 2014): the formation of an abstract representation of the external world, a memory space (Eichenbaum et al., 1999). While it is clear that such a map-like representation would be necessary for spatial navigation, it might be less obvious for episodic memory. Yet, episodic memory has been defined as the ability to recall events from one’s own life (Tulving, 1983) in a specific mode of retrieval that has been referred to as recollection (Eichenbaum et al., 2007) or 'mental time travel' (Tulving, 2002). This specific mode of retrieval makes it necessary that humans can, in their minds, re-create and re-experience episodes of their past by mentally navigating to the point when and where the episode happened, thereby retrieving the time and the place of past events. Notably, this implies that humans must be able to convert relationships between events, for example along the physical dimensions of space and time, into a mental representation so that the arrangement of events is appropriately reflected. In line with this idea, recent discoveries in rodent electrophysiology indicate that cells in the hippocampus code for events in space and time simultaneously (Kraus et al., 2013, 2015; Mankin et al., 2012) and provide evidence for the notion that memories are, in fact, stored in a multi-dimensional memory space (Eichenbaum et al., 1999; McKenzie et al., 2014). Findings from fMRI studies in humans also suggest that memories are dynamically integrated into mnemonic network representations along different dimensions (Collin et al., 2015; Horner et al., 2015; Kumaran and Maguire, 2006; Milivojevic et al., 2015; Preston and Eichenbaum, 2013; Schlichting et al., 2015; Shohamy and Wagner, 2008; Zeithamova et al., 2012). However, it remains elusive how inter-event relationships along multiple dimensions, such as space and time, are combined and converted into a multi-dimensional mnemonic event map, which might potentially support episodic memory.
In studies with rodents, the hippocampus has traditionally been associated with the representation of space (Burgess et al., 2002; Moser et al., 2008; O’Keefe and Nadel, 1978), i.e. knowing where events occurred. This spatial code is supported by specific neurons in the hippocampal formation such as place cells (O’Keefe and Dostrovsky, 1971), which increase firing rate when a specific place in an environment is traversed, and grid cells, which increase firing rate in multiple locations, organized in a hexagonal grid pattern (Hafting et al., 2005). Although much of this research has employed electrophysiology in rodents, recent studies using functional magnetic resonance imaging have pointed to similar mechanisms in humans (Doeller et al., 2010; Hassabis et al., 2009; Howard et al., 2014; Kyle et al., 2015; Vass and Epstein, 2013; Wolbers et al., 2007).
In addition to its role in spatial representation, the hippocampus is also known to be crucial for episodic memory in humans (Eichenbaum, 2014; Norman and O’Reilly, 2003; Scoville and Milner, 1957; Stark and Squire, 2000). While it has been acknowledged that episodic memory is inherently structured in a temporal manner (Tulving, 1985), our understanding of the role of the hippocampus for remembering temporal structure has been limited for a long time (Howard and Eichenbaum, 2015). Recently, descriptions of situation-specific cell-assembly firing sequences (Pastalkova et al., 2008) and the discovery of time cells in the rodent hippocampus (MacDonald et al., 2011) have sparked a renewed interest in the role of the hippocampus in temporal memory. Time cells have been shown to selectively increase their firing pattern at specific time points after a cue had been presented, while animals were waiting to respond to an odor (MacDonald et al., 2011). Interestingly, the cells exhibited this temporally coordinated firing pattern only in specific contexts. In a new context, e.g. when delay time was prolonged, some of the previously responding cells ceased to fire or fired at different time points while previously unresponsive cells suddenly began to code for elapsed time, a response pattern akin to spatial remapping of place cells (‘retiming’). Thus, rather than displaying a simple counting function or a generic delay signal, these cells seem to represent the specific temporal context of an episode, consistent with the temporal context model (Howard and Kahana, 2002; Howard et al., 2005). These findings have led to a re-examination of the hippocampus’ role in temporal memory in rodents and humans (Eichenbaum, 2014; DuBrow and Davachi, 2015; Ranganath and Hsieh, 2016) and to several recent neuroimaging studies in humans. For example, one study found that, in predictable as opposed to random sequences of items, items that are closer together in the sequence elicit increased neural pattern similarity (Hsieh et al., 2014), an effect which is dependent on the conjunction of item identity and order position rather than order position alone. Another study showed that items are represented differently within event boundaries than across event boundaries (Ezzyat and Davachi, 2014). Interestingly, participants’ judgment of temporal distance between a pair of items was systematically higher when the pair was separated by an event boundary compared to when it was within an event boundary (even though the actual temporal distance was the same for the two types of item pairs). Further, a subsequent behavioral judgment of across-boundary item pairs as being temporally close was associated with higher pattern similarity during the learning task when compared to across-boundary items that were judged to be far. However, the temporal sequences in these studies were investigated independently of the spatial relationship between the elements. Another recent report showed that spatial and temporal aspects of autobiographical experiences are coded within the hippocampus across various scales of magnitude, up to one month in time and 30 km in space (Nielson et al., 2015). Participants wore a camera over the course of four weeks, which automatically took pictures throughout the day. Later, participants were scanned with fMRI while reviewing these pictures and trying to recall what they depicted. While providing interesting insight into real-life autobiographical memory, there was little experimental control over the stimulus material with regards to visual properties and the degree of familiarity participants had with the locations. More importantly, none of the studies mentioned above compared changes in neural pattern similarity from before the acquisition of the spatial and temporal structure to after.
The present study tests the overarching idea that the hippocampus represents events within a multi-dimensional event map of memory space (Eichenbaum et al., 1999). More specifically, we will use the term 'event map' to refer to the mental representation of a complete set of interrelationships between events. In order to support an efficient and adaptive memory system, such an event map should be abstract, flexible and relational (Eichenbaum, 2014; Eichenbaum et al., 1999): Abstract in the sense that it represents aspects of the experience that go beyond a direct record of events, such as being able to extrapolate that taking a specific turn in a city will be a shortcut even before the actual experience has been made. An event map should further be flexible to allow for the representation of sudden changes in the world, such as roadblocks. Thirdly, it should be able to represent relationships along different dimensions concurrently and conjunctively, while still allowing to focus on one dimension depending on task situations, such as for example knowing that the spatial distance between two bus stops is short, but that it could take a long time to get to the destination during rush hour.
The goal of this study was to investigate whether experiencing multiple events within a spatio-temporal structure leads to the acquisition of a neural event map that is abstract, flexible and relational. We used a highly realistic first-person virtual navigation paradigm that led participants through a complex virtual city ('Donderstown', see http://www.doellerlab.com/donderstown/). The purpose of this task was to provide a learning experience for participants in which 16 objects were arranged consistently in a spatial and temporal structure, defined through the complex network of inter-object relations. We dissociated the dimensions of time and space through the use of teleporters, requiring a high level of flexibility in memory (see Figure 1A for details of the task). To ascertain maximal experimental control, the objects were shown repeatedly in random order before and after the learning task and all fMRI analyses were performed on data acquired during these independent scanning sessions. Knowledge of the spatio-temporal structure acquired during the learning task was assessed with a memory test after fMRI acquisition (see Figure 1B). Participants’ abstract representation of the event structure was estimated through all possible pairwise spatial and temporal distance judgments, including judgments which required higher-order inference because events were separated by multiple intervening events. The participant-specific spatial and temporal distance ratings from the memory test were then used to investigate changes in neural pattern similarity from before to after the learning task leveraging representational similarity analysis (RSA; Kriegeskorte et al., 2008). More specifically, we investigated whether increases in hippocampal pattern similarity co-varied with the remembered spatial and temporal event structure during the learning task.
Participants had to learn the temporal and spatial relationships between 16 objects placed in boxes along a route (see Figure 1 and Figure 1—figure supplement 1 for details) by repeatedly navigating along the specific route in the virtual city environment Donderstown. In order to ensure sufficient learning of the spatio-temporal trajectory, participants had to complete 14 rounds of the route. Proficiency in this virtual-navigation task (see Figure 1A) was assessed by analyzing improvements during the task as well as by investigating performance in the subsequent memory tests. Both the spatio-temporal learning task and the subsequent memory tests were done outside of the MR scanner, so neural activity during these tasks cannot be assessed.
Participants required on average 71.63 ± 13.75 (mean ± std) minutes for the task (range 52.67–113.06 min), showing high variability in navigation speed. When looking at the time it took participants to get from one object to the next (Figure 1—figure supplement 2) across the 14 route repetitions, we observed a rapid decrease of navigation times over the first 3–4 repetitions, with navigation duration roughly converging to the time it takes to walk from one box to the next (raw walking time). Navigation times during the last repetition were significantly shorter than during the first repetition (21.55 vs 11.60 min, T25 = 6.70, p<0.0001), see Figure 1—figure supplement 2 for more details. In sum, these data indicate that participants were able to learn the virtual route.
We assessed participants’ memory of the spatio-temporal structure of objects with three different memory tests after MRI acquisition. Firstly, participants were asked to freely recall all objects they encountered during the task. Secondly, they had to indicate the spatial and temporal distance between every possible pair of objects. Thirdly, participants were given a schematic map of Donderstown on the computer screen, shown the image of every object and asked to indicate the location of the box that contained this object by moving the mouse, see Materials and methods for more details on the memory tasks.
Participants recalled on average 13.08 ± 3.06 (mean ± std) of the 16 items. The order of free recall was more influenced by the temporal order during the task than by the spatial arrangement, as assessed by correlating the spatial and temporal distances between items during the task with the distance in order during free recall (mean R for spatial distance: −0.01 ± 0.16 [mean ± std]; temporal distance: 0.50 ± 0.40 [mean ± std]; T25 = 4.32, p<0.0003).
In this memory test, we asked participants to judge for every possible pair of objects how close together or far apart they had been presented during the learning task, both in space and in time. This yielded a participant-specific distance estimate for both the spatial and the temporal domain, effectively probing the participant’s mnemonic event map. Crucially, by asking participants to make distance judgments for every possible pair of items, we required them to infer the spatial and temporal distances of items that had not been directly experienced together in the task. For every participant, we compared the subjective distance judgments with the objective distance during the task. Because memory distance judgments were given on a scale from 'close together' to 'far apart' rather than in absolute terms (see Figure 1B), accuracy was tested by the goodness of fit between the actual spatial and temporal distances in the learning task and the estimated spatial and temporal distances. For temporal judgments, memory distances were significantly correlated with actual temporal distances in 24 of the 26 participants (p<0.05; R = 0.64 ± 0.29 (mean ± std), see Figure 2A for the correlation coefficients across participants and Figure 2—figure supplement 1 for participant-specific scatter plots). For spatial judgments, memory distances were significantly correlated with actual spatial distances in 21 of the 26 participants (p<0.05; mean ± std: R = 0.49 ± 0.29); Figure 2A). Thus, correspondence between actual and reproduced distances was very high for both space and time and slightly better for the temporal than for the spatial condition (T25 = −2.52, p = 0.019). We also examined the relationship between participants’ spatial and temporal distance judgments for a given pair of objects (even though the two factors were independent in the task, see Materials and methods). Indeed, we found that in 14 of the 26 participants, there was a significant correlation between their spatial and temporal distance judgments (p<0.05, R = 0.31 ± 0.29 [mean ± std]). Therefore, the two factors were not independent in most participants’ memory judgments and we addressed this in our fMRI analysis (see below).
To further investigate the relationship between spatial and temporal distance judgments, we set up two GLMs which model the impact of actual spatial and actual temporal distances on (a) spatial distance ratings and (b) temporal distance ratings, respectively (see Figure 2B). Across participants, both actual spatial distances and actual temporal distances explained variance in spatial distance ratings (beta for the factor actual spatial distance: 0.50 ± 0.28 [mean ± std], significantly different from zero across participants: T25 = 9.08, p<0.0001; beta for the factor actual temporal distance 0.17 ± 0.19 [mean ± std]; significantly different from zero across participants: T25 = 4.48, p<0.001). However, the factor actual spatial distance had a significantly greater impact on spatial distance ratings than the factor actual temporal distance (t-test between betas across participants for space > time: T25 = 4.06, p<0.001). Similarly, both actual spatial distances and actual temporal distances explained variance in temporal distance ratings (beta for the factor actual spatial distance: 0.16 ± 0.22 [mean ± std], significantly different from zero across participants: T25 = 3.58, p<0.01; beta for the factor actual temporal distance 0.65 ± 0.27 [mean ± std]; significantly different from zero across participants: T25 = 12.16, p<0.0001). The factor actual temporal distance had a much bigger impact on temporal distance ratings than the factor actual spatial distance (t-test between betas across participants for time > space T25 = 5.33, p<0.0001). Thus, while there was some 'cross-over' between the domain that should be rated and the respective other domain, the domain that should be rated had a greater impact on the judgments both for space and for time.
We also investigated whether errors in judging spatial and temporal distances (i.e. the difference between z-scored actual distance and z-scored remembered distance) were systematically related to the distance in the other dimension (see Figure 2C). Indeed, we found that errors in spatial distance ratings were correlated both with actual temporal distance and remembered temporal distances (Fisher z-transformed correlation coefficients tested against zero across participants; actual temporal distance: T25 = −5.39, p<0.0001; remembered temporal distance: T25 = −5.33, p<0.0001). Similarly, errors in temporal distance ratings were correlated with both actual spatial distances and remembered spatial distances (Fisher z-transformed correlation coefficients tested against zero across participants; actual spatial distance: T25 = −4.82, p<0.0001; remembered spatial distance: T25 = −5.04, p<0.0001). It is conceivable that the bias that the opposite domain has on the distance rating of the domain that should be judged depends on which domain is tested first, e.g. it could be hypothesized that actual spatial distance has a higher impact on errors in temporal distance ratings when spatial distance was probed first within a participant. Therefore, we repeated the error analysis described before, but this time split trials up depending on whether space was tested first or time was tested first for a given pair of items, and then correlated the errors with the distance in the other domain separately. Neither the spatial domain nor the temporal domain was differentially affected by a bias from the other domain (neither actual nor remembered) depending on test order (all p>0.09). This means that the order in which spatial or temporal distances were probed did not have an impact on biases in the distance judgments.Taken together, these behavioral results show that participants were mostly accurate in reproducing the spatial and temporal structure between objects that had not been directly experienced together, indicating that they successfully formed an abstract, relational event map. While spatial and temporal distance ratings were correlated with one another in some participants, there is no evidence that either the spatial domain or the temporal domain had a bigger impact on distance ratings than the other, and that the degree to which one domain was biased by the other did not depend on which domain was tested first.
On average, participants positioned the items with a distance error (expressed here as the ratio between the displacement error and the side length of the city map) of 0.193 ± 0.16 to the actual item location and performance was variable across participants (range 0.017–0.480, see Figure 2—figure supplement 2).
To assess the representational change as a consequence of the de novo acquisition of the spatial and temporal structure of events during the learning task, two independent picture-viewing tasks (PVT) inside the fMRI scanner preceded (PVT pre) and followed (PVT post) the spatio-temporal learning task (see Figure 3 for an outline of the experimental sessions). In these two PVT fMRI blocks, participants saw pictures of the same 16 objects that were also presented during the spatio-temporal learning task (see Figure 1—figure supplement 1). Objects were presented multiple times on a black background, in random order, and participants were asked to press a button whenever they saw a target object (see Materials and methods).
The rationale behind the picture-viewing task was to assess the neural pattern similarity between pairs of objects without possible confounds of stimulus presentation during the learning task. If we had assessed pattern similarity for objects while they were presented in Donderstown during the learning task, analyses would have been susceptible to visual confounds for spatially close items (sharing more similar views of the environment) and possible auto-correlation confounds for temporally close items due to the slow hemodynamic response (i.e. temporally closer volumes will always be more similar to one another) or effects related to head movements. Analyzing pattern similarity in this independent task, when the objects were shown out of context and in random order, gave us high experimental control. Furthermore, both sessions, PVT pre and PVT post, were identical with respect to stimulus order and timing. Any changes in pattern similarity from PVT pre to PVT post (PS’) are thus due to a changed neural representation of objects as a result of the spatio-temporal learning task and the newly formed memories. Therefore, we related the difference in pattern similarity from PVT pre to PVT post (PS’) to the remembered temporal and spatial distances, both in a region of interest (ROI) analysis and a searchlight analysis (see Figure 4 and Materials and methods for details on analyses and nonparametric statistical procedures). We pursued these approaches in parallel because they offer complementary advantages: the ROI approach allows for rigorous testing of a clear a priori hypothesis, while the searchlight approach allows us to identify possible regions outside of hippocampus that show the same effect, as well as to pinpoint any effect more locally within hippocampus.
We hypothesized that the hippocampus would support spatial and temporal event memory and, importantly, the combination of both. Therefore, in a first step, we investigated whether the change in pattern similarity across all hippocampal grey-matter voxels was related to participants’ spatial and temporal distance judgments for pairs of items (see Materials and methods for details). We found that pattern similarity changes in bilateral hippocampus reflected participants’ spatial distance judgments (Z = −3.719, pFDR=0.0005; FDR correction for 15 multiple comparisons, see Methods and materials), as well as their temporal distance judgments (Z = −2.597, pFDR = 0.0078), see Figure 5. More specifically, pairs of objects which were recalled as being close together either in space or time during the task had higher pattern similarity increases across all hippocampal grey-matter voxels.
However, behavioral analyses had revealed that some participants’ spatial and temporal distance judgments were correlated, even though spatial and temporal distances between items were designed to be independent in the task. It could be that the similar effects we find for spatial and temporal memory are only due to the correlation of the two in participants’ ratings, i.e. that one of the domains has no unique contribution to the pattern similarity increase. Therefore, we investigated in an additional analysis whether there were separate contributions of the two factors: First, we removed variance explained by spatial distance judgments from the pattern similarity changes in pairs of items in a GLM, and correlated the residuals from this model (i.e. what could not be explained by spatial distance judgments) with temporal distance judgments. We found that these residual pattern similarity changes still correlated with temporal distance judgments in bilateral hippocampus (Z= −1.805, pFDR = 0.041). Similarly, when we removed the influence of temporal distance judgments first, the residuals still correlated with spatial distance judgments (Z = −3.719, pFDR = 0.0005). These results suggest that both the dimensions of space and of time contribute to the observed pattern similarity increases.
As outlined above, the main goal of this study was to investigate how the spatial and the temporal aspects of an experience are combined to form a common multi-dimensional event map, into which events can be integrated. Therefore, we correlated the combination of spatial and temporal distance judgments with pattern similarity changes in the hippocampus (i.e., we took the product of the two distance ratings, with the lowest values reflecting proximity in both dimensions and the highest values reflecting high distance in both dimensions). We found that the combination of spatial and temporal distance judgments was indeed associated with hippocampal pattern similarity changes (Z = −3.719, pFDR = 0.0005). Thus, the spatial and temporal event structures are not only represented separately, but they are also combined in the hippocampus, providing evidence that these two dimensions are flexibly integrated to form a spatio-temporal event map.
Next, we investigated whether right and left hippocampus were differentially involved in representing time and space. Therefore, we computed the five models described above (spatial distance, temporal distance, spatial x temporal combination, temporal distance with effects of spatial distances removed, and spatial distance with effects of temporal distanced removed) separately for voxels in the right and left hippocampus, respectively. We found that pattern similarity across voxels in right hippocampus was significantly correlated with all five factors (all pFDR<0.041, significant after FDR correction for 15 multiple comparisons, see Materials and methods). However, for left hippocampus, only spatial distance judgments (with and without effects of temporal distance judgment removed) and the combination of spatial and temporal distance judgments were significantly correlated to pattern similarity changes (all pFDR<0.011), while temporal distance judgments were not (neither with nor without effects of spatial distance judgments removed).
So far, we have provided evidence that the hippocampus is involved in representing and integrating spatial and temporal relationships of multiple events. But are there regions in hippocampus that are more involved in this effect, and are there any other brain regions that show the same pattern? To address this question, we performed a searchlight analysis over all voxels in our field of view (see Materials and methods). In this approach, a 9 mm sphere is formed around a center voxel and pattern similarity is assessed for all voxels included in that sphere. Moving the center of this sphere consecutively over all possible voxels yields information about fine-grained local effects (see Materials and methods for details).
As can be seen in Figure 6, a cluster in right medial to anterior hippocampus showed increased pattern similarity effects for objects that were remembered as being temporally close together in the task (peak MNI:26/−18/−22, T25 = 4.05, pcorr = 0.0178, small volume correction, see Materials and methods). This peak was the global maximum in our acquisition volume and no other effects survived correction for multiple comparisons at a threshold of pcorr <0.05. There were no significant effects observed for the opposite contrast (at pcorr <0.05). Taken together, these results show that temporal relationships between events in episodic memory are reflected in pattern similarity changes in a cluster in right hippocampus extending from the medial to the anterior part.
Next, we performed the same analysis for remembered spatial distances. Again, a cluster in right medial to anterior hippocampus showed the highest pattern similarity increases for objects, which had the smallest spatial distance in the learning task (see Figure 6, peak MNI:34/−17/−22, T25 = 5.16, pcorr = 0.0046, small volume correction). This peak was again the global maximum. No other effect survived correction for multiple comparisons at a threshold of pcorr <0.05. There were no significant effects observed for the opposite contrast (at pcorr <0.05). The only observed cluster was thus again located in medial to anterior right hippocampus.
Our results so far show that partly overlapping regions in right medial to anterior hippocampus represent the spatial and temporal distance between pairs of items. Again, we were interested whether this might be caused by the fact that the two dimensions were correlated in participants’ responses during the memory task. As already described in the ROI approach, we removed the influence of the second factor in an additional analysis: Before correlating our PS’ matrix with the temporal distance matrix, we removed the effects of spatial distance with a GLM and continued the analysis with the residuals. Conversely, before we correlated our PS’ matrix with the spatial distance matrix, we removed the effects of temporal distance with a GLM. We investigated how this procedure affected our searchlight findings in the hippocampus (for peak voxels: MNI 26/−18/−22 for time and 34/−17/−22 for space, respectively). For the temporal structure analysis, the effect was still significant, albeit slightly weaker, after removing the influence of spatial distance (T25 = 3.22, puncorr = 0.0018). The same was true for the peak voxel of the spatial structure analysis after removing the influence of temporal distance (T25 = 3.87, puncorr <0.0004). Thus, part of the overlap in peak regions for space and time may be explained by the two factors being correlated in participants’ memory judgments, but there are also significant effects of space and time after statistically removing the influence of the other factor, suggesting that the two dimensions both contribute to pattern similarity increases in right medial to anterior hippocampus.
Again, we investigated how space and time are integrated to form an event map and assessed correlations between pattern similarity increases and the combined remembered spatial and temporal distances between items by using the product of spatial and temporal distance judgments. We found a significant effect in right medial to anterior hippocampus (see Figure 6B, peak MNI: 32/−17/−22, T25 = 6.07; pcorr < 0.0001, small volume correction). This indicates that pattern similarity in this region increased strongly when items were close together both in the spatial and temporal domain.
So far, we investigated how the remembered spatial and temporal relationships between items are reflected in pattern similarity increases in the brain. However, in our task there are also objective spatial and temporal relationships between items, independent of how participants remembered them. We defined objective spatial distances between items as the Euclidean distance and the actual temporal distance between items as the median walking time from one item to the next across all repetitions of the route during the navigation task. We then tested the impact of objective spatial distance, objective temporal distance and the combination of the two in an ROI analysis. We found that objective spatial distances between pairs of items were associated with an increase in pattern similarity across all gray-matter voxels in bilateral hippocampus (Z = −2.85, pFDR = 0.02, corrected for 9 comparisons: 3 ROI × 3 conditions). Interestingly, no significant effects were found for objective temporal distance and the combination of spatial and temporal distance.
As objective spatial and temporal distances were designed to be independent from one another, we did not need to control for the other factor as we did for the remembered distances. Instead, we made use of the full factorial setup of the objective distances and tested space (high vs low) against time (high vs low) with a 2-way repeated measures ANOVA. We found that only the factor ‘space’ was significant in bilateral (F1,25 = 8.29, p=0.008) hippocampus and in left hippocampus (F1,25 = 8.84, p=0.006), while neither the factor ‘time’ nor the interaction were significant in any of the three ROIs.
These results suggest that there might be a different pattern of results for objective spatial and temporal distances as compared to remembered spatial and temporal distances. While we think that the remembered distances more accurately reflect the notion of an event map, it would certainly be very interesting to investigate possible differences in the representation of objective distances in future studies, maybe by systematically increasing divergence between objective distances and remembered distances through experimental manipulation.
In this study, we investigated the neural mechanisms underlying the formation of a de novo representation of multiple events embedded in a spatial and temporal context. We used a realistic virtual reality task to induce spatial and temporal interrelations between events and assessed the ensuing change in neural pattern similarity with fMRI. We found that neural similarity in the hippocampus after learning the spatio-temporal event structure scaled with the proximity of event memories in space and time, providing evidence for a mnemonic event map in the hippocampus.
The hippocampal formation is the key region for coding memory for space and time in rodents (MacDonald et al., 2011; Moser et al., 2008; O’Keefe and Nadel, 1978; Pastalkova et al., 2008) and humans (Chadwick et al., 2015; Doeller et al., 2010; Ezzyat and Davachi, 2014; Hsieh et al., 2014; Kyle et al., 2015; Miller et al., 2013; Nielson et al., 2015; Schapiro et al., 2016). It is also essential for forming episodic memory, a hallmark of which is vivid recollection, or 'mental time travel' (Tulving, 2002). However, the two mnemonic functions have hitherto mostly been investigated in isolation, even though it has been suggested that the hippocampus supports these different functions via a common underlying mechanism, namely, the representation of multi-dimensional inter-event relationships in memory space (Eichenbaum, 2014; Eichenbaum et al., 1999). Here, we take a first step towards demonstrating such a common coding mechanism in the human hippocampus by showing that both spatial and temporal relationships between events might be represented by a similar mechanism.
Furthermore, our study goes beyond previous findings in several important ways: Firstly, we investigate the de novo formation of a spatio-temporal event map by tracking the task-induced changes in pattern similarity from a pre-task baseline scan to the post-task scan. While this approach enables us to control for the potential effects of stimulus and task confounds (see below), most crucially it allows us to investigate representational changes (through comparison of post- vs pre-acquisition effects) as a consequence of encountering a complex event structure during the learning phase.
Secondly, we directly relate the specific neural changes we observe to the interrelations of the memories that have been formed. We achieve this by mapping out the participant-specific mnemonic event-map for the newly acquired spatio-temporal structure in an extensive post-scanning memory test. Previous studies have related the strength of neural effects to markers of overall memory performance across participants (Hsieh et al., 2014), or have restricted their analyses to trials with self-reported recollection success (Nielson et al., 2015). One study reports higher pattern similarity during task trials for items which were later judged as ‘close’ as compared to items which were later judged as ‘far’ (Ezzyat and Davachi, 2014). However, all of the item pairs which entered the analyses were, in fact, separated by the same number of intervening trials in the task (i.e., two trials), limiting the complexity of the probed memory. Here, we probed all possible interrelations between the encountered events across spatial and temporal dimensions simultaneously. This allowed us to reconstruct, from these pairwise ratings, the full participant-specific temporal and spatial distance maps which were then used in the representational change analysis of our fMRI data.
Thirdly, we combine spatial and temporal aspects in our learning task and use teleporters to reduce overlap between spatial and temporal distances. This makes the task more complex and increases the level of abstraction required to accurately represent the events. Notably, all item pairs are defined by a specific spatial as well as a temporal distance. Thus, for solving the memory task, it is not sufficient to have a notion of two items belonging together or being close, but one needs to retrieve their spatial and temporal position in the task and estimate how close they are in the respective domain separately. It should be noted that space and time were intertwined to some degree in participants’ memory reports. However, the generally good fit between responses and actual distances and results from the additional analyses in which we statistically control for the influence of the other factor indicate that participants were able to represent the two dimensions separately, at least to a certain degree.
In this study, we introduce a novel experimental paradigm that allows us to investigate both spatial and temporal aspects of memory and combines a complex episode-like learning task with rigorous experimental control. So far, most studies in the field have investigated the neural underpinnings of memory either for space (Bellmund et al., 2016; Doeller et al., 2010; Iglói et al., 2010; Kyle et al., 2015; Vass and Epstein, 2013; Wolbers et al., 2007) or for time (Ezzyat and Davachi, 2014; Hsieh et al., 2014). However, in real life, episodes are always embedded in both spatial and temporal context, as recently demonstrated by a study using GPS and camera timestamps of snapshots of real-life experiences of participants to show that pattern similarity in the hippocampus is sensitive to spatial and temporal distances over large scales of magnitude (Nielson et al., 2015). We include both spatial and temporal aspects in our learning task to test, in a laboratory setting, how the two dimensions are represented after the de novo formation of an event map.
Another aspect of our task is that it induces spatio-temporal memories by exposing participants to a realistic, 3D virtual environment. This paradigm is well suited to mimic episodic memory formation, due to both the richness of experience and the active nature of the task, in which participants have control and agency over the to-be-encoded events. Volitional control is a crucial aspect of the hippocampal role in memory encoding (Voss et al., 2011) and a sense of self, in turn, may be an essential prerequisite for mental time travel (Tulving, 2002).
However, in a complex, realistic learning task, analyses might be prone to potential confounds. For example, items that are spatially proximal in the task (or in real life) probably also share a similar view, especially in the seconds walking up to the item, and would therefore inflate pattern similarity. Likewise, items that are temporally proximal would automatically have higher neural pattern similarity due to autocorrelations in the slow BOLD signal. To counter these potential confounds, we limit our fMRI analysis to the difference in pattern similarity between two separate blocks, which were scanned both before and after the task – an approach taken in several recent fMRI studies (Collin et al., 2015; Milivojevic et al., 2015; Schapiro et al., 2012; Schlichting et al., 2015). One strength of this approach is that by strictly focusing on the change in pattern similarity from PVT pre to PVT post we can exclude effects of temporal proximity between items in the PVT tasks and of a priori differences in neural pattern similarity that some pairs of items might elicit in individual participants, an aspect that cannot easily be excluded by using pairs of autobiographical photographs, for example.
Taken together, the combination of these advances gives us access to study the spatio-temporal organization of memory in humans – spanning a triad between an experimentally created objective 'external world' as simulated with our realistic, life-like task, a subjective representation of this external world in participants’ minds as assessed with the extensive memory testing, and the investigation of how this subjective representation is reflected in the brain as expressed in changes in neural patterns associated with an event map.
Are similar or different neural mechanisms supporting spatial and temporal memory? For space, the existence of place cells (O’Keefe and Dostrovsky, 1971) and grid cells (Hafting et al., 2005) has suggested that space is represented in an abstract manner, potentially in the form of a 'cognitive map' (O’Keefe and Nadel, 1978). In contrast, the representation of temporal structure has been discussed more in terms of analogous mechanisms: chaining models (Axmacher et al., 2010; Jensen and Lisman, 2005) argue that serial events are linked through pairwise binding between succeeding items (through LTP-like mechanisms) and that the recall of one item triggers recall of the subsequent item. The temporal context model (Howard and Kahana, 2002; Howard et al., 2005) suggests that an episodic element is 'tagged' to slowly changing, random neuronal background activity present at the time of encoding; this temporal context is then reinstated during recall and provides information about how long ago the episode was experienced by assessing the degree of disparity between the reinstated and the present neuronal background (i.e. the greater the disparity, the more time has passed). In both of these models, temporal structure in memory can be seen as a mere by-product of basic neuronal processing. However, the recent findings about internally generated sequential firing of neuronal ensembles (Pastalkova et al., 2008) and context-specific time cells (MacDonald et al., 2011, 2013) are consistent with the notion of a more active mechanisms in temporal memory, which might, in fact, be very similar to mechanisms in spatial memory (Howard and Eichenbaum, 2015). In humans, it has also been shown that hippocampal damage leads to impairments in both spatial and temporal memory tasks (Spiers et al., 2001; Konkel et al., 2008) and that the hippocampus is active during active retrieval of temporal sequences as well as spatial layouts (Ekstrom et al., 2011), even though dissociable networks for the two retrieval domains were observed outside of the hippocampus. In seeming contrast with our results, one study investigating pattern similarity during retrieval of spatially near versus far intervals and temporally near versus far intervals found an interaction effect in right hippocampus, with increased pattern similarity for spatially far compared to spatially near retrieval trials and the opposite effect for temporally near versus far retrieval trials (Kyle et al., 2015) whereas we find a pattern similarity increase both for spatially close and temporally close pairs of items. However, this discrepancy can probably be explained by methodological differences: we recorded our data not during active retrieval, but during two independent tasks in which participants had to passively view items, and then related the pattern similarity difference between these two tasks to an external behavioral marker, i.e. participants subjective distance ratings, which was collected at a later time point. Another study found decreased pattern similarity in hippocampal subfields CA2/CA3/DG between trials in a spatial retrieval condition and a temporal retrieval condition when both domains were correctly retrieved compared to trials when only one of the domains was correctly retrieved, and the opposite effect in parahippocampal cortex (Copara et al., 2014). Again, it is difficult to directly relate this study to our results, since the pattern similarity differences were found in a retrieval task and in hippocampal subfields, which we did not investigate in this study. However, it would be interesting to also acquire data during the memory test to examine whether actively retrieving spatial and temporal relationships affects neural pattern similarity. Our data support the notion of a common hippocampal coding mechanism in space and time: neural similarity scales with the proximity of event memories in both dimensions. Notably, while we found that spatial and temporal distance are to some degree correlated in participants’ memory, the observed effect is still present for each domain after statistically controlling for the effect of the other domain, suggesting that both space and time contribute to the observed pattern similarity increase, possibly in an additive manner. The observed strong effect for the combination of space and time further suggests that the two dimensions might be integrated in a common dimension in the hippocampus, i.e. spatio-temporal proximity, supporting the formation of hierarchical structures in a memory space (Collin et al., 2015; Eichenbaum et al., 1999; McKenzie et al., 2014). Here we provide evidence for a mapping of the entire event structure in the hippocampus. The hippocampal event-coding patterns are thus not restricted to representing event relationships per se, but rather scale with mnemonic distance in a spatio-temporal event map. One alternative explanation for the increased pattern similarity could be that in the second PVT participants covertly retrieve the environment in which they encountered the object in the learning task and that the effect might be partly related to the higher visual similarity in the imagined scene. Several points argue against this interpretation. Firstly, the views associated with nearby boxes are not necessarily very similar, for example due to rotations during navigation between the boxes or large buildings obstructing the view at one location but not the other. Secondly, we observe the increased similarity for temporally close items as well, which due to the teleporters are not necessarily spatially close. Thirdly, if the effects relied solely on similarity in visual scenes, we would expect to see very prominent pattern similarity increases in visual areas. However, no cluster survived correction for multiple comparisons outside of hippocampus. Therefore, we believe that our findings reflect memory for spatial and temporal relationships, rather than visual similarity.
One interesting finding here is that we observed a different pattern of results regarding the neural representation of objective spatial distances compared to remembered distances. It is conceivable that the spatial and temporal distances as they are remembered are more indicative of the event map which participants have formed, but the different pattern of results for the objective distances raises the interesting question how objective distances are translated into subjectively remembered distances, and how this is reflected in the neural representation. In our behavioral analyses we found that memory judgments in one domain were biased by the distances in the other domain, but no domain seemed to have a higher impact than the other. It is very likely that other factors in addition to objective spatial and temporal distance impact how a spatio-temporal event map is constructed and remembered, and it will be very interesting in future studies to identify these factors.
By showing that both the temporal and the spatial relationships between multiple events are represented in the hippocampus, we took a first step towards unraveling the link between the multi-faceted external world, participants’ memories of it and the neural coding mechanisms supporting the formation of a multi-dimensional mnemonic structure. Such event maps are likely not restricted to the physical dimensions of space and time. Elements in memory could be arranged according to a variety of factors, for example social aspects (Kumaran and Maguire, 2005; Kumaran et al., 2012; Tavares et al., 2015) or abstract concepts (Milivojevic and Doeller, 2013). It would be interesting to investigate in further studies whether nuanced differences in these dimensions can be read out in hippocampal patterns as well. Another exciting future avenue for research could be that – if mnemonic relatedness in participants’ minds is reflected in pattern similarity – one can reverse the logic and use participant-specific similarity maps to make inferences about their internal record of experience (Morris and Frey, 1997; Wood et al., 1999). In summary, the present study sheds light on the neural mechanisms supporting the formation of a spatio-temporal event map in memory by leveraging a novel, life-like learning task in combination with rigorous experimental control. More broadly, it may be a first step towards mapping out the representation of the external world in the human mind – here along physical dimensions, but potentially also along more abstract ones.
Based on an effect size of d = 1.03, which was found in a previous similar study from our lab (Milivojevic et al., 2015) in hippocampus, an alpha level of 0.001 (necessary for corrections for multiple comparisons in fMRI data) and power of 0.95, a sample size of N = 26 was calculated to be necessary using G*Power (http://www.gpower.hhu.de/). 26 participants signed up for the study through a University-wide online recruitment system. The mean age of the group was 24.88 ± 2.21 (mean ± std) and 11 were female. All participants underwent a familiarization phase in Donderstown, so they had good knowledge of the city (see below). All participants gave written informed consent, filled in a screening form to ensure they did not meet any exclusion criteria for fMRI and were compensated for their time. The study was approved by the local ethics committee (CMO Regio Arnhem-Nijmegen).
For the purpose of providing a realistic, life-like episodic learning experience, we developed a 3D virtual city environment using the Unreal Development Kit for Unreal Engine 3 (https://www.unrealengine.com/previous-versions). The city consists of a complex network of streets and features residential as well as commercial areas. Distances in the VR city are difficult to translate to real-world settings, because they are based on arbitrary units that depend on the exact scaling of the 3D meshes. Relating the eye-level height of the first-person player (assumed to be at 1.60 m) to these arbitrary units, one side length of the square city roughly translates to 390 m. Walking this side length takes approximately 36 s, putting the walking speed with 39 km/h well above normal walking speed. However, this was necessary to achieve a sufficient number of repetitions in reasonable time. See Figure 1 for a top-down overview of Donderstown and http://www.doellerlab.com/donderstown/ for further images. In the experiment, participants had to navigate through this complex virtual environment and judge distances in it (see below for details on the task). During piloting of this study we observed that it was difficult for most participants to accurately estimate Euclidean distances if the city environment was novel to them. Some participants showed signs of disorientation, especially with regard to those parts of the route in which they were teleported through the city (see below). As our hypotheses depended crucially on the use of these teleporters, we decided to only include participants with extensive prior knowledge of the city. Thus, all participants were required to have taken part in another study from our lab, which pre-exposed them to the virtual city, Donderstown. In this previous study, which took place on a different day (1–21 days prior to participation in the current study), participants had to learn the names and locations of specific houses in this city and estimate directions between these houses. The task was unrelated to the current task but exposed participants to Donderstown for approximately 2 hr and thereby ensured that they had formed a robust spatial representation of the city. This experimental session was crucial to ensure successful learning of the spatio-temporal trajectories. Notably, participants did not acquire any knowledge about the position of the wooden boxes or teleporters (see Figure 1 and below), as neither was present in the familiarization task.
The experiment consisted of five parts (see Figure 3), two of which took place in the MRI scanner. First, participants were asked to freely navigate through Donderstown for 10 min to refresh their knowledge of the city. This session was performed in front of a computer screen outside of the scanner. Secondly, participants were taken inside the scanner and performed the picture viewing task ('PVT pre'), in which 17 objects were presented 12 times in random order. Thirdly, they were taken out of the scanner and performed the route learning task in front of a computer screen in a behavioral lab. In this learning task, 16 objects were arranged at specific locations along a route and the spatial and temporal distances between them were varied independently in a 2-by-2 design. After the learning task, participants were taken inside the scanner again for the picture viewing task ('PVT post'), in which 17 objects were presented 12 times in the same random order as before. Lastly, participants performed three different memory tests outside of the scanner: a free recall test, a distance judgment test and a map test. All of the tasks are described in detail below.
The task is similar to playing a computer game and involves first-person navigation through a 3D virtual city environment, Donderstown. In the learning task, participants started at a specific point in the city and then had to follow a predefined route through the city. At the beginning, participants were unfamiliar with the route, therefore it was marked by presenting orange traffic cones in regular intervals. Participants’ task was to follow the traffic cones until they arrived at a wooden box; then they were required to touch the box. The box then opened and the content of the box was revealed by presenting a single object on a black background. After 2 s, the black screen with the object disappeared and participants continued to follow the route marked by the traffic cones. Participants encountered 16 different objects along the route, always hidden in a wooden box. Pictures depicted various every-day objects (e.g. a football, apple), the requirement being that they would reasonably fit inside the wooden box. The next box (and the traffic cones leading up to it) would only appear after the previous box had been opened (i.e. touched).
Crucially, at specific points during the route, participants encountered a teleporter after opening a box. When they touched this teleporter, they were transported immediately to a completely different part of the city, where they would encounter the next box to be opened. These teleporters were always in the same position along the route and created experimental situations in which the next box was opened after only a small temporal delay while maintaining long spatial distance between the two boxes. This was necessary for rendering the two factors of time and space independent of each other (see Figure 1—figure supplement 3 for a comparison of spatial and temporal distances; also note that in some participants’ memory, the two factors were not independent from one another, but correlated). After participants opened the last box in the route, a black screen appeared for 15 s and participants found themselves back at the start of the route, where they again followed the orange traffic cones until they found the first box in the route again.
A wooden box at a particular position during the route always contained the same object for a given participant (between participants, the content of the wooden boxes was randomized). Thus, an object was associated with a particular position in Donderstown, and it was always encountered in the same temporal order. The route was taken by participants 14 times in total.
After 6 repetitions, the orange traffic cones were no longer shown and participants had to find the route on their own. During piloting, it had become evident that participants tended to underestimate their dependence on the traffic cones and would sometimes be surprised by the difficulty of navigating in the absence of the cones. Therefore, we included an 'emergency help' procedure that was active for 5 repetitions after we removed the traffic cones. When participants felt they had got lost, they were instructed to return to the position of the last box or the last place they were certain was on the correct route and press the button 'H' on the keyboard. On this, all of the traffic cones leading up to the next box appeared at the same time and gave participants an opportunity to find their way again. As this was a free navigation task, duration for the completion of the task varied considerably between participants, lasting 71.63 ± 13.75 (mean ± std) minutes. In summary, the learning task in this study was developed to induce a spatio-temporal structure between different objects by presenting them repeatedly and consistently at a specific place and in a specific temporal order.
Memory was assessed with three different tasks, (1) a free recall task, (2) a spatial and temporal distance judgment task and finally a (3) map task:
During the free recall task, participants named all objects, which they encountered during the learning task using a microphone. They were given two minutes in this task and were instructed to name the objects in the order in which they came to their mind.
In two distance judgment tasks, participants were asked to rate the spatial and temporal distance between pairs of objects (see Figure 1B). This task comprised 240 trials and lasted 45–70 min (depending on participants’ speed). There were two conditions: one, in which participants were required to rate the distance between objects with regard to their Euclidean spatial distance (spatial memory task), and secondly, to rate the distance between objects with regard to the time it had taken to walk from one object to the other (temporal memory task). During every trial, participants were shown two objects that they had seen in the learning task and then made their distance judgment by sliding a bar with the mouse on a range from 'close together' to 'far apart'. Participants were instructed to base their rating on the smallest and largest distance that was present in the task, i.e. the smallest distance in the task would correspond to the bar position closest to the 'close together' end of the scale whereas the largest distance in the task would correspond to the 'far apart' end of the scale. To make the task easier for participants, conditions were sampled in 8 blocks: 30 trials of one condition were shown in one block, then there was a break of 20 s and then the next block would start with 30 trials from the other condition. Before each block, the condition of the next block was shown and participants had to press a button to continue. In addition, either the cue 'space' or 'time' was displayed in every trial above the pair of objects to be rated. Whether the test started with the 'time' or the 'space' condition was counterbalanced across subjects. Participants were explicitly instructed that the spatial and temporal distance ratings were in some cases quite different from one another. The instruction explicitly mentioned that items could still be close together in space even when they were at different ends of the route (i.e. far apart in time) due to the route leading back and forth through the city, and it was also pointed out that items could be far apart in space but close together in time due to the teleporters. Notably, this instruction was only given to participants immediately before the distance judgment task, i.e. after the imaging part was already concluded, so the effects we find in the fMRI data cannot be explained by this explicit instruction.
The final map test lasted approximately 5 min and involved 16 trials. In every trial, participants were shown one of the 16 objects encountered in the city. Then, they saw a schematic aerial view of Donderstown and had to indicate where in the city the object had been located by moving the mouse to the memorized location. To make sure participants could translate their first-person perspective of the city during the learning task to a topdown view, certain prominent landmarks in the city were marked with symbols and pointed out to them before the start of the memory task.
Before (picture viewing task, PVT pre) and after (PVT post) the learning task, pictures of 17 different objects (see Figure 1—figure supplement 1) were presented repeatedly on a black background. 16 of these objects were used in the learning task, while 1 object only served as a target and was not shown during learning. Whenever participants saw this target, they were asked to press a button to ensure that they attended the stimuli throughout the PVT pre and post blocks. In 25 participants, the target was detected in 98.33 + 4.17 percent of cases in both PVT pre and PVT post blocks. Due to malfunctioning of the button box in the PVT post block target detection data from one participant could not be recorded.
During each picture viewing task, every object was shown 12 times, once in each of 12 blocks in pseudo-random order (see below). Between blocks, there was a 30 s break. In total, each of the two picture viewing tasks took 23 min and included 204 trials. In every trial, participants saw the picture for 2500 ms, followed by a fixation cross until the next trial started (intertrial interval, ITI). The next trial commenced either two or three TRs after the start of the previous trial (2 TRs in 50% of trials; 3 TRs in the other 50% of trials, with both types of inter-trial intervals, ITIs, randomly assigned to trials).
Since we were interested in the changes in pattern similarity that were the result of the learning task (and not due to spurious timing differences between the PVT pre and PVT post), we used identical PVT pre and PVT post blocks for every participant. More specifically, we created one 'recipe' for every participant. This recipe described which object was shown in which trial and also defined the distribution of the different ITI types. While the recipe for each participant was created to be semi-random (see below) and was, in fact, different for every participant, we re-used the recipe from PVT pre for PVT post within a participant, thereby ensuring that the two tasks were absolutely identical. In theory, participants could have realized that the order in the second task was repeated from the first, but as there were 204 trials, it seems unlikely that they remembered sequences of multiple items. In fact, none of the participants reported to have noticed a specific order.
For creating the recipe, we took into account that every object was supposed to be shown only once in every block. Therefore, we shuffled the order of the 17 objects 12 times and concatenated the resulting vector. Then, we assigned an ITI of either 2 or 3 TRs to every trial, balancing the occurrence across the entire experiment. Lastly, we performed a one-factor ANOVA (with the 17 different objects as groups) on the positions of every object in this recipe. If this ANOVA was significant (meaning that the positions of at least one of the objects were consistently early or late in the blocks), we discarded the recipe and created a new one. This measure was taken to prevent huge imbalances in the ordering of the objects. As we always compared pattern similarity from the pre and the post blocks in our analysis (see below), any effects that are solely due to spurious order effects should be present in both sessions and therefore cannot explain differences in neural pattern similarity.
MRI data were collected on a 3T Siemens Skyra scanner (Siemens, Erlangen, Germany). A high-resolution 2D EPI sequence was used for functional scanning (TR = 2270 ms, TE = 24 ms, 40 slices, distance factor 13%, flip angle 85 degree, field of view (FOV) 210 × 210 × 68 mm 3, voxel size 1.5 mm isotropic). The field of view (FOV) was aligned to fully cover the medial temporal lobe, parts of ventral frontal cortex and (if possible) calcarine sulcus. Functional images for PVT pre and PVT post were acquired in separate sessions. In addition to these partial-volume acquisitions, 10 scans of a functional wholebrain sequence were also acquired (usually in both sessions, but due to time pressure sometimes only in one session) to improve registration during preprocessing. The sequence settings were identical to the functional sequence above, but instead of 40 slices, 120 slices were acquired, leading to a longer TR (6804.1 ms). A 0.8 mm structural scan was acquired for every participant (TR = 2300 ms; TE = 315 ms; flip angle = 8°; in-plane resolution = 256 × 256 mm; number of slices = 224, voxel resolution = 0.8 × 0.8 × 0.8 mm3). Lastly, a gradient field map was acquired (for N = 21 participants), with a gradient echo sequence (TR = 1020 ms; TE1 = 10 ms; TE2 = 12.46 ms; flip angle = 90°; volume resolution = 3.5 × 3.5 × 2 mm; FOV = 224 ×224 mm).
Preprocessing of functional images was performed with FSL (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/). Both the main functional scans and the short wholebrain functional scan were submitted to motion correction and high-pass filtering at 100 s. For those participants with a fieldmap scan, distortion correction was applied to both functional data sets. No spatial smoothing was performed. The two functional datasets (PVT pre and PVT post) were then both registered to the preprocessed mean image of one wholebrain scan (if a wholebrain scan was acquired in both sessions, the first wholebrain scan was used for both). This was done to ensure that voxels from these two separate sessions were corresponding to the same anatomical location. The two brain masks from the PVT pre and PVT post blocks were also registered to the wholebrain space and intersected: only voxels which were covered in both sessions were analyzed during the next step. The whole brain functional images were registered to the individual structural scans. The structural scans were then in turn normalized to the MNI template (at 1 mm resolution). Grey-matter segmentation was done on the structural images and the results were mapped back to the space of the wholebrain functional scan for later use in the analysis.
Representational similarity analysis (RSA; Kriegeskorte et al., 2008) was carried out separately for the PVT pre and PVT post blocks. The preprocessed scans were loaded into Matlab as 4D matrices. For every voxel, movement correction parameters were used as predictors in a GLM with the voxel time series as dependent variable. The residuals from this GLM (i.e. what could not be explained by motion) were then taken to the next analysis step. As the presentation of images in the PVT pre and post blocks was locked to the onset of a new volume (see above), the third volume after image onset was selected for every trial (effectively covering the time between 4540–6810 ms after stimulus onset). Only data for the 16 objects that were shown in the city were analyzed, discarding data for the target object. Data were then sorted according to object identity and repetition, yielding a 16 × 12 matrix for every voxel (16 objects, 12 repetitions). Resulting data were then subjected to two different types of analyses, (1) a region of interest (ROI) based analysis and (2) a searchlight analysis.
Hippocampal masks were created using the hippocampal ROIs of the probabilistic Harvard-Oxford atlas provided by FSL (Desikan et al., 2006; Makris et al., 2006), thresholded at a probability level of 0.25. We generated one mask for bilateral hippocampus, one for only the left, and one for only the right hippocampus. These masks were then coregistered to the subject specific functional space. Then, the masks were intersected with the subject-specific grey-matter masks, leaving only grey-matter voxels in the hippocampus. The trial-wise values for every voxel within this mask were then extracted and the voxel pattern for every object in every repetition was correlated with the voxel pattern of itself and every other object in every other repetition. Thus, every trial was correlated with every other trial, except combinations of trials within the same of the 12 blocks. Mean correlation coefficients for every possible pair of objects across repetitions were calculated, yielding a 16-by-16 cross-correlation matrix for every ROI and every PVT block. Subsequent analysis of this cross-correlation matrix (see below) was identical for the ROI and searchlight approach.
Instead of including all voxels within an anatomically defined region, all voxels in a sphere around a given voxels were studied in the searchlight analysis, allowing a more regionally specific analysis in the entire field of view. Around every voxel of the subject-specific combined brain mask, a sphere was formed with a radius of 6 voxels (9 mm). Within this sphere, only grey-matter voxels were considered. If less than 30 voxels remained, the searchlight was not analysed further. Within every valid searchlight, the approach was analogous to the ROI analysis: the voxel pattern for every object in every repetition was correlated with the voxel pattern of itself and every other object in every other repetition. So, again, every trial was correlated with every other trial, except combinations of trials within the same block. Mean correlation coefficients for every possible pair of objects across repetitions were calculated, yielding a 16-by-16 cross-correlation matrix for every searchlight.
The 16-by-16 cross-correlation matrices for every ROI and every searchlight reflected the mean pattern similarity between pairs of objects. We calculated similarity matrices separately for the PVT pre and the PVT post block, reflecting the neural response to the objects when participants had not seen them in the virtual city context, the other reflecting the neural response to the objects after the spatio-temporal relationship between objects had been learned. Subtracting the PVT pre similarity matrix from the PVT post matrix resulted in a matrix that reflected the change in pattern similarity for all pairwise comparisons of items that is due to the learning task. In a second step, the pattern similarity difference matrix (PS’ matrix) for every ROI and every searchlight was related to an external variable which was derived from the behavioral tasks, for example the participant-specific remembered spatial distances for all pairs of items, which were derived from the distance judgment that was performed outside of the scanner at the end of the experiment. In our main analyses, we correlated the PS’ matrix with the remembered spatial and temporal distances as derived from participants’ responses in the memory test, as well as with the combination of the spatial and temporal distance, using Spearman correlation. The size of the resulting correlation coefficient describes the fit between the PS’ matrix, i.e. the pattern similarity increase from PVT pre to PVT post, and a given variable. For the searchlight analysis, the correlation coefficient describes the relationship only for the given searchlight, and it is assigned to the center voxel of the searchlight. Iterating through all searchlights in the field of view, this results in a brain map of correlation coefficients for a participant that can then be taken to second-level testing (see Figure 4—figure supplement 1 for an illustration of this). For the ROI analyses, the correlation coefficient reflects the relationship between pattern similarity increase and the external variable for the entire ROI. Here, we applied an additional bootstrapping procedure to estimate the strength of the relationship (for the searchlight approach, this would have been computationally too demanding). For this, we shuffled the two matrices, which were used to compute the correlation coefficient (PS’ and the external variable) against one another, so that the relationship between them was random. We then calculated a Spearman correlation for these shuffled data and repeated the procedure 10,000 times. Thus, for every participant, we gained a surrogate distribution of Spearman correlation coefficients, which were based on shuffled data and compared our real correlation coefficient for a given ROI against this random distribution (see Figure 4 for an illustration of this procedure). To this end, we gained a z-statistic for every participant, reflecting how normal or extreme the real correlation coefficient was when compared to the random distribution. The resulting z-statistic was then tested across participants for the ROI analyses.
Because temporal and spatial distance estimates were correlated in some participants’ behavioral ratings, it could be that effects we find for one domain can be explained by the correlation with the other domain, and that the domain has no unique contribution to the effect we find. Therefore, we performed a control analysis (both for the ROI analysis and for the searchlight analysis) in which we investigated whether there were unique contributions of the two domains: Before correlating pattern similarity increases with the remembered temporal distances, we performed a GLM in which we entered remembered spatial distance as a predictor and pattern similarity increases as criterion. Then, we took the residuals of this GLM (i.e. variance not explained by remembered spatial distance) and correlated these residuals with the remembered temporal distances. Conversely, before we correlated pattern similarity increases with the remembered spatial distances, we performed a GLM modeling the effects of remembered temporal distances and calculated the correlation with remembered spatial distances with the residuals of this analysis. We then took these correlation coefficients to the same second-level analysis as described below for the other analyses.
For the ROI analysis, the z-statistics were averaged across participants and tested for significant deviance from zero using a non-parametric approach: the observed mean z-statistic was compared to a distribution derived by performing random sign-flips on the participant-specific values and averaging them. This was repeated 10,000 times, yielding a null distribution of average z-statistics with random signs. Then, the real average was tested against this distribution of random averages and the resulting z-statistic reflects how much the real average deviates from the random distribution. This procedure closely follows methods applied in other studies using Representational Similarity Analysis (Schlichting et al., 2015; Stelzer et al., 2013). For the ROI approach, we also corrected for 15 multiple comparisons (3 ROI × 5 effects) by applying false discovery rate correction (Benjamini and Yekutieli, 2001) to the probability values from the non-parametric second-level test; reported p-values are FDR corrected.
For the searchlight analysis, the resulting correlation coefficient brain maps were tested in a second level analysis to identify regions in which correlation coefficients consistently differed from zero across participants using a non-parametric equivalent of the one-sample t-test implemented in the randomise package provided by FSL (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Randomise; Winkler et al., 2014), using 5000 random sign-flips and threshold free clustering. Results as reported above and denoted with pcorr are corrected with FSL-randomise with a small volume correction for the bilateral hippocampus, based on the same hippocampus mask that was used in the ROI approach. Note that in both the ROI and the searchlight approach, we tested whether correlation coefficients were consistently negative as we expected low spatial or temporal distance to be associated with high pattern similarity in the brain.
The control of the false discovery rate in multiple testing under dependencyThe Annals of Statistics 29:1165–1188.
Memory hierarchies map onto the hippocampal long axis in humansNature Neuroscience 18:1562–1564.https://doi.org/10.1038/nn.4138
Complementary roles of human hippocampal subregions during retrieval of spatiotemporal contextJournal of Neuroscience 34:6834–6842.https://doi.org/10.1523/JNEUROSCI.5341-13.2014
How the hippocampus preserves order: the role of prediction and contextTrends in Cognitive Sciences 19:92–99.https://doi.org/10.1016/j.tics.2014.12.004
The medial temporal lobe and recognition memoryAnnual Review of Neuroscience 30:123–152.https://doi.org/10.1146/annurev.neuro.30.051606.094328
Time cells in the hippocampus: a new dimension for mapping memoriesNature Reviews Neuroscience 15:732–744.https://doi.org/10.1038/nrn3827
Decoding neuronal ensembles in the human hippocampusCurrent Biology 19:546–554.https://doi.org/10.1016/j.cub.2009.02.033
Evidence for holistic episodic recollection via hippocampal pattern completionNature Communications 6:7462.https://doi.org/10.1038/ncomms8462
Hippocampal sequence-encoding driven by a cortical multi-item working memory bufferTrends in Neurosciences 28:67–72.https://doi.org/10.1016/j.tins.2004.12.001
Hippocampal amnesia impairs all manner of relational memoryFrontiers in Human Neuroscience 2:e15.https://doi.org/10.3389/neuro.09.015.2008
The human hippocampus: cognitive maps or relational memory?Journal of Neuroscience 25:7254–7259.https://doi.org/10.1523/JNEUROSCI.1103-05.2005
Roles of human hippocampal subfields in retrieval of spatial and temporal contextBehavioural Brain Research 278:549–558.https://doi.org/10.1016/j.bbr.2014.10.034
Distinct hippocampal time cell sequences represent odor memories in immobilized ratsJournal of Neuroscience 33:14607–14616.https://doi.org/10.1523/JNEUROSCI.1537-13.2013
Decreased volume of left and total anterior insular lobule in schizophreniaSchizophrenia Research 83:155–171.https://doi.org/10.1016/j.schres.2005.11.020
Mnemonic networks in the hippocampal formation: From spatial maps to temporal and conceptual codesJournal of Experimental Psychology: General 142:1231–1241.https://doi.org/10.1037/a0033746
Insight reconfigures hippocampal-prefrontal memoriesCurrent Biology 25:821–830.https://doi.org/10.1016/j.cub.2015.01.033
Hippocampal synaptic plasticity: role in spatial learning or the automatic recording of attended experience?Philosophical Transactions of the Royal Society 352:1489–1503.https://doi.org/10.1098/rstb.1997.0136
Place cells, grid cells, and the brain's spatial representation systemAnnual Review of Neuroscience 31:69–89.https://doi.org/10.1146/annurev.neuro.31.061307.090723
The Hippocampus as a Cognitive MapOxford Clarendon Press.
Interplay of hippocampus and prefrontal cortex in memoryCurrent Biology 23:R764–R773.https://doi.org/10.1016/j.cub.2013.05.041
Learning-related representational changes reveal dissociable integration and separation signatures in the hippocampus and prefrontal cortexNature Communications, 6, 10.1038/ncomms9151.
Loss of recent memory after bilateral hippocampal lesionsJournal of Neurology, Neurosurgery, and Psychiatry 20:11–21.https://doi.org/10.1136/jnnp.20.1.11
Episodic memory: from mind to brainAnnual Review of Psychology 53:1–25.https://doi.org/10.1146/annurev.psych.53.100901.135114
Elements of Episodic MemoryOxford Clarendon Press.
Abstract representations of location and facing direction in the human brainJournal of Neuroscience 33:6133–6142.https://doi.org/10.1523/JNEUROSCI.3873-12.2013
Lila DavachiReviewing Editor; New York University, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "An event map ofmemory space in the hippocampus" for consideration by eLife. Your article has been reviewed by two peer reviewers, one of whom, Kenneth A. Norman (Reviewer #2), has agreed to reveal his identity, and the evaluation has been overseen by a Reviewing Editor and Timothy Behrens as the Senior Editor.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
All the reviewers agreed that the research question addresses an important and timely issue regarding how space and time might be represented in the brain. The reviewers also agree that the learning paradigm is clever, especially with respect to the use of teleporters to disentangle actual temporal distance from spatial distance. Indeed, this is themost compelling aspect of the design.
However, the reviewers all agree that new analyses and descriptions of current data need to be included for the paper to be suitable for publication. The relevant issues are presented in more detail below but are summarized here. (1) The data analyses and discussion with respect to whether temporal and spatial memory are independent requires more attention. (2) Analyzing the data based on actual (objective) spatial and temporal distance is required, either way the data work out – as that will help to address whether hippocampus is building an accurate map or only a memorial (less accurate) one. (3) Several other analytic details should be addressed – please see below for details that on the requested data analyses and reporting on this issue of how independent the memory representations are and whether hippocampus also represents objective spatial and temporal distance
Independence of space and time (reviewer comments):
1) The authors suggest in the Discussion that the spatial and temporal dimensions are represented separately. However, it is not clear that they show this, nor am I certain that this is what the authors believe. Although they show that the two dimensions both contribute to neural similarity, this does not mean they are separate representations. For example, they could be additive within a single dimension and still separately contribute to the effect. The results, as presented, appear to be compatible with this "single dimension" idea. If the authors are comfortable with this idea, they can just state this in the paper as a possible explanation, and I would be totally fine with that. If the authors want to show that space and time are truly represented separately in the anterior hippocampus, they would need to show that some regions are significantly more affected by time than space, and vice-versa (currently, they just show that the regions showing a significant time effect are not completely overlapping with the regions showing a significant space effect
2) It's interesting that temporal and spatial distance judgments were correlated in memory, particularly since object pairs were assessed for spatial and temporal distance on separate trials. I don't think such an effect has been shown before and it has interesting implications for overlap in their representation. However, it could also be an inferential bias since spatial and temporal distances tend to be correlated in the real world. Were participants ever told that the two distances were not related? Also, a couple additional analyses might help unpack this data. In particular, you should be able to get a sense for which memory is biasing the other by examining when the direction of error is towards the other distance type (e.g., a spatially close trial judged as further when the temporal distance is longer), or whether the source of the bias is whichever was tested first in the distance test (was this counterbalanced across pairs?)
3) The authors try to remove the shared variance in the MPS analysis. However, I am not sure this will totally solve the issue. Since the two are collinear, I am not sure how you can remove variance from a different task with fMRI and MPS, which is based on the change in object correlations before and after. I guess I would have preferred to see a more detailed consideration behaviorally of exactly how and in what manner the two are correlated vs. independent
4) I think the spatial and temporal combination finding is based on partial overlap of the searchlight clusters. But this doesn't prove that the combined information is stored independently and together. It merely shows the clusters have some overlap functionally, there could be different neurons responding based on different voxel patterns. I think this analysis needs more work-up and consideration. If the clusters overlap, are the correlations themselves the same? The variance in the patterns? Again, more details and consideration is needed here to reach this conclusion
Objective Distance Analysis:
1) One nice feature of this study is the independent manipulation of spatial and temporal (or sequential) distance that could allow for the same analysis to be performed on objective distances rather than mnemonic, or subjective, distances. From a theoretical standpoint, it would be interesting to know which is more predictive of the neural representation. In addition, as the memory measures of distance are correlated, using the objective measures may allow better dissociation between temporal and spatial distance. One challenge in doing this analysis is the absolute temporal distance is not fixed because subjects could move at different speeds. However, since that data is recorded, it should be possible to get the true experimental distance. Alternatively, since the authors designed the study as a full factorial (fully crossing high and low distance in space and time), they could analyze the similarity data in this discretized way rather than using a continuous measure of distance. This addition would help with interpretation of the data and also could add significant novelty, as a dissociation of objective versus subjective distance representations has not been shown before.
1) Sphere Size: It is unclear why the authors chose the sphere size they did for the searchlight analysis. It seems that a 9mm radius is quite large for looking within the hippocampus. In particular, the posterior hippocampus can be thin and not spherical in shape. Thus, one possibility is the specificity of the results to the anterior hippocampus could be due to the choice of sphere shape and size. I would recommend iterating through various searchlight sizes to address this concern or adopting the anatomical ROI procedure (splitting the hippocampus into thirds along the long axis) as in the Nielson study. Relatedly, given that the authors have high-resolution data, hippocampal subfield segmentation would a complementary and perhaps more principled way of investigating within hippocampal specificity of distance effects
2) Several details of the analyses were not clear to me and need to be explained in significantly more detail. In Figure 4, it is explained that the correlation coefficients between the RSA change from pre to post navigation, were correlated with the spatial and temporal distance judgments, and then assigned to the center of a search light sphere. I am at a loss in terms of understanding what was done here. If such a correlation exists in the data, the authors really need to show the different distances and RSA values on a scatter plot. It is also unclear to me why some analyses were done for the whole hippocampus ROI and why others were done with a searchlight. Will these methodologies converge? Along these lines, the authors should also show plots with RSA values (ideally, the raw averaged correlations) for near vs. far spatial and temporal distances and their change. It is very difficult to follow exactly how the z values were obtained in the analysis and this needs more explication. Overall, the figures should display more of the raw data (at least, data in an early form) to walk the reader through what was obtained and in what manner. I suggest adding as many new figures to the actual main figures to make these points more transparent and convince the readers of the important findings here.
3) The authors make a strong statement about anterior vs. posterior hippocampus but at least one of the clusters spans both anterior and medial hippocampus (Figure 6B). Given that none of the analyses were done in native space (as far as I can understand, please see Yassa and Stark 2009 Neuroimage for a detailed discussion of this issue in the hippocampus) and the slice thickness was around 1.5, can the authors be confident they are in anterior hippocampus? I don't think the anterior argument adds much to the paper here and the authors might consider deleting it.
1) In the Introduction, the authors briefly mention two fMRI studies investigating temporal coding in humans. This seems to be an under representation of a literature that has been growing rapidly as indicated by recent reviews of both human and across species work (e.g., Eichenbaum, 2014; Davachi and DuBrow, 2015; Ranganath & Hsieh, 2016). However, the bigger issue is that the main findings of the two highlighted papers are inaccurately described. The Hsieh paper doesn't show only that items closer together have greater pattern similarity (in fact, this would be meaningless due to autocorrelation). Rather they show that hippocampal pattern similarity is sensitive to the conjunction of item identity and temporal position (by showing increased similarity across sequences only for items that share both features). The authors also left out the main (and most relevant) finding of the Ezzyat study, which is that hippocampal pattern similarity tracked subjective temporal distance across boundaries. Without this, it's not clear what the study has to do with temporal coding at all. Also, as this latter study investigated patterns at encoding of trial-unique sequences, the last sentence of that paragraph is not technically correct.
2) A previous paper by Kyle et al. 2015 Behavioral Brain Research showed similar findings for near vs. far temporal distances but the opposite pattern for spatial distances (far RSA > near). This paper should be discussed and overstatements about the novelty here should be avoided in light of this paper although I think differences in the paradigms can probably account for the differences in findings for near vs. far space. Similarly, a recent paper by Copara et al. 2014 J Neurosci showed pattern separation of independent spatial and temporal information during correct memory retrieval following navigation, as is also shown (in part here). This paper should also be properly contextualized.
[Editors' note: further revisions were requested prior to acceptance, as described below.]
Thank you for resubmitting your work entitled "An event map of memory space in the hippocampus" for further consideration at eLife. Your revised article has been favorably evaluated by Timothy Behrens (Senior editor), a Reviewing editor, and two reviewers.
The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below
While the revision was very responsive, there remain some further adjustments of the data presentation and tempering of claims that should be addressed before we can proceed with the manuscript. The specific requested changes are appended here
Reviewer comments, edited by the reviewing editor to highlight remaining issues:
1) The single subject pattern similarity plots are overkill and also a bit hard to decipher. If the question the authors are trying to address is about the correspondence between the patterns for ROI (left vs. right hippocampus) and representation type (space vs. time), it might make more sense to plot the correlations between those things across subjects rather than to essentially ask readers to perform that correlation by eye. Having said this, I would be fine with the authors just leaving out these plots entirely.
2) I found the objective distance analysis somewhat unconvincing. Only the space factor leads to significant effects and objective time apparently does not have effects. There were also no additional analyses to really compare with the subjective ratings and given that the two are somewhat correlated anyway, I am not sure what new information is gained here. This analysis also leads to the confusing conclusion that right hippocampus coded remembered distances in both space and time while left hippocampus coded objective and remembered distances in space. This doesn't quite fit with their results in some places (as shown in Figure 5 for the ROI analysis) and has no real precedent in the literature. Laterality findings are also notoriously difficult to replicate with fMRI and overall have not told a coherent story in the literature. I would suggest deleting these speculations and simply focusing on the role of the hippocampus in remembered spatial and temporal distances.
3) I still have concerns about the argument that effects are "localized" to anterior hippocampus. I realize this may fit with some of the arguments from past papers from this lab but it doesn't particularly fit given the data here. The combination of space and time analysis (subsection “Neural changes are modulated strongly by the combination of space and time”) shows a cluster that spans anterior and medial hippocampus. Also, Figure 5—figure supplement 1 clearly shows that the effects are trending in medial and posterior and part of driving the effect overall for the hippocampal ROI. Thus, it is not really correct to focus on anterior here because the authors haven't shown a double dissociation (as they did in their recent Nature Neuroscience paper). It could simply be that the effect is stronger in anterior, perhaps due to features of the scan acquisition. Without an interaction effect (dissociation), subject-specific ROIs (rather than using atlases, as is stated in the Methods), and at least one cluster that spans both anterior and medial, references to anterior specialization should be deleted. Another reason for this is that the majority of spatial effects tend to be in posterior and it is not clear why the spatial distance cluster is in anterior (Figure 5 and Figure 5—figure supplement 1) and does not fit with the literature overall. I therefore suggest much more caution with speculations about anterior/posterior specialization when the methods and data can’t really support this.
4) There continue to be instances of overstatement ("highly" significant: note that an effect either crosses the stated threshold or not). I think another example of this is the argument that this paper somehow tests an issue neglected in past work, which is the connection between episodic memory and navigation. Technically, there is no neural analysis of data during encoding or recollection and thus the paper doesn't exactly address episodic memory specifically, more how representation of spatial metrics change as a function of navigation. While the authors do relate to later memory performance, I think there should be a little more caution in overstating the novelty of their findings. I consider this point though relative minor.
5) Finally, the authors should be careful not to overstate the dissociation between space and time in their paradigm. The subsequent analyses did tease these out although the two variables were in fact correlated in a significant number of subjects. Some restatement of this point in the Discussion is probably warranted.https://doi.org/10.7554/eLife.16534.017
- Christian F Doeller
- Christian F Doeller
- Christian F Doeller
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
This work was supported by the European Research Council (ERC-StG 261177) and the Netherlands Organisation for Scientific Research (NWO-Vidi 452-12-009 and NWO-MaGW 406-14-114). The authors would like to thank A Vicente-Grabovetsky and B Steemers for help with programming the virtual navigation task and B Milivojevic and S Collin for helpful suggestions on the manuscript.
Human subjects: All participants gave written informed consent, filled in a screening form to ensure they did not meet any exclusion criteria for fMRI and were compensated for their time. The study was approved by the local ethics committee (CMO Regio Arnhem-Nijmegen).
- Lila Davachi, Reviewing Editor, New York University, United States
- Received: March 31, 2016
- Accepted: September 7, 2016
- Version of Record published: October 6, 2016 (version 1)
© 2016, Deuker et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.