Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.
Read more about eLife’s peer review process.Editors
- Reviewing EditorThorsten KahntNational Institute on Drug Abuse Intramural Research Program, Baltimore, United States of America
- Senior EditorMichael FrankBrown University, Providence, United States of America
Reviewer #1 (Public review):
Summary:
In this paper, Qiu et al. developed a novel spatial navigation task to investigate the formation of multi-scale representations in the human brain. Over multiple sessions and diverse tasks, participants learned the location of 32 objects distributed across 4 different rooms. The key task was a "judgement of relative direction" task delivered in the scanner, which was designed to assess whether object representations reflect local (within-room) or global (across-room) similarity structures. In between the two scanning sessions, participants received extensive further training. The goal of this manipulation was to test how spatial representations change with learning.
Strengths:
The authors designed a very comprehensive set of tasks in virtual reality to teach participants a novel spatial map. The spatial layout is well-designed to address the question of interest in principle. Participants were trained in a multi-day procedure, and representations were assessed twice, allowing the authors to investigate changes in the representation over multiple days.
Weaknesses:
Unfortunately, I see multiple problems with the experimental design that make it difficult to draw conclusions from the results.
(1) In the JRD task (the key task in this paper), participants were instructed to imagine standing in front of the reference object and judge whether the second object was to their left or right. The authors assume that participants solve this task by retrieving the corresponding object locations from memory, rotating their imagined viewpoint and computing the target object's relative orientation. This is a challenging task, so it is not surprising that participants do not perform particularly well after the initial training (performance between 60-70% accuracy). Notably, the authors report that after extensive training, they reached more than 90% accuracy.
However, I wonder whether participants indeed perform the task as intended by the authors, especially after the second training session. A much simpler behavioural strategy is memorising the mapping between a reference object and an associated button press, irrespective of the specific target object. This basic strategy should lead to quite high success rates, since the same direction is always correct for four of the eight objects (the two objects located at the door and the two opposite the door). For the four remaining objects, the correct button press is still the same for four of the six target objects that are not located opposite to the reference object. Simply memorising the button press associated with each reference object should therefore lead to a high overall task accuracy without the necessity to mentally simulate the spatial geometry of the object relations at all.
I also wonder whether the random effect coefficients might reflect interindividual differences in such a strategy shift - someone who learnt this relationship between objects and buttons might show larger increases in RTs compared to someone who did not.
(2) On a related note, the neural effect that appears to reflect the emergence of a global representation might be more parsimoniously explained by the formation of pairwise associations between reference and target objects. Since both objects always came from the same room, an RDM reflecting how many times an object pair acted as a reference-target pair will correlate with the categorical RDM reflecting the rooms corresponding to each object. Since the categorical RDM is highly correlated with the global RDM, this means that what the authors measure here might not reflect the formation of a global spatial map, but simply the formation of pairwise associations between objects presented jointly.
(3) In general, the authors attribute changes in neural effects to new learning. But of course, many things can change between sessions (expectancy, fatigue, change in strategy, but also physiological factors...). Baseline phsiological effects are less likely to influence patterns of activity, so the RSA analyses should be less sensitive to this problem, but especially the basic differences in activation for the contrast of post-learning > pre-learning stages in the judgment of relative direction (JRD) task could in theory just reflect baseline differences in blood oxygenation, possibly due to differences in time of day, caffeine intake, sleep, etc. To really infer that any change in activity or representation is due to learning, an active control would have been great.
(4) RSA typically compares voxel patterns associated with specific stimuli. However, the authors always presented two objects on the screen simultaneously. From what I understand, this is not considered in the analysis ("The β-maps for each reference object were averaged across trials to create an overall β-map for that object."). Furthermore, participants were asked to perform a complex mental operation on each trial ("imagine standing at A, looking at B, then perform the corresponding motor response"). Assuming that participants did this (although see points 1 and 2 above), this means that the resulting neural representation likely reflects a mixture of the two object representations, the mental transformation and the corresponding motor command, and possibly additionally the semantic and perceptual similarity between the two presented words. This means that the βs taken to reflect the reference object representation must be very noisy.
This problem is aggravated by two additional points. Firstly, not all object pairs occurred equally often, because only a fraction of all potential pairs were sampled. If the selection of the object pairs is not carefully balanced, this could easily lead to sampling biases, which RSA is highly sensitive to.
Secondly, the events in the scanner are not jittered. Instead, they are phase-locked to the TR (1.2 sec TR, 1.2 sec fixation, 4.8 sec stimulus presentation). This means that every object onset starts at the same phase of the image acquisition, making HRF sampling inefficient and hurting trial-wise estimation of betas used for the RSA. This likely significantly weakens the strength of the neural inferences that are possible using this dataset.
(5) It is not clear why the authors focus their report of the results in the main manuscript on the preselected ROIs instead of showing whole-brain results. This can be misleading, as it provides the false impression that the neural effects are highly specific to those regions.
(6) I am missing behavioural support for the authors' claims.
Overall, I am not convinced that the main conclusion that global spatial representations emerge during learning is supported by the data. Unfortunately, I think there are some fundamental problems in the experimental design that might make it difficult to address the concerns.
However, if the authors can provide convincing evidence for their claims, I think the paper will have an impact on the field. The question of how multi-scale representations are represented in the human brain is a timely and important one.
Reviewer #2 (Public review):
Summary:
Qui and colleagues studied human participants who learned about the locations of 32 different objects located across 4 different rooms in a common spatial environment. Participants were extensively trained on the object locations, and fMRI scans were done during a relative direction judgement task in a pre- and post-session. Using RSA analysis, the authors report that the hippocampus increased global relative to local representations with learning; the RSC showed a similar pattern, but also increased effects of both global and local information with time.
Strengths:
(1) The manuscript asks a generally interesting question concerning the learning of global versus local spatial information.
(2) The virtual environment task provides a rich and naturalistic spatial setting for participants, and the setup with 32 objects across 4 rooms is interesting.
(3) The within-subject design and use of verbal cues for spatial retrieval is elegant .
Weaknesses:
(1) My main concern is that the global Euclidean distances and room identity are confounded. I fear this means that all neural effects in the RSA could be alternatively explained by associations to the visual features of the rooms that build up over time.
(2) The direction judgement task is not very informative about cognitive changes, as only objects in a room are compared. The setup also discourages global learning, and leaves unclear whether participants focussed on learning the left/right relationships required by the task.
(3) With N = 23, the power is low, and the effects are weak.
(4) It appears no real multiple comparisons correction is done for the ROI based approach, and significance across ROIs is not tested directly.
Reviewer #3 (Public review):
Summary:
The manuscript by Qui et al. explores the issue of spatial learning in both local (rooms) and global (connected rooms) environments. The authors perform a pointing task, which involves either pressing the right or left button in the scanner to indicate where an object is located relative to another object. Participants are repeatedly exposed to rooms over sessions of learning, with one "pre" and one "post" learning session. The authors report that the hippocampus shifted from lower to higher RSA for the global but not the local environment after learning. RSC and OFC showed higher RSA for global object pointing. Other brain regions also showed effects, including ACC, which seemed to show a similar pattern as the hippocampus, as well as other regions shown in Figure S5. The authors attempt to tie their results in with local vs. global spatial representations.
Strengths:
Extensive testing of subjects before and after learning a spatial environment, with data suggesting that there may be fMRI codes sensitive to both global and local codes. Behavioral data suggest that subjects are performing well at the task and learning both global and local object locations, although see further comments.
Weaknesses:
(1) The authors frame the entire introduction around confirming the presence of the cognitive map either locally or globally. There are some significant issues with this framing. For one, the introduction appears to be confirmatory and not testing specific hypotheses that can be falsified. What exactly are the hypotheses being tested? I believe that this relates to the testing whether neural representations are global and/or local. However, this is not clear. Given that a previous paper (Marchette et al. 2014 Nature Neuro, which bears many similarities) showed only local coding in RSC, this paper needs to be discussed in far more depth in terms of its similarities and differences. This paper looked at both position and direction, while the current paper looks at direction. Even here, direction in the current study is somewhat impoverished: it involves either pointing right or left to an object, and much of this could be categorical or even lucky guesses. From what I could tell, all behavioral inferences are based on reaction time and not accuracy, and therefore, it is difficult to determine if the subject's behavior actually reflects knowledge gained or simply faster reaction time, either due to motor learning or a speed-accuracy trade-off. The pointing task is largely egocentric: it can be solved by remembering a facing direction and an object relative to that. It is not the JRD task as has been used in other studies (e.g., Huffman et al. 2019 Neuron), which is continuous and has an allocentric component. This "version" of the task would be largely egocentric. In this way, the pointing task used does not test the core tenets of the cognitive map during navigation, which is defined as allocentric and Euclidean (please see O'Keefe and Nadel 1978, The Hippocampus as a Cognitive Map). Since neither of these assumptions appears valid, the paper should be reframed to reflect spatial representations more broadly or even egocentric spatial representations.
(2) The fMRI data workup is insufficient. What do the authors mean by "deactivations" in Figure 3b? Does this mean the object task showed more activation than the spatial task in HSC? Given that HSC is one of these regions, this would seem to suggest that the hippocampus is more involved in object than spatial processing, although it is difficult to tell from how things are written. The RSA is more helpful, but now a concern is that the analysis focuses on small clusters that are based on analyses determined previously. This appears to be the case for the correlations shown in Figure 3e as well. The issues here are several-fold. For one, it has been shown in previous work that basing secondary analyses on related first analyses can inflate the risk of false positives (i.e., Kriegeskorte et al. 2009 Nature Neuro). The authors should perform secondary analyses in ways that are unbiased by the first analyses, preferably, selecting cluster centers (if they choose to go this route) from previous papers rather than their own analyses. Another option would be to perform analyses at the level of the entire ROI, meaning that the results would generalize more readily. The authors should also perform permutation tests to ensure that the RSA results are reliable, as these can run the risk of false positives (e.g., Nolan et al. 2018 eNeuro). If these results hold, the authors should perform post-hoc (corrected) t-tests for global vs. local before and after learning to ensure these differences are robust and not simply rely on the interaction effect. The figures were difficult to follow in this regard, and an interaction effect does not necessarily mean the differences that are critical (global higher than local after) are necessarily significant. The end part of the results was hard to follow. If ACC showed a similar effect to HC and RSC, why is it not being considered? Many other areas that seemed to show local vs. global effects were dismissed, but these should instead be discussed in terms of whether they are consistent or inconsistent with the hypotheses.
(3) Concerns about the discussion: there are areas involving reverse inference about brain areas rather than connecting the findings with hypotheses (see Poldrack et al. 2006 Trends in Cognitive Science). The authors also argue for 'transfer" of information (for example, from ACC to OFC), but did not perform any connectivity analyses, so these conclusions are not based on any results. Instead, the authors should carefully compare what can be concluded from the reaction time findings and the fMRI data. What is consistent vs. inconsistent with the hypotheses? The authors should also provide a much more detailed comparison with past work. The Marchette et al. paper comes to different conclusions regarding RSC and involves more detailed analyses than those done here, including position. What is different in the current paper that might explain the differences in results? Another previous paper that came to a different conclusion (hippocampus local, retrosplenial global) and should be carefully considered and compared, as it also involved learning of environments and comparisons at different phases (e.g., Wolbers & Buchel 2005 J Neuro). Other papers that have used the JRD task have demonstrated similar, although not identical, networks (e.g., Huffman et al. 2019 Neuron) and the results here should be more carefully compared, as the current task is largely egocentric while the Huffman et al. paper involves a continuous and allocentric version of the JRD task.
(4) The authors cite rodent papers involving single neuron recordings. These are quite different experiments, however: they involve rodents, the rodents are freely moving, and single neurons are recorded. Here, the study involves humans who are supine and an indirect vascular measure of neural activity. Citations should be to studies of spatial memory and navigation in humans using fMRI: over-reliance on rodent studies should be avoided for the reasons mentioned above.