Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.
Read more about eLife’s peer review process.Editors
- Reviewing EditorLila DavachiColumbia University, New York, United States of America
- Senior EditorChristian BüchelUniversity Medical Center Hamburg-Eppendorf, Hamburg, Germany
Reviewer #1 (Public review):
Summary:
The question of how or whether "extensive memory training affects neocortical memory engrams" (to use the words of the authors) is an interesting question and an area where I think there is room for advancing current knowledge. That said, I do not think the current paper succeeds in meaningfully addressing this question. At a conceptual level, I really struggled with the predictions and interpretations of the findings. There are also several elements of the experimental paradigm and analysis decisions that feel incompatible with the claims that are made. While the manuscript does demonstrate that several measures of neural pattern similarity differ between the various groups of individuals, the issue is that it is difficult to draw clear conclusions from these findings.
Strengths:
(1) This is a very unique dataset. Being able to recruit and enroll high-level memory athletes is impressive.
(2) In principle, comparing memory athletes to control subjects, active control subjects (who received working memory training), and trained subjects (who received method of loci training) is very appealing.
(3) In several ways, the authors were rigorous in their analyses.
(4) In principle, the question of how memory training influences neural similarity vs. dissimilarity is of potential interest.
Weaknesses:
(1) As far as I can tell, the training manipulation is fully confounded with instructions. That is, subjects were only instructed to use the method of loci if they had completed method of loci training (or if they were the memory athletes). For the training group, in the pre-training session, there was no strategy instruction (subjects could do whatever they wanted), but post-training, they were told to use the method of loci. I understand the argument, of course, that naïve subjects might not be very good at using the method of loci if they had no experience with it. But, it does seem entirely possible that some (or even many) of the observed fMRI results that are attributed to "extensive training" are better explained by strategy use. That is, maybe the effects can be explained by TRYING to use the method of loci as opposed to actual proficiency with the method of loci. It seems impossible to address this, given the design of the experiments. As such, any claims about the effects of memory training, per se, feel inappropriate. It feels equally plausible that the effects are due to the strategy instruction. If the same results could be obtained through a simple strategy manipulation without ANY training at all, that would radically alter the interpretation of the effects. I think the strategy use account is, in fact, quite viable because it is very easy to improve subjects' memories with a method of loci instruction (relative to no strategy instruction) without ANY practice at all. Obviously, practice does improve memory performance with the method of loci, but my point is that even without any meaningful practice, there is likely to be SOME immediate benefit to adopting the method of loci as a strategy. There is also the question of why the effects for the memory athletes weren't obviously stronger than for the trained group, given that the memory athletes have much more experience with the method of loci. Ultimately, the problem with the current design is that I don't see how one can tease apart the role of training, per se, vs. strategy use.
(2) There is no clear theoretical framework for the predictions or interpretations. The Results section is mostly a list of lots of different permutations of analyses (similarity within a group, between groups, between trials, across trials between subjects, during encoding vs. retrieval, frontal vs. hippocampal vs. parietal ROIs, etc). For each analysis, I did not have an intuition for what the prediction should be (e.g., should athletes have higher or lower pattern similarity?), and even after seeing all the results, I still do not have an intuition for how to interpret them. For the main results related to dissimilarity in prefrontal cortex, I would have, if anything, predicted the opposite: that when individuals are trained to use a common strategy, there would be MORE similarity between them. The Discussion acknowledges a very wide range of possible factors that might contribute to measures of similarity/dissimilarity, but I am ultimately left feeling that I have no idea how to interpret the results because the design and analyses were not structured such that any of these interpretations could be teased apart.
(3) Same theme: the analyses shift from frontal regions (when looking at encoding) to hippocampus and precuneus (when looking at temporal recency). This shift in ROIs is confusing. The analyses (encoding vs. recognition) are essentially confounded with the ROIs (frontal vs. hippocampal/precuneus), so it's hard to know whether different analyses yielded different patterns or different ROIs yielded different patterns. Why were the frontal regions that were important for encoding ignored for the temporal recency judgments? And the fact that medial temporal lobe regions showed opposite effects to the frontal regions during encoding did not get much attention. Given that there were opposing patterns (dissimilarity vs. similarity) across different brain regions, the framing of the paper (that "the method of loci may bolster uniqueness") feels like a very selective representation of the data.
(4) One of the more surprising aspects of the analyses (or at least one of the analyses) is that representational similarity analyses (RSA) are used to compare the average activity pattern (averaged across all trials) between different individuals. At a conceptual level, this really just reduces to a univariate analysis. It is not standard (or intuitive) to think about RSA that is essentially blind to the actual representational content. In other words, averaging across trials obviously washes out the content, and what is left are process-level effects. For process-level analyses, univariate analyses are far more common and seem more straightforward. However, these 'RSA' analyses are described as reflecting the "uniqueness of each word-location association" (an account which strongly implies content-level effects). This feels like an inappropriate description of what the analyses actually reflect.
(5) I think the analysis looking at trial-by-trial similarity during word encoding (showing greater dissimilarity among the experienced individuals) is a somewhat interesting result, but again, I think the interpretation is very difficult. It is hard (or, impossible, I think) to get a clear sense of what is driving those differences. Is it the association of a unique spatial context? Is it somehow a product of better encoding, per se (as opposed to distinct spatial contexts)? These things could be tested by actually manipulating the spatial contexts in a more controlled way. For example, the paper by Liu et al. that is cited several times - and also a just-published paper by Christopher Baldassano (Nature Human Behaviour) - each used a very controlled paradigm where the (imagined) spatial location associated with each item was known/manipulated. However, the design of the current study does not allow for these things to be teased apart.
(6) Relatedly, the training group seemed to receive instruction on a common spatial route, but, surprisingly, "Participants were free to choose which route and how many they would use to anchor the 72 items." Thus, if I understand correctly, we don't know whether the trained individuals were using common or distinct locations. And the fact that they learned a 50-location route but then studied a 72-word list is also a bit strange. Not having control or knowledge of the location that was associated with each word (sequence position) is a major limitation and also a major difference between the current study and other recent studies. For that matter, the word order was also randomized, so there was no control over whether the words and/or locations matched. These issues really complicate interpretation.
(7) Again, same theme: for the result showing lower trial-by-trial similarity (within-subject similarity), the question is why, exactly, training/experience is associated with lower trial-by-trial similarity. Does training specifically or preferentially lead to greater differentiation between temporally-adjacent trials (as in Liu et al)? Does it lead to greater differentiation IF subjects associate each word with a unique location? Or maybe there is a more abstract effect of sequence/position that is independent of spatial location? Importantly, each of these three possibilities that I mention here has a precedent in prior studies that were more tightly controlled. But here, there is no way to tease these apart because of the experimental design, limiting the conclusions.
(8) The ISC analysis described on p. 9 (line 328) is confusing. If I understand correctly, correlations between different trials were not computed (e.g., subject 1 trial 1 was not correlated with subject 2 trial 2). Rather, trial 1 was always correlated with trial 1 (in other subjects). Thus, it is not clear whether trial-level alignment matters at all. Maybe the same results would be obtained if there were no correspondence across subjects in trial number. Or if the trial order was shuffled within the subject. Given this, I simply don't know how to think about the data. And why did memory athletes show higher pattern similarity in this analysis as opposed to lower pattern similarity (as in some other analyses)? And why was this analysis performed by comparing memory athletes to each other as opposed to memory athletes to non-athletes? And, conceptually, why was this selective to the memory athletes or to the precuneus? And why was it selective to the temporal order test and not encoding? I am not asking the authors to answer each of these questions; rather, the point I am trying to make is that this analysis, and many of the analyses, seem to raise more questions than they answer.
(9) The ISC analyses are interpreted in terms of scene construction and context reinstatement, but these conclusions go (very) far beyond what the data actually shows. Again, I don't see how this analysis lends itself to a meaningful conclusion. And this general critique applies to many of the analyses reported in this paper.
(10) The fact that words were in random order per subject also makes the ISC analysis even more confusing to think about. The memory athletes had unique spatial routes (that they used for the method of loci) and unique word lists. So, why would it make sense to look at trial-level ISC? At a conceptual level, I simply don't understand what this is intended to capture.
(11) Differences in the pattern of results between the encoding and temporal memory recognition task are hard to make sense of and are not addressed in much detail. Why would it make more sense to have across-trial similarity during recognition than during encoding? I think any account of this is very speculative.
Reviewer #2 (Public review):
The authors aim to understand how intensive training with the method of loci changes the brain systems that support memory in both elite "memory athletes" and previously untrained adults. They combine a cross-sectional comparison of athletes and matched controls with a longitudinal training study including mnemonic training, active working-memory training, and passive control groups, and use fMRI pattern-similarity analyses to characterise how brain activity patterns during learning and temporal-order judgments become more distinct or more shared within and across individuals.
The dual design is a major strength. It combines findings from both real-world expertise and experimentally induced training and adds well-matched control groups. The representational similarity analyses are appropriate and reveal a clear, internally consistent picture in which learning with the method of loci leads to more idiosyncratic prefrontal and posterior cortical patterns during encoding, and more shared hippocampal-precuneus patterns during temporal-order retrieval, observed in both athletes and trained novices.
However, the study is complex and the manuscript dense, and some secondary analyses feel less central or are difficult to interpret. More importantly, while the neural evidence for training-related changes in representational format is compelling, the behavioural relevance of these changes is less clearly supported. The key per-group brain-behaviour correlations are weak and inconsistent, and the direct association between neural and behavioural change across all subjects is not clearly presented.
Overall, the work convincingly shows that extensive mnemonic practice reorganises neural representations in specific networks, but the strength and specificity of the claimed link to long-term memory improvements should be viewed as more tentative.
Reviewer #3 (Public review):
Summary:
This study sought to explore how neural representations during encoding change with expertise or proficiency in the method of loci (MoL). To do this, the authors compared three groups: memory athletes (experts in MoL), naive controls, and naive participants before and after 6 weeks of MoL training and analyzed how similar their encoding-related activity patterns were across groups and training. They found that in lateral prefrontal, inferior temporal, and posterior parietal regions, pattern similarity decreased with expertise and training. They also found that changes in similarity between pre- and post-training were associated with improvements in memory performance measured 4 months later. Additionally, in a follow-up exploratory analysis on the temporal order recognition task, neural patterns were more similar for those proficient in MoL - a contrast to the decrease seen at encoding. Taken together, the authors interpret these findings as evidence that proficiency with the method of loci is associated with distinct encoding representations: Broadly, the findings suggest that greater representational differentiation at encoding may be associated with better memory.
Strengths:
(1) The manuscript is impressively rich with analyses. Their general claim that neural differentiation increases between individuals with MoL experience is thus addressed in this work. Specifically, the authors effectively explore different levels of granularity to tackle the question of whether a participant's neural representation (with MoL experience) looks more similar to that of another (with less experience) during encoding.
(2) The authors connect their hypotheses about neural representational differences caused by training to behavioral data (and 4 months later at that).
(3) Although exploratory, they not only look at encoding-related differences, but also retrieval-related differences.
(4) The authors provide many supplementary figures with complementary and interesting findings. As I read, I found myself curious about exploratory analyses, which were then addressed in supplementary figures.
Weaknesses:
(1) The manuscript is impressively rich, but the number of analyses and levels of comparison (and how they are presented) made it difficult to follow. The paper would benefit from an anticipatory introductory paragraph (or an introductory Results paragraph) that explicitly states the hypotheses and which sections of the results addressed them. Additionally, given how this is a Methods-last formatted paper, the manuscript would benefit from a few introductory sentences at each Results section describing the methodology.
(2) One of the motivations needs strengthening. Given the introduction, the manuscript seems to be motivated by two complementary questions: (i) whether neural differentiation effects reported with short-term MoL training (as done in Liu et al., 2022) extend with longer-term training and expertise and (ii) whether training might lead individuals towards a canonical "expert" representation that can only be acquired through training as has been previously shown in other work (e.g., Meshulam et al., 2021).
The first motivation is clear and compelling. The second one, however, does not feel as well grounded. In studies like Meshulam et al., alignment is expected because participants are exposed to the same stimulus or concept. In contrast, as the authors note, a user of the method of loci is encouraged to create unique, vivid representations of their loci and to-be-remembered items - here, neural alignment is at odds with the premise of the technique. As such, the described tension between increased pattern similarity across the studies cited in the second paragraph of the introduction and individuals proficient with MoL feels underdeveloped (despite the reference-rich second paragraph).
The authors would benefit from articulating why the counterfactual of "increased neural alignment" might be expected, specifically, in the method of loci. In other words, why should we expect trainees to become more similar to experts when the strategy itself promotes idiosyncratic representations? Perhaps, the authors could distinguish between alignment at the level of knowledge representations vs the process of encoding (e.g., the act of placing items into loci).
(3) Relatedly, terminology referencing the employed methodology is a bit unclear. In some of the papers cited that look at pattern similarity across people (like Meshulam et al., or Koch et al.), the spatial patterns of individuals are compared with 'template' patterns that reflect the canonical representation of a concept or episode. However, the manuscript does not include this type of template-based comparison. This is understandable because there may not be a representative canonical pattern when each participant has their own idiosyncratic palace. In this case, a pairwise comparison may be more fitting as it focuses on the distances between people's representations instead of the distances between them and a group template. Although both comparisons (pairwise and template-based similarities) are related, they have different interpretations. A clearer justification for why pairwise similarity, instead of template-based similarity (as in many of the cited papers), is the more appropriate metric in this paradigm early on would add to the clarity of the work. Additionally, this slight difference in methodology was confusing because some portions of the text (including the figures) say "group average", but in others, we see "pairwise".
Minor Comments:
A recent paper (Masis-Obando et al., 2026, Nat Hum Behav) shows that stable and distinctive spatial representations can support later reinstatement of items placed within those contexts. Their conclusions seem to support your hypotheses and results here. In parallel, prior work (like Robin et al., 2018, J Neurosci) emphasizes the importance of spatial contexts for the representation of events. Given how MoL encoding relies on vivid context-item binding, including these perspectives in the Introduction (and/or discussion) may help frame the current findings within the broader memory literature.
Overall, this work provides rich and valuable contributions to the field.