Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.
Read more about eLife’s peer review process.Editors
- Reviewing EditorMargaret SchlichtingUniversity of Toronto, Toronto, Canada
- Senior EditorLaura ColginUniversity of Texas at Austin, Austin, United States of America
Reviewer #1 (Public Review):
In this paper, Scholz and colleagues introduce a new paradigm aimed to bridge the gap between two domains that rely on hierarchical processing: language and memory. They find that, generally in line with their hypotheses, hierarchical processing is associated with activation in hippocampus (especially anterior), medial prefrontal cortex (mPFC), posterior superior temporal sulcus (pSTS), and inferior frontal gyrus (IFG). They also report that these effects in IFG are particularly strong late in the task, once participants have had a lot of experience and processing is presumably more automatic.
This work has many strengths. The goal to bridge these literatures by developing a new task is commendable. I appreciate also that the authors separately validated their new task behaviorally by comparing it to another accepted as tapping hierarchical processing. I also liked that the authors were transparent about their hypotheses, and certain analyses like the grid coding one that was planned but did not work out. I do however have a number of concerns about the interpretations of the findings, such as whether some patterns are ambiguous as to the true underlying effects. I also have a number of clarification questions. All concerns are described below.
1. Broadly, I would like to see the authors provide more information and logic on why hierarchical processing should be associated with a big reduction in univariate activation between P1 and P2-why would this signify item in contexts binding? How does this relate to existing work using other methods (e.g., like animal studies, which seem to make predictions more about representational structures)?
2. There are many differences between what kind of information participants are processing between Position 1 and Position 2 for the HIER but not ITER conditions, and these may not be related to the hierarchical structure specifically. Related to but I think distinct from some of the limitations mentioned in the Discussion is the fact that in the HIER condition, what is happening cognitively between Position 1 and Position 2 items is more distinct (attending to color for position 1, and shape for position 2), whereas the two positions are equivalent in the ITER condition. This is a bit different from the authors' intended manipulation of hierarchy, because it involves a specific dimension. A stronger design might have been to flip the dimensions with respect to position specifically, to make shape sometimes important for position 1, and color for position 2 (perhaps by counterbalancing across subjects, so half would see the current P1=color and P2=shape rules, and the other half P1=shape and P2=color rules). Another important difference between color and shape is that while color is a simple binary distinction that participants can make based on their preexisting knowledge of red versus green, and to which they can assign a verbal label; whereas, the shape distinction was something novel they acquired during the experiment, has no real-world validity or meaning, and would presumably rely more on visuospatial processing. The shape dimension was also much more variable, I believe. I should say that I do find comfort in a few things - (1) that behavior on this task is correlated with another one that also indexes hierarchy processing, and (2) that the results show regional specificity in a pattern at least not easily explained by this distinction. However, I do think future work will be needed to ask whether it is hierarchy processing per se or rather something to do with the particular cognitive states engaged during each phase in this particular task that is eliciting activation in this set of regions. It would strengthen the paper to discuss this issue directly so readers are alerted to the caveat.
3. I did not understand what data went into creating the schematic in Figure 2E. First, I think this depiction of a gradient might be easily misinterpreted because it seems to imply that the authors have a higher resolution analysis than they actually do. I believe the data were just analyzed in three subregions of hippocampus - head, body, and tail. Variability within each subregion (as seems to be implied by certain parts of a region being more grey and others more red/orange), is not something that could be assessed in this analysis. For example, why does the medial part of the head seem to be more "unspecific" whereas lateral regions look more HIER Pos1 specific? This type of depiction would only make sense in my mind if the authors had performed something like a voxelwise analysis to determine where specifically the interaction "peaks." I would recommend this visualization be cut or significantly changed to do away with the gradient.
4. I believe the authors have not reported enough information for us to know that hippocampus involvement indeed does not change with experience. It is interesting that hippocampus in the task x experience ROI analysis shows, if anything, bigger differentiation between the two tasks (numerically) for the late trials. This seems to go against the authors' hypothesis, and a lot of existing data, that hippocampus is preferentially involved in early (vs. late) learning. Given that the key signature in this region, though, is that it differentiates between position 1 and position 2 in HIER but not ITER, and doesn't show a big difference in magnitude across the two tasks, it makes me wonder whether the task x experience interaction collapsing across the two positions makes sense for this region. Did the authors consider a similar task x experience interaction within hippocampus, but additionally considering position? I think there are multiple ways to look at this question (e.g., either looking for a task x experience x position interaction, a task x experience within position 1, a task x position interaction separately in early vs. late portions of the task, or even a position x experience interaction only within the HIER task), and I'm sure the authors would be in a better place to decide on a specific path forward. The same logic might go for mPFC, which shows an interaction but no main effect of task. This relates to claims in the discussion as well, such as that "hippocampus was equally active in early and late trials," but given this analysis is collapsing across the dimension hippocampus (and mPFC) seem to be sensitive to (position), it seems like this could be masking an underlying effect in which hippocampus/mPFC might still be differentially involved early vs. late (i.e., they might show the task x position interaction preferentially during some task phases).
5. For the IFG regions, the task x experience interaction seems to be driven mainly by change (decrease in activation) for the ITER, rather than change in the HIER. The authors are at times careful to talk about this as "sustained" activity in IFG, which I appreciated, but other times talk about a "relative increase." I am not sure how I feel about that. I see the compelling evidence that there are task differences by experience, and that there is reduction for ITER that is interestingly not present for HIER, but I think I am still feeling uncomfortable with the term "increase" or even "relative increase" for HIER. For example, couldn't it simply be that the ITER task is requiring less processing with experience, whereas the HIER does not (perhaps because it requires more processing to begin with)? i.e., we do not know whether the reduction for ITER is simply a neural signal thing (i.e., activations diminish over time/experience) or a cognitive thing, specific to the ITER task. I think the authors are wanting to interpret the reductions as the former, but perhaps it would be more powerful to demonstrate if there was a baseline task that also showed reductions but for which not much would be expected in the way of cognitive change. Can the authors provide more justification for their choice of terminology (through either more logic or analyses), or if not, simply talk about it as sustained activity for HIER-which is especially interesting in the face of reductions for the ITER task?
6. Please define what is meant by the term "automaticity" in the introduction. A clearer definition of the concept would make the paper generally easier to follow, and it would also help foreshadow the hypotheses about mPFC activity in the introduction. To this end, it could be useful to elaborate on how learning takes place in this task, how it could foster increasing automaticity, and how automaticity maps onto behaviour (e.g., is it RT decrease alone, which happens for both conditions in this task?) the brain regions discussed.
7. There was no association between brain and behavior, which the authors interpret as a positive (as therefore task difficulty differences could not explain the effects). However in light of these null findings, it is on the flip side hard to know whether this neural engagement carries any behavioral significance. It seems to me as though the authors' framework makes predictions about brain-behavior correlations that were not tested in the manuscript. For example, I believe the authors asked whether behavior overall was correlated with activation. However, wouldn't the automaticity in IFG explanation for example predict that more engagement or an increase in engagement from early to late should be associated with e.g., faster RTs-not necessarily a relationship overall?
8. On p. 8, it is stated that "In the hippocampus, this effect is driven by higher betas for the presentation of the first object (H1 > I1) and lower betas for the second object (H2 < I2) when comparing across tasks." Can the authors confirm whether the pairwise comparisons following up on the interaction here are significant, or rather if they are referring to a numerical difference in the betas? It looked like the same (numerically) would be true for mPFC; is there a reason why the same information is not included for the mPFC ROI? Also, might the authors provide more speculation as to why one might see both enhanced and reduced activation for P1 and P2, respectively?
9. I was expecting some discussion of how hippocampus does not seem to show preferential involvement early, given that its potential role being restricted to early in learning (i.e., during acquisition only) was one of the primary motivators for using this task. As noted in my above comment (#4), I am not quite sure that I think there is evidence that the hippocampal role remains constant over this task, given the analyses provided (i.e., that they did not look at the position effect for early vs. late). However upon further analysis if it does seem to be more stable, and/or if it even increases over experience, the authors might want to talk about that in the Discussion.
10. The fact that the hierarchies in this paradigm unfolded over time makes them distinct on some level from the hierarchies present in the VRT task that was used to validate the HIER task's hierarchical processing demands. For example, there might be additional computations required to processes these temporally ordered structures, support online maintenance, and so on. It may be worth considering this aspect of the task, and whether/to what extent the results could be related to it, in the paper.
11. I also have many methodological and analytic clarification questions, which I detail in the recommendations for authors.
Reviewer #2 (Public Review):
In this manuscript, Scholz et al., adopt a set of tasks to study how brain regions are differentially activated with temporal context clues. In one task, the first item in a two item sequence will dictate the value of the second. In another task, there is no hierarchy in temporal order, though subjects must still maintain information across the delay to add the value of the two presented items. Using univariate analyses, the authors found many regions that showed an interaction between item position and task, including: the mPFC, anterior hippocampus and the left prefrontal and posterior temporal cortices. The results are interpreted as evidence for a dedicated system for understanding hierarchical relationships across domains as various as spatial cognition, music, and language.
The question raised by the authors is important and fMRI may be an appropriate means of studying the neural basis for hierarchical computations. The main limitation of the manuscript, and one that is briefly mentioned and dismissed in the discussion is the task design, which confounds whether or not a hierarchical relationship must be formed, and the content of the information that must be held across working memory (color in the hierarchy task and number in the iterative task).
The authors also report an interesting difference between the activation observed in the head and tail of the hippocampus during the different tasks. However, the authors compare each region independently, show one is significant and the other is not, and then conclude "the effect of hierarchical context representation in the hippocampus is specific to its anterior regions." Such a conclusion requires direct comparison of the regions.
Finally, it isn't clear if the motivating prior work makes a simple univariate prediction. A strong prediction however is that the representational similarity should be very different for objects in the first versus second position in the hierarchy task and much less so in the iterative task. Such a representational similarity analysis would better connect this study to prior research and to the hypothesis that hierarchical processing affects the coding of items in sequence.
Reviewer #3 (Public Review):
My biggest concern is that I am not convinced that the HIER task is indeed hierarchical. Based on Figure 1B, it seems that the rules of the task can be listed as "Green and same = 2", "Green and different = 4", "Red and same = 1", "Red and different = 3". If so, the hierarchical organisation intended by the authors can be trumped by simply memorising these 4 options. The rote memory explanation is even more likely given that the other, ITER task, clearly required rote memory. Hence the two tasks may vary simply in the amount of difficulty/WM involvement.