A neural mechanism for compositional generalization of structure in humans

Lennart Luettgau; Nan Chen; Tore Erdmann; Sebastijan Veselic; Zeb Kurth-Nelson; Rani Moran; Raymond J Dolan

doi:10.7554/eLife.107162.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
Roberto Bottini
University of Trento, Trento, Italy
Senior Editor
Michael Frank
Brown University, Providence, United States of America

Reviewer #1 (Public review):

Summary of the paper:

The paper presents an elegant task designed to investigate humans' ability to generalize knowledge of learned graph structures to new experiences that share the same structure but are built from different stimuli. Using behavior and MEG recordings, the authors test evidence for neural representation and application of structural knowledge.

Review overview:

While the task design is elegant, it isn't clear to me that the data support all the claims made in the paper. I have detailed my concerns below.

Major concerns

(1) The authors claim that their findings reveal "striking learning and generalization abilities based on factorization of complex experiences into underlying structural elements, parsing these into distinct subprocesses derived from past experience, and forming a representation of the dynamical roles these features play within distinct subprocesses." And "neural dynamics that support compositional generalisation, consistent with a structural scaffolding mechanism that facilitates efficient adaption within new contexts".

a. First, terms used in these example quotes (but also throughout the paper) do not seem to be well supported by data or the task design. For example, terms such as 'compositional generalisation' and 'building blocks' have important relevance in other papers by (some of) the same authors (e.g., Schwartenbeck et al., 2023), but in the context of this experiment, what is 'composition'? Can the authors demonstrate clear behavioural or neural evidence for compositional use of multiple graph structures, or alternatively remove reference to these terms? In the current paper, it seems to me that the authors are investigating abstract knowledge for singular graph structures (together with the influence of prior learning), as opposed to knowledge for the compound, more complex graph formed from the product of two simpler graphs.

b. While I would like to be convinced that this data provides evidence for the transfer of abstract, structural knowledge, I think the authors either need to provide more convincing evidence or tone down their claims.

Specifically:

(i) Can the increase in neural similarity between stimuli mapping to the same abstract structural sub-process not be explained by temporal proximity in experiencing the transitions (e.g., Cai et al., 2016)? Indeed, behavior seems to be dominated by direct experience of the structure as opposed to applying abstract knowledge of equivalent structures (and, as a result, there is little difference in behavioural performance between experience and inference probes).

(ii) The strongest evidence for neural representation of abstract task structures seems to be the increase in similarity by transition type. But this common code for 'transition type' is only observed for 6-bridge graphs and only for experienced transitions. There was no significant effect in inference probes. Therefore, there doesn't seem to be evidence for the application of a knowledge scaffold to facilitate transfer learning. Instead, the data reflects learning from direct experience and not generalisation.

(iii) The authors frequently suggest that they are providing insight into temporal dynamics, but there is no mention of particular oscillations or particular temporal sequences of neural representation that support task performance.

(2) Regardless of point (b), can the authors provide more convincing evidence for a graph structure being represented per se (regardless of whether this representation is directly experienced or inferred)? From Figure 3C, it seems that the model RDM doesn't account for relative distance within the graph. Do they see evidence for distance coding? Can they reconstruct the graph from representational patterns using MDS?

(3) In general, the figures are not very clear, and the outcome from statistical tests is not graphically shown. The paper would be easier to digest if, for example, Figures 1-2 were made clearer and statistical significance relative to chance were indicated throughout. To give two examples: (i) Figure 1 should clearly indicate what is meant by observed and held-out transitions and whether it is just the transition or also the compound that is new to the participant. (ii) Figure 2D-E could be shown with relevant comparisons and simpler statistical comparisons. Currently, it is hard to follow without carefully reading the legend.

https://doi.org/10.7554/eLife.107162.1.sa2

Reviewer #2 (Public review):

Summary:

The authors aimed to investigate the temporal dynamics of how prior experiences shape learning in new complex environments by examining whether the brain reuses abstract structural components from those experiences. They employed a sequence learning task based on graph factorization and recorded neural activity using magnetoencephalography (MEG) to investigate how the underlying graph factors are reused to support learning and inference in a new graph. MEG data was derived from passive stimulus presentation trials, and behavior was assessed through a small number of probe trials testing either experienced or inferred successions in the graph. Representational similarity analysis of the MEG data was performed at a quite aggregated level (the principal components explaining 80% of the variance). The authors report (1) enhanced neural similarity among stimuli that belong to the same graph-factor as well as (2) a correlation between abstract role representations, corresponding to particular positions in the graph, and performance in experience-probes but not in inference-probes.

Strengths & Weaknesses:

(1) The first finding is considered evidence for representational alignment of the graph factors. However, alignment seems to be just one possible arrangement underlying the increased similarity between stimuli of the same vs different graph factors. For instance, a simple categorical grouping of stimuli belonging to the same graph, rather than their structural alignment, could also underlie the reported effect. The wording should be adjusted to avoid overinterpretation.

(2) The second finding of abstract role representations is indeed expected for structural generalisation. While the data presents an interesting indication, its interpretability is constrained by a lack of testing for generalization of the effect to other graph structures (e.g., to rule out graph-specific strategies) as well as the absence of a link to transfer performance in inference-probes. The authors argue that the experienced transitions the classifier was trained on might be more similar in process to the experience-probes than the inference-probes. However, as inference-probes are the key measure of transfer, one could argue that if abstract role representations truly underlie transfer learning, they should be evident in the common neural signal.

(3) The authors write, "we observed a qualitative pattern indicative of increased neural similarity between stimuli that adhered to the same underlying subprocess across task phases. (...) There was a statistically significant interaction effect of condition x graph factor spanning approximately 300 - 680 ms post-stimulus onset". I conclude there was no significant main effect of graph factor, but the relevant statistics are not reported. The authors should report and discuss the complete statistics.

(4) The RSA is performed on highly aggregated data (the PCs that explained 80% of the variance). Could the authors include their rationale for this choice (e.g. over-analysis of sensor-level data)? In case sensor-level analyses have been conducted as well, maybe there are comparisons or implications of the chosen approach that are useful to mention in the discussion. The authors should provide the average and distribution of the number of PCs underlying their analyses.

(5) While the paper is well-written overall, it would benefit from more explicitly identifying the concrete research question and advancing through the results. The authors state their aim as understanding the "temporal dynamics of compositional generalisation", revealing "at which moment during neural information processing are they assembled". They conclude with "providing evidence for temporally resolved neural dynamics that support compositional generalization" and "we show the neural dynamics (...) presented across different task phases...". It remains somewhat vague what specific insight about the process is provided through the temporal resolution (e.g., is the time window itself meaningful, if so, it should be contextualized; is the temporal resolution critical to dissociate subprocesses). The different task phases -initial learning and transfer- are the necessary conditions to investigate transfer learning, but do not by themselves offer a particularly resolved depiction of the process.

Overall, the findings are congruent with prior research on neural correlates of structural abstraction. They offer an elegant, well-suited task design to study compositional representations, replicating the authors' earlier finding and providing temporal information on structural generalisation in a sequence learning task.

https://doi.org/10.7554/eLife.107162.1.sa1

Reviewer #3 (Public review):

Summary

This study investigates how task components can be learned and transferred across different task contexts. The authors designed two consecutive sequence learning tasks, in which complex image sequences were generated from the combination of two graph-based structural "building blocks". One of these components was shared between the prior and transfer task environments, allowing the authors to test compositional transfer. Behavioral analyses using generalized linear models (GLMs) assessed participants' sensitivity to the underlying structure. MEG data were recorded and analyzed using classifications and feature representational similarity analysis (RSA) to examine whether neural similarity increased for stimuli sharing the same relational structure. The paper aims to uncover the neural dynamics that support compositional transfer during learning.

Strengths and weaknesses

I found the methods and task design of this paper difficult to follow, particularly the way stimuli were constructed and how the experimental sequences were generated from the graph structures. These aspects would be hard to replicate without some clarification. I appreciate the integration of behavioral and neuroimaging data. The overall approach, especially the use of compositional graph structures in sequence learning, is interesting and could be used and revised in further studies in compositionality and transfer learning. I appreciated the authors' careful interpretation of their findings in the discussion. However, I would have liked a similar level of caution in the abstract, which currently overstates some claims.

Major Comments:

(1) While the introduction mentions brain areas implicated in the low-dimensional representation of task knowledge, the current study uses M/EEG and does not include source reconstruction. As a result, the focus is primarily on the temporal dynamics of the signal rather than its spatial origins. Although I am not suggesting that the authors should perform source reconstruction in this study, it would strengthen the paper to introduce the broader M/EEG literature on task-relevant representations and transfer. The same applies to behavioral studies looking at structural similarities and transfer learning. I encourage the authors to integrate relevant literature to better contextualize their results.

Duan, Y., Zhan, J., Gross, J., Ince, R. A. & Schyns, P. G. Pre-frontal cortex guides dimension-reducing transformations in the occipito-ventral pathway for categorization behaviors. Current Biology 34, 3392-3404 (2024).

Luyckx, F., Nili, H., Spitzer, B. & Summerfield, C. Neural structure mapping in human probabilistic reward learning. eLife 8, e42816 (2019). (This is in the references but not in the text).

Zhang, M. & Yu, Q. The representation of abstract goals in working memory is supported by task-congruent neural geometry. PLoS biology 22, e3002461 (2024).

L. Teichmann, T. Grootswagers, T. Carlson, A.N. Rich Decoding digits and dice with magnetoencephalography: evidence for a shared representation of magnitude Journal of cognitive neuroscience, 30 (7) (2018), pp. 999-1010

Garner, K., Lynch, C. R. & Dux, P. E. Transfer of training benefits requires rules we cannot see (or hear). Journal of Experimental Psychology: Human Perception and Performance 42, 1148 (2016).

Holton, E., Braun, L., Thompson, J., Grohn, J. & Summerfield, C. Humans and neural networks show similar patterns of transfer and interference during continual learning (2025).

(2) I found it interesting that the authors chose to perform PCA for dimensionality reduction prior to conducting RSA; however, I haven't seen such an approach in the literature before. It would be helpful to either cite prior studies that have employed a similar method or to include a comparison with more standard approaches, such as sensor-level RSA or sensor-searchlight analysis.

(3) Connected to the previous point, the choice to use absolute distance as a dissimilarity measure is not justified. How does it compare to standard metrics such as correlation distance or Mahalanobis distance? The same applies to the use of Kendall's tau.

(4) The analysis described in the "Abstract representation of dynamical roles in subprocesses" does not appear to convincingly test the stated prediction of a structural scaffolding account. The authors hypothesize that if structure and dynamics from prior experiences are repurposed, then stimuli occupying the same "dynamical roles" across different sequences should exhibit enhanced neural similarity. However, the analysis seems to focus on decoding transitions rather than directly assessing representational similarity. Rather, this approach may reflect shared temporal representation in the sequences without necessarily indicating that the neural system generalizes the abstract function or position of a stimulus within the graph. To truly demonstrate that the brain captures the dynamical role across different stimuli, it would be more appropriate to directly assess whether neural patterns evoked by stimuli, in the same temporal part of the sequence, with shared roles (but different visual identities) are more similar to each other than to those from different roles.

(5) In the following section, the authors correlate decoding accuracy with participants' behavioral performance across different conditions. However, out of the four reported correlations and the additional comparison of differences between conditions, only one correlation and one correlation difference reach significance, and only marginally so. The interpretation of this finding should therefore be more cautious, especially if it is used to support a link between neural representations and behavior. Additionally, it is possible that correlation with a more clearly defined or targeted neural signature, more directly tied to the hypothesized representational content, could yield stronger or more interpretable correlations.

Minor Comments:

During preprocessing, sensors were excluded based on an identified noise level. However, the authors do not specify the threshold used to define this noise level, nor do they report how many sensors were excluded per participant. It would be helpful to have these details. Additionally, it is unclear why the authors opted to exclude sensors rather than removing noise with MaxFiltering or interpolating bad sensors. Finally, the authors should report how many trials were discarded on average (and standard deviation) per participant.

https://doi.org/10.7554/eLife.107162.1.sa0

A neural mechanism for compositional generalization of structure in humans

Peer review process

Editors

Be the first to read new articles from eLife