Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.
Read more about eLife’s peer review process.Editors
- Reviewing EditorHuan LuoPeking University, Beijing, China
- Senior EditorHuan LuoPeking University, Beijing, China
Reviewer #1 (Public review):
The manuscript analyzes previously published MEG and ECoG datasets to examine pre-onset neural encoding effects during language processing, replicating effects that have been reported in earlier work and demonstrating that they persist even after controlling for correlations in the stimulus sequence. Replication of these effects across recording modalities and datasets is a valuable contribution, as it strengthens confidence in the robustness of anticipatory neural activity related to upcoming linguistic input. However, I have significant concerns regarding the interpretation of these findings, particularly the conclusion that the absence of temporal generalization between pre- and post-onset activity implies that pre-onset activity does not reflect predictive pre-activation of the upcoming word.
The central inferential step in this argument relies on an implicit assumption: that if the brain were predicting an upcoming word, the neural representation prior to word onset should resemble, or generalize to, the representation observed after word onset. This assumption is not theoretically necessary and is not supported by a substantial body of work on predictive processing. Many contemporary models posit that predictions are represented in abstract, compressed, or probabilistic formats that differ from sensory-evoked representations, particularly in hierarchical systems such as language (e.g., Rao & Ballard, 1999; Friston, 2005; Federmeier, 2007; Kuperberg & Jaeger, 2016; de Lange et al., 2018). Under such accounts, predictive representations may encode expectations over latent semantic features or probability distributions rather than reinstating the neural code associated with perceptual input.
In this context, the temporal generalization analyses presented here convincingly demonstrate that pre-onset and post-onset activity do not share a stable representational code. However, this result does not rule out predictive processing per se. Rather, it rules out a specific and relatively strong hypothesis: that prediction takes the form of early reinstatement of the same neural representation used during post-onset word processing. The data are equally consistent with the interpretation that pre-onset activity reflects predictive information expressed in a different representational format that is transformed upon stimulus onset.
I therefore recommend that the authors substantially soften and clarify their conclusions regarding prediction. Statements suggesting that pre-onset activity does not reflect prediction should be revised to more precisely reflect what is directly supported by the analyses, namely, the absence of representational identity or stable overlap between pre- and post-onset activity. Explicit acknowledgement of alternative interpretations grounded in established predictive processing frameworks would improve theoretical alignment and avoid overstating the implications of the temporal generalization results.
Overall, the empirical analyses are carefully executed, and the replication across datasets is a strength. However, the current framing risks over-interpreting what the data can rule out about prediction. A clearer distinction between representational equivalence and predictive processing would significantly strengthen the manuscript's theoretical contribution.
Reviewer #2 (Public review):
Summary:
The authors show that pre-onset neural encoding is likely not a product of predictive processing. They demonstrate this primarily through two analyses:
(1) They decorrelate the neural responses between pre- and post-word onset and show that this does not eliminate pre-onset neural encoding. This suggests that this pre-onset neural encoding is not a result of pre-activation driven by an underlying predictive process.
(2) They show that the future word improvement to encoding performance shown in Caucheteux et al. is likely a result deriving from the low temporal resolution in fMRI, as it does not reproduce in MEG or ECoG data, modalities that have a higher temporal resolution better suited to this kind of analysis.
Strengths:
Both of the paper's arguments are overall very compelling and point to potential problems in the underlying literature that may require reevaluation. The paper does not make any unreasonable claims. The limitations of the study are appropriately addressed. The paper is well-reasoned and well-written. Overall, I believe the paper is a worthy addition to the literature on this subject.
Weaknesses:
One concern is that I wonder about the degree to which the residualization/decorrelation that the authors employ in Figure 4 is truly forcing the model to unlearn all the interactions between pre- and post-word onset when referencing the neural activity. This point is explicitly noted in Schonmann et al. (which the authors cite): "While residualised word embeddings no longer contain temporal stimulus dependencies, these dependencies are still represented in the neural data, and can hence be 're-learned' when fitting the regression model." I imagine the inverse of this could be true here - the dependencies are still represented in the stimulus and so can be relearned when mapping to the neural data. It is possible that the small positive onset correlation that occurs after decorrelation can be entirely explained by this. This is not a bad thing per se (as it aligns with the overall point of the article), but it is a potential methodological oversight. A clear description of the decorrelation process is necessary in the methods section.
The paper correctly notes that their removal of bigram/n-gram information does not entirely exclude all stimulus dependencies. However, removing this fully would be extremely difficult, and the small reduction in performance of the bigram-ablated model does not point to this being a major issue.
Separately, some of the figures are a little rough. Suggestions have been provided to the authors.
Reviewer #3 (Public review):
Previous studies have shown that language model embeddings of future words can predict brain responses to language. This has been interpreted as evidence for predictive representations in the brain. The primary finding of the present study is that this index of predictive processing is not consistent with a pre-activation account, but instead suggests continuously evolving representations. A strength of the manuscript is that it uses methods that build on previous studies and shows that previous results replicate in the current datasets, before testing new hypotheses. Addressing some minor weaknesses would further strengthen the results and ascertains that the conclusions are justified:
(1) When analyzing neural data, "words with multiple tokens assigned by the model were excluded" (11). I am wondering whether this could have had an influence on the results. I suspect that using only single token words would bias the dataset towards semantically light high frequency and function words. Pre-activation may be different for those words from more semantically rich, longer words.
(2) The study only used a context window of 50 tokens for language model predictions (11). This is less than in previous studies, and may constitute a confound when comparing results across studies. This may be particularly relevant in comparison to Caucheteux et al. (2003), whose results suggested more extensive predictions (9), which may require more extensive context.
(3) The manuscript is largely missing data on the reliability of the results. Some form of significance test, and indication of variability and/or the noise floor in the figures would be helpful.
A primary concern when analyzing naturalistic speech data is that different speech features are highly correlated across linguistic levels and across time. The manuscript makes a reasonable effort to control for stimulus autocorrelations. It is encouraging that the effect survived this correction. As the manuscript concedes, control is not perfect and controlling for "all regularities inherent to natural speech" remains a challenge (9). This should be kept in mind when interpreting the results.
Finally, the manuscript also argues that "we observed clear signatures of postdiction, with neural activity reflecting persistent encoding of prior words" (abstract). I did not follow this reasoning. The ostensible evidence for this is that "including the previous word ... improves encoding even after the current word's onset" (Figure 5). However, this is not further surprising, because the previous word can often only be recognized around the end of the word, corresponding to the time of the current word onset. Language model embeddings reflect a contextual semantic interpretation of the word, which likely requires further processing after word recognition. I would thus expect that the initial contextual interpretation of a word occurs during presentation of the subsequent word. Evidence for "persistent encoding" should include encoding beyond this point, i.e., over the course of several subsequent words. Contrary to this, Figure 5 a (left) suggests that the predictive effect of the previous word (d-1) stops around the offset of the current word (d). This suggests to me that, once controlling for subsequent embeddings, the embedding of a word disappears from the neural activity soon after word recognition.