Stimulus dependencies—rather than next-word prediction—can explain pre-onset brain encoding during natural listening

  1. Donders Institute for Brain, Cognition and Behaviour, Nijmegen, Netherlands
  2. Amsterdam Brain and Cognition, University of Amsterdam, Amsterdam, Netherlands

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Nai Ding
    Zhejiang University, Hangzhou, China
  • Senior Editor
    Huan Luo
    Peking University, Beijing, China

Reviewer #1 (Public review):

Summary:

This paper tackles an important question: What drives the predictability of pre-stimulus brain activity? The authors challenge the claim that "pre-onset" encoding effects in naturalistic language data have to reflect the brain predicting the upcoming word. They lay out an alternative explanation: because language has statistical structure and dependencies, the "pre-onset" effect might arise from these dependencies, instead of active prediction. The authors analyze two MEG datasets with naturalistic data.

Strengths:

The paper proposes a very reasonable alternative hypothesis for claims in prior work. Two independent datasets are analyzed. The analyses with the most and least predictive words are clever, and nicely complement the more naturalistic analyses.

Weaknesses:

I have to admit that I have a hard time understanding one conceptual aspect of the work, and a few technical aspects of the analyses are unclear to me. Conceptually, I am not clear on why stimulus dependencies need to be different from those of prediction. Yes, it is true that actively predicting an upcoming word is different from just letting the regression model pick up on stimulus dependencies, but given that humans are statistical learners, we also just pick up on stimulus dependencies, and is that different from prediction? Isn't that in some way, the definition of prediction (sensitivity to stimulus dependencies, and anticipating the most likely upcoming input(s))?

This brings me to some of the technical points: If the encoding regression model is learning one set of regression weights, how can those reflect stimulus dependencies (or am I misunderstanding which weights are learned)? Would it help to fit regression models on for instance, every second word or something (that should get rid of stimulus dependencies, but still allow to test whether the model predicts brain activity associated with words)? Or does that miss the point? I am a bit unclear as to what the actual "problem" with the encoding model analyses is, and how the stimulus dependency bias would be evident. It would be very helpful if the authors could spell out, more explicitly, the precise predictions of how the bias would be present in the encoding model.

Reviewer #2 (Public review):

Summary:

At a high level, the reviewers demonstrate that there is an explanation for pre-word-onset predictivity in neural responses that does not invoke a theory of predictive coding or processing. The paper does this by demonstrating that this predictivity can be explained solely as a property of the local mutual information statistics of natural language. That is, the reason that pre-word onset predictivity exists could simply boil down to the common prevalence of redundant bigram or skip-gram information in natural language.

Strengths:

The paper addresses a problem of significance and uses methods from modern NeuroAI encoding model literature to do so. The arguments, both around stimulus dependencies and the problems of residualization, are compellingly motivated and point out major holes in the reasoning behind several influential papers in the field, most notably Goldstein et al. This result, together with other papers that have pointed out other serious problems in this body of work, should provoke a reconsideration of papers from encoding model literature that have promoted predictive coding. The paper also brings to the forefront issues in extremely common methods like residualization that are good to raise for those who might be tempted to use or interpret these methods incorrectly.

Weaknesses:

The authors don't completely settle the problem of whether pre-word onset predictivity is entirely explainable by stimulus dependencies, instead opting to show why naive attempts at resolving this problem (like residualization) don't work. The paper could certainly be better if the authors had managed to fully punch a hole in this.

Reviewer #3 (Public review):

Summary:

The study by Schönmann et al. presents compelling analyses based on two MEG datasets, offering strong evidence that the pre-onset response observed in a highly influential study (Goldstein et al., 2022) can be attributed to stimulus dependencies, specifically, the auto-correlation in the stimuli-rather than to predictive processing in the brain. Given that both the pre-onset response and the encoding model are central to the landmark study, and that similar approaches have been adopted in several influential works, this manuscript is likely to be of high interest to the field. Overall, this study encourages more cautious interpretation of pre-onset responses in neural data, and the paper is well written and clearly structured.

Strengths:

(1) The authors provide clear and convincing evidence that inherent dependencies in word embeddings can lead to pre-activation of upcoming words, previously interpreted as neural predictive processing in many influential studies.

(2) They demonstrate that dependencies across representational domains (word embeddings and acoustic features) can explain the pre-onset response, and that these effects are not eliminated by regressing out neighboring word embeddings - an approach used in prior work.

(3) The study is based on two large MEG datasets, showing that results previously observed in ECoG data can be replicated in MEG. Moreover, the stimulus dependencies appear to be consistent across the two datasets.

Weaknesses:

(1) To allow a more direct comparison with Goldstein et al., the authors could consider using their publicly available dataset.

(2) Goldstein et al. already addressed embedding dependencies and showed that their main results hold after regressing out the embedding dependencies. This may lessen the impact of the concerns about self-dependency raised here.

(3) While this study shows that stimulus dependency can account for pre-onset responses, it remains unclear whether this fully explains them, or whether predictive processing still plays a role. The more important question is whether pre-activation remains after accounting for these confounds.

Author response:

Reviewer #1 (Public review):

Summary:

This paper tackles an important question: What drives the predictability of pre-stimulus brain activity? The authors challenge the claim that "pre-onset" encoding effects in naturalistic language data have to reflect the brain predicting the upcoming word. They lay out an alternative explanation: because language has statistical structure and dependencies, the "pre-onset" effect might arise from these dependencies, instead of active prediction. The authors analyze two MEG datasets with naturalistic data.

Strengths:

The paper proposes a very reasonable alternative hypothesis for claims in prior work. Two independent datasets are analyzed. The analyses with the most and least predictive words are clever, and nicely complement the more naturalistic analyses.

Weaknesses:

I have to admit that I have a hard time understanding one conceptual aspect of the work, and a few technical aspects of the analyses are unclear to me. Conceptually, I am not clear on why stimulus dependencies need to be different from those of prediction. Yes, it is true that actively predicting an upcoming word is different from just letting the regression model pick up on stimulus dependencies, but given that humans are statistical learners, we also just pick up on stimulus dependencies, and is that different from prediction? Isn't that in some way, the definition of prediction (sensitivity to stimulus dependencies, and anticipating the most likely upcoming input(s))?

This brings me to some of the technical points: If the encoding regression model is learning one set of regression weights, how can those reflect stimulus dependencies (or am I misunderstanding which weights are learned)? Would it help to fit regression models on for instance, every second word or something (that should get rid of stimulus dependencies, but still allow to test whether the model predicts brain activity associated with words)? Or does that miss the point? I am a bit unclear as to what the actual "problem" with the encoding model analyses is, and how the stimulus dependency bias would be evident. It would be very helpful if the authors could spell out, more explicitly, the precise predictions of how the bias would be present in the encoding model.

We thank the reviewer for their comments and address both points.

Conceptually, there is a key difference between encoding predictions, i.e. pre-activations of future words, versus encoding stimulus dependencies. The speech acoustics provide a useful control case: they encode the stimulus (and therefore stimulus dependencies) but do not predict. When we apply the encoding analysis to the acoustics (i.e. when we estimate the acoustics pre-onset from post-onset words), we observe the “hallmarks of prediction” – yet, clearly, the acoustics aren't "predicting" the next word.

This reveals the methodological issue: if the brain were just passively filtering the stimulus (akin to a speech spectrogram), these "prediction hallmarks" would still appear in the acoustics encoding results, despite no actual prediction taking place. Therefore, one necessary criterion for concluding pre-activation from pre-stimulus neural encoding, is that at least the pre-stimulus encoding performance is better on neural data than on the stimulus itself. This would show that the pre-onset neural signal contains additional predictive information about the next word beyond that of the stimulus (e.g. acoustics) itself. We will make this point more prominent in the revision.

Regarding the regression: different weights are estimated per time point in a time-resolved regression. This allows for modeling of unfolding responses over time, but also for the learning of stimulus dependencies.

To sum up, the difference between encoding dependencies and predictions is at the core of our work. We appreciate this was not clear in the initial version and we will make this much clearer in the revision, conceptually and methodologically.

Reviewer #2 (Public review):

Summary:

At a high level, the reviewers demonstrate that there is an explanation for pre-word-onset predictivity in neural responses that does not invoke a theory of predictive coding or processing. The paper does this by demonstrating that this predictivity can be explained solely as a property of the local mutual information statistics of natural language. That is, the reason that pre-word onset predictivity exists could simply boil down to the common prevalence of redundant bigram or skip-gram information in natural language.

Strengths:

The paper addresses a problem of significance and uses methods from modern NeuroAI encoding model literature to do so. The arguments, both around stimulus dependencies and the problems of residualization, are compellingly motivated and point out major holes in the reasoning behind several influential papers in the field, most notably Goldstein et al. This result, together with other papers that have pointed out other serious problems in this body of work, should provoke a reconsideration of papers from encoding model literature that have promoted predictive coding. The paper also brings to the forefront issues in extremely common methods like residualization that are good to raise for those who might be tempted to use or interpret these methods incorrectly.

Weaknesses:

The authors don't completely settle the problem of whether pre-word onset predictivity is entirely explainable by stimulus dependencies, instead opting to show why naive attempts at resolving this problem (like residualization) don't work. The paper could certainly be better if the authors had managed to fully punch a hole in this.

We thank the reviewer for their assessment.

We believe the limitation we highlight extends beyond the specific method of residualizing features. Rather, it points to a fundamental problem: adjusting the features (X matrix) alone cannot address stimulus dependencies that persist in the signal (y matrix), as we demonstrate by using a different signal (acoustics) that encodes no predictions. While removing dependencies from the signal would be more thorough, this would also eliminate the effect of interest. We view this as a fundamental limitation of the encoding analysis approach combined with the experimental design, rather than something that can be resolved analytically. We will perform additional analyses to test this premise and elaborate on this point in our revision.

Reviewer #3 (Public review):

Summary:

The study by Schönmann et al. presents compelling analyses based on two MEG datasets, offering strong evidence that the pre-onset response observed in a highly influential study (Goldstein et al., 2022) can be attributed to stimulus dependencies, specifically, the auto-correlation in the stimuli-rather than to predictive processing in the brain. Given that both the pre-onset response and the encoding model are central to the landmark study, and that similar approaches have been adopted in several influential works, this manuscript is likely to be of high interest to the field. Overall, this study encourages more cautious interpretation of pre-onset responses in neural data, and the paper is well written and clearly structured.

Strengths:

(1) The authors provide clear and convincing evidence that inherent dependencies in word embeddings can lead to pre-activation of upcoming words, previously interpreted as neural predictive processing in many influential studies.

(2) They demonstrate that dependencies across representational domains (word embeddings and acoustic features) can explain the pre-onset response, and that these effects are not eliminated by regressing out neighboring word embeddings - an approach used in prior work.

(3) The study is based on two large MEG datasets, showing that results previously observed in ECoG data can be replicated in MEG. Moreover, the stimulus dependencies appear to be consistent across the two datasets.

Weaknesses:

(1) To allow a more direct comparison with Goldstein et al., the authors could consider using their publicly available dataset.

(2) Goldstein et al. already addressed embedding dependencies and showed that their main results hold after regressing out the embedding dependencies. This may lessen the impact of the concerns about self-dependency raised here.

(3) While this study shows that stimulus dependency can account for pre-onset responses, it remains unclear whether this fully explains them, or whether predictive processing still plays a role. The more important question is whether pre-activation remains after accounting for these confounds.

We thank the reviewer for their comments.

We want to address a key unclarity regarding the procedure of regressing out embedding dependencies. While Goldstein et al. showed that neural encoding results persist after their control analysis (like we did, too, in our supplementary Figure S3), this does not lessen the concern surrounding stimulus dependencies. Our analyses demonstrate that even after such residualization, the "hallmarks of prediction" remain encodable in the speech acoustics – a control system that, by definition, cannot predict upcoming words. Therefore, the hallmarks of prediction can be fully explained by stimulus dependencies. This persistence in the acoustics strengthens rather than lessens our concerns about dependencies.

This connects to a broader methodological point: our key evidence comes from analyzing the stimulus material itself as a control system. By comparing results from encoding neural responses to those of a system that encodes the stimulus, and therefore the dependencies that cannot predict the upcoming input (like acoustics), we can establish proper criteria for concluding that the brain engages in prediction. Notably, the Goldstein dataset was not available when we conducted this research. However, for the revision we will perform additional analyses to make a more direct comparison.

Finally, our focus was not to definitively test whether the brain predicts upcoming words, but rather to establish rigorous methodological and epistemological criteria for making such claims. We will elaborate on this crucial distinction in our revision and more prominently feature our central argument about the limitations of current evidence for neural prediction.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation