Reviewer #2 (Public review):

Summary:

This study investigates the influence of prior stimuli over multiple time scales in a position discrimination task, using pupillometry data and a reanalysis of EEG data from an existing dataset. The authors report consistent history-dependent effects across task-related, task-unrelated, and stimulus-related dimensions, observed across different time scales. These effects are interpreted as reflecting a unified mechanism operating at multiple temporal levels, framed within predictive coding theory.

Strengths:

The goal of assessing history biases over multiple time scales is interesting and resonates with both classic (Treisman & Williams, 1984) and recent work (Fritsche et al., 2020; Gekas et al., 2019). The manipulations used to distinguish task-related, unrelated, and stimulus-related reference frames are original and promising.

Weaknesses:

I have several concerns regarding the text, interpretation, and consistency of the results, outlined below:

(1) The abstract should more explicitly mention that conclusions about feedforward mechanisms were derived from a reanalysis of an existing EEG dataset. As it is, it seems to present behavioral data only.

(2) The EEG task seems quite different from the others, with location and color changes, if I understand correctly, on streaks of consecutive stimuli shown every 100 ms, with the task involving counting the number of target events. There might be different mechanisms and functions involved, compared to the behavioral experiments reported.

(3) How is the arbitrary choice of restricting EEG decoding to a small subset of parieto-occipital electrodes justified? Blinks and other artifacts could have been corrected with proper algorithms (e.g., ICA) (Zhang & Luck, 2025) or even left in, as decoders are not necessarily affected by noise. Moreover, trials with blinks occurring at the stimulus time should be better removed, and the arbitrary selection of a subset of electrodes, while reducing the information in input to the decoder, does not account for trials in which a stimulus was missed (e.g., due to blinks).

(4) The artifact that appears in many of the decoding results is puzzling, and I'm not fully convinced by the speculative explanation involving slow fluctuations. I wonder if a different high-pass filter (e.g., 1 Hz) might have helped. In general, the nature of this artifact requires better clarification and disambiguation.

(5) Given the relatively early decoding results and surprisingly early differences in decoding peaks, it would be useful to visualize ERPs across conditions to better understand the latencies and ERP components involved in the task.

(6) It is unclear why the precision derived from IEM results is considered reliable while the accuracy is dismissed due to the artifact, given that both seem to be computed from the same set of decoding error angles (equations 8-9).

(7) What is the rationale for selecting five past events as the meso-scale? Prior history effects have been shown to extend much further back in time (Fritsche et al., 2020).

(8) The decoding bias results, particularly the sequence of attraction and repulsion, appear to run counter to the temporal dynamics reported in recent studies (Fischer et al., 2024; Luo et al., 2025; Sheehan & Serences, 2022).

(9) The repulsive component in the decoding results (e.g., Figure 3h) seems implausibly large, with orientation differences exceeding what is typically observed in behavior.

(10) The pattern of accuracy, response times, and precision reported in Figure 3 (also line 188) resembles results reported in earlier work (Stewart, 2007) and in recent studies suggesting that integration may lead to interference at intermediate stimulus differences rather than improvement for similar stimuli (Ozkirli et al., 2025).

(11) Some figures show larger group-level variability in specific conditions but not others (e.g., Figures 2b-c and 5b-c). I suggest reporting effect sizes for all statistical tests to provide a clearer sense of the strength of the observed effects.

(12) The statement that "serial dependence is associated with sensory stimuli being perceived as more similar" appears inconsistent with much of the literature suggesting that these effects occur at post-perceptual stages (Barbosa et al., 2020; Bliss et al., 2017; Ceylan et al., 2021; Fischer et al., 2024; Fritsche et al., 2017; Sheehan & Serences, 2022).

(13) If I understand correctly, the reproduction bias (i.e., serial dependence) is estimated on a small subset of the data (10%). Were the data analyzed by pooling across subjects?

(14) I'm also not convinced that biases observed in forced-choice and reproduction tasks should be interpreted as arising from the same process or mechanism. Some of the effects described here could instead be consistent with classic priming.

Figures and data

Multiple scales of temporal context.

The influence of micro-scale temporal context on visual processing.

The influence of serial dependence on visual processing.

The influence of meso-scale temporal context on visual processing.

The influence of macro temporal context on visual processing.

Decoding stimulus location and Δ location from EEG recordings.

Removal of general micro temporal dependencies in EEG responses.

Decoding accuracy, but not precision, is confounded by general serial dependencies.

Meso-scale temporal context effects in Experiment 2.

Removal of general meso temporal dependencies in EEG responses.

Macro-scale temporal context effects in Experiment 3.