Introduction

Influential theories of perception and learning suggest a key role of prediction in both processes (13). However, it remains unclear whether the influence of prediction on neural activity across sources of prediction, time scales and learning stages can be explained by a single mechanism (47). Repetition suppression (RS), or the attenuation of neural activity in response to repeated (and therefore predictable) stimuli, has been well documented across a range of sensory modalities, species, and experimental paradigms (8,9). Predictive processing theories explain RS as a manifestation of minimising prediction errors (PEs) through adaptive changes in predictions about the content and precision of sensory inputs (8). Expectation suppression (ES), or the attenuation of neural activity in response to expected stimuli (regardless of its frequency) appears to be a related phenomenon, but ES is more controversial in terms of its underlying mechanisms. In an earlier study attempting at dissociating the two phenomena, it has been shown that RS and ES affect neural activity in separable time windows (10). More recently, however, Feuerriegel et al. (11) pointed out major challenges to the evidence underlying ES in the visual system, such as the conflation of ES with RS, surprise effects, attention effects, and stimulus novelty. While empirical evidence for ES is more fragmentary (11), it is in principle a more convincing neural signature of predictive processing, as it is less readily explained than RS by mere neural adaptation (12), which does not necessarily reflect cognitive processes.

At least two types of neural activity modulation have been linked to the effects of expectation. Sharpening models propose that expectations suppress neurons that are not tuned to the expected stimulus, thus inhibiting information inconsistent with the expectation (13,14). Depending on the level of analysis, this may result in sharper tuning curves (i.e., increased signal-to-noise ratio for expected stimuli) for a particular neuron or small population, amounting to relative expectation enhancement; and/or in sparse responses (i.e., net activity decrease) in a larger population (15). In contrast, dampening models posit that expectations suppress neurons that are tuned to the expected stimuli, effectively “explaining away” the prediction error responses (16). By cancelling information in line with expectations, uninformative activity in the sensory stream is prevented while favouring novel information (17,18). Studies attempting to settle the debate have produced mixed results on both accounts. Kok et al. (13) and Garlichs & Blank (19) both found evidence for neural sharpening in the visual domain using fMRI and multivariate pattern analysis (MVPA). Another study using 7T fMRI found both suppression of irrelevant neural responses in visual areas and selective enhancement of relevant neural responses in higher-order frontoparietal cortices (20). In contrast, Richter et al. (18,21) demonstrated evidence in favour of dampening of neural responses in the visual domain. In the latter study, after training participants on a statistical learning task, stimulus features were decoded from fMRI signals separately in expected vs unexpected trials. The rationale of the analysis was that, if expectation leads to dampening, the response amplitudes will be lower, and as a result stimulus decoding should be lower in expected trials; conversely, if expectation leads to sharpening, the signal-to-noise ratio will increase, and as a result stimulus decoding should increase in expected trials. Both studies found lower decoding for expected trials, consistent with suppressed activity of neuronal populations tuned towards expected stimuli. However, electrophysiological research in the area is scarce (17,22), despite its potential for parsing out temporal differences which may go unnoticed in fMRI studies with poor temporal resolution.

Taking temporal differences into account, the recently suggested Opposing Process Theory (OPT) has suggested that sharpening and dampening may both occur, but at different time points during the predictive process (23). This model seeks to reconcile contrasting findings as some studies show that expected events are perceived with greater intensity 50 ms after presentation but this bias reverses around 200 ms (24,25), while classically reported ES effects most often take place ∼150 ms after presentation (26). OPT suggests that within each trial, initial processing relies on prior knowledge to sharpen sensory representations towards the expected stimuli, and a later processing stage follows, dampening the neural representations of the expected stimulus. This is based in part on trends in prior research. Studies have found that while expected events are perceived with greater intensity 50 ms after presentation, this bias reverses by 200 ms such that unexpected events are perceived with greater intensity (24,25). While this theory is largely untested thus far, it will prove a highly useful benchmark for future research. Namely, by implementing decoding analyses in a time-resolved manner we can better parse out the intricacies of this model.

Theoretical work has proposed that, similar to the effects of prediction within a trial, learning should also be linked to neural activity suppression, albeit at hierarchically higher levels of neural processing (27). However, empirical work yielded more nuanced results. Previous research has focussed on the mechanisms of ES after learning has taken place, while the effects of expectation across trials (i.e., during learning) have been largely overlooked. In an earlier fMRI study (28), it has been shown that repetition effects are qualitatively different for the initial vs. later stimulus repetitions, gradually enhancing vs. suppressing BOLD activity respectively. In contrast, EEG work has shown that neural responses are more strongly modulated by initially formed repetition-based predictions than by their subsequent revisions (29). Furthermore, in an MEG study differentiating the effects of expectation and familiarity, both factors were linked to suppressed neural activity but expectation (previously seen stimuli with expected vs. unexpected structure) attenuated activity in the lateral occipital cortex, whereas familiarity (previously seen vs. unseen stimuli with no predictable structure) additionally led to temporal sharpening of evoked responses in early visual regions (30). However, the study only characterised these effects after learning, averaging across many trials. While theoretical accounts such as the reverse hierarchy theory of perceptual learning (30) and hierarchical Bayesian framework of statistical learning (32) suggest that perceptual learning effects progress from higher to lower levels of the visual system, in classical predictive coding hierarchies the time scales increase along the processing hierarchy in an ascending manner (27). Thus, it is unclear whether dampening vs. sharpening effects (previously found to occur in anterior vs. posterior regions) (20) should be identified at earlier vs. later learning stages.

Here, we assessed the effects of expectation on EEG-based stimulus decoding during statistical learning while controlling for repetition effects, which may render the expectation effects impossible to parse out given the similarity between the two phenomena. We focused on empirically testing the Opposing Process Theory on visually evoked responses. We also explored the dynamics of expectation effects across trials. In brief, our results show support for OPT, however with different dynamics within trials than across learning stages.

Results

Participants (N=31) were exposed to a sequence of visual scenes (Fig. 1A) while their neural activity was recorded using EEG (see Methods for more details). The sequence was manipulated with respect to the expectation of image categories. In each trial, participants were presented with two images from nine different categories in quick succession, with five ‘Leading’ categories, and four ‘Trailing’ categories. Category pairs and the transitional probabilities between them were determined by the transitional probability matrix depicted in Fig. 1B. Each participant viewed 1728 trials without prior training and each image was only presented once.

Paradigm overview. A. A single trial, with two example images. Images were presented for 100 ms each with 800 ms interstimulus interval, and an intertrial interval of 1300-2200 ms. Participants were required to respond to upside down images with button press; all upside down images were trailing. B. Transitional probability matrix determining category pairs. Five leading categories and four trailing categories were used. Orange represents the 2:1 condition, and green represents the 1:2 (control) condition. Cells with dots represent the valid pairs, cells with Xs represent the invalid pairs, and empty cells represent non-existent pairs. C. Reaction times across Valid, Invalid, and Neutral conditions.

Behavioural results

After the EEG recording, a majority of participants (N=21) were asked to perform a categorization task on images drawn from the same categories. Transitional probabilities in this task were kept the same as in the EEG session, and participants performed a speeded indoor/outdoor categorization task on the ‘Trailing’ images. A paired t-test was conducted to compare reaction times (RTs) in the valid and invalid trials of the categorisation task, in order to assess implicit learning. Participants reacted significantly faster in valid trials (mean = 598.9 ms, SEM = 7.2 ms) than in invalid trials (mean = 623.6 ms, SEM = 7.2 ms; t20 = -2.18, p=0.041, two-tailed), demonstrating that the associations between pairings were implicitly learned (Fig. 1C).

Univariate EEG analyses

In order to assess the impact of expected vs unexpected trials on visually evoked ERPs, a paired t-test was conducted on the epoched amplitude at channel Oz. Unexpected trials resulted in a significantly increased neural response 150 ms after image onset (t(1,26)=3.89, p=0.022) after FWE correction. (Fig. 2A). The group level topography of the amplitudes differences between valid and invalid trials are represented in Fig. 2B.

Results of data analysis. A. Grand average ERP statistical analyses at Channel Oz. Significant difference between valid and invalid trial amplitudes. B. Topography plot of the t-values resulting from paired t-tests comparing Valid and Invalid EEG data at 150 ms. C. SVM decoding results for Leading image sensory decoding. Dashed vertical line at 0 s indicates stimulus onset. D. Decoding results for Trailing image sensory decoding. Dashed rectangles denote latencies of significant effects (pFWE < 0.05). Dashed vertical line at 0 s indicates stimulus onset. E. Decoding results for Leading image prediction decoding. Dashed vertical line at 0 s indicates stimulus onset. F. Decoding results for Trailing image memory decoding. Dashed rectangle denotes latencies of significant effects (pFWE < 0.05). Dashed vertical line at 0 s indicates stimulus onset.

Decoding analyses

To test the hypothesis derived from the OPT that expectation should have opposing effects at earlier (sharpening) vs. later (dampening) response latencies, we used multivariate decoding analysis. Following previous work (21,33), we reasoned that sharpening should lead to increased decoding accuracy for expected stimuli, while dampening should lead to increased decoding accuracy for unexpected stimuli. To this end, we calculated a linear SVM-based category decoding accuracy at each time point based on EEG amplitudes, separately for valid vs. invalid trials(33). Four separate decoding analyses were conducted: “sensory decoding” (decoding the visual category of the presented image, done separately for the leading and trailing images), “prediction decoding” (decoding the predictable trailing images based on the leading images), and “memory decoding” (decoding the preceding visual category based on the trailing images). The above analyses were all subject to cluster-based family-wise error (FWE) correction for multiple comparisons across time points.

In the sensory decoding analysis based on leading images, decoding accuracy was above chance for both valid (Tmax= 2.76, pFWE < 0.001) and invalid trials (Tmax= 2.76, pFWE < 0.001) from 100 ms, with no significant difference between them (Tmax= 2.76, pFWE > 0.05) (Fig. 2C). This initial analysis largely served as a sanity check, since no difference between decoding the category of valid and invalid images should be observed as the leading image cannot be invalid.

On the contrary, in sensory decoding of trailing images, decoding the category of valid images was significantly higher than invalid images 123-180 ms after image onset (cluster-level statistics: Tmax = 4.14, pFWE < 0.001), but decoding the category of invalid images was significantly higher than valid images 280-296 ms after image onset (Tmax = 3.89, pFWE < 0.001) (Fig. 2D). Therefore, stimulus category expectation had opposing effects on trailing category decoding based on EEG signals early vs. late within a trial. The decoding accuracy of both valid (Tmax = 2.76, pFWE < 0.001) and invalid trials (Tmax = 14.903, pFWE < 0.001) was also above chance in this condition from 100 ms.

Prediction decoding produced a similar effect to the sensory decoding of leading images, where no difference between valid and invalid leading images was observed (T29= 2.76, pFWE > 0.05) (Fig. 2E), but both valid (T29 = 2.76, pFWE < 0.001) and invalid (T29 = 2.76, pFWE < 0.001) were above chance from 100 ms. A control analysis was added here to compare L1/L4 vs L2/L5 (Tmax= 2.76, pFWE > 0.05) and L1/L5 vs L2/L4 (Tmax= 2.76, pFWE > 0.05). The decoding accuracy in these control analyses was slightly above chance (Tmax = 2.76, pFWE < 0.001; Tmax = 2.76, pFWE < 0.05) between 110 and 400 ms, likely due to differences in physical characteristics between the categories. These arbitrary pairings (see Fig. 1B) make the relationship between the valid/invalid pairs clearer, as there should be no relationship between them according to the statistical learning paradigm. In order to emphasise the difference between the real vs arbitrary pairing, we subjected the prediction decoding accuracy of the ‘valid’ leading images in ‘valid’ and arbitrary pairing L1/L4 to a paired t-test, showing that decoding accuracy of the actual pairings is significantly higher between 100 and 300 ms (Tmax = 2.76, pFWE < 0.001).

Finally, as with sensory decoding of trailing images, memory decoding accuracy based on valid trailing images was significantly higher than invalid trailing images 123-180 ms after image onset (Tmax = 3.96, pFWE < 0.001) (Fig. 2F). Both valid (Tmax = 2.76, pFWE < 0.001; from 100 ms) and invalid (Tmax = 2.76, pFWE < 0.001; from 200 ms) trailing images yielded above-chance decoding.

In order to better assess how the difference in prediction and memory decoding between valid and invalid pairs changes during learning, we separated each data set into four trial bins. Decoding accuracy scores were tested for normality prior to performing t-tests to compare conditions. Data were then fit to a linear model and outliers larger than 3 standard deviations based on Cook’s distance were removed. Given the apparent consistency of bins 2-4, we focused our analyses on bins 1-2 in order to directly assess the effects of learning. We separately analysed the early (123-180 ms) and late (280-296 ms) within-trial time window identified in the main analysis of sensory decoding based on the trailing images (Fig. 2D) as well as the early (123-180 ms) time window identified in the memory decoding analysis (Fig. 2F). We conducted a repeated-measures ANOVA on each of these effects with Holm-Bonferroni correction for multiple comparisons across bins.

In the early time window of sensory decoding, the main effects of validity (F1,79 = 0.84141, p > 0.05) and bin (F1,79 = 1.9242, p = 0.058) were not significant. The interaction between validity and bin was significant, however (F1,79 = -2.1679, p = 0.033). Post-hoc analyses of the effects of each bin showed no significant differences between valid and invalid trials in the first bin (t28 = -1.2036, p = 0.239) but significant differences in the second bin (t23=-3.8704, p<0.001) (Fig. 3A).

Analysis of validity effects on decoding over trial bins. A. Learning over time at 123-180 ms in sensory decoding. B. Learning over time at 280-196 ms in memory decoding. C. Learning over time at 280-296 ms in sensory decoding.

Analyses of the later peak in sensory decoding found the opposite effect. The main effect of validity (F1,77 = 3.5402, p < 0.001), and the main effect of bin (F1,77 = 2.2399, p = 0.028) both contributed significantly. The interaction between them (F1,77 = -2.6854, p = 0.009) also showed significant differences. In contrast to the early peak, post-hoc analyses showed that the first bin had significant differences between valid and invalid trials (t24 = 4.293, p < 0.001) but the second bin did not (t24 = 0.2673, p = 0.791) (Fig. 3B).

Finally, in the memory decoding, each of the main effects for validity, bin, and the interaction between them failed to find significant effects after removal of outliers (p > 0.05) (Fig. 3C).

Discussion

We set out to investigate the effects of expectation on stimulus decoding during statistical learning and demonstrated dynamic differences in decoding expected vs. unexpected stimulus categories, both within and across trials. EEG-based decoding of expected vs. unexpected visual scene categories showed that within trials, a relative decoding boost for expected stimuli was followed by a relative decoding boost for unexpected stimuli, in line with the recent Opposing Process Theory (OPT) (20). However, across trials, the decoding boost for unexpected stimuli was observed earlier than the decoding boost for expected stimuli, possibly due to a dynamic relationship between cortical processing hierarchies (31,34,35).

The OPT (23) combines both sharpening and dampening models of stimulus expectation, which have previously been mapped onto a relative decoding boost for expected vs. unexpected stimuli respectively (33). In this model, hypothesis units (units representing the brain’s ‘best guess’ about the outside world) are initially strongly weighted towards the expected, however when agents encounter events that elicit surprise, increased gain on surprising inputs leads to high fidelity representations of unexpected events across hypothesis units. In simple terms, early sharpening (which favours processing of expected information) is followed by later dampening (which in turn allows novel information to be processed) within trials. This is supported by our within-trial results, whereby we found that decoding of expected images was significantly higher than that of unexpected images early during the trial, while the opposite effect was found later during the trial. Differences in early vs late prediction error detection have previously been observed using fMRI (5). This study used a visual detection task to demonstrate that immediate implicit (early) detection of PEs enables fast but partial comparison of bottom-up sensory input with top-down predictive information, but effortful explicit (late) processing permits more comprehensive assessment of the type of mismatch between expected and actual input. While the authors did not interpret their results in terms of dampening or sharpening models, it is possible that the differences observed are a result of changing temporal dynamics in the neural response which are difficult to demonstrate using fMRI. In this context, higher-level processes would modulate detection of prediction error through top-down processes and regulate adaptation of predictive models as a result.

Interestingly, the early (∼150 ms) decoding accuracy boost found for expected trials coincided with the univariate ERP amplitude increase found for unexpected trials. Such seemingly opposing effects are commonly explained based on the notion that sharpening leads to fewer units responding to the expected stimulus, and the population response becoming sparser and more selective (15). According to Friston’s original theoretical model assessment of predictive coding (1), sharpening and dampening occur in parallel but in separate prediction and error neurons which would reside in deep and superficial layers of the brain respectively. As such, studies that have argued for sharpening models (13,14), have focussed primarily on the response to expectations in the occipital cortex and V1, while studies that have favoured dampening models (18,21,33) have centred more on the neural response to violated expectations in the ventral visual stream. Thomas et al. (29) found that expected and unexpected events are represented across the cortical column in V1. However, while expected events were represented similarly across layers, unexpected events were only well represented in superficial layers. One recent EEG study has attempted to directly elucidate the mechanisms of the OPT. Xu et al. (36) used auditory cues followed by a predictable series of flashes to investigate the mechanisms of ES. In line with OPT, they found that the N1 was augmented, while the N2 was attenuated. However, this study focussed entirely on ERP analyses (rather than multivariate decoding or gain/tuning models), which limits the interpretability of the results in terms of sharpening vs. dampening. Additionally, the above studies have all utilised repeated, identical stimuli, failing to account for the confounding effects of RS, which may mask the ES effects being examined. The present study controls for this confound and, thanks to the high temporal resolution of EEG, provides a unique insight into the dynamics of these phenomena.

In contrast to our within-trial results, when we examined how this process developed over multiple trials, we found that dampening occurred within the first ∼15 minutes of recording and was only later followed by sharpening. We hypothesise that this is a result of a dynamic relationship between hierarchical levels of cortical processing. Previous work has shown that in MEG, (37), early-latency responses typically reflect early cortical stages, while late-latency responses typically reflect higher cortical stages. Our results are in line with this in that across trials, late-latency decoding differences arise within the initial trials, suggesting that higher-order regions quickly process statistical regularities during learning. Similarly, early-latency decoding differences occur later during the experiment, consistent with the theory that lower-order regions need more time to accumulate evidence, while the distinction between high-order dampening and lower-order sharpening can also be observed in rapid forms of learning, as seen in rapid one-shot perceptual learning (20). In short, ‘invalid’ trials are more salient early in the experiment as priors are still being formed, which leads to predictions occurring earlier in each trial, later in the experiment. Earlier studies have pointed to a dissociation between early and late ventral stream effects, which may also fit into this framework (18,21). Other studies using statistical learning have indicated slower dynamics, in that some regions, such as the hippocampus, begin preferentially representing PEs early during learning, and predictions are preferentially represented later during learning (38). This is consistent with our finding that early dampening is followed by later sharpening over time. Another recent fMRI study found that in the presence of predictions, early stages of the processing hierarchy exhibit well separable and high-dimensional neural geometries resembling those at the top of the hierarchy, which is accompanied by a systematic shift in tuning properties from higher to lower areas, enriching lower areas with higher-order, invariant representations in place of their feedforward tuning properties (39). In the context of these studies, our results could be explained by higher-order regions quickly detecting stimulus statistics and show evidence of predictive dampening to expected inputs, and then gradually delegate those predictions to lower-order regions which start showing evidence of predictive sharpening.

In addition to the dynamic effects of expectation on sensory decoding of the trailing image, we also found that valid expectation increases decoding accuracy of the memory of the leading image. Within trials, this effect was found at similar latencies as the positive effect of expectation on sensory decoding; however, across trials, the effect on memory decoding started earlier and persisted throughout most of the experiment. A memory trace linking the trailing and leading image is a key component of associative learning (40). Furthermore, the finding that within a trial, sensory and memory decoding occur at similar latencies and are subject to similar effects of expectation, is broadly consistent with work showing simultaneous sensory, mnemonic, and predictive representations within the same neocortical regions (41,42). However, the differential dynamics of these effects across trials suggest that memory and prediction encoding is also partly dissociable, not only at the level of the hypothesised circuit mechanisms, likely involving the neocortex and the hippocampus (43), but also over time. This suggestion is consistent with a neuroimaging study showing that during associative learning, the hippocampus gradually switches from signalling PEs (i.e., mismatch between memory trace and actual stimulus) to signalling predictions (i.e., stimulus expectation independent of the actual stimulus) (38).

In summary, our study is the first to show dynamic effects of expectation on stimulus processing during learning, both within and across trials. We have shown using EEG and decoding analyses that early dampening is followed by later sharpening over time, but within trials early sharpening is followed by later dampening. This provides direct evidence for the OPT, while also indicating that sharpening and dampening effects emerge at different learning stages.

Methods

Participants

Thirty-one healthy participants (25 female, 1 non-binary, 5 male) aged 18-35 (mean = 22.7) were recruited from the student population at Freie Universität Berlin. One participant was left-handed. Participants had no history of neurological illness, were not currently taking psychotropic medication, and had normal or corrected-to-normal vision. The study was approved by the Ethics Committee of the Department of Education and Psychology at Freie Universität Berlin. Participants were compensated with either €30, or research participation credits. Data from one participant was excluded due to a technical issue which resulted in an absence of triggers in the EEG data.

Experimental Design and Statistical Analysis

Stimuli and experimental paradigm

Main task

Approximately 3500 colour images across nine categories were generated using artificial intelligence software Craiyon (Craiyon LLC, 2022). Images were visually inspected to ensure they appeared realistic. Some physically implausible examples were discarded. Stimuli were presented on a 38 cm LCD monitor with 1280 x 1024 resolution and 60 Hz refresh rate, the viewing distance was 62 cm, and stimuli were presented at 2.55°. The experimental paradigm was adapted from Richter et al. (2018) with several modifications, as described below. Participants were exposed to pairs of visual stimuli, comprising artificially generated natural scenes drawn from different scene categories. In each trial, participants were presented with two images from different categories in quick succession, each presented for 100 ms with 800 ms interstimulus interval and 1300-2200 ms intertrial interval (Fig. 1A). A fixation cross was presented throughout the experiment. Each image belonged to one of nine scene categories, with five ‘Leading’ categories (‘barn’, ‘beach’, ‘library’, ‘restaurant’, ‘cave’) and four ‘Trailing’ categories (‘church’, ‘conference room’, ‘castle’, ‘forest’). Category pairs and the transitional probabilities between them were determined by the transitional probability matrix depicted in Fig. 1B. As previously established, many prior studies have overlooked the confounding effects of RS in their experimental paradigms. To control for RS, categorised images were used to avoid the repetition of stimuli. As such, each individual image was only presented once. In the experimental condition, categories were paired in a 2:1 manner, where two different ‘Leading’ categories could result in one ‘Trailing’ category with 75% validity. For example, both ‘beach’ and ‘barn’ as ‘Leading’ categories would result in ‘church’ as a ‘Trailing’ category with 75% validity. In the control condition, one ‘Leading’ category led to two ‘Trailing’ categories with 50%/50% accuracy. Participants were not informed of the associations between categories, and were instead instructed to respond by button press when images appeared upside down, which occurred in ∼5% of trials. This acted as catch trials to ensure participants maintained attention on the images. The occurrence of these catch trials was randomised, as was trial order. Participants did not undergo a training session, such that the neural signatures of early learning processes may be investigated using EEG. Each participant completed eight blocks consisting of 216 trials per block, totalling ∼ 90 mins of testing.

Categorisation task

After completion of the main task, the majority of participants (n=21) performed a categorisation task to ensure learning had taken place. In this task, participants were presented with the same images as before and instructed to indicate via button press whether the ‘Trailing’ image takes place indoors or outdoors. This task aimed to assess implicit reaction time (RT), as the statistical regularities learned in the main task could be used to predict what category the ‘Trailing’ image would belong to. Participants were not informed of the intent behind this task or instructed to make use of what they had learned in the main task.

Data Analysis

Categorisation task analysis

Behavioural data from the categorisation task was analysed in terms of RTs. All RTs exceeding 2 SD above or below the mean across subjects were excluded as outliers. RTs for valid and invalid trials underwent a log transformation before being averaged separately per participant and subjected to a paired t-test.

EEG data preprocessing

EEG data were acquired at 2048 Hz from 64 electrodes using a BioSemi ActiveTwo system. Electrodes were arranged according to the international 10/20 system. Preprocessing was conducted using custom SPM12 (Wellcome Trust Centre for Neuroimaging, University College London; RRID: SCR_007037) for Matlab (The MathWorks; RRID:SCR_001622) as well as custom Matlab scripts. Continuous EEG data were high-pass filtered above 0.1 Hz and notch-filtered between 48-52 Hz using 5th-order zero-phase Butterworth filters. Data were then downsampled to 300 Hz. Blinks were detected using two horizontal and two vertical electro-oculogram (EOG) electrodes placed above and below the left eye, and to the left and right of the eyes. An epoch of -200 ms to 400 ms relative to artefact onset was applied to detected eye blinks and spatiotemporal confounds were calculated using Singular Value Decomposition (SVD). Top two principal components were removed from the segments of data associated with each eye blink (Ille et al., 2002). Data were then further denoised using the Dynamic Separation of Sources (de Cheveigné and Simon, 2008) which maximises the reproducibility of stimulus-evoked responses across trials in order to increase the signal-to-noise ratio of ERPs. During this process, data were re-referenced offline to the average of all channels. Continuous data were epoched between 200 ms before leading stimulus onset to 2600 ms after stimulus onset. Each epoch was baseline-corrected to the mean of the pre-stimulus onset (from -200 ms to 0 ms relative to stimulus onset). Epochs with an average root mean square (RMS) amplitude exceeding the median by two standard deviations (SDs) were excluded from analysis in order to remove contamination by transient artefacts. This resulted in rejecting 11.2% (± 1.03% the standard error of the mean (SEM) across participants) trials on average. Data were then averaged across trials and a low-pass filter of 48 Hz was applied.

Decoding analyses

Single-trial data underwent decoding based on linear support vector machines (SVM) implemented using custom MATLAB code. As part of feature selection, Principal Component Analysis (PCA) was conducted to reduce dimensionality of sensor-level EEG data, and subsequent components were selected based on signal-to-noise (SNR) with a cutoff threshold of 8 dB. Four separate decoding analyses were conducted, decoding the visual category of both leading and trailing images, decoding the predictable trailing images based on the leading images, and memory decoding for trailing images. Valid and invalid were analysed separately at each step. In each, the decoder was trained and tested using leave-8-out cross-validation, such that in each test set, single trials corresponding to all 8 unique combinations of 4 leading and 2 trailing images were included (see Fig. 1B). Visual category decoding was used to differentiate neural responses to leading vs. trailing categories, while memory decoding aimed to assess memory of the leading category based on the trailing category. Prediction decoding was used to assess prediction of the trailing category based on the leading category. As a control analysis we also decoded the arbitrary pairings of L1/L4 vs. L2/L5 and L1/L5 vs. L2/L4. This serves to further demonstrate that significant decoding results occurred as a result of statistical learning, rather than any confound or alternative basis. The initial analysis aimed at quantifying decoding as a function of peristimulus time. To this end, we calculated the decoding accuracy per time point, using all trials. The resulting decoding time series were converted to one-dimensional images, entered into a GLM, and subjected to statistical inference amounting to a paired t-test between valid and invalid categories. Given that EEG time-series are autocorrelated over time, statistical tests were corrected for multiple comparisons over time using cluster-based FWE correction. A follow-up analysis aimed at quantifying changes in decoding over learning. To this end, based on the time windows in which we found significant decoding differences between valid and invalid trials (see below), we averaged the decoding accuracy estimates within each time window, separately for four bins of trials. Given the apparent consistency of bins 2-4, we focused our analyses on bins 1-2 in order to directly assess the effects of learning. Each bin encompassed approximately 15 minutes of testing or 432 trials on average.