Introduction

Perceptual learning (PL) and serial dependence effects (SDEs) are two fundamental processes that shape how we perceive and interpret sensory information. Although both rely on perceptual memory traces, they operate over distinct timescales and have been traditionally studied separately. PL leads to long-lasting improvements in sensory discrimination following repeated training (Sagi, 2011). However, in many cases, improvements remain limited to the trained stimulus or location, while in others, learning generalizes to new contexts. What determines whether learning stays local or generalizes is still not fully understood (Cheng et al., 2025; Lu & Dosher, 2022).

SDEs reflect short-term biases that pull current perceptual judgments toward what was recently seen or chosen (Abrahamyan et al., 2016; Braun et al., 2018; Fischer & Whitney, 2014; Fornaciai & Park, 2018; Liberman et al., 2014; Suárez-Pinilla et al., 2018; Urai et al., 2019). These biases are often interpreted as reflecting a Bayesian integration process that combines prior information with current input to enhance perceptual stability and efficiency, particularly under uncertainty, based on the assumption that the natural environment is typically stable (Cicchini et al., 2018; Fischer & Whitney, 2014; Manassi & Whitney, 2024). However, in tasks where stimuli vary randomly across trials, such biases can actually impair performance, yet they persist. This suggests they may serve an additional role beyond what was previously proposed. One possibility is that they reflect ongoing trial-by-trial updates to internal decision templates that persist over time and may bridge short-term memory and long-term learning (Fritsche et al., 2017; Pascucci et al., 2019).

In this work, we examine how serial dependence interacts with perceptual learning using the texture discrimination task (TDT; Karni & Sagi, 1991). The role of serial dependence in the TDT is likely complex, potentially influencing both immediate perceptual judgments and long-term learning dynamics. In the short term, biases toward the orientation of prior targets, randomly varying in typical experimental setups, and thus irrelevant to the current trial, may impair performance by introducing decision noise. At the same time, temporal integration of stimulus representations across trials may help reduce uncertainty and extract structure from noisy input. Normative analyses predict that the benefit of engaging costlier, memory-dependent integration should peak at intermediate uncertainty, while being limited in highly certain conditions (Tavoni et al., 2022). Over extended training, serial dependence may evolve within and across daily sessions as learning progresses. Specifically, we ask whether serial dependence can serve as a marker of short-term temporal integration that contributes to long-term generalization of learning. This idea builds on the hypothesis that broader temporal smoothing may reduce overfitting to local stimulus features and allow learning to generalize more broadly.

To test this, we reanalyzed data from a large-scale perceptual learning study in which observers practiced the TDT under three training conditions designed to modulate the degree of learning generalization (Harris et al., 2012). The study showed that learning generalized to a new, untrained location when targets appeared randomly across two locations or were intermixed with target-less (dummy’) trials. In contrast, learning became location-specific when targets consistently appeared in a single location in all trials, likely due to increased sensory adaptation (Karni & Sagi, 1991). These findings were attributed to differences in adaptation state: unadapted networks supported spatial transfer, whereas adaptation induced localized plasticity that constrained generalization. However, the memory mechanisms underlying this flexibility remain unclear. By examining serial dependence within this paradigm, we aim to gain further insight into these mechanisms and better understand how recent visual experiences influence both immediate perceptual reports and long-term learning outcomes. Specifically, we ask whether training conditions that promote generalization are associated with stronger or longer-lasting SDEs, suggesting a new role for serial dependence in contributing to broader learning transfer.

Methods

We reanalyzed data from 50 observers who participated in the texture discrimination task (TDT) as described by Harris Gliksberg, and Sagi (2012). In this dual-task experiment, observers identified the orientation (vertical or horizontal) of a target composed of three peripheral diagonal lines embedded within a uniform background of horizontal lines while simultaneously performing a forced-choice letter discrimination task (T vs L at the center of the stimulus) to maintain fixation (Figure 1A). The experiment consisted of four daily sessions with the target presented at one location (or two fixed locations in the 2loc condition), followed by four additional daily sessions at a second location (or a second pair of fixed locations in the 2loc condition). Performance on the TDT task, the TDT threshold, is quantified as the SOA (see Figure 1) that yields 78% correct discrimination (SOAthreshold).

Texture Discrimination Task (TDT) and Serial Dependence Effect (SDE)

(A) TDT Trial Sequence: Observers identified the orientation (vertical or horizontal) of a target composed of three diagonal lines embedded within a background of horizontal lines while simultaneously performing a forced-choice letter discrimination task (T vs L) to maintain fixation. The 10 ms target frame was followed by a 100 ms patterned mask, with the stimulus onset asynchrony (SOA) between target and mask (10 to 300 ms) randomized across trials. (B) SDE via history sequence: Correct identifications (‘Hits’) increased by approximately 20% when the current target orientation matched the preceding three orientations (e.g., ‘1’ preceded by ‘111’) and decreased by about 20% with mismatching orientations (e.g., ‘1’ preceded by ‘222’), relative to average performance (dotted light blue line). ‘1’ and ‘2’ represent the two possible orientations, either vertical and horizontal, or vice versa. The fixation task (red) showed much smaller biases, likely due to its higher overall performance. (C) SDE via Linear Mixed Effects (LME) Weights (W-bias, %): Influence of 1–10 back trials on current report. Summing the W-bias values (%) from the 1st, 2nd, and 3rd prior trials corresponds to the ±20% bias for 1–3 back trials shown in panel B. Panels B and C include data from the 1loc condition (N=14) pooled across all training days, including only trials with low-visibility current targets (SOA < SOAthreshold + 20 ms). Blue represents the texture discrimination task, and red indicates the letter discrimination (fixation control) task.

Participants were assigned to one of three experimental conditions:

  • 1loc condition: The target consistently appeared at the same fixed location across all trials.

  • 2loc condition: The target randomly appeared at one of two diagonally opposite locations with equal eccentricity, alternating between trials.

  • Dummy condition: The target appeared at a fixed location, but genuine target trials were randomly interleaved with ‘dummy’ trials, where no target was present (replaced by background elements).

For detailed specifications of the groups assigned to each condition, see Supplementary Table S1.

We quantified serial dependence effects (SDE) to examine how prior visual experiences influence current perceptual reports (Figure 1B), and whether these dependencies affect learning specificity across experimental conditions. To estimate SDE, we fit a linear mixed-effects (LME) model (Figure 1C) that evaluated the influence of prior target stimuli (up to 10 trials back) on current reports. For each lag n, the model estimates a coefficient Wn; we refer to this coefficient simply as W-bias. These coefficients represent the magnitude of the serial-dependence effect for each n-back stimulus. We modeled the report probability as

Where:

  • P(T0T0): probability of reporting target T0 ∈ {+1, −1} when the target is T0.

  • P0: history-independent baseline report probability.

  • sTn: +1 if the n-back target’s orientation matches the current target (e.g., both vertical or both horizontal), −1 otherwise.

  • Wn(W-bias): change in report probability (bias) associated with the n-back target.

Fixed effects accounted for the orientations of these stimuli, while a random intercept captured individual differences across observers. Interaction terms were excluded, as statistically significant interactions were small and inconsistent.

To systematically quantify serial dependence across different temporal scales, we defined three summary measures:

  • SDE-all: The cumulative bias from the 10 preceding trials (1–10 back), capturing the total influence of recent history on current perception.

  • SDE-recent: The cumulative bias from trials 1–3 back, reflecting the effect of very recent stimuli.

  • SDE-distant: The cumulative bias from trials 4–6 back, representing the influence of more distant past trials.

All measures are expressed as percentage changes in report probability.

For most analyses and figures, we filtered trials to include those that produced stronger SDEs, specifically, trials in which current targets were barely visible (SOA < SOAthreshold + 20 ms), prior targets were highly visible (SOA > SOAthreshold), and both appeared at the same location across trials (Figure 2).

Serial dependence effects across trial history

(A) Target visibility: SDEs (W-bias) were strongest when current targets had low visibility (high uncertainty) and prior targets had high visibility. Blue lines show target (T) histories and red lines show key (K) histories, with solid vs. dotted lines indicating prior high-vs. low-visibility trials. Visibility had a much weaker effect on key histories compared to target histories, with key-driven SDEs remaining high even when prior targets were invisible. The yellow line shows target history with high visibility in both history and current trials, demonstrating that SDEs are strongly reduced when current targets are highly visible (certainty) (N = 50). (B) Location specificity: SDEs were larger when history originated from the same location as the current target compared to a diagonally opposite location (2loc condition, N = 14). Error bars represent the standard error of the mean. The horizontal dashed line indicates zero bias.

The exception was the SDE derived from the history sequence analysis (Figure 1B), where prior-target visibility could not be filtered. Therefore, when comparing this with the LME-based estimate (Figure 1C), only the current-target visibility filter was applied. For individual-level analyses assessing the relationship between SDE strength and TDT learning and generalization, we accounted for individual differences in SOAthreshold by equalizing the number of highly visible prior trials and restricting trials to SOAs between SOAthreshold and SOAthreshold + 140 ms.

Because filtering on highly visible prior targets means observers usually reported them correctly, the key-history sequence closely tracked the target-history sequence, producing highly similar SDEs (Figure 2A, high-vis history). Accordingly, we present only Target-history results and omit Key-history plots for brevity.

Ethics

This manuscript presents a secondary analysis of previously published human behavioural data from Harris, Gliksberg & Sagi (2012). Original procedures received ethics approval, were conducted in accordance with the Declaration of Helsinki, and all participants provided written informed consent.

Results

Serial Dependence Effects (SDEs)

Observers’ reports showed a significant 15% bias toward the orientation of the immediately preceding target (1-back; W1), indicating serial dependence. When the influence of the full range of past trials was considered (SDE-all), the bias increased to about 40% (Figure 2A). These values were measured under filtering conditions that enhanced the expression of serial dependence (detailed in the next section, Conditions enhancing SDE: influence of target visibility and location). Importantly, these biases were not attributable to motor responses, as they persisted in a dual-task setup and were modulated by the spatial location and visibility of target stimuli, independent of motor actions (Figure 2).

Conditions enhancing SDE: influence of target visibility and location

Biases were most pronounced under conditions where the current targets were barely visible (SOA < SOAthreshold + 20 ms) and prior targets were clearly visible (SOA > SOAthreshold) with SDE-all reaching 40 ± 3% (Figure 2A; Supplementary Figure S2A). Biases were significantly reduced when the current targets were highly visible (SDE-all = 5 ± 1%; reduction: 35 ± 3%; t(49) = 11.9, p < 0.0001, Cohen’s d = 1.7) or when prior targets had low visibility (SDE-all = 5 ± 1%; reduction: 35 ± 3%; t(49) = 13.5, p < 0.0001, Cohen’s d = 1.9). In the 2loc condition, biases were predominantly location-selective, with significantly stronger effects when trial history originated from the same location (SDE-all = 41 ± 3%) compared to a diagonally opposite location (SDE-all = 15 ± 3%; reduction: 26 ± 3%; t(13) = 7.8, p < 0.0001, Cohen’s d = 2.1; Figure 2B; Supplementary Figure S2B).

Decay of SDE over trials and with longer RT

Biases gradually decayed across successive trials but remained substantial, extending far into trial history (Figure 2 and Figure 6A). In the 1loc condition, biases were significant up to 4 trials back (p < 0.001 for N ≤ 4). In the 2loc condition, biases persisted longer, remaining significant up to 8 trials back (p < 0.001 for N ≤ 7; p < 0.05 for N = 8). For contra 2loc condition, biases were significant up to 5 trials back (p < 0.001 for N ≤ 2, p < 0.01 for N = 3, 4; p<0.05 for N=5). The dummy condition showed the most prolonged biases, with significant effects extending up to 9 trials back (p < 0.001 for N ≤ 6; p ≤ 0.01 for N = 7, 9), although the effect at 8-back was not significant (p = 0.77) and 10-back was borderline significant (p = 0.05). Our analysis focused on decay across trials rather than elapsed time, as doubling inter-trial interval had no impact on the short-history effects but only attenuated the long-history effects (as observed when comparing the two groups in the 1loc condition; see Supplementary Table S1 for group details).

Motivated by recently found mechanism-dependent relationship between response bias and reaction time (RT) (Dekel & Sagi, 2020), we examined the dependence of SDE on RT. We compared SDEs in trials with the fastest and slowest RTs. For each observer, trials were divided into quartiles based on RTs calculated separately for each training day, to account for overall reductions in RT with practice. SDEs were then computed using all trials from the fastest quartile (lowest 25%) and the slowest quartile (highest 25%) (Figure 3A). Recent SDEs were significantly stronger for fast RT (SDE-recent = 29 ± 2%) compared to slow RT (SDE-recent = 22 ± 2%), yielding a reduction of 7 ± 2% (t(49) = 3, p < 0.01, Cohen’s d = 0.4; Figure 3B, left panel; Supplementary Figure S2C), whereas distant SDEs did not significantly differ between RT conditions (SDE-distant = 9 ± 1% for both fast and slow RT; t(49) = 0.5, p = 0.64; Figure 3B, right panel; Supplementary Figure S2C).This difference between recent and distant SDEs suggests that they may arise from distinct underlying mechanisms. Observers from all conditions were combined in this analysis to increase statistical power, as separate analyses revealed a similar qualitative pattern across conditions: recent SDEs were stronger for fast compared to slow RTs, reaching significance in the 1loc and 2loc conditions but not in the dummy condition, likely due to smaller sample size. In contrast, distant SDEs showed no significant RT-related change in any condition.

Effect of reaction time on SDEs.

(A) SDEs (W-bias) as a function of N-back trial, calculated separately for the fastest (first 25%) and slowest (last 25%) RT quartiles, defined per day within each observer. SDEs were then computed on the corresponding fast and slow trials and averaged across observers. Biases were stronger for fast RTs, particularly at recent lags. (B) Paired comparisons of recent and distant SDEs for fast vs. slow RTs. Recent SDEs were significantly higher for fast RTs (left panel), whereas distant SDEs did not differ between RT conditions (right panel). Gray bars indicate group means; dots and connecting lines represent individual observers (N = 50).

Dynamics of SDE across days and locations

Despite randomized target orientations across trials, rendering past orientations irrelevant to current judgments, serial dependence biases remained strong and highly significant across all training days (p < 0.0001 for each day; Figure 4A). A two-factor repeated-measures ANOVA with Location (first vs. second retinotopic site) and Day-in-block (1–4) as within-subject factors revealed a significant main effect of Day-in-block, F(3, 147) = 4.7, p < 0.01, indicating modest decrease of SDE magnitude across training days. The main effect of Location was not significant (F(1, 49) = 0.08, p = 0.78), with comparable SDE-all values at the first trained location (Days 1–4: 40 ± 3%) and the second trained location (Days 5–8: 41 ± 3%). There was no Location × Day-in-block interaction (F(3, 147) = 0.47, p = 0.71), indicating similar temporal dynamics across locations. Post-hoc comparisons (Bonferroni-corrected) revealed a modest but statistically reliable decrease in SDE from Day 7 to Day 8 (the last two sessions at the second location; p < 0.01), whereas no other day-to-day differences reached significance. No significant correlation was found between biases and SOA thresholds across observers (r = -0.13, p = 0.37, average across days 1-8), nor between biases and improvements in performance at the first location (r = -0.09, p = 0.54, average across days 1-4), suggesting that the magnitude of serial dependence does not predict the overall amount of perceptual learning (Supplementary Figure S1).

Dynamics of SDE across days and locations

(A) SDEs and TDT thresholds across days: Serial dependence (SDE-all, red) remained strong and consistent throughout the eight training days, despite large improvements in TDT thresholds (blue). A small reduction in SDE was observed across days and reached significance only between Days 7–8. The target location was changed after Day 4. (B) SDEs across locations: Correlation of SDE-all between the first (Days 1–4) and second (Days 5–8) trained locations across observers (N = 50). The strong correlation indicates that the magnitude of serial dependence is a stable observer-specific trait, consistent across retinotopic locations.

Across the eight days of training, SDEs thus remained robust even as texture-discrimination thresholds improved markedly from Day 1 (126 ± 6 ms) to Day 8 (80 ± 2 ms; F(7, 343) = 41.77, p < 0.0001; Figure 4A). Biases were highly correlated across locations (r = 0.51, p < 0.001; Figure 4B), suggesting that the magnitude of serial dependence reflects a stable observer-specific trait consistent across retinotopic locations.

Within-session SDE dynamics

Within-session analyses (averaged across all training days and conditions) showed that serial dependence biases remained significant throughout sessions but decreased by approximately 17% from the beginning to the end. To track such bias changes, while having a sufficient number of trials for bias analysis, each session was divided into three parts. Biases in the first third of trials (SDE-all = 43 ± 3%) were significantly higher than in the final third (SDE-all = 36 ± 4%), yielding an 8 ± 3% reduction (t(49) = 2.4, p < 0.05, Cohen’s d = 0.3; Figure 5A). This decrease may reflect sensory adaptation developing over the course of the session, diminishing serial dependence. The reduction was selective to distant lag history: SDE-distant decreased significantly from 11 ± 1% to 7 ± 1% (t(49) = 2.9, p < 0.01, Cohen’s d = 0.4; Figure 5B, right; Supplementary Figure S2D), whereas SDE-recent remained stable (28 ± 2% in both segments; t(49) = 0.31, p = 0.76; Figure 5B, left). Notably, the 1-back bias (W1) showed a slight increase from start to end (2 ± 1%; t(49) = 2.2, p < 0.05, Cohen’s d = 0.3; Figure 5A), indicating that within-session adaptation primarily attenuates the influence of distant SDEs while leaving immediate history effects intact, or even slightly enhanced.

Within-session dynamics of SDE

(A) SDEs (W-bias) as a function of N-back trial, computed separately for the first (blue) and last (red) third of each session. Biases showed an overall decrease across the session, while the 1-back bias (W1) increased slightly. (B) Recent and distant SDE components. Recent SDEs remained stable across the session (left panel), whereas distant SDEs showed a significant reduction (right panel). This pattern is consistent with sensory adaptation developing over the course of the session, selectively attenuating serial dependence from more temporally distant trials (N = 50).

Observers from all conditions were combined in this analysis to increase statistical power, as separate condition-level analyses revealed the same trend (no change in SDE-recent and >30% reduction in SDE-distant), but these did not reach significance, likely due to smaller sample sizes.

SDE differences between conditions and learning generalization

A comparison across the three experimental conditions revealed similar magnitudes of SDE-recent (dummy: 28 ± 3%; 1loc: 26 ± 2%; 2loc: 25 ± 2%; F(2,47) = 0.26, p = 0.77; Figure 6B left). In contrast, SDE-distant differed significantly between conditions (dummy: 12 ± 2%; 2loc: 10 ± 1%; 1loc: 2 ± 2%; F(2,47) = 12.47, p < 0.001; Figure 6B right). Post-hoc Tukey tests confirmed that SDE-distant was significantly lower in the 1loc condition compared to both the dummy (p < 0.001) and 2loc (p < 0.01) conditions, likely due to stronger sensory adaptation caused by repeated stimulation at a fixed location in the 1loc setup. In the letter discrimination (fixation control) task, which involved identical foveal stimuli across all conditions, no significant differences were observed for either SDE-recent (dummy: 6 ± 2%; 1loc: 8 ± 2%; 2loc: 8 ± 2%; F(2,47) = 0.40, p = 0.674) or SDE-distant (dummy: 2 ± 1%; 1loc: 2 ± 1%; 2loc: 3 ± 1%; F(2,47) = 0.59, p = 0.56). The smaller biases in this task likely resulted from higher overall performance levels and ceiling effects.

SDEs across experimental conditions and trial history

(A) Mean SDEs (W-bias) as a function of trial history (N-back) for the three experimental conditions: dummy (blue, N = 22), 1loc (red, N = 14), and 2loc (yellow, N = 14). In the 1loc condition, biases decayed more rapidly, while in the 2loc and dummy conditions they persisted further back in trial history. (B) SDEs across individual observers (N = 50), shown separately for recent lags (1–3 back; left panel) and distant lags (4–6 back; right panel). Each dot represents one observer; gray bars indicate group means. Recent SDEs were consistent across conditions, whereas distant SDEs were significantly stronger in the 2loc and dummy conditions compared to the 1loc condition (***p ≤ 0.001, **p ≤ 0.01).

We next examined whether the greater learning generalization observed in the 2loc and dummy conditions (Harris et al., 2012) is linked to the stronger distant serial dependence found in those conditions (Figure 7A).

Relationship between SDE and learning transfer

(A) Group-level comparison of learning transfer (blue), SDE-distant (solid red), and SDE-recent (dotted red) across the three experimental conditions (dummy, 1loc, 2loc). Transfer and SDE values are plotted on separate axes, with SDE measures normalized by subtracting the mean of the 1-loc condition. Conditions showing greater learning generalization (dummy, 2loc) also exhibited stronger SDE-distant effects. In contrast, SDE-recent was relatively constant across conditions, suggesting that generalization was primarily linked to distant serial dependence. (B) Across observers, learning transfer correlated positively with SDE-distant (r = 0.37, p < 0.01, N = 50), indicating that stronger distant serial dependence predicted greater generalization. (C) SDE-distant values were normalized to the 1-back effect to estimate the temporal decay constant of SDE, reflecting how long biases persisted across trials. Observers with longer decay constants showed greater learning transfer (r = 0.50, p < 0.001, N = 48; two outliers >10 SD excluded), indicating that extended temporal integration supports generalization. (D) No significant correlation was found between SDE-recent and learning transfer ( = – 0.22, p = 0.12, N = 50), suggesting that recent serial dependence does not predict generalization. Learning transfer was defined as the change in TDT threshold between Day 4 (final day at the first location) and Day 5 (initial day at the second location), with negative values indicating performance loss.

Supporting this, we found that learning transfer correlated positively with SDE-distant across observers (r = 0.37, p < 0.01; Figure 7B) and with SDE-distant values normalized to the 1-back effect to estimate the temporal decay constant of SDE (r = 0.50, p < 0.001; Figure 7C). In contrast, SDE-recent showed no positive correlation (r = -0.22, p = 0.12; Figure 7D), and became significantly negative when one outlier (>3 SD) was excluded (r = -0.32, p < 0.05, N = 49), suggesting that recent-trial biases, being more closely tied to the current stimulus, may have a weaker and less consistent relationship with learning generalization (Figure 7A).

Discussion

Our investigation of serial dependence in the texture discrimination task (TDT) reveals robust perceptual biases toward the orientation of previously presented targets, extending up to 10 trials back under certain conditions. Notably, these biases persisted despite randomized target orientations and significant improvements in performance across training days, suggesting that serial dependence is a fundamental feature of visual processing, largely unaffected by task demands or learning. Considering the universality of learning mechanisms in the brain(Censor et al., 2012), we suggest that this newly established link is not limited to visual perception but rather a general property of human behavior.

While previous studies typically report a 3-back limit for serial dependence (Fischer & Whitney, 2014; John-Saaltink et al., 2016; Lau & Maus, 2019; Manassi et al., 2019, 2023), our results reveal exceptionally long memory traces, extending up to 8-back in the 2loc condition and up to 9-back in the dummy condition. Consistent with earlier work, we found that the reliability of both current and prior target stimuli affected bias magnitude. Biases increased when current targets were less visible (Ceylan et al., 2021; Cicchini et al., 2017, 2018; Manassi et al., 2018) and when prior targets were more visible (Pascucci et al., 2019; Van Bergen & Jehee, 2019). This pattern aligns with Bayesian models of perception (Kersten et al., 2004; Knill & Pouget, 2004) which propose that perceptual estimates under uncertainty integrate current sensory evidence with prior information. Additionally, we observed spatial selectivity in the 2loc condition, with stronger biases when prior and current targets appeared at the same location, in agreement with previous reports (Collins, 2019; Fischer & Whitney, 2014; Fornaciai & Park, 2018; John-Saaltink et al., 2016; Manassi et al., 2019).

Importantly, our findings reveal a functional link between serial dependence and perceptual learning. Serial dependence extends beyond transient perceptual biases and influences long-term learning, specifically, the extent to which learning generalizes to new, untrained locations. Observers trained under conditions that promote generalization (‘2loc’, ‘dummy’; Harris et al., 2012) exhibited significantly stronger and more temporally extended serial dependence from distant trial histories (4–6 back). Conversely, consistent stimulus repetition in the ‘1loc’ condition, which promotes location-specific learning, was associated with a shorter temporal span of serial dependence, likely due to the stronger sensory adaptation (Censor & Sagi, 2009; Harris et al., 2012; Ofen et al., 2007). Across individuals in all conditions, greater distant SDEs predicted greater learning transfer.

These results suggest a unified mechanism in which short-term memory traces, as reflected in serial dependence, can either accumulate to support generalization or be truncated, possibly by adaptation, limiting learning to the trained context. Limited generalization is often attributed to smaller or less variable training sets in machine learning (Ying, 2019), and in perceptual learning (Sagi, 2011), which can lead to overfitting. A similar principle may apply here: the shorter integration window in 1loc limits the accumulation of informative variability, promoting overfitting and thus reducing generalization, thus a longer history adds little value (Tavoni et al., 2022).

To our knowledge, no previous study has experimentally linked serial dependence to long-term perceptual learning. However, a theoretical framework proposed by Pascucci et al. (2019) connects short-term history biases to learning mechanisms, suggesting that what appears as bias in serial dependence tasks may actually reflect the process by which the visual system updates its decision templates, the same mechanism thought to underlie perceptual learning (Dosher & Lu, 1998; Kuai et al., 2013). Specifically, they argue that serial dependence arises from the reinstatement of previously informative sensory channels, effectively reusing feature weights that were beneficial in previous trials. Similarly, Talluri et al. (2018) found that observers selectively overweight evidence consistent with prior choices, and Murai & Whitney (2021), using classification-image analysis, demonstrated that serial dependence reshapes the perceptual templates applied to upcoming stimuli. Supporting this view, Urai et al. (2019) fitted bounded-accumulation models and showed that choice history is best explained by a history-dependent change in evidence accumulation (implemented as a drift bias), rather than merely shifting the starting point of the decision process (criterion bias). This result is in agreement with our RT analysis showing that distant SDEs are RT-independent, a marker of drift bias (Dekel & Sagi, 2020). In addition, the recent trials seem to introduce criterion shifts (starting point bias in the drift diffusion model), indicated by the larger biases found for fast RTs. Together, these findings suggest that serial dependence directly alters how sensory information is weighted and interpreted. We suggest that these updated decision templates subserve perceptual learning.

The persistence of serial dependence during eight days of training with random stimulus sequences, where it does not contribute to online performance, but rather increases decision noise, suggests that the assumption of environment stability is hardwired into the brain, or that these biases may serve a broader function beyond optimizing immediate performance. Consistent with our approach, recent models reframe serial dependence as a memory-driven phenomenon, not as an optimal inference about the external world (Barbosa & Compte, 2020), but as a consequence of internal mechanisms shaped by how recent perceptual states are encoded and maintained over time (Kalm & Norris, 2018).

Our findings reveal a functional/mechanistic dissociation between short- and long-range serial dependence. Only recent SDEs were modulated by reaction time, presenting stronger biases with faster responses, suggesting that these biases are due to shifts in decision criteria (Dekel & Sagi, 2020). In contrast, distant SDEs were found to be RT independent, suggesting that these biases are a result of neuronal reweighting (Dekel & Sagi, 2020). Importantly, only distant SDEs predicted learning transfer, while recent SDEs remained stable across conditions and were unrelated to generalization. The within session dynamics showed distant SDEs, but not recent SDE, to decline with training, thus effectively reducing the SDE range, consistent with the increased learning specificity observed in perceptual learning with extensive learning (Sagi, 2011). This pattern suggests a functional distinction: recent biases may relate more to prior stimulus statistics, whereas temporally extended biases may support the integration of sensory evidence required for efficient perceptual learning. Previous studies also point to distinct timescales in serial dependence. For example, Lieder et al. (2019) showed that perceptual biases reflect processes operating over different timescales that vary across clinical populations: individuals with ASD rely less on recent trials but show intact long-term integration, whereas individuals with dyslexia exhibit the opposite pattern. Thus, in the absence of adaptation, we expect learning in ASD to generalize, as indeed was recently found (Harris et al., 2015). Fritsche et al. (2020) proposed a model in which perceptual history influences current biases through both short-term Bayesian decoding and longer-term efficient encoding, aligning with our observed dissociation between recent and distant SDEs.

Our findings offer a new insight into the mechanisms of perceptual learning. While traditional theories explain learning specificity through local changes at the site of target encoding (Karni & Sagi, 1991), the formation of location-specific decision templates (Dosher & Lu, 1999), or both (Karni & Sagi, 1993; Watanabe & Sasaki, 2015), we propose a unified mechanism. Specifically, we suggest an account based on a single decision template that learns the discrimination task by classifying neuronal response features as signaling vertical or horizontal texture targets. These templates generalize across retinal locations of equal eccentricity but not across locations with different eccentricities (Harris & Sagi, 2018). However, when trained with targets in a fixed location, the decision template may become overfitted to features that are specific to that location, limiting generalization (Sagi, 2011). For learning to generalize, multiple samples (trials) must be integrated over time to filter out local noise. Our results show that decision biases are integrated linearly over trials, suggesting efficient temporal integration over many trials in conditions that support learning generalization. In contrast, reduced integration, due to adaptation or limited temporal windows, may produce classifiers that rely on spurious, location-specific features (Mollon & Danilova, 1996). We suggest that previous reports of learning generalization can be explained by a modulation of temporal integration. This includes short training phases that are stopped before adaptation takes over, showing generalization to other retinal locations (Censor & Sagi, 2009; Karni & Sagi, 1993), short pre-training phases enabling generalization across visual tasks (Zhang et al., 2010) and other paradigms that effectively reduce sensory adaptation (reviewed in Sagi, 2011) and by that allow serial dependence to accumulate. Our findings thus provide empirical support for a unified mechanism that governs both specific and generalized learning through modulation of temporal integration.

In summary, we show that long-range serial dependence predicts learning transfer, supporting the view that short-term memory contributes directly to long-term learning. By connecting serial dependence with learning, our findings bridge a key theoretical gap and suggest that the integration of past experience plays a crucial role in determining the specificity or generalization of learning.

Data availability

The data that support the findings of this study will be available from the corresponding author upon request.

Additional files

Supplementary Information