Introduction

Human learning and decision-making often take place in dynamic, uncertain environments. A key computational challenge is to infer latent properties of the environment—such as means and variances—from noisy observations, and to adapt to potential changes in these properties over time.

For example, consider a commuter noticing that their travel time is longer than expected. To decide whether to leave earlier or take a different route on subsequent days, they must first judge whether this fluctuation is simply part of the typical day-to-day variability (the variance), or whether there has been a genuine change in travel conditions. If a change is likely, they must further consider the nature of the change: is it a gradual increase in traffic (e.g., due to a growing local population), or a sudden disruption (e.g., new roadworks)? Effective decision-making therefore requires inferring both the mean and the variance of travel times, and adapting based on whether these properties are stable, gradually shifting, or abruptly changing.

Two canonical classes of dynamics are commonly used to describe how latent variables evolve over time: change-point dynamics, in which the latent variable is assumed to change abruptly and discretely (Adams & MacKay, 2007; Behrens et al., 2007; Nassar et al., 2010), and random-walk dynamics, in which the latent variable drifts gradually and continuously (Kalman, 1960; Mathys et al., 2011; Piray & Daw, 2021; Pulcu & Browning, 2025). Both frameworks capture essential aspects of environmental dynamics but fundamentally differ in how they represent environmental changes. These differences, in turn, imply distinct statistical and computational principles for learning.

Are humans able to learn appropriately given the environment they are placed in? A large body of work has examined human learning in either change-point environments (e.g., Behrens et al., 2007; Glaze et al., 2015; Vaghi et al., 2017) or random-walk environments (e.g., Daw et al., 2006; Lee et al., 2020; Piray & Daw, 2024; Pulcu & Browning, 2025) (for an overview of major studies and their environmental dynamics, see also Table 1 in Foucault, 2023). However, these two types of environments have largely been investigated separately—very few studies have compared them within the same experimental framework. This separation makes it unclear whether the behavioral signatures reported in each literature reflect general properties of human learning that hold across different environmental dynamics, or whether they are specific to the particular dynamics under study.

It remains an open question whether participants apply the appropriate model to the appropriate environment. This gap in our understanding of human learning is perhaps due to a lack of consistent study designs, task structures, analysis methods, and computational models across the two largely parallel lines of research—one anchored in change-point models and the other in random-walk models. Studies in change-point environments typically adopt a change-point modeling framework, analyzing modulations in learning rate in terms of variables such as the estimated change-point probability (Foucault & Meyniel, 2024; McGuire et al., 2014; Murphy et al., 2021; Nassar et al., 2012; Vaghi et al., 2017). Conversely, studies in random-walk environments typically adopt a random-walk modeling framework, relating human learning rates to model-derived estimates of volatility—that is, the diffusion noise of the random walk—(Mathys et al., 2011; Piray & Daw, 2021; Pulcu & Browning, 2025). Although some studies have analyzed human data from both types of environments, they have generally relied on only a single model class (typically a random-walk model) to interpret behavior (Mathys et al., 2011; Piray & Daw, 2020, 2021). As a result, it is unclear whether findings attributed to one model class might alternatively be explained by the other. This has important implications beyond basic learning research: work in computational psychiatry has shown that reanalyzing existing datasets under alternative generative-model assumptions can yield different interpretations of the computational mechanisms underlying psychiatric traits and symptoms (Mikus et al., 2025; Piray & Daw, 2021). Theoretical work on simulated data has shown that change-point and random-walk models could be disambiguated (Marković & Kiebel, 2016), but how humans behave across these dynamics, and whether they align more closely with one model or the other, remains to be determined.

A further outstanding question concerns whether and how humans infer and respond to changes in variance (stochasticity) over time within a sequence, alongside changes in the mean. Most previous studies either did not manipulate variance, or if they did, varied it across experimental blocks—keeping it fixed within blocks (Lee et al., 2020; McGuire et al., 2014; Nassar et al., 2010, 2012; Piray & Daw, 2024). Yet in many natural environments, both the central tendency and the variability—which determines the reliability of individual observations—can change, creating the added challenge of distinguishing changes in mean from changes in variance. This is non-trivial because both types of changes can produce a similar increase in the dispersion of recent observations and in prediction errors. Thus, an open question is whether humans can infer a changing variance alongside a changing mean, distinguish between change points in mean versus variance, and explicitly report when they believe the variance has changed. Being able to report such changes is important, for example because communicating uncertainty can substantially improve decision-making (Bahrami et al., 2010; Van Der Bles et al., 2019). Moreover, the way in which learners infer variance may interact with how they infer the mean. These interactions may differ depending on whether the underlying dynamics follow a change-point or random-walk process.

Overall, the influence of environmental dynamics on human learning remains poorly understood. In particular, it is unclear whether humans learn differently and whether they deploy the appropriate model in environments governed by change points versus random walks, and how these differences may be captured by computational models. A direct comparison of the two types of environments and their corresponding models—under conditions where both mean and variance can change—is currently lacking.

Here, we directly compare human learning with Bayesian modelling under change-point and random-walk dynamics within a single task that requires joint inference of mean and variance. Across two experiments, participants performed a learning task in which the generative mean changed through either change points (Experiment 1) or a random walk (Experiment 2), while the generative variance switched between two states. We analyze behavior using model-agnostic measures derived directly from participants’ responses and compare it with predictions from a unified family of Bayesian models. Our results show that, in line with normative predictions, human learning rates adapt differently under change-point and random-walk dynamics, indicating that humans use the appropriate model for each environment. Moreover, human reports of variance (paddle-width choices) track the generative variance online, and importantly, dissociate changes in variance from changes in mean (i.e., dissociate changes due to stochasticity versus volatility). Together, these findings demonstrate flexible human inference in dynamic environments, correctly attributing variability to latent factors and adapting to their dynamics.

Results

Task structure and performance

We designed a learning task and conducted two experiments to examine how environmental dynamics influence human learning (Figure 1). In both experiments, the environment contained a generative mean and a generative variance that independently changed over time, and participants’ ability to infer them online simultaneously was directly tested by the task.

Capture-the-beams prediction task.

(a) Task structure. On each trial, participants positioned a paddle on a circular ring and selected its width (small or large) to predict the angular location of the next beam. After adjusting the paddle, they observed the next beam and received feedback. Capturing a beam with the small paddle yielded more points (+10) than with the large paddle (+5), but carried a higher risk of misses, which incurred a loss (–10). Beam locations were generated from a latent mean and variance that both changed over time. In Experiment 1, the generative mean changed abruptly at unpredictable change points, whereas in Experiment 2, it changed gradually and continuously from trial to trial according to a random walk. In both experiments, the generative variance alternated probabilistically between low- and high-variance states. Participants: n = 30 in each of the two experiments. (b, c) Example sequences from Experiment 1 (change-point environment) and Experiment 2 (random-walk environment), and example participant, whose paddle locations and widths (orange) tracked the generative mean and variance (purple), respectively.

On each trial, participants predicted the next observation in a sequence—corresponding in this task to the beam emission angle around a central point (Figure 1a). Instead of a single-point prediction, participants provided a prediction interval by adjusting a paddle, whose angular extent defined the interval. They could rotate the paddle around a circular ring and choose between two available paddle sizes (small or large). Using the small paddle resulted in higher gains if the next beam was captured but carried a higher risk of missing the beam, which induced a loss. Once they confirmed their choice, the next beam appeared, the resulting gains or losses were displayed, and the next trial began.

Accurate performance required learning, through inference, two latent variables: the generative mean and the generative variance, which governed the distribution from which the next beam emission angle was drawn. Importantly, the optimal paddle location and size directly reflected these inferences. To maximize their score, participants needed to center the paddle on the current generative mean, and choose the large paddle when the generative variance was high or the small paddle when the generative variance was low (see Methods). This mapping provided a trial-by-trial behavioral readout of participants’ beliefs about both latent variables.

To make the task intuitive, we embedded it in a space-explorer cover story in which participants captured energy from beams emitted by a distant star to power their colony. We explained the key components of the generative process using analogies. We first introduced the generative mean and variance using a flashlight analogy: the star’s average emission direction represented the mean, and the spread of the beams the variance. We then introduced the two variance levels as two distinct states of the star—a focused state (low variance) and a diffuse state (high variance). Lastly, we described changes in mean and variance as unpredictable forces that independently caused directional shifts or changes in the star’s state. Before the task began, we administered a short comprehension quiz to ensure that participants had understood these concepts and their implications for maximizing score.

Two experiments were conducted with 30 participants each: in Experiment 1, the dynamics of the generative mean followed a change-point process (Figure 1b), and in Experiment 2, they followed a random-walk process (Figure 1c). Apart from the dynamics of the generative mean, all other task parameters were identical between the two experiments. Participants performed the tasks accurately: Experiment 1, percentage of beams captured 81 ± 1%, proportion of trials using the small/large paddle 66/34% ± 3%, score 586 ± 9 points per 100-trial block; Experiment 2, beams captured 77 ± 1%, small/large paddle 54/46% ± 5%, score 447 ± 11 points (values are mean ± s.e.m.).

Computational modeling framework

To derive theoretical predictions about behavior in our task, we developed and simulated a set of computational models (Figure 2). These models formalize specific hypotheses about the inference processes humans might use, allowing us to predict how behavior should unfold under different assumptions and, by comparison with human data, gain insights into the computational processes underlying human learning and decision-making.

Modeling framework.

(a) Structure of the models. On each trial, models update their beliefs via Bayesian inference over the assumed generative process, compute the predictive distribution for the next beam location, determine the response (paddle location and width) that maximizes expected reward under this distribution, and generate it probabilistically according to the associated response probability. (b) Model family. Models were defined by their assumptions about the generative process of beam locations. We systematically varied assumptions about the dynamics of the latent mean (change-point vs. random-walk) and about whether the variance was fixed or changing (and thus inferred as a latent variable). (c) Example inference of the change-point model in the change-point environment. The posterior distribution (heat map) accurately follows the true generative mean (top) and variance (bottom). For reference, we overlay the posterior mean and mode; both align closely with the ground truth.

Our model set was defined in a systematic way to isolate the role of key features of the assumed generative process (Figure 2b). In particular, we varied (i) the type of dynamics (abrupt change points versus gradual random walks) and (ii) whether the generative variance was assumed to be fixed or changing. This yielded a family of models that can be compared and contrasted to provide theoretical insights. Several of these models correspond directly to proposals from prior work, but our formulation unifies them within a common Bayesian framework: all models rely on inference over an assumed generative process, differing only in the process they posit (e.g., the change-point model without variance inference makes the same assumptions as the model of (Nassar et al., 2010); the random-walk model with variance inference has the same latent variable structure as the model of (Piray & Daw, 2024) but with different variance dynamics).

The computations performed by each model can be decomposed into three steps (Figure 2a).

  1. Inference: The model computes the posterior distribution over the assumed latent generative state—when the assumed generative process matches the environment, the posterior optimally recovers the true latent state (see example in Figure 2c).

  2. Prediction: From the posterior, the model derives the predictive distribution over the expected location of the next beam.

  3. Response generation: The model determines the paddle location and width that maximize the expected reward (in points) under its predictive distribution, and updates the paddle to this computed location and width with a probability (referred to as ‘response probability’) that increases with the distance between the computed and current location. This response-probability mechanism captures common motor inertia and action initiation costs that make participants unwilling to move the paddle for very small updates. In other words, there are trials where no movement is performed, and these are most likely to occur when only a small update is required.

Model predictions: adaptation to environmental dynamics and stochasticity

Having developed this family of models, we next examine their predictions about behavior under different environmental dynamics (change points vs. random walks). To characterize learning behavior in a model-agnostic manner, we quantified an apparent learning rate—defined on each trial as the signed update in paddle location divided by the signed prediction error (Foucault & Meyniel, 2024; Nassar et al., 2010). This measure is not a fitted model parameter but a descriptive index of how strongly an agent updates its response to new evidence. For simplicity, we refer to it below as the learning rate.

Model simulations show that the environmental dynamics dictate how this learning rate should vary with the observed prediction error: the model adapts its learning rate markedly differently when assuming change-point dynamics compared to random-walk dynamics (Figure 3). In the change-point model, the learning rate increases sharply with prediction error, reaching values close to 1 for large errors (Figure 3a). By contrast, in the random-walk model, the learning rate shows little or no modulation once beyond the low-error regime: increases in the magnitude of the prediction error do not translate into larger learning rates (Figure 3b). This qualitative difference reflects the underlying assumptions about environmental dynamics. In a change-point process, a large prediction error provides evidence that a change point in the latent mean may have occurred, in which case it is optimal to reset the belief about the latent mean and weight the new observation heavily—hence the steep rise in learning rate for large errors. Conversely, when prediction errors are small to moderate (e.g. < 50°), a change point is unlikely, implying that the latent mean has remained stable; in this case, the model exhibits a very low learning rate. In a random-walk process, however, the latent mean is assumed to drift gradually on every trial rather than change abruptly, thus, belief updates should be applied consistently (even when prediction errors are only moderate in size). As a result, the random-walk model exhibits a relatively stable learning rate across different error magnitudes, while the change-point model flexibly suppresses or amplifies updates depending on the inferred probability of a change point.

Normative model predictions for the effects of environmental dynamics.

Environmental dynamics (change points vs. random walk) should fundamentally alter how learning rates adjust to prediction errors. (a,b) Trial-wise apparent learning rate of the normative models (y-axis) as a function of absolute prediction error (x-axis), shown separately for trials when the generative variance was low (dark) or high (light), under change-point dynamics (a) and random-walk dynamics (b). Each panel includes, as an inset, the corresponding distribution of absolute prediction errors for low- and high-variance trials, illustrating that higher stochasticity produces larger errors. The change-point model predicts a sharp increase in learning rate for large errors, reflecting inference about change points in the latent mean, whereas the random-walk model exhibits relatively stable learning rates across error magnitudes (beyond very small ones). Both models show a reduction in apparent learning rate for very small errors (≤10°) due to a lower response probability. (c,d) Average apparent learning rate in low- versus high-variance trials for the change-point (c) and random-walk (d) environments. Contrary to the simple expectation that higher stochasticity should reduce learning rates, the models reveal two opposing effects. Holding prediction errors constant, higher stochasticity decreases learning rates (a,b); however, because higher stochasticity also generates larger errors (insets), it indirectly elicits higher learning rates—resulting here in a net positive effect of stochasticity on learning rates (c, d). Simulation results were obtained using the experimental sequences from the change-point environment (Experiment 1) for the change-point model (panels a,c), and from the random-walk environment (Experiment 2) for the random-walk model (panels b,d).

When prediction errors are very small (≤ 10°), both models show a large decrease in learning rate (Figures 3a and 3b). This suppression arises not from the inference process itself—beliefs continue to be updated—but from the response generation. When prediction errors are minimal, the paddle is already close to the model’s computed ideal location, and thus the model has a low response probability (i.e., a low probability of adjusting the paddle). Therefore, many of these very-low-error trials produce no paddle movement, yielding a learning rate of zero, and lowering the average learning rate in this regime.

Beyond absolute prediction errors, the models also adjust their learning rate based on the inferred current generative variance—that is, the inferred stochasticity of the environment. Surprisingly, and in contrast to previous findings (Piray & Daw, 2024; Pulcu & Browning, 2025), the average learning rate in our study is not lower under higher stochasticity, even for models that perform Bayes-optimal inference (Figures 3c and 3d). This seeming contradiction arises because stochasticity influences the learning rate in two opposing ways. Holding prediction error constant, higher stochasticity reduces the informativeness of observations and should lower the learning rate. At the same time, greater stochasticity increases the typical size of prediction errors (insets in Figures 3a and 3b), which increases the learning rate through the mechanisms described above. The observed net effect on the average learning rate therefore depends on the balance of these two influences.

In the change-point model, the increase in prediction errors tends to dominate, yielding a higher average learning rate under high stochasticity (Figure 3c). In the random-walk model, the two influences more closely offset one another, producing only a weak difference in average learning rates between stochasticity conditions (Figure 3d), and this difference reverses when the response probability mechanism is removed (Supplementary Figure 1).

These results provide an important insight: the effects of stochasticity cannot be interpreted in isolation, but only in interaction with prediction errors. When prediction errors are held constant, the learning rate is lower under higher stochasticity across both dynamics (Figures 3a and 3b).

Human learning-rate adjustments reflect use of the appropriate generative model across environments

We next examined how human behavior compared to the model predictions (Figure 4), and crucially, whether humans deployed the appropriate model for the appropriate environment. In line with our theoretical predictions, we found that humans adapted their learning rate differently across the two environments, in a manner consistent with the appropriate model: learning rate adjustments in the change-point environment were more similar to the change-point model (Figure 4a), whereas those in the random-walk environment were more similar to the random-walk model (Figure 4b). Statistically, participants were significantly closer to the matched model than to the mismatched one (paired-samples t tests comparing the two models on their Pearson correlation with human trial-wise learning rates, Experiment 1: t29 = 8.4, p = 10-9, Cohen’s d = 1.5; Experiment 2: t29 = -5.5, p = 10-7, d = -1.0). Consistently, 100% of participants (95% CI = [88–100%]) were better matched by the change-point model in Experiment 1, compared to only 10% (CI = [2–27%]) in Experiment 2 (CIs estimated from binomial tests).

Human learning-rate adaptation reflects environmental dynamics, consistent with normative model predictions.

(a,b) Participants’ apparent learning rate (y-axis) as a function of absolute prediction error (x-axis), shown separately for trials with low (blue) and high (orange) generative variance. Shaded areas indicate ± s.e.m. across participants. Panel a corresponds to Experiment 1 (change-point environment) and panel b to Experiment 2 (random-walk environment). Insets show model simulation results within each environment for comparison: human behavior aligns more closely with the normative model matched to the true environmental dynamics (green check mark) than with the mismatched model. In the change-point environment (a), the change-point model captures the sharp increase in learning rates for large errors, whereas the random-walk model fails to reproduce this pattern. In the random-walk environment (b), the random-walk model captures the relatively stable learning rates across error magnitudes, while the change-point model does not. Across both environments, learning rates were reduced at very small prediction errors, and for equal error magnitudes, were lower under high variance than low variance, consistent with model predictions.

The qualitative patterns observed in humans echoed those in our simulations. In the change-point environment, learning rates increased sharply with prediction errors, consistent with inference about the occurrence of change points. By contrast, in the random-walk environment, beyond the very-low-error regime, learning rates remained more stable, mirroring the continuous gradual change in the latent mean. Importantly, these differences arise from the inference process, since all other model components (prediction and response generation) were held constant across models.

Further analyses supported this interpretation. In the change-point environment, subjects exhibited a characteristic sharp increase in learning rate immediately following change points, closely matching the learning-rate response profile predicted by the change-point model but not the random-walk model (Supplementary Figure 2). Moreover, a consistent correspondence with model predictions was revealed after decomposing the learning rate into two components: (i) the proportion of trials on which subjects moved their paddle (Supplementary Figure 3), and (ii) the learning rate conditional on movement, that is, excluding no-movement trials (Supplementary Figure 4).

Specifically, subjects were more likely to move their paddle in response to low prediction errors in the random-walk environment than in the change-point environment. This pattern, also observed in models, reflects the different environmental dynamics: in the random-walk environment, even small prediction errors provide meaningful evidence for updating beliefs because the latent mean changes on every trial, whereas in the change-point environment, such errors elicit only marginal belief updates because the latent mean has likely not changed. These differences in belief update size translate into differences in the frequency of paddle movements because larger belief updates are more likely to trigger movements. Independently of these differences in movement frequency, when subjects did move the paddle, the magnitude of their updates increased more steeply with prediction errors in the change-point environment than in the random-walk environment, again mirroring model predictions (Supplementary Figure 4).

In summary, we found that model predictions for learning rate differed between environmental dynamics, and were best described as a joint function of absolute prediction error and stochasticity. These model predictions were closely matched by human behaviour, indicating appropriate use of the correct generative model in each environment.

Paddle-width adjustments demonstrate variance inference and disambiguation from mean changes

We now turn to humans’ ability to track the current generative variance. This ability is already partly reflected in their learning-rate adjustments, which, like in the models, were appropriately calibrated to the current variance (Figure 4: holding absolute error constant, learning rates were lower under high variance than low variance in both environments, mirroring the normative prediction in Figure 3). However, this ability can be demonstrated more directly by examining their paddle-width choices, which provide an explicit measure of participants’ variance estimates.

Humans’ paddle-width choices appropriately tracked the current generative variance: they more often chose the large paddle when variance was high and the small paddle when variance was low (Experiment 1, change-point environment: Figure 5c; average proportion of trials using the large paddle equal to 18% vs. 49% in low- vs. high-variance periods, paired-samples t test t29 = –11.0, p = 10-11). The magnitude of this effect was comparable to that of the normative change-point model, which optimally infers variance (Figure 5b, average proportion 12% vs. 63% in low- vs. high-variance periods). By contrast, when variance inference was removed from the model, its paddle-width choices no longer accurately tracked the current variance (Figures 5a, 6a). The same pattern of results was also observed in the random-walk environment (Experiment 2: Supplementary Figure 5).

Human paddle-width adjustments reflect their inference of the current generative variance.

(a–c) Mean proportion of trials in which the large paddle was selected when the generative variance was low versus high, in Experiment 1 for models (a,b) and humans (c). (a) Models without variance inference, which assume a fixed low or high variance, do not flexibly adjust paddle width, selecting the paddle size that matches their assumption rather than the true variance level. (b) Models with variance inference, which assume that the generative variance switches probabilistically, predict adaptive paddle use, with greater use of the large paddle under high variance and of the small paddle under low variance. (c) Human behavior shows the same pattern: participants significantly increased their use of the large paddle when variance was high compared to when it was low (p < 10⁻¹¹). Bars show group means (error bars ± s.e.m.); grey dots indicate individual participants, with lines connecting paired values from the same participant. For equivalent analyses of Experiment 2, see Supplementary Figure 5.

A more compelling demonstration of variance inference comes from examining how humans adjusted paddle width over time in response to changes in variance, showing that they tracked variance online within a sequence. As humans accumulated observations following an increase in variance, they became increasingly likely to select the large paddle, whereas following a decrease in variance, they became increasingly likely to select the small paddle (Figure 6d). Their adjustment time course closely matched that of the normative change-point model (Figure 6c), with a detection time—defined as the point at which the increase and decrease curves crossed—of about three observations.

Disambiguating mean and variance changes: In change-point environments, humans distinguish change points in mean from change points in variance, consistent with the normative change-point model that jointly infers mean and variance.

(a–d) Proportion of large-paddle choices in Experiment 1 following the different types of latent changes (change point in mean, increase in variance, or decrease in variance). (a) The change-point model without variance inference fails to adjust its paddle width after variance changes. (b) The random-walk model with variance inference correctly adjusts its paddle width after variance increases and decreases, but conflates mean change points with variance increases: mean change-points elicit an even stronger switch to the large paddle than variance increases for 7 trials. (c,d) The normative change-point model and humans both correctly dissociate variance changes from mean change points. While mean change points can trigger a transient increase in large-paddle use for 1 to 3 trials, only variance changes produce a sustained switch in paddle choice. Shaded areas denote ± s.e.m. across participants.

A key question, however, is whether humans confuse changes in variance with changes in mean. This distinction is non-trivial, as both types of change result in increased variability in the observations—the only data available to the observer. Previous work has suggested that humans may misattribute high noise (i.e., high variance in our terminology) to high volatility (i.e., larger changes in mean) (Pulcu & Browning, 2025). To address this question, we focus on the change-point environment, where mean and variance both follow a change-point process, allowing us to make a direct comparison between the two types of change points.

In our study, humans’ paddle-width adjustments provided clear evidence that they correctly attributed observed variability to the appropriate source. Following a change point in mean, participants’ adjustments were only transient—consistent with a brief rise in uncertainty about the latent mean’s location—whereas variance increases produced a sustained switch to the large paddle (Figure 6d). The average duration of these variance-triggered adjustments (9.56 ± 0.56 observations) greatly exceeded those after mean change points (2.26 ± 0.17 observations), a difference confirmed by a Wilcoxon signed-rank test (W = 0, p = 10-5), with all participants (100%; CI = [87–100%], binomial test p = 10-8) showing a longer duration for variance- than for mean-triggered adjustments (see Methods for analysis details). Humans’ responses closely resembled those of the normative change-point model, which also exhibited a modest transient paddle-width increase after a change point in mean and a sustained increase after an increase in variance (Figure 6c). This pattern contrasts with models lacking variance inference, which showed no paddle-width adjustment (Figure 6a), as well as with the random-walk model with variance inference, which incorrectly responded to mean change points with a sustained increase in paddle width (Figure 6b).

Taken together, these findings demonstrate humans’ ability to perform online joint inference of mean and variance. Their behavior was in line with the normative model, both in their paddle-width adjustments (Figures 5 and 6) and in their learning-rate adjustments (Figure 4 and Supplementary Figure 2).

Discussion

Humans constantly face environments that change in different ways—sometimes abruptly, sometimes gradually. In this study, we asked whether and how humans adapt their learning to such distinct forms of environmental dynamics. We found that participants adjusted their learning rates in systematically different ways across change-point and random-walk environments, in close alignment with a normative Bayesian learner that reasons about the environment using the correct generative model. We further asked whether participants could distinguish different types of latent environmental changes—either in mean, or in variance. This was evident in their simultaneous adjustments of paddle location and paddle width, which jointly tracked and clearly dissociated the two latent variables, and again mirrored the predictions of the normative model. Together, these results advance previous work by providing a direct comparison of change-point and random-walk dynamics within a unified framework, and demonstrating that humans flexibly deploy the appropriate world model for the environment at hand and its changing properties.

Notably, the same prediction error elicited different adjustments under change-point versus random-walk structure. In the change-point environment, large prediction errors (PEs) triggered sharp, transient increases in apparent learning rate, consistent with detecting discrete changes in the latent mean. In contrast, in the random-walk environment, the function mapping from PEs to learning rates was flatter, with more even weighting of small and large PEs, reflecting steadier, continuous integration of new evidence appropriate for gradual drift. By directly contrasting these regimes within the same paradigm, the present findings extend and unify two largely independent literatures on adaptive learning under change points (e.g. (Meyniel et al., 2015; Nassar et al., 2010)) and random walks (e.g. (Mathys et al., 2011; Piray & Daw, 2024)). Critically, they show that learners do not rely on a fixed, error-driven update rule, but instead condition their interpretation of prediction errors on the environment’s dynamics—a hallmark of world-model accounts of learning (Bruckner, Heekeren, et al., 2025).

Our results resolve apparent contradictions in the literature on learning by revealing that stochasticity does not exert a unitary effect on the learning rate. Instead, its influence depends on the assumed dynamics and interacts with prediction error (PE). Higher stochasticity produces more frequent occurrences of large PEs, which, under a change-point interpretation, yields transient bursts in learning rates. This can lead to an overall increase in average learning rate with higher stochasticity, even though the normative model predicts a decrease when controlling for PE magnitude (Figure 3). This account offers a normative reinterpretation that helps reconcile the opposite effects of stochasticity reported by (Pulcu & Browning, 2025) and (Piray & Daw, 2024) — the latter finding that higher stochasticity decreased learning rates, whereas the former found that it increased them. Although both studies examined human learning under random-walk dynamics, a key difference is that participants in (Pulcu & Browning, 2025) were not informed about the task’s underlying structure, whereas those in (Piray & Daw, 2024) were instructed that the latent mean moved noisily on a trial-by-trial basis (i.e., followed random-walk-like dynamics rather than discrete change points). Without explicit instructions, participants in (Pulcu & Browning, 2025) may have treated the environment as one that could contain change points, and implicitly applied a change-point model when confronted with large PEs. This would produce an increase in learning rate with higher stochasticity. This mechanism may coexist with the authors’ proposed stochasticity–volatility misattribution explanation.

Beyond inferring the mean, participants were able to track latent variance online, detect sudden within-block changes in variance, and dissociate them from changes in mean. This was evidenced by coordinated adjustments in paddle location (mean) and paddle width (uncertainty). Following variance change points, participants rapidly recalibrated their uncertainty reports to the new variance—often within a few trials—and then maintained this calibration thereafter. By contrast, mean change points elicited only a brief uptick in reported uncertainty and, critically, a step-like shift in the mean estimate, manifested as a transient spike in the apparent learning rate—consistent with the normative model (Figure 6 and Supplementary Figure 2). These findings extend prior work on uncertainty and confidence during learning (e.g., (Boldt et al., 2019; Meyniel et al., 2015; Nassar et al., 2010; Vaghi et al., 2017)) and complement recent studies on stochasticity inference (e.g., (Piray & Daw, 2024; Pulcu & Browning, 2025)) by demonstrating within-block, online adaptation supported by explicit uncertainty reports.

Human behavior reflects a model-based, probabilistic-inference account of learning

World-model accounts of learning argue that learners use an internal generative model of the environment to interpret prediction errors and guide belief updates (Bruckner, Heekeren, et al., 2025). Prediction errors are not treated as uniform learning signals, as in model-free algorithms, but are instead evaluated in light of the learner’s internal generative model of the environment’s dynamics. This context-dependent interpretation of error explains the dynamics-specific learning-rate patterns described above, and situates our findings within broader debates contrasting model-free and model-based mechanisms of learning (Collins & Cockburn, 2020; Daw et al., 2005; Dolan & Dayan, 2013; Doody et al., 2022; Feher da Silva & Hare, 2020; Gläscher et al., 2010). While both forms of learning likely coexist in the brain, with one or the other or both playing a central role depending on the conditions, our results indicate that in our predictive task the model-based component plays the primary role. Moreover, participants were able to deploy the correct generative model for the environment at hand immediately, without environment-specific training or feedback, by relying on abstract structural knowledge acquired through instructions. This capacity to apply an appropriate inference or learning strategy without prior task exposure connects to work on theory-based learning in humans (Lake et al., 2015; Tsividis et al., 2021) and recent demonstrations of zero-shot learning in artificial systems (Brown et al., 2020; Kojima et al., 2022).

The convergence between human and model predictions highlights the importance of aligning computational models with the environment’s generative structure. Only models embodying the correct assumptions about latent dynamics could reproduce the distinct learning patterns observed across environments, manifested in trial-by-trial adjustments in learning rate and reported uncertainty. This alignment demonstrates that differences in observed learning behavior—often interpreted as changes in learning mechanisms (Eckstein et al., 2022)—can arise naturally from normative inference under different world models. Fixed-dynamics models (such as the Kalman filter, the Hierarchical Gaussian Filter and its extensions (Diaconescu et al., 2014; Kalman, 1960; Mathys et al., 2011; Weber et al., 2023) or change-point models (Adams & MacKay, 2007; Nassar et al., 2010)) capture important aspects of learning but cannot by themselves explain humans’ ability to flexibly adjust computations to match the environment’s dynamics, and risk misattributing behavior when model and environment dynamics are mismatched. Our framework instead offers a unified lens for capturing behavior across environments by explicitly modeling assumptions about environmental structure within a common probabilistic-inference account. Given that exact Bayesian inference is in general computationally intractable, an open question concerns how humans approximate and implement it (Lieder & Griffiths, 2019; Van Rooij, 2008). Future work could compare normative predictions with process-level approximations, such as small recurrent neural networks (Foucault & Meyniel, 2021; Ji-An et al., 2025), hybrid neural-cognitive models (Eckstein et al., 2024), or other resource-bounded models (Bhui et al., 2021; Bruckner, Heekeren, et al., 2025; Bruckner, Nassar, et al., 2025; Drevet et al., 2022; Findling et al., 2019; Franklin & Frank, 2015; Gershman et al., 2015; Murphy et al., 2021; Zhang et al., 2023), to uncover the mechanisms that enable human brains to approximate Bayesian inference under resource constraints. Relatedly, here we used ideal Bayesian observer models given that our goal was to isolate the normative signatures implied by the underlying environmental dynamics and to test whether humans express these signatures in their behavior. Work aimed at capturing inter-individual differences in learning may instead fit process-level models to individual subjects (Mikus et al., 2025; Piray & Daw, 2021; Weber et al., 2023).

A future direction that remains to be investigated is how humans infer environmental structure, including its dynamics, when this is not scaffolded by instruction. In our study, participants were explicitly informed about the task’s generative structure, as our study was focused on estimating latent states (namely, the generative mean and variance) rather than on discovering which structural form governed the environment. The present findings should therefore be interpreted within this instructed-context boundary condition. An open question is whether the dynamics-specific learning signatures observed here re-emerge once learners have discovered the underlying structure. This broader problem, known as structure learning (Gershman & Niv, 2010), involves identifying which generative process best explains incoming observations and when to switch between alternative world models, and likely engages additional inference mechanisms beyond those studied here. Recent work has begun to reveal such structure-learning mechanisms (Gershman et al., 2017; Tomov et al., 2018, 2023). Extending the present framework to situations where environmental dynamics must themselves be inferred would help unify adaptive learning and structure discovery under a common computational account.

Implications and outlook

Our findings establish a computational account of how environmental dynamics shape human learning. Theoretically, they demonstrate that humans flexibly deploy computational strategies tailored to the structure of their environment, providing strong evidence for adaptive, model-based cognition. Methodologically, they underscore the importance of aligning computational models with the generative structure assumed or inferred by the learner and illustrate how our framework provides a general approach for unifying analyses across different forms of environmental change. In addition, they generate concrete neural predictions for future work. Specifically, changes in mean versus variance should produce dissociable neural signatures, reflecting distinct forms of latent-state updating. Moreover, change-point environments should evoke transient, large neural responses to surprising events (e.g., pupil dilation, P3 amplitudes, and prefrontal and anterior cingulate cortex activity), whereas random-walk environments should elicit steadier, graded error-related responses. While prior work has investigated these neural signatures separately (Chakroun et al., 2020; Daw et al., 2006; Findling et al., 2019; Jepma et al., 2016; Kao et al., 2020; McGuire et al., 2014; Murphy et al., 2021; Nassar et al., 2012; O’Reilly et al., 2013), neural responses to change points versus random walks have not yet been compared directly.

Methods

Participants

Participants were recruited via Prolific, an online participant recruitment platform for academic research studies. The study was approved by the University of Oxford’s Central University Research Ethics Committee (CUREC Approval Reference: R96268/RE001). All participants provided informed consent prior to participation.

A total of 60 participants completed the study (30 in each of the two experiments), with a mean age of 35.0 years (SD = 11.9); fourteen identified as female and one did not report their gender. All participants reported normal or corrected-to-normal vision and fluency in English. All participants completed a comprehension quiz (repeating instructions if necessary until passing) and went on to complete all 15 blocks of the behavioral task. Participants received a base payment of £8.40 plus a performance-based bonus of up to £4.20. No participants were excluded from analyses.

Task and experimental design

Task instructions

Participants completed a behavioral task presented as a “capture-the-beams” space exploration game. Before starting, they received instructions explaining the task through a cover story, describing the task’s goal, mechanics, and underlying generative process. Their objective was to maximize points, which in the cover story corresponded to energy collected for their colony, awarded for capturing beams emitted from a star.

The key concepts of generative mean and variance were introduced using simple analogies (a flashlight emitting beams with a certain average direction and spread). The two variance levels were described as two distinct states of the star (a focused state vs. a diffuse state). Participants were then introduced to the paddle and its two possible sizes, along with the point system. They had the opportunity to practice moving the paddle and switching sizes.

The amount of points awarded per observed beam depended on capture outcome and paddle size (+10 points for capturing with the small paddle, +5 points for capturing with the large paddle, -10 points for a miss). This scoring scheme created a trade-off between the higher likelihood of catching a beam with the large paddle and the greater reward per beam captured with the small paddle. Normatively, this incentivized participants to use the large paddle when the generative variance was high and the small paddle when variance was low.

The dynamics of the generative mean and variance were then explained. In Experiment 1, participants were told that the star’s average direction could change abruptly from time to time; in Experiment 2, they were told it changed gradually over time. In both experiments, they were told that the star’s state could change occasionally (switching between the focused, low-variance state and the diffuse, high-variance state). Participants were not informed about the frequency or magnitude of changes, nor about the precise variance levels.

At the end of the instructions, participants completed a short comprehension quiz that tested their understanding of how beam direction and spread changed and how paddle size influenced scoring. Those who did not pass the quiz reviewed the instructions until they succeeded, after which they proceeded to the task.

Task structure

Each participant completed 15 task blocks of 100 trials each. On each trial, participants could freely adjust the angular location of the paddle on a circular ring (0–360°) and select between the small paddle (60° wide) or the large paddle (120° wide). The paddle could be controlled using either the keyboard (left/right arrows to rotate, up/down arrows to change paddle size) or the mouse or touchpad (cursor movement for rotation, and scroll to change size). Once satisfied, participants confirmed their choice using the space bar or a mouse click.

After participants’ confirmation, the beam was revealed, and points were awarded according to whether it was captured and with which paddle size (as per the +10/+5/-10 scheme described above). Visual feedback showed whether the beam was captured or missed and how many points were gained or lost. The next trial then began.

At the start of each block, participants began with an initial score of 110 points. This starting score incentivized them to avoid losses from the outset, since points translated to a monetary bonus that could only be positive. At the end of each block, participants were shown their total score for that block and could take a break if they wished. Once they confirmed (by pressing the space bar), the next block began.

Participants were free to take as much time as they wished to make their choices. On average, they spent 2 min 30 s (± 4 s) per block in Experiment 1, and 2 min 34 s (± 5 s) per block in Experiment 2, corresponding to approximately 1.5s per trial in both cases.

Generative process and parameters

The observation sequences (beam angles) presented in each block were generated using a stochastic process, identical across the two experiments except for the dynamics of the generative mean. On each trial, the current generative mean and variance were updated, and a new observation (beam angle) was drawn. The generative process is fully described as follows.

  • Observation. Each observation was drawn from a wrapped Gaussian distribution with the current generative mean and variance parameters.

  • Dynamics of the generative mean.

    • Experiment 1: The mean followed a change-point process, with probability p = 1/16 of a change point (jump to a new value, sampled uniformly from 0–360) occurring on each trial.

    • Experiment 2: The mean followed a wrapped Gaussian random walk with volatility (process noise variance) of 152.

  • Dynamics of the generative variance. The generative variance switched between two levels, low (102) and high (302), with probability p = 1/16 of a switch occurring on each trial.

  • Initialization. The generative mean was initially sampled uniformly from 0–360, and the generative variance was initially set randomly to low or high with equal probability

To optimize experimental efficiency, we imposed additional constraints on the sequences used in the experiment. Run lengths (intervals between change points) were constrained to 5–40 trials (avoiding excessively long or short stable periods) and change points in mean were required to elicit a change of at least 90° (avoiding excessively subtle change points). These constraints were not disclosed to participants or computational models.

Parameters such as paddle widths, variance levels, and reward values were selected based on simulations of Bayesian ideal observers. Simulations verified that the normative model would use both paddle sizes meaningfully (favoring the large paddle in high-variance periods and the small paddle in low-variance periods) and that change points in mean rarely caused the normative model to change its paddle size. This ensured that paddle size choices primarily reflected variance inference rather than mean uncertainty.

Implementation

The task was implemented in JavaScript and ran in participants’ web browsers. Participants accessed the task through a dedicated URL. The task was hosted on Pavlovia servers, ensuring compliance with the EU General Data Protection Regulation (GDPR).

The full experiment code, which can be used to run and test the task, is available at our GitHub repository (see Code and data availability for access details).

Computational Models

We developed a family of Bayesian models. All models shared the same computational structure but differed in the assumptions they made about the generative process. In those assumptions, we varied two main factors of interest:

  • Mean dynamics: Whether the model assumed that the generative mean followed change-point dynamics or random-walk dynamics (the former are referred to as change-point models, the latter as random-walk models)

  • Variance inference: Whether the model assumed a fixed generative variance of a given level (models without variance inference) or a changing variance to be inferred from the observations (models with variance inference).

Additional generative-model structures and parameterizations can easily be explored using our online code (see Code and data availability for access).

Model computations

Each model performed three computations on every trial (Figure 2a):

  1. Inference: Updating the posterior over the latent state (corresponding to the assumed generative state) given the new observation.

  2. Prediction: Deriving the predictive distribution for the next beam’s angle.

  3. Response generation: Selecting the paddle location and width that maximizes expected reward under the predictive distribution, and updating the paddle to the selected choice probabilistically based on how far it was from the current one. This probabilistic rule captured the fact that participants did not always update their paddle, and were less likely to do so if the current choice was already “good enough” (close to the optimal one).

Mathematical Details

Inference

Posterior updates were computed using Bayesian filtering (Foucault, 2023; Särkkä & Svensson, 2023). This computation can be described with two equations:

Where 𝑠𝑡 is the latent state at time step t and 𝑥1:𝑡 is the sequence of observations up to time step t.

Equation (1) updates the posterior given the new observation, while Equation (2) propagates this posterior forward to predict the next latent state.

Note that the Bayesian filtering algorithm performs recursive updates: the posterior at each time step is computed using only the current observation and the posterior from the previous step. This allows the algorithm to perform online inference without storing the entire history of observations.

Prediction

The predictive distribution over the next observation was computed as described by the following equation:

𝑝(𝑠𝑡+1 | 𝑥1:𝑡) corresponds to the posterior computed at the inference step above, and 𝑝(𝑥𝑡+1 | 𝑠𝑡+1) to the generative process’s observation distribution.

Response generation

The expected reward under the predictive distribution was computed for each possible choice of paddle location and size, and the one that maximized this expected reward was selected:

Where is the expectation taken over the next observation 𝑥 under the above predictive distribution, 𝑦 is a possible choice of paddle location and size, 𝑟(𝑥, 𝑦) is the reward function describing the task’s scoring system (see above), and y* is the selected choice.

The paddle was updated to the selected choice probabilistically. The probability (which we refer to as ‘response probability’) was computed as a function of the circular distance (in degrees) between the current and selected paddle locations. We used the cumulative distribution function (CDF) of a truncated normal distribution with fixed parameters (mean = 0°, SD = 8°) chosen to approximate the observed human behavior.

Model variants

The model set included the following models.

  • Change-point model with variance inference: Normative model for Experiment 1, it assumes the generative process is the one described above for Experiment 1.

  • Random-walk model with variance inference: Normative model for Experiment 2, it assumes the generative process is the one described above for Experiment 2 and also infers the volatility level (rather than knowing it in advance).

  • Change-point models without variance inference: Assume the same mean dynamics as the change-point model with variance inference, but a fixed generative variance. The fixed low variance model assumes that the generative variance is always low, the fixed high variance model that the generative variance is always high.

  • Random-walk models without variance inference: Assume the same mean dynamics as the random-walk model with variance inference, but a fixed generative variance, with fixed-low and fixed-high variants as for the change-point models.

Implementation

Models were implemented in Python using NumPy and Scipy. The modeling framework developed for this study will be made publicly available as a standalone, reusable repository designed to facilitate application to other tasks and environments beyond those examined here (see Code and data availability for details).

Data Analyses

Apparent learning rate

We quantified learning behavior using the apparent learning rate, a model-agnostic measure that has been widely used to characterize human learning (Foucault & Meyniel, 2024; McGuire et al., 2014; Nassar et al., 2010, 2012; Piray & Daw, 2024; Vaghi et al., 2017). The paddle location and observation values in each trial were used to define the trial-wise update (in paddle location), prediction error and apparent learning rate. The update was calculated as the circular difference between the paddle location preceding and following the observation received at the given trial. The prediction error was calculated as the circular difference between the observation and the paddle location (i.e. prediction) chosen before receiving that observation. The apparent learning rate was then calculated as the update divided by the prediction error. As in previous studies (Foucault & Meyniel, 2024; Piray & Daw, 2024; Vaghi et al., 2017), to avoid outliers, the per-trial apparent learning rate was bounded, between -0.6 and +1.3, following (Foucault & Meyniel, 2024).

Analysis procedure

All our behavioral analyses followed a two-step procedure: first computing subject-level measures, then aggregating those measures across participants for the group-level analysis. The same procedure was applied to model simulations: models were run per participant using the exact observation sequences shown in the experiment, enabling the same two-step analysis.

At the subject level, participant measures were computed as follows. In Figures 3a, 3b, 4, and the corresponding Supplementary Figures, the trial-wise apparent learning rates were aggregated and averaged per bin of absolute prediction error split and variance level; in Figures 3c and 3d they were averaged per variance level. (Note that the variance level corresponds to the true generative variance, which could differ from the variance level currently inferred by the participant or model at the given trial.) In Figure 5, paddle size choices (coded 0 for small, 1 for large) were averaged per variance level.

Event-related analyses (Figure 6 and related Supplementary Figures) were conducted by epoching and averaging the data (paddle size choices, apparent learning rates) in time windows around each event type (change point in mean or variance). For the paddle-width-increase duration analysis (reported in the main text, related to Figure 6), we computed a duration per subject and event type. The duration was defined as the number of consecutive observations following the peak during which the subject’s paddle width curve remained above the value halfway between its pre-event baseline (mean at the two pre-event lags) and post-event peak (maximum after the event) after it reached its peak. A duration of 1 indicates that the curve fell below this halfway value immediately after the peak.

At the group level, all statistical tests we conducted were two-sided unless specified otherwise. Details of each test are given where the results are reported.

Implementation

All analyses were implemented in Python using standard scientific libraries (NumPy, SciPy, pandas, Matplotlib). See the Code and data availability section for code access details.

Data availability

The complete code and data supporting the findings of this study are available at the following GitHub repository: https://github.com/cedricfoucault/infer-pred_early-access. This repository includes everything required to reproduce all results reported in the manuscript, and the accompanying README provides step-by-instructions for doing so. Upon publication, all code and anonymized behavioral data will be made publicly available on GitHub; an early-access version of this repository is provided here for peer review. In addition, we will make the Bayesian modeling framework developed for this study publicly available as a standalone repository, accompanied by documentation and example notebooks demonstrating its use across different tasks and environments to facilitate reuse within the wider research community.

Model predictions without the response-probability mechanism.

Analysis details as in Figure 3, except that the analysis was performed on modified versions of the change-point model (panels a, c) and of the random-walk model (panels b, d) in which the response-probability mechanism was removed (i.e., response probability set to 1).

In change-point environments, as in the normative change-point model, human learning rates selectively peak in response to mean but not variance changes.

Apparent learning rate of human participants (left), the change-point model (top right), and the random-walk model (bottom right) following different types of latent changes in Experiment 1 (change point in mean, increase in variance, or decrease in variance). Both human participants and the normative change-point model show a large transient increase in apparent learning rate at the first observation following a change point in mean, but not following a change in variance. Shaded areas denote ± s.e.m. across participants.

Proportion of trials in which the paddle was moved as a function of absolute prediction error.

Panel a corresponds to Experiment 1 (change-point environment) and panel b to Experiment 2 (random-walk environment). Main plots show human behavior and insets show model predictions for comparison. Shaded areas indicate ± s.e.m. across participants. Participants were more likely to move the paddle in response to small prediction errors in the random-walk environment than in the change-point environment, consistent with the predictions of the environment-matched normative model.

Apparent learning rate conditional on paddle movement (i.e. excluding no-movement trials) as a function of absolute prediction error.

Details as in Figure 4, except that trials in which the agent (human participant or model) did not move the paddle—i.e., where the apparent learning rate was 0—were excluded. Beyond differences in response probability, when participants chose to move the paddle, they made larger updates in response to large prediction errors in the change-point environment than in the random-walk environment, consistent with the predictions of the environment-matched normative model.

Human paddle-width adjustments in the random-walk environment (Experiment 2).

Same analyses as in Figure 5, but using the data collected in the random-walk environment. As in the change-point environment (Experiment 1), participants significantly increased their use of the large paddle when the generative variance was high compared to when it was low (p < 10⁻⁷), consistent with the predictions of models that infer the current variance level.

Acknowledgements

We thank Amy X. Li for her assistance with the ethics approval process. Data collection for this study was funded by a Wellcome/Royal Society Sir Henry Dale Fellowship (208789/Z/17/Z) awarded to L.H., who was also supported by a strategic Longer and Larger award from BBSRC (BB/W003392/1). C.F. was supported by a Junior Research Fellowship from Christ Church, University of Oxford.

Additional information

Funding

Wellcome Trust (WT)

https://doi.org/10.35802/208789

  • Laurence T Hunt

UKRI | Biotechnology and Biological Sciences Research Council (AFRC) (BB/W003392/1)

  • Laurence T Hunt