Capture-the-beams prediction task.

(a) Task structure. On each trial, participants positioned a paddle on a circular ring and selected its width (small or large) to predict the angular location of the next beam. After adjusting the paddle, they observed the next beam and received feedback. Capturing a beam with the small paddle yielded more points (+10) than with the large paddle (+5), but carried a higher risk of misses, which incurred a loss (–10). Beam locations were generated from a latent mean and variance that both changed over time. In Experiment 1, the generative mean changed abruptly at unpredictable change points, whereas in Experiment 2, it changed gradually and continuously from trial to trial according to a random walk. In both experiments, the generative variance alternated probabilistically between low- and high-variance states. Participants: n = 30 in each of the two experiments. (b, c) Example sequences from Experiment 1 (change-point environment) and Experiment 2 (random-walk environment), and example participant, whose paddle locations and widths (orange) tracked the generative mean and variance (purple), respectively.

Modeling framework.

(a) Structure of the models. On each trial, models update their beliefs via Bayesian inference over the assumed generative process, compute the predictive distribution for the next beam location, determine the response (paddle location and width) that maximizes expected reward under this distribution, and generate it probabilistically according to the associated response probability. (b) Model family. Models were defined by their assumptions about the generative process of beam locations. We systematically varied assumptions about the dynamics of the latent mean (change-point vs. random-walk) and about whether the variance was fixed or changing (and thus inferred as a latent variable). (c) Example inference of the change-point model in the change-point environment. The posterior distribution (heat map) accurately follows the true generative mean (top) and variance (bottom). For reference, we overlay the posterior mean and mode; both align closely with the ground truth.

Normative model predictions for the effects of environmental dynamics.

Environmental dynamics (change points vs. random walk) should fundamentally alter how learning rates adjust to prediction errors. (a,b) Trial-wise apparent learning rate of the normative models (y-axis) as a function of absolute prediction error (x-axis), shown separately for trials when the generative variance was low (dark) or high (light), under change-point dynamics (a) and random-walk dynamics (b). Each panel includes, as an inset, the corresponding distribution of absolute prediction errors for low- and high-variance trials, illustrating that higher stochasticity produces larger errors. The change-point model predicts a sharp increase in learning rate for large errors, reflecting inference about change points in the latent mean, whereas the random-walk model exhibits relatively stable learning rates across error magnitudes (beyond very small ones). Both models show a reduction in apparent learning rate for very small errors (≤10°) due to a lower response probability. (c,d) Average apparent learning rate in low- versus high-variance trials for the change-point (c) and random-walk (d) environments. Contrary to the simple expectation that higher stochasticity should reduce learning rates, the models reveal two opposing effects. Holding prediction errors constant, higher stochasticity decreases learning rates (a,b); however, because higher stochasticity also generates larger errors (insets), it indirectly elicits higher learning rates—resulting here in a net positive effect of stochasticity on learning rates (c, d). Simulation results were obtained using the experimental sequences from the change-point environment (Experiment 1) for the change-point model (panels a,c), and from the random-walk environment (Experiment 2) for the random-walk model (panels b,d).

Human learning-rate adaptation reflects environmental dynamics, consistent with normative model predictions.

(a,b) Participants’ apparent learning rate (y-axis) as a function of absolute prediction error (x-axis), shown separately for trials with low (blue) and high (orange) generative variance. Shaded areas indicate ± s.e.m. across participants. Panel a corresponds to Experiment 1 (change-point environment) and panel b to Experiment 2 (random-walk environment). Insets show model simulation results within each environment for comparison: human behavior aligns more closely with the normative model matched to the true environmental dynamics (green check mark) than with the mismatched model. In the change-point environment (a), the change-point model captures the sharp increase in learning rates for large errors, whereas the random-walk model fails to reproduce this pattern. In the random-walk environment (b), the random-walk model captures the relatively stable learning rates across error magnitudes, while the change-point model does not. Across both environments, learning rates were reduced at very small prediction errors, and for equal error magnitudes, were lower under high variance than low variance, consistent with model predictions.

Human paddle-width adjustments reflect their inference of the current generative variance.

(a–c) Mean proportion of trials in which the large paddle was selected when the generative variance was low versus high, in Experiment 1 for models (a,b) and humans (c). (a) Models without variance inference, which assume a fixed low or high variance, do not flexibly adjust paddle width, selecting the paddle size that matches their assumption rather than the true variance level. (b) Models with variance inference, which assume that the generative variance switches probabilistically, predict adaptive paddle use, with greater use of the large paddle under high variance and of the small paddle under low variance. (c) Human behavior shows the same pattern: participants significantly increased their use of the large paddle when variance was high compared to when it was low (p < 10⁻¹¹). Bars show group means (error bars ± s.e.m.); grey dots indicate individual participants, with lines connecting paired values from the same participant. For equivalent analyses of Experiment 2, see Supplementary Figure 5.

Disambiguating mean and variance changes: In change-point environments, humans distinguish change points in mean from change points in variance, consistent with the normative change-point model that jointly infers mean and variance.

(a–d) Proportion of large-paddle choices in Experiment 1 following the different types of latent changes (change point in mean, increase in variance, or decrease in variance). (a) The change-point model without variance inference fails to adjust its paddle width after variance changes. (b) The random-walk model with variance inference correctly adjusts its paddle width after variance increases and decreases, but conflates mean change points with variance increases: mean change-points elicit an even stronger switch to the large paddle than variance increases for 7 trials. (c,d) The normative change-point model and humans both correctly dissociate variance changes from mean change points. While mean change points can trigger a transient increase in large-paddle use for 1 to 3 trials, only variance changes produce a sustained switch in paddle choice. Shaded areas denote ± s.e.m. across participants.

Model predictions without the response-probability mechanism.

Analysis details as in Figure 3, except that the analysis was performed on modified versions of the change-point model (panels a, c) and of the random-walk model (panels b, d) in which the response-probability mechanism was removed (i.e., response probability set to 1).

In change-point environments, as in the normative change-point model, human learning rates selectively peak in response to mean but not variance changes.

Apparent learning rate of human participants (left), the change-point model (top right), and the random-walk model (bottom right) following different types of latent changes in Experiment 1 (change point in mean, increase in variance, or decrease in variance). Both human participants and the normative change-point model show a large transient increase in apparent learning rate at the first observation following a change point in mean, but not following a change in variance. Shaded areas denote ± s.e.m. across participants.

Proportion of trials in which the paddle was moved as a function of absolute prediction error.

Panel a corresponds to Experiment 1 (change-point environment) and panel b to Experiment 2 (random-walk environment). Main plots show human behavior and insets show model predictions for comparison. Shaded areas indicate ± s.e.m. across participants. Participants were more likely to move the paddle in response to small prediction errors in the random-walk environment than in the change-point environment, consistent with the predictions of the environment-matched normative model.

Apparent learning rate conditional on paddle movement (i.e. excluding no-movement trials) as a function of absolute prediction error.

Details as in Figure 4, except that trials in which the agent (human participant or model) did not move the paddle—i.e., where the apparent learning rate was 0—were excluded. Beyond differences in response probability, when participants chose to move the paddle, they made larger updates in response to large prediction errors in the change-point environment than in the random-walk environment, consistent with the predictions of the environment-matched normative model.

Human paddle-width adjustments in the random-walk environment (Experiment 2).

Same analyses as in Figure 5, but using the data collected in the random-walk environment. As in the change-point environment (Experiment 1), participants significantly increased their use of the large paddle when the generative variance was high compared to when it was low (p < 10⁻⁷), consistent with the predictions of models that infer the current variance level.