Environmental dynamics shape human learning: change points versus random walks

  1. Department of Experimental Psychology, University of Oxford, Oxford, United Kingdom
  2. Centre for Integrative Neuroimaging, University of Oxford, Oxford, United Kingdom
  3. Department of Psychiatry, University of Oxford, Oxford, United Kingdom

Peer review process

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    Tobias Donner
    University Medical Center Hamburg-Eppendorf, Hamburg, Germany
  • Senior Editor
    Michael Frank
    Brown University, Providence, United States of America

Reviewer #1 (Public review):

Foucault and colleagues examine how people's belief updating in a predictive inference task depends on qualitative differences in generative structure, in particular focusing on two generative structures frequently employed in learning and belief updating tasks (changepoints and random walks). While behavior and normative predictions for these structures have been explored many times in different tasks and settings, these exact structures have, to the best of my knowledge, never been explored in the same study and modeling framework for direct comparison. The authors use ideal observer models coupled with a response bias module to make predictions for what structure-appropriate adaptive learning would look like across the two conditions, then they ran an experiment to test behavioral predictions for the two structures under different levels of stochasticity. The authors present evidence that stochasticity changes in learning for two qualitatively different reasons, and that depending on which of these factors dominate, can have different effects on learning. They show that human participants showed qualitative trends consistent with adjusting their structural assumptions of the task to guide learning and adjusting their assessments of stochasticity.

The experiment was well designed and executed, and the paper was well written. The findings from the study are largely consistent with other work in the field, but there are a few advances that go beyond previously established findings, most notably a nuanced examination of how stochasticity affects learning behavior, which has the potential to provide an explanation for a notable discrepancy in the field (Pulco and Browning 2025; Piray and Daw 2024). The paper has notable strengths in its use of computational models to generate qualitative predictions that are evaluated in empirical behavioral data.

The current paper has a few weaknesses. It makes strong claims regarding the impacts of stochasticity on optimal learning that were difficult to evaluate, given a lack of clarity on the exact modeling that was implemented and incompletely supported by the existing analysis. The paper also lacks statistical support for some of its claims and evaluates models only through their ability to reproduce summary measures, rather than through direct model fitting.

Reviewer #2 (Public review):

Summary:

The manuscript by Foucault, Weber, and Hunt examines human learning behavior across change-point and continuously changing environments. The authors suggest that humans normatively adjust their learning dynamics to the current environmental dynamics. Moreover, they argue that humans not only track the means of the outcome-generating process, but also the variance, which extends recent work in this domain. The present results suggest that human learners are well able to distinguish the two moments and adjust their behavior accordingly.

Strengths:

(1) The paper is clearly written, and the figures demonstrate the results well. The authors clearly explain the two key results and their implications for the field.

(2) The paper uses a common modeling framework for the two environments. This makes it less likely that differences in learning behavior between the two environments are driven by general model properties rather than the specific learning mechanisms.

Weaknesses:

(1) Interpretation in terms of normative learning

(1.1) Perseveration and paddle movement

The model presented in the main manuscript is equipped with a response-probability mechanism that controls whether the paddle is updated. Especially on smaller prediction errors, the paddle is often not updated (perseveration). I wonder whether this mechanism truly reflects normative updating behavior or rather a heuristic strategy. Not moving the paddle is non-normative. A fully Bayesian model would hardly ever show a learning rate of exactly zero (one could argue only when the error is itself zero or after a massive amount of trials). This is partly apparent in Supplementary Figure 1, where the lowest learning rates are around alpha = 0.2 (change-point environment) and 0.5 (random walk).

Supplementary Figure 1 shows the learning rate for the normative model without the response-probability mechanism. Primarily in the random-walk environment, but to some extent also in the change-point condition, the shape of the learning rate changes quite dramatically compared to Figure 4. In the random-walk environment, the learning rate appears relatively stable, with a value slightly larger than 0.5. In the change-point case, the learning rate is somewhat higher in the range of smaller prediction errors. Doesn't this speak against the interpretation that the model in the main manuscript is really behaving in a purely normative fashion? The tendency to perseverate might reflect a simplified strategy, which is sometimes described as "satisficing". That is, in line with the authors' description of the mechanism, perseveration occurs when it seems "good enough" (Simon, 1956), which has been demonstrated in a belief updating context before (Bruckner et al., 2025; Gershman, 2020; Nassar et al., 2021).

Supplementary Figure 3 suggests that humans show quite a lot of this type of behavior. It indicates that in the change-point condition, in only 20% of the trials in the minimal prediction error range, participants update their prediction (i.e., in 80% of these trials, they perseverate on the previous prediction). This update probability increases as a function of the prediction error. In the random-walk condition, update probabilities are higher, starting at around 40% and also increasing as a function of the error.

Indeed, Supplementary Figure 4 suggests that the shape of the learning rate for true update trials is much shallower for humans and the "perseverative" model compared to the model in Supplementary Figure 1. This suggests that the curve in Figure 4 (main manuscript), hinting at a continuous increase in the learning rate, could be the result of a mixture of perseveration (alpha = 0) and higher learning rates compared to the normative model without the response-probability mechanism.

(1.2) Control models

One might reply that the response-probability mechanism just adds noise, while the actual learning mechanism is still normative. However, a standard Rescorla-Wagner model with the same response-probability mechanism might also show increasing apparent learning rates as a function of prediction error (when perseveration trials and regular update trials are averaged as a function of the prediction error).

Therefore, I suggest adding a control analysis with a Rescorla-Wagner model. One version with the same response mechanism yielding perseveration, and one standard Rescorla-Wagner model without this mechanism. This should help identify how well the present analyses can distinguish true learning-rate dynamics from averaging artifacts due to perseveration.

(1.3) Discussion of the possibility of non-normative learning mechanisms

Given the considerations above, I suggest a more balanced discussion of potential non-normative influences on learning, in particular, perseveration. Several previous papers have similarly shown that perseveration prominently characterizes human learning and decision-making (Bruckner et al., 2025; Gershman, 2020; Nassar et al., 2021), and in my opinion, it would be relevant to discuss how normative and non-normative mechanisms might jointly shape learning.

(2) Model description

The Bayesian model is quite central to the paper. However, the mathematical details are sparse, and I did not fully understand the differences between the model variants and how they were implemented. In particular, what approximations were used to make the model tractable? And how does the variance inference work? Is the learning rate directly computed, similar to the Nassar model, or is it derived from updates and prediction errors?

(3) Apparent learning rates in humans

The main learning-rate analyses compute the fraction of updates and prediction errors. For quality assurance, it would be useful to see a few supplementary histograms of the apparent learning rates. It would be great to have one plot across all participants and a few example plots for single participants. These analyses will reveal the distribution of learning rates and the proportion at the boundaries, which can sometimes be a source of bias.

References:

Bruckner, R., Nassar, M. R., Li, S.-C., & Eppinger, B. (2025). Differences in learning across the lifespan emerge via resource-rational computations. Psychological Review, 132(3), 556-580. https://doi.org/10.1037/rev0000526.

Gershman, S. J. (2020). Origin of perseveration in the trade-off between reward and complexity. Cognition, 204, 104394. https://doi.org/10.1016/j.cognition.2020.104394.

Nassar, M. R., Waltz, J. A., Albrecht, M. A., Gold, J. M., & Frank, M. J. (2021). All or nothing belief updating in patients with schizophrenia reduces precision and flexibility of beliefs. Brain, 144(3), 1013-1029. https://doi.org/10.1093/brain/awaa453.

Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological Review, 63(2), 129-138. https://doi.org/10.1037/h0042769.

Reviewer #3 (Public review):

Summary:

This paper uses a single Bayesian modelling framework to derive specific predictions for making inference, either with assumptions of a change-point structure or a gradually changing structure across tasks.

Strengths:

The paper nicely summarizes the slightly different subliteratures that have studied human behavior with models that only assume a single underlying task structure. The diagnostic predictions from the models are presented clearly, and the human data are nicely consistent with the model predictions.

As the authors discuss themselves, this work opens the door to many questions on the structured learning of inferring (from experience or verbal instructions) which meta-model is most appropriate to use.

Weaknesses:

Alignment between models and human behavior is mostly qualitative; the models are not fit to individual data (which could, for instance, uncover interesting differences between individuals.

There is no consideration of the possibility that individuals may not fully use one or the other meta-model (of gradual change vs changepoints), but instead a hybrid. Fits of the models to data may help uncover if some people (e.g., the 10% in experiment 2 that were best matched by the CP model?) use a slightly different mix of strategies than the one suggested by the verbal instructions received (which may cause the pattern in Figure 6d, which looks to have featured both models).

Author response:

We thank the reviewers for their constructive feedback and careful evaluation of our manuscript. We are encouraged that the study was viewed as well designed and clearly presented, that its computational modeling approach was recognized as a strength, and that the key findings were appreciated. We agree that some claims would benefit from additional support and clarification. Below, we outline the main revisions we will undertake to strengthen the manuscript and address the points raised in the reviews. These revisions are intended to strengthen the evidential support for our conclusions and clarify aspects of the results and modeling.

(1) Statistical support.

Some claims were judged to lack sufficient statistical support [Reviewer 1]. In the revised manuscript, we will carefully review all inferential claims and ensure that they are supported by appropriate statistical analyses. Where necessary, we will implement additional statistical tests and expand statistical reporting to ensure that differences between conditions, models, or behavioral measures are formally evaluated and that key aspects of the data are appropriately described.

(2) Modeling clarification.

Some aspects of the modeling were considered insufficiently clear, particularly regarding how the models were implemented [Reviewers 1 and 2]. We will expand the Methods section to provide a clearer and more complete description of the Bayesian models and their implementation. In particular, we will clarify that full probability distributions were computed (without reduced approximations such as those used in simplified Bayesian variants), and that the only approximation concerns numerical discretization of continuous state spaces at fine resolution. We will clarify that variance is part of the joint multidimensional state space and is inferred jointly with the mean. We will also explicitly state that apparent learning rates are derived from predicted paddle responses in the same way as for participants, and are not directly computed within the Bayesian inference process.

(3) Model fitting.

The absence of direct model fitting to individual participants was identified as a limitation [Reviewers 1 and 3]. In response, we will implement individual-level model fitting (to the extent feasible in practice) and conduct formal model comparison based on the fitted models. We will further validate the fitted models by examining whether they reproduce the main behavioral signatures observed in the data.

(4) Normative interpretation and control analyses.

The interpretation of the models as normative was questioned in light of the response-probability mechanism [Reviewer 2]. In the revision, we will clarify the distinction between the normative inference component of the model and the response-level mechanism. We will revise the framing of the results accordingly and ensure that normative claims are restricted to the inference component. We will also expand the discussion to integrate relevant literature on perseveration and satisficing, and clarify how normative and non-normative mechanisms may jointly shape behavior. In addition, following the reviewer’s suggestion, we will include control analyses using standard Rescorla–Wagner models, with and without the response-probability mechanism, to evaluate whether the observed signatures can be accounted for by simpler learning rules.

(5) Additional points.

We will also address the additional points raised in the reviews. Specifically, we will include supplementary histograms of apparent learning rates [Reviewer 2]. We will provide additional clarification and analyses regarding the effects of stochasticity on learning [Reviewer 1]. Finally, we will explore hybrid or mixture models and strategies and expand the discussion of this possibility [Reviewer 3].

We believe that these revisions will substantially strengthen the support for our claims and address the concerns raised in the current assessment. We are grateful for the reviewers’ engagement with our work and for their comments, which will allow us to significantly improve the clarity and strength of the manuscript.

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation