Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, public reviews, and a provisional response from the authors.
Read more about eLife’s peer review process.Editors
- Reviewing EditorTobias DonnerUniversity Medical Center Hamburg-Eppendorf, Hamburg, Germany
- Senior EditorMichael FrankBrown University, Providence, United States of America
Reviewer #1 (Public review):
Foucault and colleagues examine how people's belief updating in a predictive inference task depends on qualitative differences in generative structure, in particular focusing on two generative structures frequently employed in learning and belief updating tasks (changepoints and random walks). While behavior and normative predictions for these structures have been explored many times in different tasks and settings, these exact structures have, to the best of my knowledge, never been explored in the same study and modeling framework for direct comparison. The authors use ideal observer models coupled with a response bias module to make predictions for what structure-appropriate adaptive learning would look like across the two conditions, then they ran an experiment to test behavioral predictions for the two structures under different levels of stochasticity. The authors present evidence that stochasticity changes in learning for two qualitatively different reasons, and that depending on which of these factors dominate, can have different effects on learning. They show that human participants showed qualitative trends consistent with adjusting their structural assumptions of the task to guide learning and adjusting their assessments of stochasticity.
The experiment was well designed and executed, and the paper was well written. The findings from the study are largely consistent with other work in the field, but there are a few advances that go beyond previously established findings, most notably a nuanced examination of how stochasticity affects learning behavior, which has the potential to provide an explanation for a notable discrepancy in the field (Pulco and Browning 2025; Piray and Daw 2024). The paper has notable strengths in its use of computational models to generate qualitative predictions that are evaluated in empirical behavioral data.
The current paper has a few weaknesses. It makes strong claims regarding the impacts of stochasticity on optimal learning that were difficult to evaluate, given a lack of clarity on the exact modeling that was implemented and incompletely supported by the existing analysis. The paper also lacks statistical support for some of its claims and evaluates models only through their ability to reproduce summary measures, rather than through direct model fitting.
Reviewer #2 (Public review):
Summary:
The manuscript by Foucault, Weber, and Hunt examines human learning behavior across change-point and continuously changing environments. The authors suggest that humans normatively adjust their learning dynamics to the current environmental dynamics. Moreover, they argue that humans not only track the means of the outcome-generating process, but also the variance, which extends recent work in this domain. The present results suggest that human learners are well able to distinguish the two moments and adjust their behavior accordingly.
Strengths:
(1) The paper is clearly written, and the figures demonstrate the results well. The authors clearly explain the two key results and their implications for the field.
(2) The paper uses a common modeling framework for the two environments. This makes it less likely that differences in learning behavior between the two environments are driven by general model properties rather than the specific learning mechanisms.
Weaknesses:
(1) Interpretation in terms of normative learning
(1.1) Perseveration and paddle movement
The model presented in the main manuscript is equipped with a response-probability mechanism that controls whether the paddle is updated. Especially on smaller prediction errors, the paddle is often not updated (perseveration). I wonder whether this mechanism truly reflects normative updating behavior or rather a heuristic strategy. Not moving the paddle is non-normative. A fully Bayesian model would hardly ever show a learning rate of exactly zero (one could argue only when the error is itself zero or after a massive amount of trials). This is partly apparent in Supplementary Figure 1, where the lowest learning rates are around alpha = 0.2 (change-point environment) and 0.5 (random walk).
Supplementary Figure 1 shows the learning rate for the normative model without the response-probability mechanism. Primarily in the random-walk environment, but to some extent also in the change-point condition, the shape of the learning rate changes quite dramatically compared to Figure 4. In the random-walk environment, the learning rate appears relatively stable, with a value slightly larger than 0.5. In the change-point case, the learning rate is somewhat higher in the range of smaller prediction errors. Doesn't this speak against the interpretation that the model in the main manuscript is really behaving in a purely normative fashion? The tendency to perseverate might reflect a simplified strategy, which is sometimes described as "satisficing". That is, in line with the authors' description of the mechanism, perseveration occurs when it seems "good enough" (Simon, 1956), which has been demonstrated in a belief updating context before (Bruckner et al., 2025; Gershman, 2020; Nassar et al., 2021).
Supplementary Figure 3 suggests that humans show quite a lot of this type of behavior. It indicates that in the change-point condition, in only 20% of the trials in the minimal prediction error range, participants update their prediction (i.e., in 80% of these trials, they perseverate on the previous prediction). This update probability increases as a function of the prediction error. In the random-walk condition, update probabilities are higher, starting at around 40% and also increasing as a function of the error.
Indeed, Supplementary Figure 4 suggests that the shape of the learning rate for true update trials is much shallower for humans and the "perseverative" model compared to the model in Supplementary Figure 1. This suggests that the curve in Figure 4 (main manuscript), hinting at a continuous increase in the learning rate, could be the result of a mixture of perseveration (alpha = 0) and higher learning rates compared to the normative model without the response-probability mechanism.
(1.2) Control models
One might reply that the response-probability mechanism just adds noise, while the actual learning mechanism is still normative. However, a standard Rescorla-Wagner model with the same response-probability mechanism might also show increasing apparent learning rates as a function of prediction error (when perseveration trials and regular update trials are averaged as a function of the prediction error).
Therefore, I suggest adding a control analysis with a Rescorla-Wagner model. One version with the same response mechanism yielding perseveration, and one standard Rescorla-Wagner model without this mechanism. This should help identify how well the present analyses can distinguish true learning-rate dynamics from averaging artifacts due to perseveration.
(1.3) Discussion of the possibility of non-normative learning mechanisms
Given the considerations above, I suggest a more balanced discussion of potential non-normative influences on learning, in particular, perseveration. Several previous papers have similarly shown that perseveration prominently characterizes human learning and decision-making (Bruckner et al., 2025; Gershman, 2020; Nassar et al., 2021), and in my opinion, it would be relevant to discuss how normative and non-normative mechanisms might jointly shape learning.
(2) Model description
The Bayesian model is quite central to the paper. However, the mathematical details are sparse, and I did not fully understand the differences between the model variants and how they were implemented. In particular, what approximations were used to make the model tractable? And how does the variance inference work? Is the learning rate directly computed, similar to the Nassar model, or is it derived from updates and prediction errors?
(3) Apparent learning rates in humans
The main learning-rate analyses compute the fraction of updates and prediction errors. For quality assurance, it would be useful to see a few supplementary histograms of the apparent learning rates. It would be great to have one plot across all participants and a few example plots for single participants. These analyses will reveal the distribution of learning rates and the proportion at the boundaries, which can sometimes be a source of bias.
References:
Bruckner, R., Nassar, M. R., Li, S.-C., & Eppinger, B. (2025). Differences in learning across the lifespan emerge via resource-rational computations. Psychological Review, 132(3), 556-580. https://doi.org/10.1037/rev0000526.
Gershman, S. J. (2020). Origin of perseveration in the trade-off between reward and complexity. Cognition, 204, 104394. https://doi.org/10.1016/j.cognition.2020.104394.
Nassar, M. R., Waltz, J. A., Albrecht, M. A., Gold, J. M., & Frank, M. J. (2021). All or nothing belief updating in patients with schizophrenia reduces precision and flexibility of beliefs. Brain, 144(3), 1013-1029. https://doi.org/10.1093/brain/awaa453.
Simon, H. A. (1956). Rational choice and the structure of the environment. Psychological Review, 63(2), 129-138. https://doi.org/10.1037/h0042769.
Reviewer #3 (Public review):
Summary:
This paper uses a single Bayesian modelling framework to derive specific predictions for making inference, either with assumptions of a change-point structure or a gradually changing structure across tasks.
Strengths:
The paper nicely summarizes the slightly different subliteratures that have studied human behavior with models that only assume a single underlying task structure. The diagnostic predictions from the models are presented clearly, and the human data are nicely consistent with the model predictions.
As the authors discuss themselves, this work opens the door to many questions on the structured learning of inferring (from experience or verbal instructions) which meta-model is most appropriate to use.
Weaknesses:
Alignment between models and human behavior is mostly qualitative; the models are not fit to individual data (which could, for instance, uncover interesting differences between individuals.
There is no consideration of the possibility that individuals may not fully use one or the other meta-model (of gradual change vs changepoints), but instead a hybrid. Fits of the models to data may help uncover if some people (e.g., the 10% in experiment 2 that were best matched by the CP model?) use a slightly different mix of strategies than the one suggested by the verbal instructions received (which may cause the pattern in Figure 6d, which looks to have featured both models).