Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.
Read more about eLife’s peer review process.Editors
- Reviewing EditorAndreea DiaconescuUniversity of Toronto, Toronto, Canada
- Senior EditorMichael FrankBrown University, Providence, United States of America
Reviewer #1 (Public review):
Summary:
The study investigates how uncertainty and heuristic strategies influence reward-based decision-making, using a novel two-armed bandit task combined with computational modeling. It aims to disentangle uncertainty-driven behavior from heuristic strategies such as repetition bias and win-stay-lose-shift tendencies, while also exploring individual differences in these processes.
Strengths:
The paper is methodologically sound, and the inclusion of subjective reports enhances the validity of the model testing. The findings on the use of heuristics under specific uncertainty conditions are particularly intriguing.
Weaknesses:
(1) Unclear how the findings significantly diverge from previous work:
At the start of the introduction, the authors propose a working hypothesis of "heterogeneity in the uncertainty effects." However, this concept is already well-established in the field. Foundational work by Yu and Dayan (2005) and more recent studies by Gershman and colleagues on total and relative uncertainty have provided substantial evidence supporting this idea. Additionally, the notion that such heterogeneity could explain mixed findings has been discussed in studies like Wilson (2014). What specific problem are the authors addressing here, and how does their work significantly differ from previous research?
Later on, however, it seems that the authors' hypothesis is to test the role of multiple factors in driving participants' decisions in the context considered by the authors. First, why is it important to solve such a puzzle? Second, this too has been investigated previously, see for example Dubois (2022), eLife. Therefore, what novel things is this paper bringing to the table? I do see that the task is novel - mostly combining different experimental strategies previously adopted - and that the model includes both heuristics and uncertainty-based strategies, which can account for their shared variance ... but are the authors really answering a novel question? Also, it is not very clear which question the authors are answering see point C below.
(2) The sample size appears to be quite small, and the results would be more convincing if supported by a replication study.
(3) The results section can be somewhat unclear at times, as it introduces novel aspects (e.g., the fMRI session) or questions that were not previously explained within the framework outlined in the introduction. While the findings related to psychopathology are interesting, their relevance to the research question posed in the introduction is not immediately clear. If these findings have significant added value, it would be helpful for the authors to highlight this earlier in the manuscript. Similarly, the results on individual differences in uncertainty (Section 3.6), though intriguing, appear tangential to the primary research question regarding the role of multiple factors in driving participants' decisions. Overall, it would strengthen the manuscript to clarify the main research question and ensure the results are more directly aligned with it.
Reviewer #2 (Public review):
Summary:
This paper addresses mixed findings regarding levels of uncertainty-seeking/avoidance in past reinforcement learning studies. Using computational modelling and a novel variant of a bandit task performed across two sessions, the authors investigate the extent to which uncertainty-driven behaviour can be distinguished from heuristic-like behaviours (e.g., repetition, win-stay/lose-switch). They demonstrate that heuristics account for a significant and stable portion of the variance in choice behaviour, which might otherwise be misattributed to uncertainty-driven parameters. Additionally, they find that relative uncertainty explains additional variance and provides some evidence of stability across sessions.
Strengths:
The task is well-designed to tease apart multiple different factors contributing to choice during a bandit task, including separating those tied to uncertainty per se versus other policies. They validate a Bayesian model to account for learning and choice behaviour, as well as subjective estimates of learned value and confidence in these values. The work employs comprehensive model comparison to characterise behaviour in this task, and points to important risks within research on uncertainty preferences using bandit-like tasks when failing to fully account for heuristic-like drivers of such behaviour.
Weaknesses:
Part of this work seeks to relate individual differences in various choice parameters across sessions and to relate those to self-report scales. The estimates of cross-session reliability are valuable, particularly when comparing across the different parameters (e.g., heuristic ones being most robust), but the uncertainty-related parameters are interpreted too liberally (i.e., as being stable across sessions when both were weak and one was not significant). Moreover, the correlations with external scales are very hard to interpret given the number of comparisons that were run without correction. The findings overall will have value to people interested in modelling uncertainty preferences in learning tasks -- some of whom have considered heuristic factors less than others -- but perhaps be of more moderate impact beyond this group.
Reviewer #3 (Public review):
Summary:
This work investigated how uncertainty, repetition bias, and win-stay-lose-shift processes influence reward-based decision-making. Using a modified two-armed bandit task and computational models, the authors provide evidence for individual variation in the integration of uncertainty on choice behaviour that remains somewhat stable across two experiment sessions. The authors also find a number of interesting results due to their ability to disentangle components of this decision-making process using their novel task and models. Specifically, they find that higher total uncertainty leads people to use more heuristic-based strategies like making repetitive choices or engaging in win-stay-lose-shift behaviour. However, they also find that there are individual differences in how people use uncertainty to guide their choices, and that these differences are consistent within individuals across multiple experiment sessions. This finding can help explain prior inconsistencies in the literature, where researchers have found evidence for both uncertainty-seeking and uncertainty-avoidance tendencies. Overall, this research adds to our understanding of the mechanisms of uncertainty-modulated learning and decision-making.
Strengths:
One of the primary strengths of this research is that it helps provide support for the idea that mixed and null results in the prior literature could be due to individual differences in uncertainty preferences and that this individual variation is somewhat stable within subjects across multiple experiment sessions. The authors cleverly disentangle expected reward and uncertainty by interleaving free and forced choice trials in their behavioural task, illuminating the novel impact of reward and uncertainty on this particular decision process. However, it should be noted that this behavioural decorrelation does not persist beyond the first few trials after a forced choice period, so whether or not the decorrelation is truly robust remains unclear.
The authors also use computational modelling to further probe the influence of uncertainty on reward-based choices. Specifically, they compare a Bayesian ideal observer learning model and a variation on a standard Rescorla-Wagner model, finding that a version of the Bayesian model fits the participants' behaviour best. The model descriptions and analyses are clearly explained and methodologically rigorous.
Interestingly, the authors find that both repetition bias and model parameters that capture a win-stay-lose-shift strategy (signed and unsigned previous prediction error) significantly improve their model fits. They also make an important point that if win-stay-lose-shift behaviour is not controlled for, then switch behaviour (for example, switching to a lower expected reward option after receiving a large loss) may appear to be uncertainty-seeking when it is not. This idea speaks to a larger point that future research should be careful to not conflate "exploration" with "uncertainty-seeking."
Weaknesses:
This research has some weaknesses regarding the correlations between the psychopathology measures and the computational model parameters. First, the choice of self-report measures is not well supported by any specific hypotheses. Relatedly, the authors do not include sufficient rationale for their choice to include only results from the anxiety and impulsivity measures in the main text while leaving out significant findings for a number of correlations between other measures and parameter coefficients. It is also not clear how the model parameters are being derived for use in each of these correlational analyses. In sum, the manuscript as-is contains inconsistent and/or confusing reporting of correlation results that require further clarification.