Peer review process
Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.
Read more about eLife’s peer review process.Editors
- Reviewing EditorJoshua GoldUniversity of Pennsylvania, Philadelphia, United States of America
- Senior EditorJoshua GoldUniversity of Pennsylvania, Philadelphia, United States of America
Reviewer #1 (Public review):
This study examined the effects of uncertainty over states (i.e., stimuli) and uncertainty over rewards (i.e., reward probability) on human learning and decision-making in a simple reinforcement learning task. The authors proposed two hypotheses: (1) high uncertainty over states reduces the learning rate, and (2) visual salience drives decision-making. A Bayesian learner is proposed to support the first hypothesis and several regression analyses confirm this finding. Furthermore, the analysis of salience bias also supports the second hypothesis.
Strengths:
(1) The experiment is simple and solid.
(2) The experimental design is clever and consistent with several well-established paradigms.
Weaknesses:
(1) One of my main concerns is that the first conclusion "high uncertainty over states reduces learning rate" is not new and has been shown recently in Yoo et al. (2023). In that study, a slower learning rate was found when stimuli were perceptually similar. It seems to me that the only difference here is that simple Gabor patches are used instead of e.g., green vegetable images in that study. The conclusion is exactly the same.
(2) The second hypothesis should be more explicit. Instead of claiming "A drives B", can you show specific predictions for the direction of this influence? For example, given the same expected value, do human learners prefer to choose a high-contrast stimulus? and why?
(3) The analyses of salience bias support the second hypothesis. However, If I understand it correctly, there is no salience parameter (i.e., absolute contrast of each stimulus) in the decision process, according to Eqs. 4,5, and 6 in the Methods. In other words, the Bayesian learner should not exhibit a salience bias. The question then became, why do human learners have such a bias? What are the underlying mechanisms of the salience bias?
(4) If high perceptual uncertainty reduces the learning rate, why does the normative agent, which takes perceptual uncertainty into account, learn faster than the categorical agent, which has no perceptual uncertainty at all? Did I miss something?
(5) The learning algorithm is different from the standard Q-learning modeling approach. Better to include more explanation of why this type of learning algorithm is Bayesian optimal?
(6) Similar to the above, Bayesian modeling here only confirms that high perceptual uncertainty reduces the learning rate in an optimal Bayesian learner. Two questions remain elusive: (a) whether human learners are close to the Bayesian learner (i.e., near optimal). It seems that (a) is unlikely given several suboptimal heuristics (e.g., confirmation bias) found in humans. Then the question is (b) how optimal learning and suboptimal heuristics are combined in the human learning process. One of the major disadvantages of this study is that no new model is proposed to fit trial-by-trial human choices. I believe that building formal process models is the key to improving this study.
(7) The writing should be substantially improved. The main concern here is that the authors used several seemingly related but ambiguous words to represent the same concept. For example, "perceptual uncertainty" in Figures 1 & 2 indicate the contrast differences between two patches. But page 5 line 9 includes "belief-state uncertainty". Are they the same concept? Moreover, on page 18 line 17, if I understand it correctly, "perceptual uncertainty" here indicates sensory noise not contrast differences. Please carefully check all terminologies and use a single and concrete one to represent a concept throughout the paper.
(8) Similarly, is the "task state" on page 17 the same as the "perceptual state" in Figure 1&2?
(9) The Methods section could also be improved. For example, I am not sure how Eq. 5 is derived. Also, page 18 line 16 states that "in our simulations, we manipulated...'. I did not find any information about the simulation. How was the simulation performed? Did I miss something?
Reviewer #2 (Public review):
Summary:
The authors addressed the question of how perceptual uncertainty and reward uncertainty jointly shape value-based decision-making. They sought to test two main hypotheses: (H1) perceptual uncertainty modulates learning rates, and (H2) perceptual salience is integrated in value computation. Through a series of analyses, including regression models and normative computational modeling, they showed that learning rates were modulated by perceptual uncertainty (reflected by differences in contrast), supporting H1, and the update was indeed biased toward high-contrast (ie, salient) stimuli, supporting H2.
Strengths:
This is a timely and interesting study, with a strong theory-driven focus, reflected by the sophisticated experimental design that systematically tests both perceptual and reward uncertainty. This paper is also well written, with relevant examples (bakery) that draw the analogy to explain the main research question. The main response by participants is reward probability estimation (on a slider), which goes beyond commonly used binary choices and offers richness of the data, that was eventually used in the regression analysis. This work may also open new directions to test the interaction between perceptual decision-making and value-based decision-making.
Weaknesses:
Despite the strengths, multiple points may need to be clarified, to make this paper stronger.
(1) Experimental design:
(1a) The authors stated (page 6) that "The systematic manipulation of uncertainty resulted in three experimental conditions." If this is truly systematic, wouldn't there be a low-low condition, in a factorial design fashion? Essentially, the current study has H(perceptual uncertainty)-H(reward uncertainty), L(perceptual uncertainty)-H(reward uncertainty), H(perceptual uncertainty)-L(reward uncertainty), but naturally, one would anticipate a L-L condition. It could be argued that the L-L condition may seem too easy, causing a ceiling effect, but it nonetheless provides a benchmark for baseline learning when everting is not ambiguous. Unless the authors would love to, I am not asking the authors to run additional experiments to include all these 4 conditions. But it would be helpful to justify their initial choice of why a L-L condition was not included.
(1b) I feel there are certain degrees of imbalance regarding the levels of uncertainty. For reward uncertainty, {0.9, 0.1} is low uncertainty, and {0.7, 0.3} is uncertainty, whereas for perceptual uncertainty, the levels of differences in contrasts of the Gabor stimuli are much higher. This means the design appears to be more sensitive to detect any effect that can be caused by perceptual uncertainty (as there is sufficient variation) than reward uncertainty. Again, I am not asking the authors to run additional experiments, but it would be very helpful if they can explain/justify the choice of experimental set up and specification.
(2) Statistical Analysis:
(2a) There is some inconsistency regarding the stats used. For all the comparisons across the three conditions, sometimes an F-test is used followed by a series of t-tests (eg. page 6), but in other places, only pair-wise t-tests were reported without an F-test (eg, page 12). It would be helpful, for all of them, to have an F-test first, and then three t-tests. And for the F-test, I assume it was one-way ANOVA? This info was not explicit in the Methods. Also, what multiple comparison corrections were used, or whether it was used at all?
(2b) Regarding normative modeling, I am aware that this is a pure simulation without model fitting, but it loses the close relationship between the data and model without model fitting. I wonder if model fitting can be done at all. As it stands, there is even no qualitative evidence regarding how well the model could explain the data (eg, by adding real data to Figure 3e). In other words, now that it is a normative model, it is no surprise that it works, but it is not known if it works to account for human data. As a side note, I appreciate that certain groups of researchers tend not to run model estimation; instead, model simulations are used to qualitatively compare the model and data. This is particularly true for "normative models". But at least in the current case, I believe model estimation can be implemented, and will provide mode insights.
(2c) Relatedly, regarding specific results shown in Figure 4b - the normative agent has a near-zero effect on the fixed learning rate. I do not find these results surprising, because since the normative agent "knows" what is going to happen, and which state the agent is in, there is no need to update the prediction error in the classic Q-learning fashion. But humans, on the other hand, do NOT know the environment, hence they do not know what they are supposed to do, like the model. In essence, the model knows more than the humans in the task know. We can leave this to debate, but I believe most cognitive modelers would agree that the model should not know more than humans know. I think it would be helpful if the authors could discuss the advantages and disadvantages of using normative models in this case.
(2d) I find the results in Figure 5 interesting. But given the dependent variable is identical across the three correlations (ie, absolute estimation error), I would suggest the authors put all three predicters into a single multiple regression. This way, shared variance, if any, could also be taken into account by the model.
(2e) I feel the focus on testing H2 is somewhat too less on H1. The authors did a series of analyses on testing and supporting H1, but then only briefly on H2. On first reading, I wondered why not having a normative model also tests the effect of salience, but actually, salience is indeed included in the model (buried in the methods). I am curious to know whether analyzing the salience-related parameter (beta_4) would also support H2.