Dynamic integration of visual and reward information under uncertainty.

a| Learning requires assigning experienced rewards (e.g., taste experience) to the stimuli or states of the environment (e.g., type of bread). In this example, the person can clearly distinguish the two states (pretzel and baguette). When choosing an option (e.g., eating the baguette), they can easily learn an association between reward and state (corresponding to the stimulus “baguette” in this case). b| However, when states cannot be clearly dissociated based on sensory information, the person experiences perceptual uncertainty (e.g., two very similar types of bread). In this case, they can compute a belief about the state (belief state), quantifying how confidently the states can be distinguished (e.g., 40% baguette, 60% ciabatta). This leads to a credit-assignment problem, making it unclear what association between state and reward should be updated, and thus, the risk of learning the incorrect association between state and reward. c| Our first hypothesis concerns learning under different degrees of uncertainty of belief states. Learning behavior can be quantified using the learning rate (LR; illustrated by the slope of the line). It stands for the rate at which updates about reward expectations change with the prediction error. A learning rate of 1 indicates that only the prediction error is used to make a corresponding update. In contrast, when the learning rate is 0, it indicates that the prediction error has been ignored altogether. We hypothesized that the learning rate tends to be higher, leading to larger updates for a given prediction error when belief states are certain (e.g., 99% baguette, 1% pretzel; dark green line). In contrast, under higher belief-state uncertainty (e.g., 40% baguette, 60% ciabatta; light green line), learning rates are lower. d| Our second hypothesis concerns the integration of learned reward expectations (expected value) and visual salience during decision-making. Different options often have, next to different expected values, distinct perceptual features such as salience (e.g., one type of bread captures one’s attention). We hypothesized that both visual salience and expected value govern economic decision-making.

Uncertainty-augmented Gabor-Bandit task, choice performance, and learning behavior.

a| Subjects were asked to make an economic decision between two Gabor patches. Based on their choice, an outcome was presented. Finally, participants were required to report their subjective value expectation for a hypothetical choice using a slider. Inset plot| Experimental conditions. In the “both-uncertainties” condition, participants faced high levels of perceptual uncertainty, where Gabor patches were harder to distinguish, and reward uncertainty, which led to the “correct” option being rewarded with 70 % probability. In the perceptual-uncertainty condition, high levels of perceptual uncertainty were accompanied by low levels of reward uncertainty, which led to the “correct” option being rewarded with 90 % probability. In the reward-uncertainty condition, low levels of perceptual uncertainty, i.e., Gabor patches were easily distinguishable, were combined with high levels of reward uncertainty. b| Task contingency. The main aim of the task was to maximize rewards by learning the underlying task contingency between the action and reward, given the state of a trial. Each trial could potentially belong to state 0 or 1. The state determined the location of the high-contrast patch. In state 0, the right patch had a stronger contrast than the left patch and vice versa for state 1. The contingency parameter µ determined the reward probability given the action of the participant and the task state. In this example, in state 0, the probability of a reward is higher when choosing the left patch. In state 1, the reward probability is higher when choosing the right patch. Please note that in other blocks, this pattern was reversed, and participants were instructed to relearn the underlying contingency. c| Mean ± standard error of the mean (SEM) economic performance, defined as the frequency of choosing the more rewarding or correct option. d| Mean ± SEM subjective estimate of the reward probability based on the slider responses. e| Mean ± SEM subjective estimate of reward probability based on the slider responses plotted across trials. f| Relationship between accuracy in learning (absolute estimation error reflecting absolute difference between true reward probability and slider response) and choice behavior. Lower average estimation errors signal better learning and are moderately correlated with higher levels of economic performance.

Normative agent.

a| Contrast-difference observation. A trial can assume one of two hidden task states. The state determines the contrast difference between the high- and low-contrast patches (state st = 0 indicates that the right patch has a stronger contrast, and state st = 1 indicates that the left patch has a stronger contrast). Due to sensory noise (perceptual uncertainty), the agent cannot perceive the objectively presented contrast difference but a subjective observation that is sampled from a Gaussian observation distribution. Within this distribution, higher perceptual uncertainty is reflected in higher variance over possible observations. b| Belief state. The agent computes the probability of being in a given state (belief state) given the subjective observation. Larger contrast differences are translated into more distinct belief states. Subsequently, the agent considers the belief state for economic decision-making and learning. c| Uncertainty-weighted expected value. During decision-making, the agent combines the belief state and the learned reward probabilities to compute the expected value. The expected values for the two options are less distinct when belief states are more similar. d| Uncertainty-weighted learning. When receiving reward feedback after an economic choice, the agent takes into account the belief state during reward-based learning. The agent uses the belief states to determine how much the prediction error modulates the current trial’s update in the estimate of the contingency parameter. When there is less uncertainty regarding belief states, the agent uses a higher learning rate and, thus, engages in faster learning from prediction errors. However, to deal with the credit-assignment problem arising from highly uncertain belief states (i.e., due to uncertainty, it is unclear what association between stimulus and reward should be updated), the agent dynamically adjusts the learning rate to avoid incorrect assignment of obtained rewards to alternatives. e| When learning from multiple outcomes, the estimated contingency parameter approaches the actual contingency parameter with the passage of trials in a block. In contrast, an agent who ignores perceptual uncertainty and represents “categorical” belief states (i.e., assuming that it can perfectly perceive contrast differences and infer the hidden task state) shows reduced learning performance. In this case, the agent often updates the wrong association between stimuli and rewards, thereby leading to an underestimation of the contingency parameter.

Decomposing learning rates.

a| We computed single-trial learning rates reflecting the extent to which prediction errors (difference between obtained reward and subjectively reported reward probability) drive slider updates (difference in reported reward probability between current and previous trial). To examine whether learning rates were dynamically adjusted to how well subjects could discriminate the choice options, we divided the data into 10 contrast-difference bins, where lower bins correspond to more uncertain belief states. The plot shows the mean± standard error of the mean (SEM) learning rate for each bin. The increase as a function of contrast difference (Pearson’s r08 = 0.87, p = 0.001) suggests that subjects use higher learning rates when belief states are more clearly distinct. b| To decompose the influences of different factors on the learning rate, we developed a regression model (see inset equation on top of plot, where δ denotes the prediction error). Mean ± SEM coefficients for key regressors from the linear regression model are shown here. Positive fixed-LR coefficients indicate participants’ average tendency to learn from prediction errors (Cohen’s d = 0.68). c| The belief-state-adapted-LR coefficients reflect the adjustment of the learning conditional of the contrast difference (Cohen’s d = 0.53). d| This subplot shows an example participant illustrating the extent to which prediction errors weighted by contrast difference (belief-state-adapted LR) drive the update. In line with (a), this suggests a down-regulation of the learning rate when belief states are more uncertain. e| Across three levels of contrast-difference values, regression fits for a range of prediction errors of an example participant suggest that belief states modulated the learning rate. Higher contrast differences (i.e., on average, more distinct belief states) led to larger updates as compared to lower contrast differences.

Influence of learning on belief accuracy.

We examined the relationship between absolute estimation errors (difference between actual reward probability and subjective estimate of the probability) reflecting belief accuracy and several predictors of the regression model (fixed learning rates, belief states, confirmation bias). a| Larger fixed learning rates were associated with larger absolute estimation errors, suggesting that learning too much from a prediction error negatively impacts learning. We did not find a systematic effect of b| belief-state-adapted learning rates and c| the confirmation bias.

Salience bias.

We examined whether choice performance was governed by perceptual salience. In high-contrast blocks, the more salient option had a higher reward probability and vice versa for low-contrast blocks. Therefore, higher choice performance on high-contrast than low-contrast blocks reflects a positive choice bias towards the more salient option. The plot shows the mean ± standard error of the mean (SEM) salience bias (difference in economic performance between high- and low-contrast blocks) for the different types of uncertainty, which is significant in the condition with both perceptual and reward uncertainty (both-uncertainties condition) and the reward-uncertainty condition but not in the perceptual-uncertainty condition.

Absolute learning-rate analysis.

a| Mean ± standard error of the mean (SEM) absolute single-trial updates grouped across 10 absolute single-trial prediction-error bins (Pearson’s r08 = 0.97, p < 0.001). Participants’ slider updates were larger for larger prediction errors. b| Mean ± SEM absolute single-trial updates grouped across 10 contrast-difference bins (Pearson’s r08 = 0.66, p = 0.038). c| Mean ± SEM coefficients for key regressors from the linear regression model fit to absolute single-trial updates. Positive fixed-LR coefficients indicate participants’ proclivity to show larger updates for larger absolute prediction errors (Cohen’s d = 0.64). Similarly, belief-state-adapted-LR coefficients convey a contrast-difference-contingent update magnitude (Cohen’s d = 0.35). The confirmation-bias coefficient also revealed higher absolute learning from confirming outcomes (Cohen’s d = 1.07). d| Across three levels of contrast-difference values, regression fits for a range of absolute prediction-error values show contrast-difference-modulated flexible learning. Higher contrast differences led to larger updates, presumably driven by more distinct belief states, as compared to lower contrast differences, for a given prediction error. e| Relationship between absolute and signed belief-state-adapted LR across participants shows that both approaches to analyzing the data corroborate the presence of flexible learning (Pearson’s r08 = 0.92, p = 0.001). f| Larger updates were made on trials where the participant learned from outcomes that confirmed the participant’s belief estimate, across different values of prediction errors.

Full regression model and multi-collinearity check.

a| Mean ± standard error of the mean (SEM) coefficients for all key and control regressors from the signed and absolute linear regression model. b| Heat-map showing correlation coefficients between coefficient values for all regressors from the signed learning-rate analysis. c| Heat-map showing correlation coefficients between coefficient values for all regressors from the absolute learning-rate analysis.

Full signed and absolute learning-rate analyses.

a| Mean ± standard error of the mean (SEM) coefficients for all key and control regressors, including risk-adapted learning-rate coefficients from the signed and absolute linear regression model. b| Mean ± SEM coefficients for the salience-bias coefficient.

Model-fit assessment.

a| A visual representation of the goodness of fit, as illustrated by the model-predicted posterior updates using estimated parameters and single-trial regression data. b| R2 values show the regression model was moderately effective in capturing and explaining learning data despite heterogeneity across participants.

Example participant diagnostics.

a-c| Added variable plots assessing the relationship between all model regressors and updates. The evident linear relationship (dark blue line) suggests that the model regressors made impactful contributions in capturing the general trend of single-trial updates for individual participants. d-f| Single-subject posterior updates predicted by the model efficiently capture single-subject updates.

Split-half reliability correlation coefficients between odd and even trials.

Correlation between signed-analysis coefficients for a| fixed LR, b| belief-state-adapted LR, and c| confirmation bias. Correlation between absolute-analysis coefficients for d| fixed LR, e| belief-state-adapted LR, and f| confirmation bias.

Confirmation bias and signed estimation error.

a| Illustration of the hypothetical role of the confirmation bias during learning under uncertainty. The confirmation bias reflects stronger learning from choice-confirming than dis-confirming outcomes. In some situations, it might boost learned value representations. In this example, the confirmation bias helps the learner estimate the value more accurately (lower underestimation of the true value) compared to an unbiased learner (higher underestimation of the true value). b| We tested this idea based on the experimental data. To do so, we relied on signed estimation errors indicating the degree of under-versus overestimation of the true but unknown reward probability. Most subjects tended to underestimate the reward probability (average estimation error across all blocks). The confirmation bias and the signed estimation error turned out to be associated in that stronger confirmation biases statistically predicted more accurate reward probabilities (less underestimation of the true probabilities). This result is consistent with the idea that under some circumstances, the confirmation bias can be adaptive.

Influence of learning on belief accuracy.

Relationship between absolute estimation error and a| salience and b| congruence.

Influence of absolute learning rates on belief accuracy.

Relationship between absolute estimation error and coefficients for a| fixed LR, b| belief-state-adapted LR, c| confirmation bias, d| salience, and e| congruence.

Choice analysis of pilot experiment.

a| Mean± standard error of the mean (SEM) salience bias for types of uncertainties. Positive salience bias indicates participants’ preference for the high-salience option. b| Mean ± SEM coefficients for key regressors after fitting a linear regression model to block-level choice accuracy. Positive coefficients for the main effect of high contrast indicate participants’ proclivity to choose the high salience option. Negative coefficients for high levels of perceptual and reward uncertainty capture the decrease in participant’s performance with increasing uncertainty in the environment. c Mean ± standard error of the mean (SEM) choice performance across levels of perceptual uncertainty showing that high perceptual uncertainty leads to worse performance.

Normative agent’s simulated choice and learning behavior.

a| Averaged across 100 simulations, choice performance is corrupted by higher perceptual uncertainty (“both” and “perceptual” condition). b| Learned contingency parameter (µ) converges towards the actual contingency parameter. Simulated learning curves are more noisy in the higher reward uncertainty blocks due to riskier outcomes that the agent is required to learn from. In comparison to blocks with low reward uncertainty, the agent shows systematically slower learning in the higher reward uncertainty blocks (“both”, “reward” and “high uncertainty” condition). Additionally, we see slower learning due to extremely high perceptual uncertainty which primarily dictates the agent’s learning patterns (see learning curve for high-uncertainty blocks) in conjunction with reward uncertainty.