Task and model

A. On each trial subjects choose between two options and observe the outcome of the selected option. On forced trials, only the circled option can be selected (here, the purple one). B. Every 16 trials on average (SD=4.85) trials, subjects are asked to report their estimate of option values (black outline). Example of reward level estimate (left) and confidence report (right); only one option is shown here but both options are queried sequentially C. Example block. The latent mean reward levels (solid lines) shifted abruptly between 3 reward levels at uncued changepoints, independently for each option. Outcomes (dots) were drawn from Gaussians around the mean level with either low or high standard deviation. Each block was split into low-noise and high-noise halves corresponding to the low/high standard deviation of sampled outcomes. Subjects were instructed about noise levels, which were cued using a single or double circle around the fixation dot. Periods of 8 free trials alternated with periods of 4 forced trials (denoted by white and gray rectangles, circles and crosses, respectively). D. Posterior probability of an option following a single observation computed by the Bayesian ideal observer model. Model-based quantities were derived from the posterior: Expected reward (ER) is the sum of the reward levels weighted by their probabilities. Estimation uncertainty (EU) is 1 minus the probability of the maximum a posteriori (MAP) reward level.

Choice performance and reports.

A. Average performance (i.e. payoff) across sessions. Oracle performance is the expected payoff when selecting the option with highest (unknown) latent value on each trial. Chance performance is the expected payoff when selecting randomly and evenly both options. B. Fraction of reports matching the Bayesian model estimate of the reward level or matching of the (unknown) latent reward level. C. Average subject confidence for bins of Bayesian model confidence (with volatility fitted to subjects’ choices) about the latent reward level (1-EU). In all panels: dots correspond to subjects, error bars to SEM.

Multiple factors contribute to choices.

A: Model selection of main effects. All considered decision factors contribute to choices. Adding 1-4 factors consecutively to a base model of repetition bias + ΔER improves the cross-validated model fit (average choice likelihood). Colors represent effect sizes (Cohen’s d) of model fit difference. Highlighted squares (in contrast to semi-transparent ones) are minimal model pairs with a one-factor difference, showing reliable improvements from adding each term. The bottom row and four rightmost columns (Ablation tests) show unique factor contributions controlling for all other factors; *: p<0.05, **: p<0.01, ***: p<0.001 B. The selected (full) model coefficient estimates (all p < 0.05; dots are individual subjects) show expected reward (ΔER; red) and repetition bias (blue) to be the dominant drivers of choices, with heterogeneous uncertainty effects (ΔEU and EUt; green) and an additional contribution of the signed and unsigned previous prediction error (PE and UPE, yellow), capturing a win-stay-lose-shift (WSLS) heuristic. Uncertainty interacts with the WSLS heuristic (light green bars), such that when relative uncertainty is higher on the previously unchosen option or total uncertainty is higher, subjects relied more on the heuristic.

Effects of uncertainty on choices are subject-specific.

A. Top. Scatterplots of regression coefficients estimated per session (behavioral and fMRI) show stability of uncertainty estimates. Dots are subjects. Bottom. Other model factors are also correlated across sessions. B. Uncertainty effects (ΔEU) correlate with psychological measures of anxiety and impulsivity. *p < 0.05; ^0.05 < p < 0.1.

Parameter recovery.

Spearman’s correlations between generative and recovered parameter estimates across 1000 iterations of simulated choices for the main effects model (left) and the full model, including interactions (right). *p < 0.05, **p < 0.01, ***p < 0.001.

Model recovery.

Confusion matrix of proportion of simulated choice sequences (out of 100 iterations) generated with a given model (x-axis) that are best fitted by each model (y-axis) (including only main effects models; n=16). Note that the larger proportion of matches in the lower-left triangle compared to the upper-right triangle of the matrix indicates that behavior generated by a given model can be well accounted for by a more complex model, but not by a simpler model. In particular, behavior generated by the most complex model with all main effects (which best accounted for our subjects’ data) is not well fitted by any of the simpler models.

Counts of best fitting model across subjects.

Number of subjects who are best fitted by each of the main effects models. Note that the full model and the models excluding only either ΔEU or EUt are the best-fitting models for 24 of 56 subjects (43%), and models that include one or more uncertainty term best-fitted 44 of 56 subjects (79%).

ΔEU parameter misestimation without heuristics.

Omitting either the repetition bias (“no rep. bias”) or both repetition bias and win-stay-lose-shift heuristic terms (PE and UPE) (“no heuristics”) results in significantly different group-level ΔEU effects (see also Fig. S5). This suggests ΔEU can be misestimated if the heuristics are not accounted for.

Main effects model.

Parameter estimates for the model with main effects only (no interactions). * indicates p < 0.05.

Parameter stability of the model including interactions.

Spearman’s rank correlations of parameter estimates across testing sessions for full model, including interactions. *p < 0.05; ^p < 0.1.

Psychometric scales correlations with parameter estimates from the models.

Pearson correlations of all included psychometric scales and subscales with A: parameters of main effects-only model and B: parameters of full model including interactions. ^p < 0.10; *p < 0.05, **p < 0.01. Scale acronyms: WFRIS: Weiss Functional Impairment Rating Scale; STAI-Y: State-Trait Anxiety Inventory; LOT-R: Life Orientation Test - Revised; BIS-11: Barratt Impulsiveness scale. SHAPS: Snaith–Hamilton Pleasure Scale; AQ: Autism quotient; SPQ: Schizotypal personality questionnaire; BIG 5: Big five personality questionnaire (extroversion, agreeableness, conscientiousness, neuroticism, openness to experience).

Asymmetry of prediction error effect.

Test of the interpretation of the unsigned prediction error term (UPE) as capturing an asymmetry effect, as used in the model presented in the main text. The previous prediction error (PE, capturing win-stay-lose-shift heuristic) in the main-effects model is parameterized into “sign” (indicator 0/1 variable), “magnitude” (i.e., unsigned prediction error), and interaction terms (sign x magnitude). The interaction term captures the asymmetry of the PE effect in the positive and negative domains (beta = 0.137, SE = 0.034, t(55) = 4.037, p = 0.0002, Cohen’s d = 0.54). *p < 0.05. In the main text, this asymmetry is captured by the UPE (unsigned prediction error). The advantage of a model with PE and UPE over PE sign, PE magnitude and their interactions is model simplicity and easier modeling of interaction with uncertainty.

Model selection with interactions.

Model fit improvements from adding interaction terms to the selected main effects model. The mode complex model is better than any simpler model. Model numbers correspond to:

1. repeat ∼ 1+ΔER+ΔEU+EUt+PE+UPE (base model without interactions)

2. repeat ∼ 1+ΔER+ΔEU+EUt+PE+UPE+ΔER*ΔEU

3. repeat ∼ 1+ΔER+ΔEU+EUt+PE+UPE+ΔER*EUt

4. repeat ∼ 1+ΔER+ΔEU+EUt+PE+UPE+PE*ΔEU+UPE*ΔEU

5. repeat ∼ 1+ΔER+ΔEU+EUt+PE+UPE+PE*EUt+UPE*EUt

6. repeat ∼ 1+ΔER+ΔEU+EUt+PE+UPE+PE*ΔEU+PE*EUt+UPE*ΔEU+UPE*EUt

*p < 0.05, **p < 0.01, ***p < 0.001.

Decorrelation of reward and uncertainty by forced choices.

Anticorrelation (Pearson’s r) of ΔER and ΔEU on each free trial following a forced choice period. This analysis reveals the benefit of interleaving periods of free and forced choices: a negative correlation between ΔER and ΔEU builds up within periods of free trials, which forced choices (temporarily) abolish.

Ablation tests with full vs partial cross-validation.

Decrements in model fit relative to model with all main effects (repeat ∼ 1+ΔER+ΔEU+EUt+PE+UPE), from removing each additional factor (UPE, PE, EUt, ΔEU, left to right), for “Main models”, reported in the text (excluding volatility) and with “Full crossvalidation” (including volatility parameter). This analysis shows that similar conclusions can be made when fitting all parameters (including volatility) compared to the simpler version adopted in the main text, where volatility is not fitted in each fold of the crossvalidation.

Instructions for illustrated version of the task with back story (translated from French to English).