Schematic diagram of the experimental task in Gagne et al. (2020).

A. On each trial, the participants were shown two stimuli (with potential reward presented) and were instructed to choose one of them to receive feedback. Only one stimulus results in a reward.

B. Each task is evenly divided into two blocks: a stable and a volatile block. During the stable block, the environmental probability does not change, while in the volatile block, the probability flips every 20 trials.

Four experimental contexts

Model comparison results. The reds indicate the target models.

A-C. Averaged relative increase in negative log-likelihood (NLL), Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC) subtracted by the lowest values. Lower scores indicate better models. Error bars indicate the standard deviation for the estimated mean across 86 participants.

D. Protected exceedance probability (PXP) for the group-level Bayesian model selection.

Parameter analyses of the MOS6 model.

A. The weighting parameters for the healthy control participants (HC, purple) and the patients (PAT, pink) diagnosed with MDD and GAD. The y-axis means averaged preference over different volatility levels (volatile and stable) and feedback types (reward and aversive). Error bars reflect the standard deviation across 86 participants. Significance symbol conventions are: *: p < 0.05; p < 0.01; p < 0.001; n.s.: non-significant.

B. Decision preferences predict participants’ general factor score (g score) in the bifactor analysis reported in Gagne, et al. (2020). The y-axis indicates the averaged preference over different volatility levels (volatile and stable) and feedback types (reward and aversive). This average operation is permitted here because the logit of the weight is normally distributed. The shaded areas reflect 95% confidence intervals of the regression prediction.

C. The simulated learning curve for each strategy.

Simulated learning behavior of the two groups and the three strategies. The black dashed lines indicate the ground truth feedback probability. The simulated curves in B-C were generated by averaging 500 simulations.

A. The human learning curves of the two groups were produced using the data in the aversive context (see the reward context in Fig. S3) and smoothed by a Gaussian kernel with a window size of 5 trials and a s.t.d. of 2 trials.

B. Simulated the learning curves of the two groups using their fitted parameters in the MOS model.

Fitted learning rate and the learning rate differences between the stable and volatile conditions. The simulated data are generated by the MOS model and fitted by the FLR (A) and the RS (B) model.

Parameter and model recovery analyses

A. Parameter recovery for the MOS6 model. Each recovered parameter is averaged over ten samples.

B-C. Model recovery analyses. The MOS6 model is still the best-fitting model on synthetic data generated by MOS6 per se, according to the AIC (B) and BIC (C) comparisons. Error bars indicate the standard deviation of the mean value across 79 synthetic data points.

The reparametrized priors for parameters.

A. For parameters with a range of (0, 1), the raw values were sampled from N(0, 1.55) and passed through the sigmoid function.

B. For parameters with a range of (0, ∞), the raw values were sampled from N(2, 1) and passed through the exponential function.

C. For parameters with a range (−∞, ∞), the raw values were sampled from N(0, 10).

Parameter analyses of MOS22.

A. The weighting parameters for the healthy control participants (HC) and the patients (PAT) diagnosed with MDD and GAD. The y-axis means averaged preference over different volatility levels (volatile and stable) and feedback types (reward and aversive). Error bars reflect the standard error of the mean value across participants ×experimental conditions× feedback types.

B. Decision preferences predict participants’ general factor score (g score) in the bifactor analysis reported in Gagne, et al. (2020). The y-axis indicates the averaged preference over different volatility levels (volatile and stable) and feedback types (reward and aversive). This average operation is permitted here because the logit of the weight is normally distributed. The shaded areas reflect 95% confidence intervals of the regression prediction.

The human learning behaviors of the two groups in the reward condition.