Task design and performance.

a, Illustration of agent-feedback. Each selected bandit generated a true outcome, either a reward or a non-reward. Participants did not see this true outcome but instead were informed about it via a computerised feedback agent (reward: dollar sign; non-reward: sad emoji). Agents told the truth on most trials (left panel). However, on a random minority of trials they lied, reporting a reward when the true outcome was a non-reward or vice versa (right panel). b, Participants received feedback from 3 distinct feedback agents of variable credibility (i.e., truth-telling probability). Credibility was represented using a starbased system: a 3-star agent always reported the truth (and never lied), a 2-star agent reported the truth on 75% of trials (lying on the remaining 25%), and a 1-star agent reported the truth half of the time (lying on the other half). Participants were explicitly instructed and quizzed about the credibility of each agent prior to the task. c, Trial-structure: On each trial participants were first presented with the feedback agent for that trial (here, the 2-star agent) and next offered a choice between a pair of bandits (represented by identicons) (for 2sec). Next, choice-feedback was provided by the agent. d, Learning curves. Average choice accuracy as a function of trial number (within a bandit-pair). Thin lines: individual participants; thick line: group mean with thickness representing the group standard error of the mean for each trial.

Computational models and cross-fitting method.

a, Summary of the two model families. Bayesian models (top panel) represent a benchmark for normative learning. In these models, the observer maintains a belief-distribution over the probability a bandit is truly rewarding (denoted r). On each trial, this distribution is updated for the selected bandit according to Bayes rule, based on the valence (i.e., rewarding/non-rewarding; denoted f) and credibility of the trial’s reward feedback (denoted c). Credit-assignment models (bottom panel) are used to test deviations from Bayesian learning. Here, the observer maintains a subjective point-value (denoted Q) for each bandit. On each trial the value of the chosen bandit is updated based on a free CA parameter, quantifying the extent of value increase/decrease following positive/negative feedback. CA parameters can be modulated by the valence and credibility of feedback. b,c, Model selection between the credibility-CA model and the two variants of Bayesian models. Most participants were best fitted by a credibility-CA model, compared to the instructed-credibility Bayesian model (b) or free-credibility Bayesian (c) models. d, Cross-fitting method: Firstly, we fit a Bayesian model to empirical data, to estimate its (ML) parameters. This yields the Bayesian learning token that comes closest to accounting for a participant’s choices. Secondly, we simulate synthetic data based on the Bayesian model, using its ML parameters to obtain instances of how a Bayesian learner would behave in our task. Thirdly, we fit these synthetic data with a CA model, thus estimating “Bayesian CA parameters”, i.e., CA parameters capturing the performance of a Bayesian model. Finally, we fit the CA model directly to empirical data to obtain “empirical CA parameters”. A comparison of Bayesian and empirical CA parameters, allows us to identify, which aspects of behaviour are consistent with Bayesian belief updating, as well as characterize biases in behaviour that deviate from normative Bayesian learning.

Learning adaptations to credibility.

a, Probability of repeating a choice as a function of feedbackvalence and agent-credibility on the previous trial for the same bandit pair. The effect of feedback-valence on repetition increases as the feedback credibility increases, indicating that more credible feedback has a greater effect on behaviour. b, Similar analysis as in panel a, but for synthetic data obtained by simulating the main models. Simulations were computed using the ML parameters of participants for each model. The null model (bottom left) attributes a single CA to all credibility-levels, hence feedback exerts a constant effect on repetition (independently of its credibility). The credibility-CA model (bottom-right) allowed credit assignment to change as a function of source credibility, predicting varying effects of feedback with different credibility levels. The instructed-credibility Bayesian model (top left) updated beliefs normatively based on the true credibility of the feedback, and therefore predicted an increase effect of feedback on repetition as credibility increased. Finally, the free-credibility Bayesian model (top right) allowed for a possibility that participants use distorted credibilities for 1-star and 2-star agents when following a Bayesian strategy, also predicting an increase in the effect of feedback as credibility increased. c, ML credit assignment parameters for the credibility-CA model. Participants show a CA increase as a function of agent-credibility, as predicted by Bayesian-CA parameters for both the instructed-credibility and free-credibility Bayesian models. Moreover, participants showed a positive CA for the 1-star agent (which essentially provides feedback), which is only predicted by cross-fitting parameters for the free-credibility Bayesian model. d, ML credibility parameters for a free-credibility Bayesian model attributing credibility 1 to the 3-star agent but estimating credibility for the two lying agents as free parameters. Small dots represent results for individual participants/simulations, big circles represent the group mean (a,b,d) or median (c) of participants’ behaviour. Results of the synthetic model simulations are represented by diamonds (instructed-credibility Bayesian model), squares (free-credibility Bayesian model), upward-pointing triangles (null-CA model) and downward-pointing triangles (credibility-CA model). Error bars show the standard error of the mean. (*) p<.05, (**) p<0.01, (***) p<.001.

Contextual effects and learning.

a, Trials contributing to the analysis of effects of credibility-context on learning from the fully credible agent. We included only “current trials (n)” for which: 1) the last trial (trial n-k) offering the same bandit pair was associated with the 3-star agent, and 2) the immediately preceding context trial (n-k-1) featured a different bandit pair (providing a learning context irrelevant to current choice). We examined how choice-repetition (from n-k to n) was modulated by feedback valence on the last same-pair trial, and on the feedback agent on the context trial (i.e., the credibility context). Note the greyed-out star-rating on the current trial indicates the identity of the current agent and was not included in the analysis. b, Difference in probability of repeating a choice after receiving positive vs negative feedback (i.e., feedback effect) from the 3- star agent, as a function of the credibility context. The 3-star agent feedback-effect is greater when preceded by a low-credibility context (i.e., 1-star agent in preceding trial), than when preceded by a higher credibility context (i.e., 2-star or 3-star agent in preceding trial). Big circles represent the group mean, and error bars show the standard error of the mean. (*) p<.05, (**) p<0.01.

Positivity bias as a function of agent-credibility.

a, ML parameters from the credibility-valence-CA model. CA+ and CA-are free parameters representing credit assignments for positive and negative feedback respectively (for each credibility level). Our data revealed a positivity bias (CA+ > CA-) for all credibility levels. b, Absolute valence bias index (defined as CA+-CA) based on the ML parameters from the credibility-valence CA model. Positive values indicate a positivity bias, while negative values represent a negativity bias. c, Relative valence bias index (defined as (CA+-CA)/(|CA+|+|CA|)) based on the ML parameters from the credibility-valence CA model. Positive values indicate a positivity bias, while negative values represent a negativity bias. Small dots represent fitted parameters for individual participants and big circles represent the group median (a,b) or mean (c) (both of participants’ behavior), while squares are the median or mean of the fitted parameters of the free-credibility Bayesian model simulations. Error bars show the standard error of the mean. (***) p<.001 for ML fits of participants behavior.

Credit assignment is higher on true-feedback trials.

a, Posterior belief that feedback is true (y-axis) as a function of prior belief, i.e., during choice and before feedback receipt, that the selected bandit is rewarding (x-axis), feedback valence (dashed vs solid lines), and agent credibility (different colors). b, Distribution of posterior belief probability that feedback is true, calculated separately for each agent (1 or 2 star) and objective feedback-truthfulness (true or lie). These probabilities were computed based on trial-sequences and feedback participants experienced, indicating belief probabilities that feedback is true are higher in truth compared to lie trials. For illustration, plotted distributions pool trials across participants. The black line within each box represents the median, upper and lower bounds represent the third and first quartile respectively. The width of each half-violin plot corresponds to the density of each posterior belief value among all trials for a given condition. c, ML parameters for the “Truth-CA” model. Credit assignment parameters (y-axes) are shown as a function of agent-credibility and feedback-truthfulness (x-axes). These data show credit assignment was enhanced for true compared to false feedback (CAtrue>CAlie). Small dots represent fitted parameters for individual participants, big circles represent the group median, and error bars show the standard error of the mean. d, Like c. but here CA parameters were obtained by fitting the Truth-CA model not to empirical data but rather, to synthetic data generated from simulations of our alternative models (based on participants best fitting parameters). e, Effect of feedback-truthfulness on empirical Truth-CA parameters and on Truth-CA parameters based on synthetic simulations of our alternative models (obtained as in d.). Effects were estimated by regressing CA parameters from the Truth-CA model on the agent (1-star or 2-star) and on feedback-truthfulness. None of our models predicted higher credit assignment for true compared to false feedback. Lines represent 95% confidence intervals around the estimated effect coefficient. Small dots represent fitted parameters for individual simulations, diamonds represent the median value, and error bars show the standard error of the mean. (*) p<.05 and (***) p<.001

summary of free parameters for each of the CA models.