Task design and performance.

a, Illustration of agent-feedback. Each selected bandit generated a true outcome, either a reward or a non-reward. Participants did not see this true outcome but instead were informed about it via a computerised feedback agent (reward: dollar sign; non-reward: sad emoji). Agents told the truth on most trials (left panel). However, on a random minority of trials they lied, reporting a reward when the true outcome was a non-reward or vice versa (right panel). b, Participants received feedback from 3 distinct feedback agents of variable credibility (i.e., truth-telling probability). Credibility was represented using a star based system: a 3-star agent always reported the truth (and never lied), a 2-star agent reported the truth on 75% of trials (lying on the remaining 25%), and a 1-star agent reported the truth half of the time (lying on the other half). Participants were explicitly instructed and quizzed about the credibility of each agent prior to the task. c, Trial-structure: On each trial participants were first presented with the feedback agent for that trial (here, the 2-star agent) and next offered a choice between a pair of bandits (represented by identicons) (for 2sec). Next, choice-feedback was provided by the agent. d, Learning curves. Average choice accuracy as a function of trial number (within a bandit-pair). Thin lines: individual participants; thick line: group mean with thickness representing the group standard error of the mean for each trial.

Computational models and cross-fitting method.

a, Summary of the two model families. In our Bayesian models (top panel), the observer maintains a belief-distribution over the probability a bandit is truly rewarding (denoted r). On each trial, this distribution is updated for the selected bandit according to Bayes rule, based on the valence (i.e., rewarding/non-rewarding; denoted f) and credibility of the trial’s reward feedback (denoted c). In credit-assignment models (bottom panel), the observer maintains a subjective point-value (denoted Q) reflecting a choice propensity for each bandit. On each trial the propensity of the chosen bandit is updated based on a free CA parameter, quantifying the extent of value increase/decrease following positive/negative feedback. CA parameters can be modulated by the valence and credibility of feedback. b,c, Model selection between the credibility-CA model (without perseveration) and the two variants of Bayesian models. Most participants were best fitted by a credibility-CA model, compared to the instructed-credibility Bayesian model (b) or free-credibility Bayesian (c) models. d, Cross-fitting method: Firstly, we fit a Bayesian model to empirical data, to estimate its (ML) parameters. This yields the Bayesian learning token that comes closest to accounting for a participant’s choices. Secondly, we simulate synthetic data based on the Bayesian model, using its ML parameters to obtain instances of how a Bayesian learner would behave in our task. Thirdly, we fit these synthetic data with a CA model, thus estimating “Bayesian CA parameters”, i.e., CA parameters capturing the performance of a Bayesian model. Finally, we fit the CA model directly to empirical data to obtain “empirical CA parameters”. A comparison of Bayesian and empirical CA parameters, allows us to identify, which aspects of behaviour are consistent with our Bayesian models, as well as characterize biases in behaviour that deviate from our Bayesian learning models.

Learning adaptations to credibility.

a, Probability of repeating a choice as a function of feedbackvalence and agent-credibility on the previous trial for the same bandit pair. The effect of feedback-valence on repetition increases as the feedback credibility increases, indicating that more credible feedback has a greater effect on behaviour. b, Similar analysis as in panel a, but for synthetic data obtained by simulating the main models. Simulations were computed using the ML parameters of participants for each model. The null model (bottom left) attributes a single CA to all credibility-levels, hence feedback exerts a constant effect on repetition (independently of its credibility). The credibility-CA model (bottom-right) allowed credit assignment to change as a function of source credibility, predicting varying effects of feedback with different credibility levels. The instructed-credibility Bayesian model (top left) updated beliefs based on the true credibility of the feedback, and therefore predicted an increase effect of feedback on repetition as credibility increased. Finally, the free-credibility Bayesian model (top right) allowed for a possibility that participants use distorted credibilities for 1- star and 2-star agents when following a Bayesian strategy, also predicting an increase in the effect of feedback as credibility increased. c, ML credit assignment parameters for the credibility-CA model. Participants show a CA increase as a function of agent-credibility, as predicted by Bayesian-CA parameters for both the instructed-credibility and free-credibility Bayesian models. Moreover, participants showed a positive CA for the 1-star agent (which essentially provides random feedback), which is only predicted by cross-fitting parameters for the free-credibility Bayesian model. d, ML credibility parameters for a free-credibility Bayesian model attributing credibility 1 to the 3-star agent but estimating credibility for the two lying agents as free parameters. Small dots represent results for individual participants/simulations, big circles represent the group mean (a,b,d) or median (c) of participants’ behaviour. Results of the synthetic model simulations are represented by diamonds (instructed-credibility Bayesian model), squares (free-credibility Bayesian model), upward-pointing triangles (null-CA model) and downward-pointing triangles (credibility-CA model). Error bars show the standard error of the mean. (*) p<.05, (**) p<0.01, (***) p<.001.

Contextual effects and learning.

a, Trials contributing to the analysis of effects of credibility-context on learning from the fully credible agent. We included only “current trials (n)” for which: 1) the last trial (trial nk) offering the same bandit pair (i.e., the learning trial) was associated with the 3-star agent, and 2) the immediately preceding context trial (n-k-1) featured the same bandit pair. We examined how choice-repetition (from n-k to n) was modulated by feedback valence on the learning trial, and by the feedback agent on the context trial. Note the greyed-out star-rating on the current trial indicates the identity of the current agent and was not included in the analysis. b, Difference in probability of repeating a choice after receiving positive vs negative feedback (i.e., feedback effect) from the 3-star agent, as a function of the credibility context. The 3-star agent feedback-effect is greater when preceded by a lower-credibility context, compared to a higher credibility context. Big circles represent the group mean, and error bars show the standard error of the mean. (*) p<.05, (**) p<0.01. c, We ran the same mixed effects model (regressing choice repetition of learning-trial feedback valence and on contextual credibility) on simulated data (See Methods: Model-agnostic analysis of contextual credibility effects on choice-repetition). The panels show contrasts in feedback-effect (from the 3-star agent in the learning trial) on choice-repetition between contextual credibility agent-pairs. None of our models predicted the contrast effects observed in participants. Histograms represent the distribution of regression coefficients based on 101 group-level synthetic datasets, simulated based on each model. The label right to each histograms represents the proportion of simulated datasets that predict an equal or stronger effect than the one observed in participants.

Positivity bias as a function of agent-credibility.

a, ML parameters from the credibility-valence-CA model. CA+ and CA− are free parameters representing credit assignments for positive and negative feedback respectively (for each credibility level). Our data revealed a positivity bias (CA+ > CA−) for all credibility levels. b, Absolute valence bias index (defined as CA+−CA) based on the ML parameters from the credibility-valence CA model. Positive values indicate a positivity bias, while negative values represent a negativity bias. c, Relative valence bias index (defined as (CA+−CA)/(|CA+|+|CA|)) based on the ML parameters from the credibility-valence CA model. Positive values indicate a positivity bias, while negative values represent a negativity bias. Small dots represent fitted parameters for individual participants and big circles represent the group median (a,b) or mean (c) (both of participants’ behavior), while squares are the median or mean of the fitted parameters of the free-credibility Bayesian model simulations. Error bars show the standard error of the mean. (***) p<.001 for ML fits of participants behavior.

Credit assignment is enhanced for feedback that is more likely to be true.

a, The posterior belief that the received feedback is truthful (y-axis) is plotted against the prior belief (held before receiving feedback) that the chosen bandit would be rewarding (x-axis). The plot illustrates how this posterior belief is influenced by the valence of the feedback (reward indicated by solid lines, no reward by dashed lines) and the credibility of the feedback agent (represented by different colors). b, Distribution of posterior belief probability that feedback is true, calculated separately for each agent (1 or 2 star) and objective feedback-truthfulness (true or lie). These probabilities were computed based on trial-sequences and feedback participants experienced, indicating belief probabilities that feedback is true are higher in truth compared to lie trials. For illustration, plotted distributions pool trials across participants. The black line within each box represents the median, upper and lower bounds represent the third and first quartile respectively. The width of each half-violin plot corresponds to the density of each posterior belief value among all trials for a given condition. c, Maximum likelihood (ML) estimate of the “truth-bonus” parameter derived from the “Truth-CA” model. The significantly positive truth bonus indicates that participants increased credit assignment as a function of the likelihood this feedback was true (after controlling for the credibility of this feedback) . Each small dot represents the fitted truth-bonus parameter for an individual participant, the large circle indicates the group mean, and the error bars represent the standard error of the mean. d, Distribution of truth-bonus parameters predicted by synthetic simulations of our alternative computational models. For each alternative model, we generated 101 synthetic group-level datasets based on the maximum likelihood parameters fitted to the participants’ actual behavior. Each of these datasets was then independently fitted with the “Truth-CA” model. Each histogram represents the distribution of the mean truth bonus across the 101 simulated group-level datasets for a specific alternative model. Notably, the truth bonus observed in our participants was significantly higher than the truth bonus predicted by any of these alternative models (proportion of datasets predicting a higher truth bonus: Instructed-credibility Bayesian < 0.01, Free-credibility Bayesian = 0, Credibility-CA = 0, Credibility-Valence CA = 0). (**) p<.01

summary of free parameters for each of the CA models.
summary of free parameters for each of the CA models.

Task design, performance and model selection in discovery study.

a, During the task participants received feedback from 4 feedback agents, varying in their credibility (i.e., truth-telling probability). The credibility of the agents was represented using a star-based system: the 4-star agent always reported the truth (and never lied), whereas the 3-star agent reported the truth on 85% of the trials (lying on the remaining 15%), the 2-star agent reported the truth on 70% of the trials (lying on the remaining 30%), and the 1-star agent reported the truth half of the time (lying on the other half). Participants were explicitly instructed and quizzed about the credibility of each agent prior to the task. b, Learning curve. Average choice accuracy as a function of trial number (within a bandit-pair). Thin lines: individual participants; thick line: group mean with thickness representing the group standard error of the mean for each trial. c,d, Model selection between the credibility-CA model and the two variants of Bayesian models. Most participants were best fitted by the credibility-CA model, compared to the instructed-credibility Bayesian (c) or free-credibility Bayesian (d) models.

Learning adaptations to credibility in discovery study.

a, Probability of repeating a choice as a function of feedback-valence and agent-credibility on the previous trial with the same bandit pair. As in the main study, the effect of feedback-valence on repetition increases as feedback credibility increases, indicating that more credible feedback has a greater effect on behavior. b, Same analysis as in panel a, but for synthetic data obtained by simulating the main models. Simulations were computed using the ML parameters of participants for each model. c, ML credit assignment parameters for the credibility-CA model. Consistent with the main study, participants show a CA increase as a function of agent-credibility, as predicted by cross-fitting parameters from instructed-credibility Bayesian and free-credibility Bayesian model simulations. However, we find no evidence for a positive CA for the 1-star agent. d, ML credibility parameters for a free-credibility Bayesian model attributing credibility 1 to the 4-star agent but estimating credibility for the three lying agents as free parameters. Small dots represent results for individual participants/simulations, big circles represent the group mean (a,b,d) or median (c) of participants’ behavior. Results of the synthetic model simulations are represented by diamonds (instructed-credibility Bayesian model), squares (free-credibility Bayesian model), upward-pointing triangles (null-CA model) and downward-pointing triangles (credibility-CA model). Error bars show the standard error of the mean. (*) p<.05, (**) p<0.01,(***) p<.001.

Positivity bias as a function of agent-credibility in the discovery study.

a, Maximum likelihood parameters from the credibility-valence-CA model. CA+ and CA− are free parameters representing credit assignments for positive and negative feedback respectively (for each credibility level). The data revealed a positivity bias (CA+ > CA−) for feedback of low-credibility, but not for fully credible feedback. b, Absolute valence bias index (defined as CA+-CA) based on the ML parameters from the credibility-valence CA model. Positive values indicate a positivity bias, while negative values represent a negativity bias. c, Relative valence bias index (rVBI, defined as (CA+-CA)/(|CA+|+|CA|)) based on the ML parameters from the credibility-valence CA model. Positive values indicate a positivity bias, while negative values represent a negativity bias. Small dots represent fitted parameters for individual participants and big circles represent the group median (a,b) or mean (c) (both of participants’ behavior), while squares are the median or mean of the fitted parameters of the free-credibility Bayesian model simulations. Error bars show the standard error of the mean. (**) p<.01, (***) p<.001 for ML fits of participants behavior.

Credit assignment is enhanced for feedback inferred to be true in discovery study.

a, Maximum likelihood (ML) estimate of the “truth-bonus” parameter derived from the “Truth-CA” model. The significantly positive truth bonus indicates that participants increased the degree to which they updated their value estimates (credit assignment) when they inferred a higher probability that the feedback they received was true. Each small dot represents the fitted truth-bonus parameter for an individual participant, the large circle indicates the group mean, and the error bars represent the standard error of the mean. b, Distribution of truthbonus parameters predicted by synthetic simulations of our alternative computational models. For each alternative model, we generated 101 group-level synthetic datasets based on the maximum likelihood parameters fitted to the participants’ actual behavior. Each of these synthetic datasets was then independently fitted with the “Truth-CA” model. Each histogram represents the distribution of the mean truth bonus across the 101 simulated datasets for a specific alternative model. Notably, the truth bonus observed in our participants was significantly higher than the truth bonus predicted by any of these alternative models (proportion of datasets predicting a higher truth bonus = 0 for all models). (***) p<.001

Illustration of computational models over 15 trials with an example bandit-pair.

a, Example block with a bandit pair. The top bandit provided true rewards 75% of the time, while the bottom bandit did so 25% of the time. The agent providing feedback on each trial is represented above the plot, while the feedback is depicted in the horizontal line of the bandit selected for that trial. Dollar signs represent positive feedback, while sad emojis represent negative feedback. b, Values of the two bandits computed based on credibility-CA model. The values are represented as a point estimates (i.e., Q-values). On each trial, the Q-value of the selected bandit is updated based on the feedback valence and credibility. Values correspond to ends of trials following credit-assignment. c, Posterior beliefs about the true reward probabilities of the bandits, computed using the instructed-credibility Bayesian model. The x-axis in each subplot represents the probability of a true reward (p), while the y-axis represents the density distribution of such true reward probability (g(p)). On each trial, the density distribution g(p) of the selected bandit is updated based on the feedback valence and credibility. Both b and c were generated with the mean ML parameters from participants fitted with the CA-credibility and instructed-credibility Bayesian models respectively.

Model comparison between Bayesian models and credibility-CA model for main study.

a, Histograms of log-likelihood improvements for three example participants. For each participant, we generated 201 simulations based on their ML parameters for each model variant (i.e., the Bayesian model and the CA model). We fitted each dataset with the two models and calculated the log-likelihood difference between the two fits for each dataset (CA fit - Bayesian fit), resulting in two log-likelihood difference distributions: one for the dataset based on Bayesian simulations (grey) and another for the dataset based on CA simulations (blue). A greater value along the x-axis indicates that a dataset was better fitted with the CA model compared to the Bayesian model. We determined a log-likelihood difference threshold that leads to the best model classification (i.e., maximizing the average of true positives and true negatives), represented by a red line in the plots. Finally, we fitted the empirical data of each participant with the two model variants, calculating an empirical loglikelihood difference, represented as a black dashed line in the plots. The three example plots correspond to participants with different model classification accuracy (i.e., proportion of true positives and true negatives) and different model classifications, both stated above each subplot. b, Distribution of model classification accuracy for the model comparison between the instructed-credibility Bayesian and credibility-CA models. A greater average of TP and TN represents a better discrimination between the models. c, Distribution of model classification accuracy for the model comparison between the free-credibility Bayesian and credibility-CA models. Vertical orange lines represent the mean classification accuracy for each model comparison.

Effects of CA for the 1-star and 3-star agents on accuracy.

a, Scatter plot between the absolute CA parameter for the 1-star agent in the main task (x-axes) and the predicted drop in accuracy due to CA based on random feedback (y-axes). For each participant, we generated 5000 synthetic simulations based on their ML parameters from the credibility-CA model, and another 5000 simulations using the same ML parameters but ablating CA for the 1-star agent by fixing it to 0. The difference in mean accuracy between the two datasets represents the estimated drop in accuracy for each participant due to learning from random feedback. The negative Pearson’s correlation illustrates that accuracy drop increases as a function of (absolute) credit assignment based on random feedback. Circles represent the datapoints of individual participants, lines represent the prediction from linear regression on the data with shaded areas representing the 99% confidence interval. b, Mean learning curves for different 1-star CA parameter values. We generated synthetic simulations based on the credibility-CA model ML parameters from participants but fixing the 1-star CA to different values between −2 and 5. We generated 100 synthetic simulations per participant for each 1-star CA value and averaged across participants. As (absolute) CA attributed to random feedback diverges from 0 (with all other parameters held invariant), accuracy decreases. c, For comparison with b we show the mean learning curves for different 3- star CA values, calculated in the same way but fixing the 3-star CA to different values. Here, accuracy increases with the CA attributed to fully credible.

Positivity bias as a function of agent-credibility compared with instructed-credibility Bayesian-CA parameters in the main study.

a, ML parameters from the credibility-valence-CA model. CA+ and CA− are free parameters representing credit assignments for positive and negative feedback respectively (for each credibility level). Empirical-CA parameters revealed a positivity bias (CA+ > CA−) for all credibility levels, while instructed-credibility Bayesian-CA parameters revealed a negativity bias. b, Absolute valence bias index (defined as CA+-CA) based on the ML parameters from the credibility-valence CA model. Positive values indicate a positivity bias, while negative values represent a negativity bias. c, Relative valence bias index (rVBI, defined as (CA+-CA)/(|CA+|+|CA|)) based on the ML parameters from the credibility-valence CA model. Positive values indicate a positivity bias, while negative values represent a negativity bias. Small dots represent fitted parameters for individual participants and big circles represent the group median (a,b) or mean (c) (both of participants’ behaviour), while diamonds are the median or mean of the fitted parameters of the instructed-credibility Bayesian model simulations. Error bars show the standard error of the mean. (***) p<.001 for ML fits of participants behaviour.

Posterior-truthfulness belief as a function of objective feedback truthfulness based on distorted credibilities from free-credibility Bayesian model fits.

a, Main study distributions of posterior truthfulness belief probability that feedback is true calculated separately for each agent (1 or 2 star) and objective feedback-truthfulness (true or lie). These probabilities are based on the ML credibilities from the free-credibility Bayesian model fits for each participant. The probabilities are computed based on trial-sequences and feedback participants experienced, revealing that belief probabilities that feedback is true are higher in truth compared to lie trials, even if participants attribute distorted feedback-credibilities. For illustration, plotted distributions pool trials across participants. The black line within each box represents the median, upper and lower bounds represent the third and first quartile respectively. The width of each half-violin plot corresponds to the frequency of each posterior belief value among all trials for a given condition. b, Same as in a, but for discovery study.

Mixed-effects binomial regression model regressing choice-repetition on feedback-valence, agent-credibility and better/worse choice from previous trial featuring the same bandit pair.

Based on participants’ data. Feedback effect increased as a function of agent-credibility (3-star vs. 2-star: b=0.91, F(1,2436)=351.17; 3-star vs. 1-star: b=1.15, t(2436)=24.02; and 2-star vs. 1-star: b=0.24, t(2436)=5.34, all p’s<0.001). Feedback valence exerted a positive effect for the 1-star agent (b=0.25, t(2436)=8.05, p<0.001).

Mixed-effects binomial regression model regressing choice-repetition on feedback-valence, agent-credibility and better/worse choice from previous trial featuring the same bandit pair.

Based on instructed-credibility Bayesian model simulations. Feedback effect increased as a function of agent-credibility (3-star vs. 2-star: b=0.47, F(1,2436)=581.93; 3-star vs. 1-star: b=0.86, t(2436)=44.52; and 2-star vs. 1-star: b=0.39, t(2436)=21.22, all p’s<0.001). Feedback valence did not exert a positive effect for the 1-star agent (b=−0.01, t(2436)=−0.41, p=0.68).

Mixed-effects binomial regression model regressing choice-repetition on feedback-valence, agent-credibility and better/worse choice from previous trial featuring the same bandit pair.

Based on Free-credibility Bayesian model simulations. Feedback effect increased as a function of agentcredibility (3-star vs. 2-star: b=0.70, F(1,2436)=1268.1; 3-star vs. 1-star: b=0.85, t(2436)=43.63; and 2- star vs. 1-star: b=0.15, t(2436)=7.99, all p’s<0.001). Feedback valence exerted a positive effect for the 1-star agent (b=0.12, t(2436)=9.48, p=0.68).

Mixed-effects binomial regression model regressing choice-repetition on feedback-valence, agent-credibility and better/worse choice from previous trial featuring the same bandit pair.

Based on null-CA model simulations. The feedback effect did not interact with agent-credibility (F(2,2436)=0.11, p=0.89).

Mixed-effects binomial regression model regressing choice-repetition on feedback-valence, agent-credibility and better/worse choice from previous trial featuring the same bandit pair.Based on credibility-CA model simulations.

Feedback effect increased as a function of agent-credibility (3- star vs. 2-star: b=0.96, F(1,2436)=2009.5.1; 3-star vs. 1-star: b=1.15, t(2436)=54.5; and 2-star vs. 1- star: b=0.19, t(2436)=9.79, all p’s<0.001). Feedback valence exerted a positive effect for the 1-star agent (b=0.25, t(2436)=18.31, p<0.001).

Mixed-effects linear regression model regressing CA on agent-credibility, based on credibility-CA fits of participants’ data.

CA increased as a function of agent-credibility (3-star vs. 2- star: b= 1.02, F(1,609)=253.73; 3-star vs. 1-star: b=1.24, t(609)=19.31; and 2-star vs. 1-star: b=0.22, t(609)=3.38, all p’s<0.001). We found a positive CA for the 1-star agent (b=0.23, t(609)=4.54, p<0.001).

Mixed-effects linear regression model regressing CA on agent-credibility, based on credibility-CA fits of instructed-credibility Bayesian model simulations.

CA increased as a function of agent-credibility (3-star vs. 2-star: b= 0.33, F(1,609)=233.17; 3-star vs. 1-star: b=0.61, t(609)=28.55; and 2-star vs. 1-star: b=0.28, t(609)=13.28, all p’s<0.001). There was no evidence that CA for the 1- star agent differed from 0 (b=−0.01, t(609)=−0.31, p=0.76).

Mixed-effects linear regression model regressing CA on agent-credibility, based on credibility-CA fits of free-credibility Bayesian model simulations.

CA increased as a function of agentcredibility (3-star vs. 2-star: b= 0.5, F(1,609)=272.63; 3-star vs. 1-star: b=0.61, t(609)=20.05; and 2-star vs. 1-star: b=0.11, t(609)=3.54, all p’s<0.001). We detected a positive CA for the 1-star agent (b=0.08, t(609)=3.32, p<0.001).

Mixed-effects linear regression model regressing CA on feedback-valence and agent-credibility, based on credibility-valence-CA fits of participants’ data.

We found an overall positive valence effect(b=0.64, F(1,1218)=37.39, p<0.001), with no interactions with agent credibility (F(2,1218)=0.12, p=0.88).

Mixed-effects linear regression model regressing CA on feedback-valence and agentcredibility, based on credibility-valence-CA fits of instructed-credibility Bayesian model simulations.

We found an overall negative valence effect (b=-0.54, F(1,1218)=101.87, p<0.001), with no interactions with agent credibility (F(2,1218)=0.02, p=0.98).

Mixed-effects linear regression model regressing CA on feedback-valence and agentcredibility, based on credibility-valence-CA fits of free-credibility Bayesian model simulations.

We found an overall negative valence effect (b=-0.54, F(1,1218)=98.91, p<0.001), with no interactions with agent credibility (F(2,1218)=0.06, p=0.94).

Mixed-effects binomial regression model regressing choice-repetition on feedbackvalence, agent-credibility and better/worse choice from previous trial featuring the same bandit pair.

Based on participants’ data. Feedback effect increased as a function of agent-credibility, and we found no significant feedback valence effect for the 1-star agent.

Mixed-effects linear regression model regressing CA on agent-credibility, based on credibility-CA fits of participants’ data.

CA increased as a function of agent-credibility. We found no evidence for significant CA for the 1-star agent.

Mixed-effects linear regression model regressing CA on agent-credibility, based on credibility-CA fits of instructed-credibility Bayesian model simulations.

CA increased as a function of agent-credibility. Instructed-credibility Bayesian simulations do net predict significant CA for the 1- star agent.

Mixed-effects linear regression model regressing CA on agent-credibility, based on credibility-CA fits of free-credibility Bayesian model simulations.

CA increased as a function of agentcredibility. Free-credibility Bayesian simulations do net predict significant CA for the 1-star agent.

Mixed-effects linear regression model regressing CA on feedback-valence and agentcredibility, based on credibility-valence-CA fits of participants’ data.

We found an overall positive valence effect, which interacted with agent credibility, such that the valence effect was two 2-star agent than for the 4-star agent. Moreover, we found a positive valence effect for the 1-star, 2-star and 3-star agents, but not for the 4-star agent. Finally, the negative feedback from 1-star agent had a significant negative effect on CA, while positive feedback from the same agent had a significant positive effect.

Mixed-effects linear regression model regressing CA on feedback-valence and agentcredibility, based on credibility-valence-CA fits of instructed-credibility Bayesian model simulations.

We found an overall negative valence effect, with no interactions with agent credibility.

Mixed-effects linear regression model regressing CA on feedback-valence and agentcredibility, based on credibility-valence-CA fits of free-credibility Bayesian model simulations.

We found an overall negative valence effect, with no interactions with agent credibility.

Statistics summarizing Wilcoxon test results comparing aVBI from participants with the one from instructed-credibility Bayesian simulations.

Participants showed a greater aVBI than predicted by the instructed-credibility Bayesian model, for all levels of credibility.

Statistics summarizing Wilcoxon test results comparing aVBI from participants with the one from free-credibility Bayesian simulations.

Participants showed a greater aVBI than predicted by the free-credibility Bayesian model, for all levels of credibility.

Mixed-effects linear regression model regressing rVBIs on agent-credibility, based on credibility-valence-CA fits of instructed-credibility Bayesian model simulations.

The instructed-credibility Bayesian model predicted a negative rVBI for all credibility levels, with an increase in rVBI for higher credibility-levels.

Mixed-effects linear regression model regressing rVBIs on agent-credibility, based on credibility-valence-CA fits of free-credibility Bayesian model simulations.

We found a negative relative valence effect, with no interactions with agent credibility.

Statistics summarizing Wilcoxon test results comparing aVBI from participants with the one from instructed-credibility Bayesian simulations.

Participants showed a greater aVBI than predicted by the instructed-credibility Bayesian model, for all levels of credibility.

Statistics summarizing Wilcoxon test results comparing aVBI from participants with the one from free-credibility Bayesian simulations.

Participants showed a greater aVBI than predicted by the free-credibility Bayesian model, for all levels of credibility.

Mixed-effects linear regression model regressing rVBIs on agent-credibility, based on credibility-valence-CA fits of instructed-credibility Bayesian model simulations.

We found no relative valence effect, with no interactions with agent credibility.

Mixed-effects linear regression model regressing rVBIs on agent-credibility, based on credibility-valence-CA fits of free-credibility Bayesian model simulations.

We found no relative valence effect, with no interactions with agent credibility.

fitted parameters from participants for our main CA models.

Values represent the group mean (sd).

Distribution of ML parameters from participants fitted with the Credibility-CA model.

Distribution of ML parameters from participants fitted with the Credibility-Valence CA model.

Distribution ML parameters from participants fitted with the Truth CA model.

fitted parameters from participants for our main Bayesian models.

Values represent the group mean (sd).

Distribution ML parameters from participants fitted with (a) the Instructed-credibility Bayesian model, and (b) the free-credibility Bayesian model.

fitted parameters from participants for our main CA models.

Values represent the group mean (sd).

Distribution of ML parameters from participants fitted with the Credibility-CA model.

Distribution of ML parameters from participants fitted with the Credibility-Valence CA model.

Distribution ML parameters from participants fitted with the Truth CA model.

fitted parameters from participants for our main Bayesian models.

Values represent the group mean (sd).

Distribution ML parameters from participants fitted with (a) the Instructed-credibility Bayesian model, and (b) the free-credibility Bayesian model.

Parameter recovery for parameters of interest from free-credibility Bayesian model.

a, Recovery for the credibility parameter of the 1-star agent. b, Recovery for the credibility parameter of the 2-star agent. Recoverability is represented by scatter plots between the generative credibility parameters used to create the synthetic datasets (x-axis) and the corresponding credibility parameters fitted from those datasets (y-axis). Circles represent the datapoints of individual simulations, the denoted metric “r” corresponds to the Spearman correlation between the generative and fitted parameters.

Parameter recovery for parameters of interest from credibility-CA model.

Recovery for the CA parameter of the 1-star agent (a), 2-star agent (b), and 3-star agent (c). Recoverability is represented by scatter plots between the generative CA parameters used to create the synthetic datasets (x-axis) and the corresponding CA parameters fitted from those datasets (y-axis). Circles represent the datapoints of individual simulations, the denoted metric “r” corresponds to the Spearman correlation between the generative and fitted parameters.

Parameter recovery for parameters and metrics of interest from credibility-valence-CA model.

Recovery for the CA and CA+ parameters (top two rows) and their associated absolute valence bias (aVBI) and relative valence bias (rVBI) (bottom two rows) for the 1-star agent (a, d, g, j), 2-star agent (b, e, h, k), and 3-star agent (c, f, i, l). Recoverability is represented by scatter plots between the generative parameters/metrics used to create the synthetic datasets (x-axis) and the corresponding parameters/metrics fitted from those datasets (y-axis). Circles represent the datapoints of individual simulations, the denoted metric “r” corresponds to the Spearman correlation between the generative and fitted parameters.

Model recovery assessment using Parametric Bootstrap Cross-fitting Method (PBCM), AIC, and BIC.

We ran a model recovery analysis to assess how well different model selection methods identify the generative models. We report confusions matrices for comparisons between each Bayesian model variant (Instructed-credibility: top; Free-credibility: bottom matrices) and the Credibility-CA model (without perseveration) for 3 model-comparison methods (PBCM-left; AIC- middle, BIC-right matrices). Each matrix cell displays the percentage of simulated datasets generated by the “row model” where the “column model” provided a better fit (e.g., the top right cell describes the proportion of datasets generated by a Bayesian variant, which were better fit by the Credibility-CA model). These proportions were calculated according to the following method (applied separately for each compared model-pair). For each model under comparison (Bayesian or Credibility-CA), we created 100 synthetic datasets per participant using their empirical maximum likelihood (ML) parameter estimates. Each of these synthetic datasets was then fitted with both models. Next, we calculated, for each participant (and each model-comparison method), individual-level 2x2 confusion matrices displaying the proportion of that individual’s “row model” generated datasets (out of 100) that was best fit by each of the two models. Finally, the matrices shown in the figure represent the across-participant average of these individual-level confusion matrices. The results clearly demonstrate that the PBCM provides an unbiased recovery of both the Bayesian and the Credibility-CA models, while achieving the highest rate of correct classifications (the average of main diagonal matrix cells). In contrast, both the AIC and BIC methods exhibit a significant bias towards selecting the Bayesian models, with a substantial (majority for BIC) selection of a Bayesian model as the winning model for data generated by the CA model.

Contextual effects and learning when the context trial features a different bandit pair than the learning trial.

a, Trials contributing to the analysis of effects of credibility-context on learning from the fully credible agent. We included the same trials as in our main analysis, with one key the difference: we only included context trials featuring a different bandit pair than the learning and current trial. We examined how choicerepetition (from n-k to n) was modulated by feedback valence on the learning trial, and on the feedback agent on the context trial. Note the greyed-out star-rating on the current trial indicates the identity of the current agent and was not included in the analysis. b, Difference in probability of repeating a choice after receiving positive vs negative feedback (i.e., feedback effect) from the 3-star agent, as a function of the credibility context. We found no significant effect of the credibility context on learning from the 3-star agent. c, Difference in feedback valence effect on choice-repetition between contextual credibility pairs based on synthetic simulations of our alternative models. Histograms represent the distribution of regression coefficients based on 101 group-level synthetic datasets simulated based on each model. Participants’ results were within the range of effects predicted by our main models (more than 5% of group-level simulations predicted and equal or stronger effect). Big circles represent the group mean, and error bars show the standard error of the mean. (*) p<.05, (**) p<0.01.

Predicted positivity bias as a function of agent-credibility based on Bayesian account including perseveration.

a, ML parameters from fitting simulations of the instructed-credibility Bayesian model (with perseveration) with the credibility-valence-CA model. Simulations predict a negativity bias (CA+ < CA-) for all credibility levels. b, Absolute valence bias index (defined as CA+-CA) based on the ML parameters from the credibility-valence CA model. Positive values indicate a positivity bias, while negative values represent a negativity bias. c, Relative valence bias index (defined as (CA+-CA)/(|CA+|+|CA|)) based on the ML parameters from the credibility-valence CA model. Positive values indicate a positivity bias, while negative values represent a negativity bias. d-f, Same as a-c, but for simulations based in the extended version of the free-credibility Bayesian model (including perseveration). Small dots represent fitted parameters for individual participants and big diamonds/squares represent the group median (a,b,d,e) or mean (c,f) for the instructed/free-credibility Bayesian model simulations. Error bars show the standard error of the mean.

Predicted positivity bias results for participants and for simulations of the Credibility-CA (including perseveration, but no valence-bias component).

a, Valence bias results measured in absolute terms (by regressing the ML CA parameters, on their associated valence and credibility). b, Difference in positivity bias (measured in absolute terms) across credibility levels. On the x-axis, the hyphen (-) represents subtraction, such that a label of ‘0.5–1’ indicates the difference in the measurement for the 0.5 and 1.0 credibility conditions. Such differences are again based in the same mixed effects model as plot a. The inflation of aVBI for lower-credibility agents is larger than the one predicted by a pure perseveration account. c, Valence bias results measured in relative terms (by regressing the rVBIs on their associated credibility). Participants present a higher rVBI than what would be predicted by a perseveration account (except for the completely credible agent). d, Difference in rVBI across credibility levels. Such differences are again based in the same mixed effects model as plot c. The inflation of rVBI for lower-credibility agents is larger than the one predicted by a pure perseveration account. Histograms depict the distribution of coefficients from 101 simulated group-level datasets generated by the Credibility-CA model and fitted with the Credibility-Valence CA model. Gray circles represent the mean coefficient from these simulations, while black/green circles show the actual regression coefficients from participant behaviour (green for significant effects in participants, black for non-significant). Significance markers (* p<.05, ** p<.01) indicate that fewer than 5% or 1% of simulated datasets, respectively, predicted an effect as strong as or stronger than that observed in participants, and in the same direction as the participant effect.

Predicted positivity bias in discovery study as a function of agent-credibility based on Bayesian account including perseveration.

a, ML parameters from fitting simulations of the instructed-credibility Bayesian model (with perseveration) with the credibility-valence-CA model. Simulations predict a positivity bias (CA+ > CA-) for all credibility levels. b, Absolute valence bias index (defined as CA+-CA) based on the ML parameters from the credibility-valence CA model. Positive values indicate a positivity bias, while negative values represent a negativity bias. c, Relative valence bias index (defined as (CA+-CA)/(|CA+|+|CA|)) based on the ML parameters from the credibility-valence CA model. Positive values indicate a positivity bias, while negative values represent a negativity bias. c-f, Same as a-c, but for simulations based in the extended version of the free-credibility Bayesian model (including perseveration). Small dots represent fitted parameters for individual participants and big diamonds/squares represent the group median (a,b,d,e) or mean (c,f) for the instructed/free-credibility Bayesian model simulations. Error bars show the standard error of the mean.

Predicted positivity bias results for participants and for simulations of the Credibility-CA (including perseveration, but no valence-bias component) in discovery study.

a, Valence bias results measured in absolute terms (by regressing the ML CA parameters, on their associated valence and credibility). b, Difference in positivity bias (measured in absolute terms) across credibility levels. On the x-axis, the hyphen (-) represents subtraction, such that a label of ‘0.5–1’ indicates the difference in the measurement for the 0.5 and 1.0 credibility conditions. Such differences are again based in the same mixed effects model as plot a. The inflation of aVBI for lower-credibility agents is larger than the one predicted by a pure perseveration account. c, Valence bias results measured in relative terms (by regressing the rVBIs on their associated credibility). Participants present a higher rVBI than what would be predicted by a perseveration account (except for the completely credible agent). d, Difference in rVBI across credibility levels. Such differences are again based in the same mixed effects model as plot c. The inflation of rVBI for lower-credibility agents is larger than the one predicted by a pure perseveration account. Histograms depict the distribution of coefficients from 101 simulated group-level datasets generated by the Credibility-CA model and fitted with the Credibility-Valence CA model. Gray circles represent the mean coefficient from these simulations, while black/green circles show the actual regression coefficients from participant behavior (green for significant effects in participants, black for nonsignificant). Significance markers (* p<.05, ** p<.01) indicate that fewer than 5% or 1% of simulated datasets, respectively, predicted an effect as strong as or stronger than that observed in participants, and in the same direction as the participant effect.

Credit assignment is enhanced for feedback inferred to be true, even when controlling for positivity bias.

a, Maximum likelihood (ML) estimate of the “truth-bonus” parameter derived from the “Truth-CA” model with valence bias in main study. b, Distribution of truth-bonus parameters predicted by synthetic simulations of our alternative computational models in main study. For each alternative model, we generated 101 group-level synthetic datasets based on the maximum likelihood parameters fitted to the participants’ actual behaviour. Each of these synthetic datasets was then independently fitted with the “Truth-CA” model with valence bias. Each histogram represents the distribution of the mean truth bonus across the 100 simulated datasets for a specific alternative model. Notably, the truth bonus observed in our participants was significantly higher than the truth bonus predicted by all alternative models, with the exception of the instructed-credibility Bayesian model. c-d, Same as a-b, but for discovery study. (*)<0.05, (***) p<.001