(A) Left: Animals choices were simulated using standard reinforcement learning (RL) models (see Figure 8—figure supplements 1 and 2 and Materials and methods). Dotted lines show the performance of the model in predicting monkeys’ choices. Solid lines show monkeys’ choice behaviour (identical to Figure 5B). The parameters of the RL model were separately optimized for each behavioural session (Supplementary file 2). Right: The RL model’s session-by-session probability of choosing the novel cue, estimated using model’s optimized parameters, versus monkeys’ session-by-session probability of choosing the novel cue. (B) Upper panel: Regression of neuronal population responses to cues onto trial-by-trial chosen values estimated from the RL model fitted on monkeys’ choice data. Lower panel: Regression of neuronal population responses to cues onto trial-by-trial unchosen values estimated from the RL model fitted on the choice data. (C) Regression of neuronal population responses to cues onto trial-by-trial relative chosen values (i.e. chosen value – unchosen value) estimated from the RL model fitted on the choice data. Importantly, the chosen and unchosen value variables were not, on average, strongly correlated (r = −0.039, Pearson’s correlation), and we excluded from this analysis sessions in which the absolute value of the correlation coefficient between the chosen and unchosen variables was larger than 0.25. In B and C, the neuronal responses were measured 0.4–0.65 s after cue onset (i.e. dopamine value signals) and are regressed against value estimates of the superior model. In explaining the neuronal responses, relative chosen value outperformed other variables in all six models tested. See Figure 8—figure supplement 2B for regression of responses measured 0.1–0.2 s after cue onset (i.e dopamine novelty responses) onto model-driven novelty estimates. Regression of whole neuronal responses (0.1–0.65 s after the cue onset) against value estimates of the RL model further confirmed relative chosen value as the best explanatory variables (R2 = 0.57, 0.61 and 0.83 for unchosen, chosen and relative chosen values). In all plots, all trials of learning blocks are included (regression results are similar after excluding initial (i.e. 5) trials of each session).