(a) Participants chose freely between four available options on each trial. A total of 8 options were used in the experiment but only four were used in each run of the task (indicated by option …
Participants controlled a mouse seeking cheese or apples, counterbalanced across participants. Participants could select any of four available locations, denoted in this case by blue circles, again …
T-statistic map for the [1 -1] contrast in GLM1. Positive activation (hotter colours) denotes regions more active during exploitation than exploration, and vice versa for negative activation. Medial …
T-statistic maps for the entropy (a), whether changing response (b) and reward (c) regressors in GLM2. These variables explain variance in very similar brain regions to those in which variance is …
T-statistic map for the [1 -1] contrast in GLM3, testing for differences between exploitation and exploration having regressed out effects of the task factors. (a) Image thresholded voxelwise at p<0.…
(a) The currently selected option can be decoded above chance, as expected, in motor and visual cortex (t score map for above chance decoding of the chosen option across subjects; thresholded at p<0.…
(a) T-statistic map showing model probability explains representation strength specifically in mOFC, as in GLM5. Image thresholded at p<0.001; corrected for multiple comparisons. (b) T-statistic map …
This demonstrates that the dimensionality reduction does not introduce bias in to the result. The histogram is centred about 0 and the t score of our analysis (−6.8) is off the distribution.
(a) Pupil size mean timecourses throughout a trial. Left: mean timecourses shown for all trials as well as trials split according to whether they were explore or exploit trials, revealing a larger …
We checked our results could not be explained by differences in luminosity between outcomes signalling reward vs. no reward. We counterbalanced the stimuli signalling rewarding vs. non-rewarding …
The figure has the same notation as Figure 3b in the main text. When performing this regression (GLM6) on explore trials, whether participants changed their response on the subsequent trial was …
T-statistic map showing where univariate brain activity explains changes in baseline pupil size, as in GLM8. Analysis performed in a grey matter mask, hence the streak in the activation. Similar to F…
(a) T-statistic maps for the regressor for switching from exploitation to exploration, as in GLM2 (note this is variance explained by the regressor over and above that explained by the other …
Here we show – as requested by a reviewer – individual pupil effects, since pupil effects are prone to strong individual differences. We present the mean timecourse for each participant for both of …
Figure notation the same as that in Figure 3—figure supplement 1, and analyses presented are the same, but having removed trials on which central fixation was broken. We instructed participants to …
Top panel – an example schedule for one participant/run. Y axis values 1–4 are the possible high reward locations, x axis values are trials. The dashed line shows the ground truth high reward …
Taken together, the model comparison suggests that the Bayesian model, when allowed to adopt different softmax policies in explore and exploit phases, performs very comparably to a model fitted …
The last exploit trial is coded as −1, the first explore trial as +1 (there is no trial zero in this plot). CPP peaks on the penultimate trial of the exploit block, or the trial before that.
Trials classified as ‘exploit’ (light grey) and ‘core exploit’ (dark grey) using our heuristic method (top) and the HMM method (bottom) for each of the 19 participants included in the main analysis. …
MATLAB code for the Bayesian model in Figure 1, and the behavioural datasets to which the model was fit.