Control of entropy in neural models of environmental state

  1. Timothy H Muller  Is a corresponding author
  2. Rogier B Mars  Is a corresponding author
  3. Timothy E Behrens  Is a corresponding author
  4. Jill X O'Reilly  Is a corresponding author
  1. University of Oxford, John Radcliffe Hospital, United Kingdom
  2. Radboud University, The Netherlands
  3. University College London, United Kingdom
  4. University of Oxford, United Kingdom
8 figures and 2 additional files

Figures

Figure 1 with 4 supplements
Task and behaviour.

(a) Participants chose freely between four available options on each trial. A total of 8 options were used in the experiment but only four were used in each run of the task (indicated by option …

https://doi.org/10.7554/eLife.39404.002
Figure 1—figure supplement 1
Implementation of the task described in the Main Text and Methods.

Participants controlled a mouse seeking cheese or apples, counterbalanced across participants. Participants could select any of four available locations, denoted in this case by blue circles, again …

https://doi.org/10.7554/eLife.39404.003
Figure 1—figure supplement 2
Exploitation and exploration engage different brain regions.

T-statistic map for the [1 -1] contrast in GLM1. Positive activation (hotter colours) denotes regions more active during exploitation than exploration, and vice versa for negative activation. Medial …

https://doi.org/10.7554/eLife.39404.004
Figure 1—figure supplement 3
Activation related to task factors.

T-statistic maps for the entropy (a), whether changing response (b) and reward (c) regressors in GLM2. These variables explain variance in very similar brain regions to those in which variance is …

https://doi.org/10.7554/eLife.39404.005
Figure 1—figure supplement 4
A large amount of the difference between exploitation and exploration is captured by task factors.

T-statistic map for the [1 -1] contrast in GLM3, testing for differences between exploitation and exploration having regressed out effects of the task factors. (a) Image thresholded voxelwise at p<0.…

https://doi.org/10.7554/eLife.39404.006
Figure 2 with 2 supplements
Probabilistic beliefs represented in mOFC.

(a) The currently selected option can be decoded above chance, as expected, in motor and visual cortex (t score map for above chance decoding of the chosen option across subjects; thresholded at p<0.…

https://doi.org/10.7554/eLife.39404.007
Figure 2—figure supplement 1
Representation strength in mOFC is explained by probability assigned to the currently selected option, as well as the difference between high and low reward exploit periods.

(a) T-statistic map showing model probability explains representation strength specifically in mOFC, as in GLM5. Image thresholded at p<0.001; corrected for multiple comparisons. (b) T-statistic map …

https://doi.org/10.7554/eLife.39404.008
Figure 2—figure supplement 2
Histogram of t scores for the effect of entropy on representation strength (GLM4) for null data produced by shuffling voxel identities prior to PCA.

This demonstrates that the dimensionality reduction does not introduce bias in to the result. The histogram is centred about 0 and the t score of our analysis (−6.8) is off the distribution.

https://doi.org/10.7554/eLife.39404.009
Figure 3 with 6 supplements
Neuromodulatory systems as a candidate mechanism for increasing flexibility of belief representations.

(a) Pupil size mean timecourses throughout a trial. Left: mean timecourses shown for all trials as well as trials split according to whether they were explore or exploit trials, revealing a larger …

https://doi.org/10.7554/eLife.39404.010
Figure 3—figure supplement 1
Task-related pupil size changes are not explained by outcome stimulus type.

We checked our results could not be explained by differences in luminosity between outcomes signalling reward vs. no reward. We counterbalanced the stimuli signalling rewarding vs. non-rewarding …

https://doi.org/10.7554/eLife.39404.012
Figure 3—figure supplement 2
The relationship between change in baseline pupil diameter and change in representation strength in mOFC replicates in explore trials.

The figure has the same notation as Figure 3b in the main text. When performing this regression (GLM6) on explore trials, whether participants changed their response on the subsequent trial was …

https://doi.org/10.7554/eLife.39404.014
Figure 3—figure supplement 3
ACC activity explains changes in baseline pupil size across all trials.

T-statistic map showing where univariate brain activity explains changes in baseline pupil size, as in GLM8. Analysis performed in a grey matter mask, hence the streak in the activation. Similar to F…

https://doi.org/10.7554/eLife.39404.015
Figure 3—figure supplement 4
ACC is engaged when transitioning from exploitation to exploration.

(a) T-statistic maps for the regressor for switching from exploitation to exploration, as in GLM2 (note this is variance explained by the regressor over and above that explained by the other …

https://doi.org/10.7554/eLife.39404.016
Figure 3—figure supplement 5
Individual pupil effects.

Here we show – as requested by a reviewer – individual pupil effects, since pupil effects are prone to strong individual differences. We present the mean timecourse for each participant for both of …

https://doi.org/10.7554/eLife.39404.011
Figure 3—figure supplement 6
Breaking central fixation does not alter pupil effects.

Figure notation the same as that in Figure 3—figure supplement 1, and analyses presented are the same, but having removed trials on which central fixation was broken. We instructed participants to …

https://doi.org/10.7554/eLife.39404.013
Appendix 1—figure 1
Following the same layout as Figure 1 in the main text.

Top panel – an example schedule for one participant/run. Y axis values 1–4 are the possible high reward locations, x axis values are trials. The dashed line shows the ground truth high reward …

https://doi.org/10.7554/eLife.39404.020
Appendix 1—figure 2
Model log likelihoods for each participant and each model, based on predicting participants’ choices across all trials.

Taken together, the model comparison suggests that the Bayesian model, when allowed to adopt different softmax policies in explore and exploit phases, performs very comparably to a model fitted …

https://doi.org/10.7554/eLife.39404.021
Appendix 1—figure 3
Change point probability in the trials leading up to, and following, the transition from exploit- to explore phase.

The last exploit trial is coded as −1, the first explore trial as +1 (there is no trial zero in this plot). CPP peaks on the penultimate trial of the exploit block, or the trial before that.

https://doi.org/10.7554/eLife.39404.022
Appendix 1—figure 4
Explore and Exploit phases extracted by the Ebitz model.

Trials classified as ‘exploit’ (light grey) and ‘core exploit’ (dark grey) using our heuristic method (top) and the HMM method (bottom) for each of the 19 participants included in the main analysis. …

https://doi.org/10.7554/eLife.39404.023
Author response image 1

Additional files

Source code 1

MATLAB code for the Bayesian model in Figure 1, and the behavioural datasets to which the model was fit.

https://doi.org/10.7554/eLife.39404.017
Transparent reporting form
https://doi.org/10.7554/eLife.39404.018

Download links