A causal role for right frontopolar cortex in directed, but not random, exploration
Figures

The horizon task.
Participants make a series of decisions between two one-armed bandits that pay out probabilistic rewards with unknown means. At the start of each game, ‘forced-choice’ trials give participants partial information about the mean of each option. We use the forced-choice trials to set up one of two information conditions: (A) an unequal (or [1 3]) condition in which participants see 1 play from one option and 3 plays from the other and (B) an equal (or [2 2]) condition in which participants see 2 plays from both options. A model-free measure of directed exploration is then defined as the change in information seeking with horizon in the unequal condition (A). Likewise a model-free measure of random exploration is defined as the change choosing the low mean option in the equal condition (B).

The reward-information confound.
The y-axis corresponds to the correlation between the sign of the difference in mean () between options and the sign of difference in the number of times each option has been played (). The forced trials are chosen such that the the correlation is approximately zero on the first free-choice trial. After the first trial, however, a positive correlation quickly emerges as participants choose the more rewarding options more frequently. This strong confound between reward and information makes it difficult to dissociate directed and random exploration on later trials.

Model-free analysis of the first free-choice trial shows that RPFC stimulation affects directed, but not random, exploration.
(A) In the control (vertex) condition, information seeking increases with horizon, consistent with directed exploration. When RFPC is stimulated, directed exploration is reduced, an effect that is entirely driven by changes in horizon 6 (* denotes and ** denotes ; error bars are s.e.m.). (B) Random exploration increases with horizon but is not affected by RFPC stimulation.

Model-based analysis of the first free-choice trial showing the effect of RFPC stimulation on each of the 13 parameters.
Left column: Posterior distributions over each parameter value for RFPC and vertex stimulation condition. Right column: posterior distributions over the change in each parameter between stimulation conditions. Note that, because information bonus, decision noise and spatial bias are all in units of points, we plot them on the same scale to facilitate comparison of effect size.

Correlation between TMS-induced changes in information bonus, , and TMS-induced changes in the prior mean, .
(A, B) Samples from the posterior distributions over the TMS-related changes in prior mean, , and TMS-related change in information bonus in horizon 1 (A) and horizon 6 (B). In both cases we see a negative correlation between the change in and the change in consistent with a tradeoff between these variables in the model. (C) Samples from the posterior over the effect of TMS stimulation on the horizon-related change in information bonus, plotted against samples from the TMS-related change in prior mean. Here we see no correlation between variables and the majority of samples below zero consistent with an effect of RFPC stimulation on directed exploration.

Model-free analysis of all trials.
(A, B) Model-free measures of directed (A) and random (B) exploration as a function of trial number suggests a reduction in both directed and random exploration over the course of the game. (C, D) TMS-induced change in measures of directed and random exploration as a function of trial number. This suggests that the reduction in directed exploration on the first free-choice trial, persists into the second trial of the game.

Correlation between individual differences in the levels of directed and random exploration in a sample of 277 people performing the Horizon Task.
https://doi.org/10.7554/eLife.27430.009
Graphical representation of the model.
Each variable is represented by a node, with edges denoting the dependence between variables. Shaded nodes correspond to observed variables, that is, the free choices , forced-trial rewards, and forced-trial choices . Unshaded nodes correspond to unobserved variables whose values are inferred by the model.

No difference in effects between original and replication experiments.
In each panel we plot the model-free measures of directed and random exploration and how they change between stimulation conditions. For example, in Panel A, we plot p(high info) in horizon 1 for vertex stimulation (x-axis) and RFPC stimulation (y-axis). Each point in this plot is a single subject and the diagonal line represents equality. Participants below the diagonal line have a smaller value of p(high info) in the RFPC stimulation condition. From this we can clearly see that there is no effect of RFPC stimulation on directed exploration in horizon 1 (panel A), or random exploration in either horizon (B, D). However, there is a strong effect of RFPC stimulation on directed exploration in horizon 6 with the majority of points lying below the diagonal (C). Moreover, both the original and replication datasets point to the same conclusions in all four panels.

Effect of TMS on information bonus in model with bonus proportional to uncertainty.

Effect of TMS on decision noise in model in which bonus is a linear function of uncertainty.
Tables
Model parameters.
Subject’s behavior on the first free choice of each session is described by 13 free parameters. Three of these parameters (, and ) describe the learning process and do not vary with horizon or uncertainty condition. Ten of these parameters (, and in the different horizon and information conditions) describe the decision process. All parameters are estimated for each subject in each stimulation condition and the key analysis asks whether parameters change between vertex and RFPC stimulation.
Parameter | Horizon dependent? | Uncertainty dependent? | TMS dependent? |
---|---|---|---|
prior mean, | no | no | yes |
initial learning rate, | no | no | yes |
asymptotic learning rate, | no | no | yes |
information bonus, | yes | n/a | yes |
spatial bias, | yes | yes | yes |
decision noise, | yes | yes | yes |
Model parameters, priors, hyperparameters and hyperpriors.
https://doi.org/10.7554/eLife.27430.011Parameter | Prior | Hyperparameters | Hyperpriors |
---|---|---|---|
prior mean, | Gaussian(, ) | Gaussian( 50, 14 ) Gamma( 1, 0.001 ) | |
initial learning rate, | Beta(, ) | Uniform( 0.1, 10 ) Uniform( 0.5, 10 ) | |
asymptotic learning rate, | Beta(, ) | Uniform( 0.1, 10 ) Uniform( 0.1, 10 ) | |
information bonus, | Gaussian(, ) | Gaussian( 0, 100 ) Gamma( 1, 0.001 ) | |
spatial bias, | Gaussian(, ) | Gaussian( 0, 100 ) Gamma( 1, 0.001 ) | |
decision noise, | Gamma(, ) | Exp( 0.1 ) Exp( 10 ) |
Additional files
-
Transparent reporting form
- https://doi.org/10.7554/eLife.27430.012