Neural computations underlying inverse reinforcement learning in the human brain

  1. Sven Collette  Is a corresponding author
  2. Wolfgang M Pauli
  3. Peter Bossaerts
  4. John O'Doherty
  1. California Institute of Technology, United States
  2. The University of Melbourne, Australia
5 figures and 1 additional file

Figures

Figure 1 with 1 supplement
Observational slot machine task with hidden outcomes.

(A) Trial timeline. The first screen signals whether the agent or the participant has to make a choice. Subsequently two slot machines are presented, along with on agent trials a (pseudo) video feed …

https://doi.org/10.7554/eLife.29718.002
Figure 1—figure supplement 1
Additional task information.

(A) Pre-MRI phase and agent preferences construction: Participants first made pairwise food comparisons before entering the MRI sessions. From their actual food rating, we then constructed one …

https://doi.org/10.7554/eLife.29718.003
Figure 2 with 1 supplement
Model comparison.

(A) Bar plots illustrating the results of the Bayesian Model Selection (BMS) for the two main model frameworks. The inverse RL algorithm performs best, across both conditions (similar and …

https://doi.org/10.7554/eLife.29718.004
Figure 2—figure supplement 1
Additional model information.

(A) Bayesian Model Selection with additional models: the imitation RL with counterfactuals as described in the main text, and the PRO model (probabilistic rank order): In the PRO model, the observer …

https://doi.org/10.7554/eLife.29718.005
Outcome prediction signals in agent-referential preference space.

(A) Neural response to parametric changes in inverse RL outcome prediction in agent-referential space. Activity in dmPFC at the time of presumptive agent decision significantly correlated with …

https://doi.org/10.7554/eLife.29718.006
Figure 3—source data 1

areas exhibiting significant changes in BOLD associated with predicted outcome in similar and dissimilar.

OFC: orbitofrontal cortex, dmPFC: dorsomedial prefrontal cortex. x y z in MNI coordinates.

https://doi.org/10.7554/eLife.29718.007
Learning signals during action feedback.

(A) Z-statistic map of the inverse RL entropy signals during agent choice revelation is presented, relating to the update of the food distributions within the chosen slot machine. From left to right …

https://doi.org/10.7554/eLife.29718.008
Figure 4—source data 1

areas exhibiting significant changes in BOLD associated with entropy signals.

Pre-SMA: pre-supplementary motor area. TPJ: temporo-parietal junction. dlPFC: dorsolateral prefrontal cortex. x y z in MNI coordinates.

https://doi.org/10.7554/eLife.29718.009
dmPFC signal predicts performance in slot machine game.

(A) Scatter plot showing beta estimates of outcome prediction signals in the dmPFC ROI across participants, plotted against the social information integration index (SI index), which characterizes …

https://doi.org/10.7554/eLife.29718.010

Additional files

Download links