Neural signatures of model-based and model-free reinforcement learning across prefrontal cortex and striatum

Bruno Miranda; James L Butler; W M Nishantha Malalasekera; Timothy EJ Behrens; Peter Dayan; Steven W Kennerley

doi:10.7554/eLife.106032.1

Not revised: This Reviewed Preprint includes the authors’ original preprint (without revision), an eLife assessment, and public reviews.

Reviewing Editor
Thorsten Kahnt
National Institute on Drug Abuse Intramural Research Program, Baltimore, United States of America
Senior Editor
Michael Frank
Brown University, Providence, United States of America

Reviewer #1 (Public review):

Summary:

Using single-unit recording in 4 regions of non-human primate brains, the authors tested whether these regions encode computational variables related to model-based and model-free reinforcement learning strategies. While some of the variables seem to be encoded by all regions, there is clear evidence for stronger encoding of model-based information in the anterior cingulate cortex and caudate.

Strengths:

The analyses are thorough, the writing is clear, and the work is well-motivated by prior theory and empirical studies.

Weaknesses:

My comments here are quite minor.

The correlation between transition and reward coefficients is interesting, but I'm a little worried that this might be an artifact. I suspect that reward probability is higher after common transitions, due to the fact that animals are choosing actions they think will lead to higher reward. This suggests that the coefficients might be inevitably correlated by virtue of the task design and the fact that all regions are sensitive to reward. Can the authors rule out this possibility (e.g., by simulation)?

The explore/exploit section seems somewhat randomly tacked on. Is this really relevant? If yes, then I think it needs to be integrated more coherently.

https://doi.org/10.7554/eLife.106032.1.sa1

Reviewer #2 (Public review):

Summary:

The authors investigate single-neuron activity in rhesus macaques during model-based (MB) and model-free (MF) reinforcement learning (RL). Using a well-established two-step choice task, they analyze neural correlates of MB and MF learning across four brain regions: the anterior cingulate cortex (ACC), dorsolateral PFC (DLPFC), caudate, and putamen. The study provides strong evidence that these regions encode distinct RL-related signals, with ACC playing a dominant role in MB learning and caudate updating value representations after rare transitions. The authors apply rigorous statistical analyses to characterize neural encoding at both population and single-neuron levels.

Strengths:

(1) The research fills a gap in the literature, which has been limited in directly dissociating MB vs. MF learning at the single unit level and across brain areas known to be involved in reinforcement learning. This study advances our understanding of how different brain regions are involved in RL computations.

(2) The study used a two-step choice task Miranda et al., (2020), which was previously established for distinguishing MB and MF reinforcement learning strategies.

(3) The use of multiple brain regions (ACC, DLPFC, caudate, and putamen) in the study enabled comparisons across cortical and subcortical structures.

(4) The study used multiple GLMs, population-level encoding analyses, and decoding approaches. With each analysis, they conducted the appropriate controls for multiple comparisons and described their methods clearly.

(5) They implemented control regressors to account for neural drift and temporal autocorrelation.

(6) The authors showed evidence for three main findings:
a) ACC as the strongest encoder of MB variables from the four areas, which emphasizes its role in tracking transition structures and reward-based learning. The ACC also showed sustained representation of feedback that went into the next trial.
b) ACC was the only area to represent both MB and MF value representations.
c) The caudate selectively updates value representations when rare transitions occur, supporting its role in MB updating.

(7) The findings support the idea that MB and MF reinforcement learning operate in parallel rather than strictly competing.

(8) The paper also discusses how MB computations could be an extension of sophisticated MF strategies.

Weaknesses: o

(1) There is limited evidence for a causal relationship between neural activity and behavior. The authors cite previous lesion studies, but causality between neural encoding in ACC, caudate, and putamen and behavioral reliance on MB or MF learning is not established.

(2) There is a heavy emphasis on ACC versus other areas, but it is unclear how much of this signal drives behavior relative to the caudate.

(3) The role of the putamen is somewhat underexplored here.

(4) The authors mention the monkeys were overtrained before recording, which might have led to a bias in the MB versus MF strategy.

(5) The GLM3 model combines MB and MF value estimates but does not clearly mention how hyperparameters were optimized to prevent overfitting. While the hybrid model explains behavior well, it does not clarify whether MB/MF weighting changes dynamically over time.

(6) It was unclear from the task description whether the images used changed periodically or how the transition effect (e.g., in Figure 3) could be disambiguated from a visual response to the pair of cues.

https://doi.org/10.7554/eLife.106032.1.sa0

Neural signatures of model-based and model-free reinforcement learning across prefrontal cortex and striatum

Peer review process

Editors

Be the first to read new articles from eLife