On the normative advantages of dopamine and striatal opponency for learning and choice
Abstract
The basal ganglia (BG) contribute to reinforcement learning (RL) and decision making, but unlike artificial RL agents, it relies on complex circuitry and dynamic dopamine modulaton of opponent striatal pathways to do so. We develop the OpAL* model to assess the normative advantages of this circuitry. In OpAL*, learning induces opponent pathways to differentially emphasize the history of positive or negative outcomes for each action. Dynamic DA modulation then amplifies the pathway most tuned for the task environment. This efficient coding mechanism avoids a vexing explore-exploit tradeoff that plagues traditional RL models in sparse reward environments. OpAL* exhibits robust advantages over alternative models, particularly in environments with sparse reward and large action spaces. These advantages depend on opponent and nonlinear Hebbian plasticity mechanisms previously thought to be pathological. Finally, OpAL* captures risky choice patterns arising from DA and environmental manipulations across species, suggesting that they result from a normative biological mechanism.
Data availability
The current manuscript is a computational study, so no data have been generated for this manuscript. Simulation code is available on the authors' GitHub repositories https://github.com/amjaskir/opal-star
Article and author information
Author details
Funding
National Institute of Mental Health (P50MH119467)
- Michael J Frank
National Institute of Mental Health (R01 MH084840)
- Michael J Frank
National Institutes of Health (S10OD025181)
- Michael J Frank
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Copyright
© 2023, Jaskir & Frank
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 2,333
- views
-
- 328
- downloads
-
- 26
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Neuroscience
Attentional capture by an irrelevant salient distractor is attenuated when the distractor appears more frequently in one location, suggesting learned suppression of that location. However, it remains unclear whether suppression is proactive (before attention is directed) or reactive (after attention is allocated). Here, we investigated this using a ‘pinging’ technique to probe the attentional distribution before search onset. In an EEG experiment, participants searched for a shape singleton while ignoring a color singleton distractor at a high-probability location. To reveal the hidden attentional priority map, participants also performed a continuous recall spatial memory task, with a neutral placeholder display presented before search onset. Behaviorally, search was more efficient when the distractor appeared at the high-probability location. Inverted encoding analysis of EEG data showed tuning profiles that decayed during memory maintenance but were revived by the placeholder display. Notably, tuning was most pronounced at the to-be-suppressed location, suggesting initial spatial selection followed by suppression. These findings suggest that learned distractor suppression is a reactive process, providing new insights into learned spatial distractor suppression mechanisms.
-
- Neuroscience
The unexpected absence of danger constitutes a pleasurable event that is critical for the learning of safety. Accumulating evidence points to similarities between the processing of absent threat and the well-established reward prediction error (PE). However, clear-cut evidence for this analogy in humans is scarce. In line with recent animal data, we showed that the unexpected omission of (painful) electrical stimulation triggers activations within key regions of the reward and salience pathways and that these activations correlate with the pleasantness of the reported relief. Furthermore, by parametrically violating participants’ probability and intensity related expectations of the upcoming stimulation, we showed for the first time in humans that omission-related activations in the VTA/SN were stronger following omissions of more probable and intense stimulations, like a positive reward PE signal. Together, our findings provide additional support for an overlap in the neural processing of absent danger and rewards in humans.