Dopamine role in learning and action inference
Abstract
This paper describes a framework for modelling dopamine function in the mammalian brain. It proposes that both learning and action planning involve processes minimizing prediction errors encoded by dopaminergic neurons. In this framework, dopaminergic neurons projecting to different parts of the striatum encode errors in predictions made by the corresponding systems within the basal ganglia. The dopaminergic neurons encode differences between rewards and expectations in the goal-directed system, and differences between the chosen and habitual actions in the habit system. These prediction errors trigger learning about rewards and habit formation, respectively. Additionally, dopaminergic neurons in the goal-directed system play a key role in action planning: They compute the difference between an available reward and the reward expected from the current motor plan, and they facilitate action planning until this difference diminishes. Presented models account for dopaminergic responses during movements, effects of dopamine depletion on behaviour, and make several experimental predictions.
Data availability
Matlab codes for all simulations described in the paper are available at MRC Brain Network Dynamics Unit Data Sharing Platform(https://data.mrc.ox.ac.uk/data-set/simulations-action-inference).
-
Simulations of action inferenceMRC Brain Network Dynamics Unit Data Sharing Platform.
Article and author information
Author details
Funding
Medical Research Council (MC_UU_12024/5)
- Rafal Bogacz
Medical Research Council (MC_UU_00003/1)
- Rafal Bogacz
Biotechnology and Biological Sciences Research Council (BB/S006338/1)
- Rafal Bogacz
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Reviewing Editor
- Thorsten Kahnt, Northwestern University, United States
Publication history
- Received: November 1, 2019
- Accepted: July 6, 2020
- Accepted Manuscript published: July 7, 2020 (version 1)
- Version of Record published: July 30, 2020 (version 2)
Copyright
© 2020, Bogacz
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 6,905
- Page views
-
- 742
- Downloads
-
- 12
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Computational and Systems Biology
- Neuroscience
Across species, the optokinetic reflex (OKR) stabilizes vision during self-motion. OKR occurs when ON direction-selective retinal ganglion cells (oDSGCs) detect slow, global image motion on the retina. How oDSGC activity is integrated centrally to generate behavior remains unknown. Here, we discover mechanisms that contribute to motion encoding in vertically tuned oDSGCs and leverage these findings to empirically define signal transformation between retinal output and vertical OKR behavior. We demonstrate that motion encoding in vertically tuned oDSGCs is contrast-sensitive and asymmetric for oDSGC types that prefer opposite directions. These phenomena arise from the interplay between spike threshold nonlinearities and differences in synaptic input weights, including shifts in the balance of excitation and inhibition. In behaving mice, these neurophysiological observations, along with a central subtraction of oDSGC outputs, accurately predict the trajectories of vertical OKR across stimulus conditions. Thus, asymmetric tuning across competing sensory channels can critically shape behavior.
-
- Computational and Systems Biology
The hippocampus has been proposed to encode environments using a representation that contains predictive information about likely future states, called the successor representation. However, it is not clear how such a representation could be learned in the hippocampal circuit. Here, we propose a plasticity rule that can learn this predictive map of the environment using a spiking neural network. We connect this biologically plausible plasticity rule to reinforcement learning, mathematically and numerically showing that it implements the TD-lambda algorithm. By spanning these different levels, we show how our framework naturally encompasses behavioral activity and replays, smoothly moving from rate to temporal coding, and allows learning over behavioral timescales with a plasticity rule acting on a timescale of milliseconds. We discuss how biological parameters such as dwelling times at states, neuronal firing rates and neuromodulation relate to the delay discounting parameter of the TD algorithm, and how they influence the learned representation. We also find that, in agreement with psychological studies and contrary to reinforcement learning theory, the discount factor decreases hyperbolically with time. Finally, our framework suggests a role for replays, in both aiding learning in novel environments and finding shortcut trajectories that were not experienced during behavior, in agreement with experimental data.