Dopamine role in learning and action inference

  1. Rafal Bogacz  Is a corresponding author
  1. University of Oxford, United Kingdom

Abstract

This paper describes a framework for modelling dopamine function in the mammalian brain. It proposes that both learning and action planning involve processes minimizing prediction errors encoded by dopaminergic neurons. In this framework, dopaminergic neurons projecting to different parts of the striatum encode errors in predictions made by the corresponding systems within the basal ganglia. The dopaminergic neurons encode differences between rewards and expectations in the goal-directed system, and differences between the chosen and habitual actions in the habit system. These prediction errors trigger learning about rewards and habit formation, respectively. Additionally, dopaminergic neurons in the goal-directed system play a key role in action planning: They compute the difference between an available reward and the reward expected from the current motor plan, and they facilitate action planning until this difference diminishes. Presented models account for dopaminergic responses during movements, effects of dopamine depletion on behaviour, and make several experimental predictions.

Data availability

Matlab codes for all simulations described in the paper are available at MRC Brain Network Dynamics Unit Data Sharing Platform(https://data.mrc.ox.ac.uk/data-set/simulations-action-inference).

The following data sets were generated

Article and author information

Author details

  1. Rafal Bogacz

    Medical Research Council Brain Network Dynamics Unit, University of Oxford, Oxford, United Kingdom
    For correspondence
    rafal.bogacz@ndcn.ox.ac.uk
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8994-1661

Funding

Medical Research Council (MC_UU_12024/5)

  • Rafal Bogacz

Medical Research Council (MC_UU_00003/1)

  • Rafal Bogacz

Biotechnology and Biological Sciences Research Council (BB/S006338/1)

  • Rafal Bogacz

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Reviewing Editor

  1. Thorsten Kahnt, Northwestern University, United States

Publication history

  1. Received: November 1, 2019
  2. Accepted: July 6, 2020
  3. Accepted Manuscript published: July 7, 2020 (version 1)
  4. Version of Record published: July 30, 2020 (version 2)

Copyright

© 2020, Bogacz

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 6,905
    Page views
  • 742
    Downloads
  • 12
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Rafal Bogacz
(2020)
Dopamine role in learning and action inference
eLife 9:e53262.
https://doi.org/10.7554/eLife.53262

Further reading

    1. Computational and Systems Biology
    2. Neuroscience
    Scott C Harris, Felice A Dunn
    Research Article

    Across species, the optokinetic reflex (OKR) stabilizes vision during self-motion. OKR occurs when ON direction-selective retinal ganglion cells (oDSGCs) detect slow, global image motion on the retina. How oDSGC activity is integrated centrally to generate behavior remains unknown. Here, we discover mechanisms that contribute to motion encoding in vertically tuned oDSGCs and leverage these findings to empirically define signal transformation between retinal output and vertical OKR behavior. We demonstrate that motion encoding in vertically tuned oDSGCs is contrast-sensitive and asymmetric for oDSGC types that prefer opposite directions. These phenomena arise from the interplay between spike threshold nonlinearities and differences in synaptic input weights, including shifts in the balance of excitation and inhibition. In behaving mice, these neurophysiological observations, along with a central subtraction of oDSGC outputs, accurately predict the trajectories of vertical OKR across stimulus conditions. Thus, asymmetric tuning across competing sensory channels can critically shape behavior.

    1. Computational and Systems Biology
    Jacopo Bono, Sara Zannone ... Claudia Clopath
    Research Article

    The hippocampus has been proposed to encode environments using a representation that contains predictive information about likely future states, called the successor representation. However, it is not clear how such a representation could be learned in the hippocampal circuit. Here, we propose a plasticity rule that can learn this predictive map of the environment using a spiking neural network. We connect this biologically plausible plasticity rule to reinforcement learning, mathematically and numerically showing that it implements the TD-lambda algorithm. By spanning these different levels, we show how our framework naturally encompasses behavioral activity and replays, smoothly moving from rate to temporal coding, and allows learning over behavioral timescales with a plasticity rule acting on a timescale of milliseconds. We discuss how biological parameters such as dwelling times at states, neuronal firing rates and neuromodulation relate to the delay discounting parameter of the TD algorithm, and how they influence the learned representation. We also find that, in agreement with psychological studies and contrary to reinforcement learning theory, the discount factor decreases hyperbolically with time. Finally, our framework suggests a role for replays, in both aiding learning in novel environments and finding shortcut trajectories that were not experienced during behavior, in agreement with experimental data.