Offline replay supports planning in human reinforcement learning

  1. Ida Momennejad  Is a corresponding author
  2. A Ross Otto
  3. Nathaniel D Daw
  4. Kenneth A Norman
  1. Princeton University, United States
  2. McGill University, Canada

Abstract

Making decisions in sequentially structured tasks requires integrating distally acquired information. The extensive computational cost of such integration challenges planning methods that integrate online, at decision time. Furthermore, it remains unclear whether 'offline' integration during replay supports planning, and if so which memories should be replayed. Inspired by machine learning, we propose that (a) offline replay of trajectories facilitates integrating representations that guide decisions, and (b) unsigned prediction errors (uncertainty) trigger such integrative replay. We designed a 2-step revaluation task for fMRI, whereby participants needed to integrate changes in rewards with past knowledge to optimally replan decisions. As predicted, we found that (a) multi-voxel pattern evidence for off-task replay predicts subsequent replanning; (b) neural sensitivity to uncertainty predicts subsequent replay and replanning; (c) off-task hippocampus and anterior cingulate activity increase when revaluation is required. These findings elucidate how the brain leverages offline mechanisms in planning and goal-directed behavior under uncertainty.

Data availability

Neural and behavioral data have been available online at OpenNeuro (https://openneuro.org/datasets/ds001612/versions/1.0.0).

The following data sets were generated

Article and author information

Author details

  1. Ida Momennejad

    Princeton Neuroscience Institute, Princeton University, Princeton, United States
    For correspondence
    idam@princeton.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0830-3973
  2. A Ross Otto

    Department of Psychology, McGill University, Quebec, Canada
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9997-1901
  3. Nathaniel D Daw

    Princeton Neuroscience Institute, Princeton University, Princeton, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5029-1430
  4. Kenneth A Norman

    Princeton Neuroscience Institute, Princeton University, Princeton, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5887-9682

Funding

John Templeton Foundation (57876)

  • Ida Momennejad
  • Kenneth A Norman

National Institute of Mental Health (R01MH109177)

  • Nathaniel D Daw

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Ethics

Human subjects: The Princeton University Institutional Review Board approved the study. All participants gave informed consent to participate in the fMRI study and signed a screening form that ensured they had normal or corrected to normal vision, had no metal in their body, and had no history of psychiatric or neurological disorders.(Protocol#6014).

Reviewing Editor

  1. David Badre, Brown University, United States

Publication history

  1. Received: October 5, 2017
  2. Accepted: December 4, 2018
  3. Accepted Manuscript published: December 14, 2018 (version 1)
  4. Version of Record published: December 21, 2018 (version 2)

Copyright

© 2018, Momennejad et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 4,099
    Page views
  • 628
    Downloads
  • 45
    Citations

Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Ida Momennejad
  2. A Ross Otto
  3. Nathaniel D Daw
  4. Kenneth A Norman
(2018)
Offline replay supports planning in human reinforcement learning
eLife 7:e32548.
https://doi.org/10.7554/eLife.32548

Further reading

    1. Computational and Systems Biology
    2. Neuroscience
    Emmanuelle Bioud et al.
    Research Article

    To decide whether a course of action is worth pursuing, individuals typically weigh its expected costs and benefits. Optimal decision-making relies upon accurate effort cost anticipation, which is generally assumed to be performed independently from goal valuation. In two experiments (n = 46), we challenged this independence principle of standard decision theory. We presented participants with a series of treadmill routes randomly associated to monetary rewards and collected both ‘accept’ versus ‘decline’ decisions and subjective estimates of energetic cost. Behavioural results show that higher monetary prospects led participants to provide higher cost estimates, although reward was independent from effort in our design. Among candidate cognitive explanations, they support a model in which prospective cost assessment is biased by the output of an automatic computation adjusting effort expenditure to goal value. This decision bias might lead people to abandon the pursuit of valuable goals that are in fact not so costly to achieve.

    1. Computational and Systems Biology
    2. Neuroscience
    Janus RL Kobbersmed et al.
    Research Article

    Synaptic communication relies on the fusion of synaptic vesicles with the plasma membrane, which leads to neurotransmitter release. This exocytosis is triggered by brief and local elevations of intracellular Ca2+ with remarkably high sensitivity. How this is molecularly achieved is unknown. While synaptotagmins confer the Ca2+ sensitivity of neurotransmitter exocytosis, biochemical measurements reported Ca2+ affinities too low to account for synaptic function. However, synaptotagmin's Ca2+ affinity increases upon binding the plasma membrane phospholipid PI(4,5)P2 and, vice versa, Ca2+-binding increases synaptotagmin's PI(4,5)P2 affinity, indicating a stabilization of the Ca2+/PI(4,5)P2 dual-bound syt. Here we devise a molecular exocytosis model based on this positive allosteric stabilization and the assumptions that (1.) synaptotagmin Ca2+/PI(4,5)P2 dual binding lowers the energy barrier for vesicle fusion and that (2.) the effect of multiple synaptotagmins on the energy barrier is additive. The model, which relies on biochemically measured Ca2+/PI(4,5)P2 affinities and protein copy numbers, reproduced the steep Ca2+ dependency of neurotransmitter release. Our results indicate that each synaptotagmin dual binding Ca2+/PI(4,5)P2 lowers the energy barrier for vesicle fusion by ~5 kBT and that allosteric stabilization of this state enables the synchronized engagement of several (typically three) synaptotagmins for fast exocytosis. Furthermore, we show that mutations altering synaptotagmin’s allosteric properties may show dominant-negative effects, even though synaptotagmins act independently on the energy barrier, and that dynamic changes of local PI(4,5)P2 (e.g. upon vesicle movement) dramatically impact synaptic responses. We conclude that allosterically stabilized Ca2+/PI(4,5)P2 dual binding enables synaptotagmins to exert their coordinated function in neurotransmission.