Offline replay supports planning in human reinforcement learning

  1. Ida Momennejad  Is a corresponding author
  2. A Ross Otto
  3. Nathaniel D Daw
  4. Kenneth A Norman
  1. Princeton University, United States
  2. McGill University, Canada

Abstract

Making decisions in sequentially structured tasks requires integrating distally acquired information. The extensive computational cost of such integration challenges planning methods that integrate online, at decision time. Furthermore, it remains unclear whether 'offline' integration during replay supports planning, and if so which memories should be replayed. Inspired by machine learning, we propose that (a) offline replay of trajectories facilitates integrating representations that guide decisions, and (b) unsigned prediction errors (uncertainty) trigger such integrative replay. We designed a 2-step revaluation task for fMRI, whereby participants needed to integrate changes in rewards with past knowledge to optimally replan decisions. As predicted, we found that (a) multi-voxel pattern evidence for off-task replay predicts subsequent replanning; (b) neural sensitivity to uncertainty predicts subsequent replay and replanning; (c) off-task hippocampus and anterior cingulate activity increase when revaluation is required. These findings elucidate how the brain leverages offline mechanisms in planning and goal-directed behavior under uncertainty.

Data availability

Neural and behavioral data have been available online at OpenNeuro (https://openneuro.org/datasets/ds001612/versions/1.0.0).

The following data sets were generated

Article and author information

Author details

  1. Ida Momennejad

    Princeton Neuroscience Institute, Princeton University, Princeton, United States
    For correspondence
    idam@princeton.edu
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-0830-3973
  2. A Ross Otto

    Department of Psychology, McGill University, Quebec, Canada
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9997-1901
  3. Nathaniel D Daw

    Princeton Neuroscience Institute, Princeton University, Princeton, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5029-1430
  4. Kenneth A Norman

    Princeton Neuroscience Institute, Princeton University, Princeton, United States
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5887-9682

Funding

John Templeton Foundation (57876)

  • Ida Momennejad
  • Kenneth A Norman

National Institute of Mental Health (R01MH109177)

  • Nathaniel D Daw

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Ethics

Human subjects: The Princeton University Institutional Review Board approved the study. All participants gave informed consent to participate in the fMRI study and signed a screening form that ensured they had normal or corrected to normal vision, had no metal in their body, and had no history of psychiatric or neurological disorders.(Protocol#6014).

Copyright

© 2018, Momennejad et al.

This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 5,204
    views
  • 779
    downloads
  • 97
    citations

Views, downloads and citations are aggregated across all versions of this paper published by eLife.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Ida Momennejad
  2. A Ross Otto
  3. Nathaniel D Daw
  4. Kenneth A Norman
(2018)
Offline replay supports planning in human reinforcement learning
eLife 7:e32548.
https://doi.org/10.7554/eLife.32548

Share this article

https://doi.org/10.7554/eLife.32548

Further reading

    1. Neuroscience
    Jill R Turner, Jocelyn Martin
    Insight

    Reversing opioid overdoses in rats using a drug that does not enter the brain prevents the sudden and severe withdrawal symptoms associated with therapeutics that target the central nervous system.

    1. Neuroscience
    Ilya A Rybak, Natalia A Shevtsova ... Alain Frigon
    Research Advance

    Locomotion is controlled by spinal circuits that interact with supraspinal drives and sensory feedback from the limbs. These sensorimotor interactions are disrupted following spinal cord injury. The thoracic lateral hemisection represents an experimental model of an incomplete spinal cord injury, where connections between the brain and spinal cord are abolished on one side of the cord. To investigate the effects of such an injury on the operation of the spinal locomotor network, we used our computational model of cat locomotion recently published in eLife (Rybak et al., 2024) to investigate and predict changes in cycle and phase durations following a thoracic lateral hemisection during treadmill locomotion in tied-belt (equal left-right speeds) and split-belt (unequal left-right speeds) conditions. In our simulations, the ‘hemisection’ was always applied to the right side. Based on our model, we hypothesized that following hemisection the contralesional (‘intact’, left) side of the spinal network is mostly controlled by supraspinal drives, whereas the ipsilesional (‘hemisected’, right) side is mostly controlled by somatosensory feedback. We then compared the simulated results with those obtained during experiments in adult cats before and after a mid-thoracic lateral hemisection on the right side in the same locomotor conditions. Our experimental results confirmed many effects of hemisection on cat locomotion predicted by our simulations. We show that having the ipsilesional hindlimb step on the slow belt, but not the fast belt, during split-belt locomotion substantially reduces the effects of lateral hemisection. The model provides explanations for changes in temporal characteristics of hindlimb locomotion following hemisection based on altered interactions between spinal circuits, supraspinal drives, and somatosensory feedback.