One-shot learning and behavioral eligibility traces in sequential decision making
Abstract
In many daily tasks we make multiple decisions before reaching a goal. In order to learn such sequences of decisions, a mechanism to link earlier actions to later reward is necessary. Reinforcement learning theory suggests two classes of algorithms solving this credit assignment problem: In classic temporal-difference learning, earlier actions receive reward information only after multiple repetitions of the task, whereas models with eligibility traces reinforce entire sequences of actions from a single experience (one-shot). Here we show one-shot learning of sequences. We developed a novel paradigm to directly observe which actions and states along a multi-step sequence are reinforced after a single reward. By focusing our analysis on those states for which RL with and without eligibility trace make qualitatively distinct predictions, we find direct behavioral (choice probability) and physiological (pupil dilation) signatures of reinforcement learning with eligibility trace across multiple sensory modalities.
Data availability
The datasets generated during the current study are available on Dryad, at the following address https://dx.doi.org/10.5061/dryad.j7h6f69
-
Data from: One-shot learning and behavioral eligibility traces in sequential decision makingDryad Digital Repository, doi:10.5061/dryad.j7h6f69.
Article and author information
Author details
Funding
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (CRSII2 147636 (Sinergia))
- Marco P Lehmann
- He A Xu
- Vasiliki Liakoni
- Michael H Herzog
- Wulfram Gerstner
- Kerstin Preuschoff
Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung (CRSII2 200020 165538)
- Marco P Lehmann
- Vasiliki Liakoni
- Wulfram Gerstner
Horizon 2020 Framework Programme (Human Brain Project (SGA2) 785907)
- Michael H Herzog
- Wulfram Gerstner
H2020 European Research Council (268 689 MultiRules)
- Wulfram Gerstner
Horizon 2020 Framework Programme (Human Brain Project (SGA1) 720270)
- Michael H Herzog
- Wulfram Gerstner
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Reviewing Editor
- Thorsten Kahnt, Northwestern University, United States
Ethics
Human subjects: Experiments were conducted in accordance with the Helsinki declaration and approved by the ethics commission of the Canton de Vaud (164/14 Titre: Aspects fondamentaux de la reconnaissance des objets : protocole général). All participants were informed about the general purpose of the experiment and provided written, informed consent. They were told that they could quit the experiment at any time they wish.
Version history
- Received: April 5, 2019
- Accepted: November 1, 2019
- Accepted Manuscript published: November 11, 2019 (version 1)
- Version of Record published: December 6, 2019 (version 2)
Copyright
© 2019, Lehmann et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 2,752
- views
-
- 398
- downloads
-
- 15
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Neuroscience
Learning requires the ability to link actions to outcomes. How motivation facilitates learning is not well understood. We designed a behavioral task in which mice self-initiate trials to learn cue-reward contingencies and found that the anterior cingulate region of the prefrontal cortex (ACC) contains motivation-related signals to maximize rewards. In particular, we found that ACC neural activity was consistently tied to trial initiations where mice seek to leave unrewarded cues to reach reward-associated cues. Notably, this neural signal persisted over consecutive unrewarded cues until reward-associated cues were reached, and was required for learning. To determine how ACC inherits this motivational signal we performed projection-specific photometry recordings from several inputs to ACC during learning. In doing so, we identified a ramp in bulk neural activity in orbitofrontal cortex (OFC)-to-ACC projections as mice received unrewarded cues, which continued ramping across consecutive unrewarded cues, and finally peaked upon reaching a reward-associated cue, thus maintaining an extended motivational state. Cellular resolution imaging of OFC confirmed these neural correlates of motivation, and further delineated separate ensembles of neurons that sequentially tiled the ramp. Together, these results identify a mechanism by which OFC maps out task structure to convey an extended motivational state to ACC to facilitate goal-directed learning.
-
- Neuroscience
Hippocampal place cells in freely moving rodents display both theta phase precession and procession, which is thought to play important roles in cognition, but the neural mechanism for producing theta phase shift remains largely unknown. Here, we show that firing rate adaptation within a continuous attractor neural network causes the neural activity bump to oscillate around the external input, resembling theta sweeps of decoded position during locomotion. These forward and backward sweeps naturally account for theta phase precession and procession of individual neurons, respectively. By tuning the adaptation strength, our model explains the difference between ‘bimodal cells’ showing interleaved phase precession and procession, and ‘unimodal cells’ in which phase precession predominates. Our model also explains the constant cycling of theta sweeps along different arms in a T-maze environment, the speed modulation of place cells’ firing frequency, and the continued phase shift after transient silencing of the hippocampus. We hope that this study will aid an understanding of the neural mechanism supporting theta phase coding in the brain.