Confidence-guided updating of choice bias during perceptual decisions is a widespread behavioral phenomenon
Abstract
Learning from past successes and failures improves decisions to produce appropriate actions in each perceived situation. However, reinforcement learning is not thought to be engaged during well-trained perceptual decision tasks, —after task acquisition is complete and performance is stable—, since choice accuracy is limited by perception. We report a novel form of reinforcement learning during perceptual decisions: past rewards bias future perceptual choices specifically when the previous stimulus was difficult to judge, and the confidence in obtaining the reward was low. We identified this phenomenon in six datasets from four laboratories, across mice, rats and humans, and sensory modalities from olfaction and audition to vision. We show that reinforcement learning models incorporating decision confidence into their teaching signal explain this choice updating. Thus, reinforcement learning mechanisms are continually engaged to produce systematic adjustments of choices even in well-learned perceptual decisions in order to optimize behavior in an uncertain world.
Data availability
The data used in this study is available at http://dx.doi.org/10.6084/m9.figshare.4300043
Article and author information
Author details
Funding
Wellcome (106101)
- Armin Lak
Wellcome (213465)
- Armin Lak
National Institutes of Health (R01 MH110404)
- Naoshige Uchida
National Institutes of Health (R01MH097061 and R01DA038209)
- Naoshige Uchida
Wellcome (205093)
- Matteo Carandini
Deutsche Forschungsgemeinschaft (DO 1240/2-1 and DO 1240/3-1)
- Tobias H Donner
RIKEN-CBS
- Emily Hueske
- Susumu Tonegawa
JPB Foundation
- Emily Hueske
- Susumu Tonegawa
Howard Hughes Medical Institute
- Emily Hueske
- Susumu Tonegawa
German Academic Exchange Service
- Anne E Urai
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Reviewing Editor
- Emilio Salinas, Wake Forest School of Medicine, United States
Ethics
Animal experimentation: The experimental procedures were approved by Institutional committees at Cold Spring Harbor Laboratory (for experiments on rats), MIT and Harvard University (for mice auditory experiments) and were in accordance with National Institute of Health standards (project ID: 18-14-11-08-1). Experiments on mice visual decisions were approved by the home Office of the United Kingdom (license 70/8021). Experiments in humans were approved by the ethics committee at the University of Amsterdam (project ID: 2014-BC-3376).
Human subjects: The ethics committee at the University of Amsterdam approved the study, and all observers gave their informed consent.project ID: 2014-BC-3376
Version history
- Received: July 1, 2019
- Accepted: April 9, 2020
- Accepted Manuscript published: April 14, 2020 (version 1)
- Accepted Manuscript updated: April 15, 2020 (version 2)
- Version of Record published: May 11, 2020 (version 3)
Copyright
© 2020, Lak et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 7,312
- views
-
- 1,120
- downloads
-
- 75
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Neuroscience
The enhancement of associative synaptic plasticity often results in impaired rather than enhanced learning. Previously, we proposed that such learning impairments can result from saturation of the plasticity mechanism (Nguyen-Vu et al., 2017), or, more generally, from a history-dependent change in the threshold for plasticity. This hypothesis was based on experimental results from mice lacking two class I major histocompatibility molecules, MHCI H2-Kb and H2-Db (MHCI KbDb−/−), which have enhanced associative long-term depression at the parallel fiber-Purkinje cell synapses in the cerebellum (PF-Purkinje cell LTD). Here, we extend this work by testing predictions of the threshold metaplasticity hypothesis in a second mouse line with enhanced PF-Purkinje cell LTD, the Fmr1 knockout mouse model of Fragile X syndrome (FXS). Mice lacking Fmr1 gene expression in cerebellar Purkinje cells (L7-Fmr1 KO) were selectively impaired on two oculomotor learning tasks in which PF-Purkinje cell LTD has been implicated, with no impairment on LTD-independent oculomotor learning tasks. Consistent with the threshold metaplasticity hypothesis, behavioral pre-training designed to reverse LTD at the PF-Purkinje cell synapses eliminated the oculomotor learning deficit in the L7-Fmr1 KO mice, as previously reported in MHCI KbDb−/−mice. In addition, diazepam treatment to suppress neural activity and thereby limit the induction of associative LTD during the pre-training period also eliminated the learning deficits in L7-Fmr1 KO mice. These results support the hypothesis that cerebellar LTD-dependent learning is governed by an experience-dependent sliding threshold for plasticity. An increased threshold for LTD in response to elevated neural activity would tend to oppose firing rate stability, but could serve to stabilize synaptic weights and recently acquired memories. The metaplasticity perspective could inform the development of new clinical approaches for addressing learning impairments in autism and other disorders of the nervous system.
-
- Neuroscience
The hippocampal-dependent memory system and striatal-dependent memory system modulate reinforcement learning depending on feedback timing in adults, but their contributions during development remain unclear. In a 2-year longitudinal study, 6-to-7-year-old children performed a reinforcement learning task in which they received feedback immediately or with a short delay following their response. Children’s learning was found to be sensitive to feedback timing modulations in their reaction time and inverse temperature parameter, which quantifies value-guided decision-making. They showed longitudinal improvements towards more optimal value-based learning, and their hippocampal volume showed protracted maturation. Better delayed model-derived learning covaried with larger hippocampal volume longitudinally, in line with the adult literature. In contrast, a larger striatal volume in children was associated with both better immediate and delayed model-derived learning longitudinally. These findings show, for the first time, an early hippocampal contribution to the dynamic development of reinforcement learning in middle childhood, with neurally less differentiated and more cooperative memory systems than in adults.