Reinforcement biases subsequent perceptual decisions when confidence is low: a widespread behavioral phenomenon
Abstract
Learning from successes and failures often improves the quality of subsequent decisions. Past outcomes, however, should not influence purely perceptual decisions after task acquisition is complete since these are designed so that only sensory evidence determines the correct choice. Yet, numerous studies report that outcomes can bias perceptual decisions, causing spurious changes in choice behavior without improving accuracy. Here we show that the effects of reward on perceptual decisions are principled: past rewards bias future choices specifically when previous choice was difficult and hence decision confidence was low. We identified this phenomenon in six datasets from four laboratories, across mice, rats, and humans, and sensory modalities from olfaction and audition to vision. We show that this choice-updating strategy can be explained by reinforcement learning models incorporating statistical decision confidence into their teaching signals. Thus, despite being suboptimal from the experimenter’s perspective, confidence-guided reinforcement learning optimizes behavior in uncertain, real-world situations.
Data availability
The data used in this study is available at http://dx.doi.org/10.6084/m9.figshare.4300043
Article and author information
Author details
Funding
Wellcome (106101)
- Armin Lak
Wellcome (213465)
- Armin Lak
National Institutes of Health (R01 MH110404)
- Naoshige Uchida
National Institutes of Health (R01MH097061 and R01DA038209)
- Naoshige Uchida
Wellcome (205093)
- Matteo Carandini
Deutsche Forschungsgemeinschaft (DO 1240/2-1 and DO 1240/3-1)
- Tobias H Donner
RIKEN-CBS
- Emily Hueske
- Susumu Tonegawa
JPB Foundation
- Emily Hueske
- Susumu Tonegawa
Howard Hughes Medical Institute
- Emily Hueske
- Susumu Tonegawa
German Academic Exchange Service
- Anne E Urai
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Ethics
Animal experimentation: The experimental procedures were approved by Institutional committees at Cold Spring Harbor Laboratory (for experiments on rats), MIT and Harvard University (for mice auditory experiments) and were in accordance with National Institute of Health standards (project ID: 18-14-11-08-1). Experiments on mice visual decisions were approved by the home Office of the United Kingdom (license 70/8021). Experiments in humans were approved by the ethics committee at the University of Amsterdam (project ID: 2014-BC-3376).
Human subjects: The ethics committee at the University of Amsterdam approved the study, and all observers gave their informed consent.project ID: 2014-BC-3376
Reviewing Editor
- Emilio Salinas, Wake Forest School of Medicine, United States
Version history
- Received: July 1, 2019
- Accepted: April 9, 2020
- Accepted Manuscript published: April 14, 2020 (version 1)
- Accepted Manuscript updated: April 15, 2020 (version 2)
- Version of Record published: May 11, 2020 (version 3)
Copyright
© 2020, Lak et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 6,799
- Page views
-
- 1,054
- Downloads
-
- 42
- Citations
Article citation count generated by polling the highest count across the following sources: Scopus, Crossref, PubMed Central.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Neuroscience
The functional complementarity of the vestibulo-ocular reflex (VOR) and optokinetic reflex (OKR) allows for optimal combined gaze stabilization responses (CGR) in light. While sensory substitution has been reported following complete vestibular loss, the capacity of the central vestibular system to compensate for partial peripheral vestibular loss remains to be determined. Here, we first demonstrate the efficacy of a 6-week subchronic ototoxic protocol in inducing transient and partial vestibular loss which equally affects the canal- and otolith-dependent VORs. Immunostaining of hair cells in the vestibular sensory epithelia revealed that organ-specific alteration of type I, but not type II, hair cells correlates with functional impairments. The decrease in VOR performance is paralleled with an increase in the gain of the OKR occurring in a specific range of frequencies where VOR normally dominates gaze stabilization, compatible with a sensory substitution process. Comparison of unimodal OKR or VOR versus bimodal CGR revealed that visuo-vestibular interactions remain reduced despite a significant recovery in the VOR. Modeling and sweep-based analysis revealed that the differential capacity to optimally combine OKR and VOR correlates with the reproducibility of the VOR responses. Overall, these results shed light on the multisensory reweighting occurring in pathologies with fluctuating peripheral vestibular malfunction.
-
- Neuroscience
Genuinely new discovery transcends existing knowledge. Despite this, many analyses in systems neuroscience neglect to test new speculative hypotheses against benchmark empirical facts. Some of these analyses inadvertently use circular reasoning to present existing knowledge as new discovery. Here, I discuss that this problem can confound key results and estimate that it has affected more than three thousand studies in network neuroscience over the last decade. I suggest that future studies can reduce this problem by limiting the use of speculative evidence, integrating existing knowledge into benchmark models, and rigorously testing proposed discoveries against these models. I conclude with a summary of practical challenges and recommendations.