Dynamic decision policy reconfiguration under outcome uncertainty
Abstract
In uncertain or unstable environments, sometimes the best decision is to change your mind. To shed light on this flexibility, we evaluated how the underlying decision policy adapts when the most rewarding action changes. Human participants performed a dynamic two-armed bandit task that manipulated the certainty in relative reward (conflict) and the reliability of action-outcomes (volatility). Continuous estimates of conflict and volatility contributed to shifts in exploratory states by changing both the rate of evidence accumulation (drift rate) and the amount of evidence needed to make a decision (boundary height), respectively. At the trialwise level, following a switch in the optimal choice, the drift rate plummets and the boundary height weakly spikes, leading to a slow exploratory state. We find that the drift rate drives most of this response, with an unreliable contribution of boundary height across experiments. Surprisingly, we find no evidence that pupillary responses associated with decision policy changes. We conclude that humans show a stereotypical shift in their decision policies in response to environmental changes.
Data availability
Behavioral data and their computational derivatives are available at https://github.com/kmbond/dynamic_decision_policy_reconfiguration. Code used to generate figures can be found here: https://github.com/kmbond/dynamic_decision_policy_reconfiguration/tree/master/revised_figure_nbs.Raw pupillometry data (DOI: 10.1184/R1/13543133), the features of the task-evoked pupillometry response (DOI: 10.1184/R1/13543067), and the principal components calculated from those features (DOI: 10.1184/R1/13543160) are available at https://kilthub.cmu.edu/projects/Dynamic_decision_policy_reconfiguration_under_outcome_uncertainty/96116.
Article and author information
Author details
Funding
Air Force Research Laboratory (FA9550-18-1-0251)
- Krista Bond
- Timothy Verstynen
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Ethics
Human subjects: Neurologically healthy adults were recruited from thelocal university population. All procedures were approved by the Carnegie Mellon University Institutional Review Board (Approval Code: 2018_00000195; Funding: Air Force Research Laboratory, Grant Office ID: 180119). All research participants provided informed consent to participate in the study and consent to publish any research findings based on their provided data.
Copyright
© 2021, Bond et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 925
- views
-
- 171
- downloads
-
- 7
- citations
Views, downloads and citations are aggregated across all versions of this paper published by eLife.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Neuroscience
Large vesicle extrusion from neurons may contribute to spreading pathogenic protein aggregates and promoting inflammatory responses, two mechanisms leading to neurodegenerative disease. Factors that regulate the extrusion of large vesicles, such as exophers produced by proteostressed C. elegans touch neurons, are poorly understood. Here, we document that mechanical force can significantly potentiate exopher extrusion from proteostressed neurons. Exopher production from the C. elegans ALMR neuron peaks at adult day 2 or 3, coinciding with the C. elegans reproductive peak. Genetic disruption of C. elegans germline, sperm, oocytes, or egg/early embryo production can strongly suppress exopher extrusion from the ALMR neurons during the peak period. Conversely, restoring egg production at the late reproductive phase through mating with males or inducing egg retention via genetic interventions that block egg-laying can strongly increase ALMR exopher production. Overall, genetic interventions that promote ALMR exopher production are associated with expanded uterus lengths and genetic interventions that suppress ALMR exopher production are associated with shorter uterus lengths. In addition to the impact of fertilized eggs, ALMR exopher production can be enhanced by filling the uterus with oocytes, dead eggs, or even fluid, supporting that distention consequences, rather than the presence of fertilized eggs, constitute the exopher-inducing stimulus. We conclude that the mechanical force of uterine occupation potentiates exopher extrusion from proximal proteostressed maternal neurons. Our observations draw attention to the potential importance of mechanical signaling in extracellular vesicle production and in aggregate spreading mechanisms, making a case for enhanced attention to mechanobiology in neurodegenerative disease.
-
- Neuroscience
Previous studies on reinforcement learning have identified three prominent phenomena: (1) individuals with anxiety or depression exhibit a reduced learning rate compared to healthy subjects; (2) learning rates may increase or decrease in environments with rapidly changing (i.e. volatile) or stable feedback conditions, a phenomenon termed learning rate adaptation; and (3) reduced learning rate adaptation is associated with several psychiatric disorders. In other words, multiple learning rate parameters are needed to account for behavioral differences across participant populations and volatility contexts in this flexible learning rate (FLR) model. Here, we propose an alternative explanation, suggesting that behavioral variation across participant populations and volatile contexts arises from the use of mixed decision strategies. To test this hypothesis, we constructed a mixture-of-strategies (MOS) model and used it to analyze the behaviors of 54 healthy controls and 32 patients with anxiety and depression in volatile reversal learning tasks. Compared to the FLR model, the MOS model can reproduce the three classic phenomena by using a single set of strategy preference parameters without introducing any learning rate differences. In addition, the MOS model can successfully account for several novel behavioral patterns that cannot be explained by the FLR model. Preferences for different strategies also predict individual variations in symptom severity. These findings underscore the importance of considering mixed strategy use in human learning and decision-making and suggest atypical strategy preference as a potential mechanism for learning deficits in psychiatric disorders.