Dynamic decision policy reconfiguration under outcome uncertainty

  1. Krista Bond  Is a corresponding author
  2. Kyle Dunovan
  3. Alexis Porter
  4. Jonathan E Rubin
  5. Timothy Verstynen  Is a corresponding author
  1. Department of Psychology, Carnegie Mellon University, United States
  2. Center for the Neural Basis of Cognition, United States
  3. Carnegie Mellon Neuroscience Institute, United States
  4. Department of Psychology, Northwestern University, United States
  5. Department of Mathematics, University of Pittsburgh, United States
  6. Department of Biomedical Engineering, Carnegie Mellon University, United States
23 figures, 3 tables and 1 additional file

Figures

Dynamic decision policy reconfiguration.

(A) The degree of conflict and volatility shifts the optimal balance between exploration and exploitation. (B) The drift diffusion model. (C) Accuracy (probability that left choice selected is selected; P(L)) as a function of coordinated changes in the rate of evidence accumulation (v) and the amount of information needed to make a decision, or the boundary height (a). (D) Reaction time as a function of changes in the rate of evidence accumulation and the boundary height. (E) Decision policy reconfiguration.

Task and uncertainty manipulation.

(A) In Experiment 1, participants were asked to choose between one of two ‘mystery boxes’. The point value associated with a selection was displayed above the chosen mystery box. The sum of points earned across trials was shown to the left of a treasure box on the upper right portion of the screen. (B) In Experiment 2, participants were asked to choose between one of two Greebles (one male, one female). The total number of points earned was displayed at the center of the screen. The stimulus display was rendered isoluminant throughout the task. (C) The manipulation of conflict and volatility for Experiments 1 (gray) and 2 (black). Each point represents the combination of degrees of conflict and volatility. Under high conflict, the probability of reward for the optimal and suboptimal target is relatively close. Under high volatility, a switch in the identity of the optimal target selection is relatively frequent.

Behavior.

(A) Mean accuracy and reaction time for the manipulation of conflict in Experiment 1. (B) Mean accuracy and reaction time for the manipulation of volatility in Experiment 1. Each point represents the average for a single subject. The distribution to the right represents the bootstrapped uncertainty in the mean difference between conditions (high conflict or high volatility subtracted from low conflict or low volatility). Distributions with 95% CIs that do not encompass 0 are marked with an asterisk. (C) Mean accuracy for Experiment 2. Each purple line represents a subject. The black line represents the mean accuracy calculated across subjects. (D) Reaction time distributions for each subject for Experiment 2. The black line represents the mean reaction time calculated over subjects. Error bars indicate a bootstrapped 95% confidence interval. For panels C and D, λ values shown above each plot specify the average period of optimal choice stability and the probability of reward shown on the x-axis specifies the degree of conflict. Means are calculated over all trials.

Changes in ideal observer estimates as a function of condition for Experiment 1.

(A) Changes in the belief in the value of the optimal target (ΔB) as a function of conflict and volatility over time. (B) Belief in the value of the optimal choice by condition and averaged over all trials. (C) Changes in change point probability (Ω) as a function of conflict and volatility over time. (D) Change point probability by condition and averaged over all trials. Error bars represent 95% CIs.

Change point sensitivity of underlying decision processes.

Posterior distributions for each decision parameter are shown for the trial prior to a change point to three trials after the change point. (A) The drift rate. (B) The boundary height. (C) Non-decision (onset) time. (D) Starting bias. (E) Drift criterion. (F) Degree of fit to observational data as information loss. The models that lost the least information are marked with an asterisk.

Change-point-evoked uncertainty.

(A) Changes in ideal observer estimates of uncertainty over time and their effect on the boundary height and the drift rate. Directly after a change point, the boundary height increases and the drift rate slows. Over time, the boundary height returns to its baseline value and the drift rate increases. (B) Fitted estimates of change-point-evoked drift rate and boundary height for both experiments with 95% CIs of the posterior distributions. Inset plots represent data from Experiment 2.

The decision surface.

(A) Representing decision space in vector form. An angle (θ) was calculated between sequential values of (a,v) coordinates, beginning with the trial prior to the change point. This represents subject-averaged data from Experiment 1. Note that these trajectories are z-scored. (B) Distributions depicting the angle between drift rate and boundary height for both Experiments 1 and 2. Each subpanel shows the distribution of angles between (a, v) over sequential trials, beginning with the trial prior to the change point. The area of the shaded region is proportional to the density and the arrow represents the circular mean.

Model comparisons for the effect of volatility and conflict on the relationship between drift rate and boundary height.

(A) The posterior probability for models testing for an effect of volatility and conflict on the angle of shift in a and v, θ. (B) The Bayes Factor for the null model relative to the alternative models specifying either an effect of time relative to a change point alone or a conditional effect on this evoked response θ. (C) The Bayes Factor for the evoked response model relative to the surviving alternative models specifying a conditional effect on the evoked response, θ. Note that time refers to time relative to the onset of a change point. All models specifying an interaction also include main effects. Dotted horizontal lines refer to grades of evidence (Wagenmakers, 2007).

Method for analyzing pupil data.

(A) The evoked pupillary response was characterized according to seven metrics. (B) These pupillary features were submitted to a principal component analysis. The contribution of each feature to the variance explained for the first two components is plotted for each subject. Note that we also conducted a supplementary analysis of the task-evoked pupillary response using a more conventional method with similar results.

Model comparisons for the effect of change-point-evoked pupillary dynamics on the relationship between drift rate and boundary height (θ).

(A) The posterior probability for models testing for an effect of pupillary dynamics on θ. (B) The Bayes Factor for the evoked response model relative to the alternative models specifying an effect of pupillary dynamics on the evoked response, θ. Note that time refers to time relative to the onset of a change point. All models specifying an interaction also include main effects.

Appendix 1—figure 1
Reaction times.

Median reaction times as a function of volatility (epoch length; λ) and conflict (probability of reward; p(r)).

Appendix 1—figure 2
Change-point-evoked accuracy.

Change-point-evoked accuracy by subject.

Appendix 1—figure 3
Change-point evoked reaction times.

Change-point-evoked reaction times by subject.

Appendix 1—figure 4
Ideal observer estimates for Experiment 2.

(A) The average belief in the value of the optimal target (ΔB) as a function of the probability of reward (conflict) and the average period of stability for the optimal choice (λ; volatility). (B) Average change point probability (Ω) as a function of conflict and volatility.

Appendix 1—figure 5
Initial window selection.

Analysis conducted on data from Experiment 1 to determine the timescale of the response that maximized the intersection between high volatility (λ=15) and low volatility (λ=35) data. The bolded line represents the mean and the gray lines represent individual subjects. The dotted line indicates the initial window of nine trials used.

Appendix 1—figure 6
Stability analysis.

Analysis conducted on Experiment 1 to determine the timescale of the response to consider for Experiment 2. The estimated angle is plotted as a function of time within an epoch (estimate from circular regression).

Appendix 1—figure 7
Quantification of stability.

Probability that sequential posterior distributions for βΔt have equal means.

Appendix 1—figure 8
Blink timing.

Blink timing for a sample participant. For visibility, thirty trials were selected at random. The onset of the trial is marked as time 0 and the trial ends at 1500ms. Blinks are marked in black. Blink timing plots are available for all subjects and all conditions in the GitHub repository for this publication.

Appendix 1—figure 9
Raw pupil diameter.

Mean time course of pupil diameter by subject.

Appendix 1—figure 10
Evoked pupil diameter by condition.

Mean time course of pupil diameter by condition.

Appendix 1—figure 11
First temporal derivative of pupil diameter.

Mean time course of derivative of pupil diameter by subject.

Appendix 1—figure 12
Derivative of evoked pupil diameter by condition.

Mean time course of derivative of pupil diameter by condition.

Appendix 1—figure 13
Prestimulus pupillary response.

Mean prestimulus pupillary response by subject.

Tables

Table 1
Model comparison for Experiments 1 and 2.

Roman numerals refer to a given model, as defined by the mapping between the ideal observer estimates and decision parameters in the first two columns. The left panel shows the deviance information criterion (DIC) scores for the set of models considered during the model selection procedure for Experiment 1. The right panel shows the DIC scores for the equivalent model selection analysis for Experiment 2, with a model estimated for each of four subjects. Values shown represent the mean and standard deviation computed over subjects. Note that the raw DIC values for each of the subjects in Experiment 2 are included in Appendix 3—table 1. The column labeled DIC gives the raw DIC score, ΔDICnull lists the change in model fit from an intercept-only model (the null-adjusted fit), and ΔDICbest provides the change in null-adjusted model fit from the best-fitting model. The best performing model is denoted by an asterisk, with equivocal best cases marked by a tilde.

Experiment 1
ΔBΩDICΔDICnullΔDICbest
*Iva–18643.9–2698.00.0
IIav–16265.6–319.72378.3
IIIv–16180.5–234.72463.3
IVv–18630.8–2684.913.1
Va–15949.20–3.42694.7
VIa–16032.8–87.02611.1
VII–15945.80.02698.0
Experiment 2
ΔBΩΔDICnullΔDICbest
*∼Iva–90.3 ± 71.71.0 ± 0.8
IIav–7.6 ± 13.183.8 ± 60.5
IIIv–8.5 ± 13.182.9 ± 61.4
*∼IVv–90.8 ± 71.00.5 ± 1.1
Va0.3 ± 2.591.6 ± 70.6
VIa0.95 ± 1.492.3 ± 70.9
VII0 ± 091.3 ± 71.5
Appendix 2—table 1
Power analysis for Experiment 1.

The results of the model comparison analysis using simulated data. Roman numerals refer to a given model, as defined by the mapping between the ideal observer estimates and decision parameters in the first two columns. The column labeled DIC gives the raw DIC score, ΔDICnull lists the change in model fit from an intercept-only model (the null-adjusted fit), and ΔDICbest provides the change in null-adjusted model fit from the best-fitting model. The last row represents the null, intercept-only regression model. The best performing model is denoted by an asterisk.

BΩDICΔDICnullΔDICbest
I*va–101886.4–15477.50.0
IIav–87486.7–1077.814399.7
IIIv–87373.9–965.014512.5
IVv–97634.70–11225.84251.70
Va–90577.3–4168.4011309.00
VIa–86525.7–116.7015360.7
VII–86408.90.015477.5
Appendix 3—table 1
Raw model selection results for Experiment 2.

The raw results of the model comparison analysis conducted depicted in Table 1 for Experiment 2. Roman numerals refer to a given model, as defined by the mapping between the ideal observer estimates and decision parameters in the first two columns. The column labeled DIC gives the raw DIC score, ΔDICnull lists the change in model fit from an intercept-only model (the null-adjusted fit), and ΔDICbest provides the change in null-adjusted model fit from the best-fitting model. The last row for each subject represents the null, intercept-only regression model. Equivocal winning models marked with an asterisk and a tilde.

SubjectΔBΩDICΔDICnullΔDICbest
1va5286.0–156.11.8 *sim
1av5430.2–11.9146.0
1-v5431.3–10.8147.1
1v-5284.1–157.90.0 *sim
1-a5444.02.0159.9
1a-5441.1–0.9157.0
1--5442.00.0157.9
2va5162.9–144.10.0 *sim
2av5283.0–24.1120.0
2-v5281.0–26.1118.0
2v-5165.1–142.02.1 *sim
2-a5303.7–3.4140.8
2a-5309.22.1146.2
2--5307.10.0144.1
3va3034.8–53.40.7 *sim
3av3090.11.956.0
3-v3089.51.255.4
3v-3034.1–54.10.0 *sim
3-a3089.20.955.1
3a-3088.80.654.7
3--3088.20.054.1
4va5438.9–7.71.4 *sim
4av5450.53.813.0
4-v5448.51.811.0
4v-5437.5–9.10.0 *sim
4-a5448.21.610.7
4a-5448.72.011.2
4--5446.70.09.1

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Krista Bond
  2. Kyle Dunovan
  3. Alexis Porter
  4. Jonathan E Rubin
  5. Timothy Verstynen
(2021)
Dynamic decision policy reconfiguration under outcome uncertainty
eLife 10:e65540.
https://doi.org/10.7554/eLife.65540