Figures and data in Human exploration strategically balances approaching and avoiding uncertainty

Figures
Tables
Additional files

9 figures, 24 tables and 1 additional file

Figures

Figure 1

Download asset Open asset

Examining exploration strategy in relation to uncertainty in an incremental learning task.

(a) Structure of the task. Participants explored four tables, each containing two decks with different proportions of blue/orange cards. The goal was to learn the difference in proportions of the decks on each table. (b) The two phases of the task - exploration and test. On a single exploration trial (left), participants chose between two tables, and then sampled a card from one of the decks on that table, observing its color. After a random number of exploration trials, participants were tested on their knowledge (right). A color was designated as rewarding, and participants then chose the deck with the highest proportion of the rewarding color on each table. They were rewarded for correct test-phase choices and received no reward during exploration. (c) Histogram of round lengths. Participants played 22 rounds. The length of exploration in each round followed a shifted geometric distribution, such that the test was equally likely to occur following any trial after the first 10. (d) We considered a hierarchy of strategies for choosing which table to explore. The normatively prescribed strategy is to choose the table affording maximal expected information gain. This is the table for which the next card is expected to maximally decrease uncertainty (measured as entropy $H$ ) about the value of the goal-relevant latent parameter θ, given observations thus far $x$ . A simpler strategy is to choose the table with the maximum uncertainty, as it does not necessitate computing an expectation over the next observation. An even simpler heuristic is to equate previous exposure and choose the table with the least previous observations $n_{x}$ . Even though these three strategies vary considerably in complexity, they are all uncertainty-approaching on average. Lastly, people may be random explorers.

Figure 2 with 1 supplement

Download asset Open asset

Hypothetical strategies make differing predictions for exploratory choice behavior.

We computed the three quantities hypothesized to drive exploratory choices using a Bayesian observer model. To illustrate this process, we plot the derivation of Bayesian belief on a single trial (a) and across multiple trials (**b, c**). For visualization, we use a simplified version with two tables only. a depicts the Bayesian observer’s belief about a single table on a single trial. Given a sequence of previously observed cards (left), the Bayesian observer forms posterior beliefs about the proportion of orange cards in each deck (center). These beliefs are expressed as Beta distributions. From these, it is possible to derive a belief about the difference in the proportion of orange cards between the two decks $π_{1} - π_{2}$ (right). The probability that $π_{1} > π_{2}$ is given by the proportional size of the area marked in gray (0.74 in this example). (b) Depicts the same process over a series of 20 trials. The observed card sequence for each table is presented at the top of each panel. The matching belief state about $π_{1} - π_{2}$ is plotted below it as an evolving posterior density in white (high) and black (low). The green arrows mark the true value of $π_{1} - π_{2}$ for that round. As the round progresses, belief converges towards the true value and becomes more certain. (c) The three choice strategies prescribe different table choices on most trials. The difference between table 1 and table 2 in each of the three quantities (EIG, uncertainty, and exposure) is plotted for each trial. This difference is the hypothesized decision variable for choosing between tables 1 and 2. A positive value indicates a preference for exploring table 1, and a negative value indicates a preference for table 2. The three variables are normalized to facilitate visual comparison.

Figure 2—figure supplement 1

Download asset Open asset

Correlations between the three strategies.

Correlations between the decision variable value or the predicted choice of each strategy for each participant. Correlations with Δ-exposure are low, whereas correlations between Δ-uncertainty and Δ-EIG are moderate. To address these dependencies, we fit each strategy separately and used model comparison and posterior predictive checks to adjudicate among them. Sample mean plotted as a line.

Figure 3 with 1 supplement

Download asset Open asset

The Bayesian observer model is validated by participants’ accuracy and confidence on the test phase.

(a) Participants were accurate when an exploration phase ended with low uncertainty and performed at chance level when the phase ended with high uncertainty. (b) Participants’ confidence on correct choices fell with rising uncertainty. Confidence on error trials did not depend as much on Bayesian observer uncertainty. When a test question was unsolvable because no evidence was observed on each deck during exploration, participants had very low confidence. Data presented as mean values ± 1 SE, n=194 participants.

Figure 3—figure supplement 1

Download asset Open asset

Matching results in the preliminary sample.

Reproducing the analysis using the preliminary sample: The Bayesian observer model is validated by participants’ accuracy and confidence on the test phase. (a) Participants were accurate when an exploration phase ended with low uncertainty and performed at chance level when the phase ended with high uncertainty. (b) Participants’ confidence in correct choices fell with rising uncertainty. Confidence in errors did not depend as much on Bayesian observer uncertainty. Data presented as mean values ± 1 SE, n=62 participants. Nats are the units of entropy, a mathematically convenient measure of uncertainty.

Figure 4 with 3 supplements

Download asset Open asset

Uncertainty is the best predictor of choice.

(a) On each plot, the difference in the hypothesized quantity between the two tables presented on each trial is plotted against actual choices of the table presented on the right. For each plot, the relevant hypothesis predicts a positive smooth curve. Δ-uncertainty, plotted on the left, matches this prediction better than Δ-EIG (center). The relationship between Δ-exposure (right) and choice is negative, rather than the hypothesized positive correlation. (b) Quantitative model comparison confirms this observation. Out of the three hypothesized strategies, uncertainty has the highest approximate expected log predictive density (using PSIS LOO; see Methods). Data presented as mean values ± 1 SE, n=194 participants.

Figure 4—figure supplement 1

Download asset Open asset

Fitting simulated data successfully recovers the underlying strategy.

Our analysis approach successfully recovers the strategy used by simulated agents. We compared the actual data (top) to simulated datasets, each comprising a group of agents operating according to one of the hypothesized strategies. We simulated using the effect size observed for uncertainty in the actual data. Each agent matched a single participant, choosing cards from the decks presented to the participant. The probability that an agent would choose the table on the right on a given trial is $f (a + b \times Δ x)$ , where f is the logistic function, b is the degree to which the agent's choices are dependent on Δx, standing for the relevant decision variable, and a is a general bias towards rightward or leftward choices. Coefficients a and b were extracted per participant from the uncertainty model described in Figure 4. We assumed the agents choose a random deck on the table of their choice. The simulated data for each group of agents were plotted against each of the three decision variables and were fit with the same models we used on the actual dataset. We tested whether our procedure for qualitative and quantitative model comparison used in Figure 4 is potent at recovering the true strategy generating the data. For easy comparison, the actual data is re-plotted on the first row. For each of the three simulated strategies, we observe successful recovery: the decision variable matching the true strategy shows the strongest positive correlation with choice, and the correct strategy is indicated as best fitting the data. Furthermore, a negative correlation between behavior and Δ-exposure, as observed in the true data, is only evident in the uncertainty-based group of agents (second row). Data plotted as means ±1SE, n=194 participants/agents.

Figure 4—figure supplement 2

Download asset Open asset

Uncertainty is a sufficient predictor of choice.

Simulations confirm that uncertainty is a sufficient predictor of choice. We further confirmed our conclusion that uncertainty is the best predictor of participants' choices by plotting the posterior predictive distribution for each of the models predicting choice from a hypothesized strategy. We simulated 500 datasets for each of the three models and plotted the distribution of the simulated data (green lines with 50% and 95% posterior interval bands, n=500 iterations) against the observed dataset (means ± 1SE plotted in black, n=194 participants). The simulation procedure was similar to that used in Figure 4—figure supplement 1, with the exception that coefficients were extracted from the posterior distribution of each model fitted to the actual data. We expect that the posterior predictive distribution for each model would capture the relationship between the relevant decision variable and choice well. The extent to which the posterior predictive distribution can recreate the association with the other two decision variables is a test of model fit. (a) The posterior predictive distribution for the EIG model does not match the observed data well: it does not reproduce the strong slope for Δ-uncertainty, nor the negative correlation with Δ-exposure. (b) The posterior predictive distribution for uncertainty captures the data very well, matching the particular shape of the correspondence between choices and Δ-EIG, and the negative correlation between choices and. Δ-exposure. (c) The posterior predictive distribution of exposure does not match observed data well: it fails to recreate the positive correlations between choice and EIG and uncertainty.

Figure 4—figure supplement 3

Download asset Open asset

Matching results in the preliminary sample.

Reproducing the analysis in Figure 4 using the preliminary sample: Uncertainty is the best predictor of choice. (a) On each plot, the difference in the hypothesized quantity between the two tables presented on each trial is plotted against actual choices of the table presented on the right. For each plot, the relevant hypothesis predicts a positive smooth curve. Δ-uncertainty, plotted on the left, matches this prediction better than Δ-EIG (center). The relationship between Δ-exposure (right) and choice is negative, rather than the hypothesized positive correlation. (b) Quantitative model comparison confirms this observation. Out of the three hypothesized strategies, uncertainty has the highest approximate expected log predictive density (PSIS LOO; see Methods). Data presented as mean values ±1 SE, n=62 participants.

Figure 5 with 2 supplements

Download asset Open asset

Participants approach vs. avoid Δ-uncertainty as a function of overall uncertainty.

(a) While the Δ-uncertainty is the decision variable identified above, overall uncertainty, defined as the sum of uncertainty for both tables, is a measure of decision difficulty. (b) The influence of Δ-uncertainty on choice differed markedly below and above a threshold of overall uncertainty. Below a certain threshold of overall uncertainty, estimated as a free parameter, Δ-uncertainty had a significant positive effect on choice. Above this threshold of overall uncertainty, the influence of Δ-uncertainty became strongly negative. Points denote mean posterior estimate from regression models fitted to binned data, error bars mark 50% PI. The solid line depicts the prediction from a piecewise regression model capturing the non-linear relationship and estimating the threshold, with darker ribbon marking 50% PI and light ribbon marking 95% PI. Data from three regions of overall uncertainty marked in color are plotted in (c) For low overall uncertainty (blue), participants tend to choose the table they are more uncertain about, as normatively prescribed. But that relationship is broken for medium levels of overall uncertainty (purple). For high overall uncertainty (red), participants strongly prefer to choose the table they are less uncertain about, thereby slowing down the rate of information intake. Data plotted as mean ± SE, n=194 participants.

Figure 5—figure supplement 1

Download asset Open asset

Matching results in the preliminary sample.

Reproducing the analysis using the preliminary sample: Participants approach vs. avoid Δ-uncertainty as a function of overall uncertainty. (a) The influence of Δ-uncertainty on choice differed markedly below and above a threshold of overall uncertainty. Below a certain estimated threshold of overall uncertainty, Δ-uncertainty had a significant positive effect on choice. Above this threshold of overall uncertainty, the influence of Δ-uncertainty decreased significantly. Points denote mean posterior estimate from regression models fitted to binned data, error bars mark 50% PI. The solid line depicts the prediction from a piecewise regression model capturing the non-linear relationship and estimating the threshold, with the darker ribbon marking 50% PI and the light ribbon marking 95% PI. Data from three regions of overall uncertainty marked in color are plotted in (b). For low overall uncertainty (blue), participants tend to choose the table they are more uncertain about, as normatively prescribed. But that relationship is broken for medium levels of overall uncertainty (purple). For high overall uncertainty (red), participants strongly prefer to choose the table they are less uncertain about, thereby slowing down the rate of information intake. Data plotted as mean ± SE, n=62 participants.

Figure 5—figure supplement 2

Download asset Open asset

No correlation between overall uncertainty and Δ-uncertainty.

The values of overall uncertainty and Δ-uncertainty on each trial of the pre-registered sample. In any finite-information exploration task, overall uncertainty sets bounds on the possible values of Δ-uncertainty. However, the two measures are not correlated, which allows them to be included as independent predictors of behavior within the same regression model.

Figure 6 with 3 supplements

Download asset Open asset

Learners benefit from approaching uncertainty, but are not penalized for avoiding it.

(a) We observe substantial individual differences in strategy. Replotting Figure 5e separately for each participant highlights variation in the baseline tendency to approach uncertainty, as well as in the degree of avoidance when overall uncertainty is high. (b) A stronger baseline tendency to approach uncertainty (left) predicts better test performance, such that participants unable to approach uncertainty also perform poorly. Test performance shows a weak positive correlation with avoidance when overall uncertainty is high (middle), since learners who approach uncertainty also tend to avoid it under high uncertainty (right). Uncertainty avoidance is quantified as the triangular area above the piecewise regression line in panel a.

Figure 6—figure supplement 1

Download asset Open asset

Matching results in the preliminary sample.

Reproducing the analysis using the preliminary sample: Learners benefit from approaching uncertainty, but are not penalized for avoiding it. (a) We observe substantial individual differences in strategy. Replotting Figure 5—figure supplement 1a separately for each participant highlights variation in the baseline tendency to approach uncertainty, as well as in the degree of avoidance when overall uncertainty is high. (b) A stronger baseline tendency to approach uncertainty (left) predicts better test performance, such that participants unable to approach uncertainty also perform poorly. Test performance shows a weak positive correlation with avoidance when overall uncertainty is high (middle), reflecting that learners who approach uncertainty also tend to avoid it under high uncertainty (right). Uncertainty avoidance is quantified as the triangular area above the piecewise regression line in panel a.

Figure 6—figure supplement 2

Download asset Open asset

Individual differences in the use of Δ-EIG and Δ-exposure: pre-registered sample.

Variations in exploration strategy usage among individuals. (a) Individual differences in the support for uncertainty over EIG. The difference in predictive density for the uncertainty model versus the EIG model is plotted against the individually estimated coefficient for EIG. Uncertainty is supported by 125 participants (pink). EIG was supported for 58 participants, who had both a better model fit value for EIG and a positive EIG coefficient (gold). 11 participants showed inconclusive results, with a better model fit value for EIG, but a negative EIG coefficient (gray). (b) Uncertainty was favored over exposure for most participants (pink). 96 participants had a better model fit value for uncertainty than exposure. An additional 72 participants had a negative exposure coefficient, a corollary of uncertainty-based choice (\FIGSUPP[basic]{sufficiency}a). Exposure was supported for 26 participants who had both a better model fit value for exposure and a positive exposure coefficient (brown). There was no relationship between test accuracy and the relative support for uncertainty over EIG (c), nor over exposure (d). Test accuracy plotted as a function of the EIG coefficient. (d) Test accuracy plotted as a function of the exposure coefficient. See Appendix 3—tables 20–23 for models and model comparison. Error bars span the 50% PI, n=194 participants.

Figure 6—figure supplement 3

Download asset Open asset

Individual differences in the use of Δ-EIG and Δ-exposure: preliminary sample.

Variations in exploration strategy usage among individuals. (a) Individual differences in the support for uncertainty over EIG. The difference in predictive density for the uncertainty model versus the EIG model is plotted against the individually estimated coefficient for EIG. Uncertainty is supported by 30 participants (pink). EIG was supported for 28 participants, who had both a better model fit value for EIG and a positive EIG coefficient (gold). Four participants showed inconclusive results, with a better model fit value for EIG, but a negative EIG coefficient (gray). (b) Uncertainty was favored over exposure for most participants (pink). 31 participants had a better model fit value for uncertainty than exposure. An additional 27 participants had a negative exposure coefficient, a corollary of uncertainty-based choice (Figure 4—figure supplement 2a). Exposure was supported for 4 participants who had both a better model fit value for exposure and a positive exposure coefficient (brown). There was no relationship between test accuracy and the relative support for uncertainty over EIG (c), nor over exposure (d). Test accuracy plotted as a function of the EIG coefficient. (d) Test accuracy plotted as a function of the exposure coefficient. See Appendix 3—tables 20–23 for models and model comparison. Error bars span the 50% PI, n=194 participants.

Figure 7 with 1 supplement

Download asset Open asset

Individuals who spend time deliberating during exploration make strategic choices and learn well.

Participants varied not only in the pattern of their choices, but also in their RTs. (a) Data from three example participants. The relationship of choice and RTs with Δ-uncertainty weakens from left to right. Data plotted as mean ± SE. (b) These individual differences were captured by a sequential sampling model, explaining choices and RTs as the interaction between participants’ efficacy of deliberating about Δ-uncertainty and their tendency to deliberate longer vs. make quick responses. Plotting model predictions, we observe a U-shaped dependence of RTs on Δ-uncertainty for participants whose performance at test was in the top accuracy tertile. This characteristic u-shape is indicative of decisions made by prolonged deliberation. This relationship is weaker for participants in the bottom two test accuracy tertiles. Such participants also exhibit shorter RTs overall. Lines mark mean predictions from a sequential sampling model fit by tertiles for visualization, ribbons denote 50% PIs. (c) Correlating the sequential sampling model parameters with test performance confirms these observations. Participants with a stronger dependence of RT on Δ-uncertainty perform better at test, as do participants who deliberate longer for the sake of accuracy. Example participants from a are marked in red. Lines are mean predictions from a logistic regression model.

Figure 7—figure supplement 1

Download asset Open asset

Matching results in the preliminary sample.

Reproducing the analysis using the preliminary sample: Individuals who spend time deliberating during exploration make strategic choices and learn well. Participants varied not only in the pattern of their choices, but also in their RTs. (a) Data from three example participants. The relationship of choice and RTs with Δ-uncertainty weakens from left to right. Data plotted as mean ± SE. (b) These individual differences were captured by a sequential sampling model, explaining choices and RTs as the interaction between participant’s efficacy of deliberating about Δ-uncertainty and their tendency to deliberate longer vs. make quick responses. Plotting model predictions, we observe a u-shaped dependence of RTs on Δ-uncertainty for participants whose performance at test was in the top accuracy tertile. This characteristic u-shape is indicative of decisions made by prolonged deliberation. This relationship is weaker for participants in the bottom two test accuracy tertiles. Such participants also exhibit shorter RTs overall. Lines mark mean predictions from a sequential sampling model fit by tertiles for visualization, ribbons denote 50% PIs. (c) Correlating the sequential sampling model parameters with test performance confirms these observations. Participants with a stronger dependence of RT on Δ-uncertainty perform better at test, as do participants who deliberate longer for the sake of accuracy. Example participants from a are marked in red. Lines are mean predictions from a logistic regression model.

Figure 8 with 1 supplement

Download asset Open asset

Participants tend to repeat previous choices instead of deliberating over uncertainty.

(a) On a given trial, one table has been chosen more recently than the other (frames denote previous choices). In the example, the green table had been chosen more recently; hence, it is designated the repeat option and the other table the switch option. (b) Participants tend to choose the table displayed on the right more often when it is the repeat option than when it is the switch option. Data plotted as mean ± SE, n=194 participants. (c) When choosing a repeat option, participants’ RTs are shorter and less dependent on Δ-uncertainty. Lines mark mean predictions from a sequential sampling model, ribbons denote 50% PIs. (d) Participants who tended to repeat their previous choice also tended to perform better at test (left), were more likely to have a stronger baseline tendency to approach uncertainty (middle), and a stronger tendency to avoid uncertainty when overall uncertainty is high (right). Regression lines are plotted for visualization.

Figure 8—figure supplement 1

Download asset Open asset

Matching results in the preliminary sample.

Reproducing the analysis using the preliminary sample: participants tend to repeat previous choices instead of deliberating over uncertainty. (a) On a given trial, one table has been chosen more recently than the other (frames denote previous choices). In the example, the green table had been chosen more recently; hence, it is designated the repeat option and the other table the switch option. (b) Participants tend to choose the table displayed on the right more often when it is the repeat option than when it is the switch option. Data plotted as mean ± SE, n=194 participants. (c) When choosing a repeat option, participants’ RTs are shorter and less dependent on Δ-uncertainty. Lines mark mean predictions from a sequential sampling model, ribbons denote 50% PIs. (d) Participants who tended to repeat their previous choice also tended to perform better at test (left), were more likely to have a stronger baseline tendency to approach uncertainty (middle), and a stronger tendency to avoid uncertainty when overall uncertainty is high (right). Regression lines are plotted for visualization.

Figure 9 with 1 supplement

Download asset Open asset

Forgetting is associated with random choice rather than a systematic bias.

(a) Memory lag, defined as trials since last choice, serves as a proxy for forgetting and contributes to the difficulty of making an exploratory choice. RTs rise with memory lag. The RT advantage for repeat choices disappears with higher memory lag. (b) With higher memory lag, choices become less dependent on Δ-uncertainty, as indicated by flatter curves. The tendency to repeat the last choice is also diminished with memory lag. Both effects amount to choice becoming more random due to forgetting. Data plotted as mean ± SE, n=194 participants.

Figure 9—figure supplement 1

Download asset Open asset

Matching results in the preliminary sample.

Reproduce the analysis using the preliminary sample. Forgetting is associated with random choice rather than a systematic bias. (a) Memory lag, defined as trials since last choice, serves as a proxy for forgetting and contributes to the difficulty of making an exploratory choice. RTs rise with memory lag. The RT advantage for repeat choices disappears with higher memory lag. (b) With higher memory lag, choices become less dependent on Δ-uncertainty, as indicated by flatter curves. The tendency to repeat the last choice is also diminished with memory lag. Both effects amount to choice becoming more random due to forgetting. Data plotted as mean ± SE, n=194 participants.

Tables

Table 1

Regularizing priors used in regression models.

Type of coefficient	Prior for logistic and ordered-logistic regression	Prior for lognormal regression (RTs; following Schad et al., 2019)
Intercept	normal(0,1) (not applicable for ordered logistic models; Bürkner and Vuorre, 2019)	normal(–0.25, 0.5)
Group-level effects of predictors	normal(0,1)	normal(0, 0.5)
Scale of by-participant terms	normal(0,1)	normal(0, 0.01)
Correlation matrices for by-participant terms	LKJ(2)	LKJ(2)

Prior distributions are given in Stan syntax. All predictors used in models were centered and scaled prior to fitting, so that the same priors can apply to all parameters.

Appendix 3—table 1

Test accuracy as a function of the final uncertainty in the exploration phase.

	Pre-registered sample		Preliminary sample
	(12.379 trials, 194 participants)		(3482 trials, 62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	0.86	[0.61, 1.11]	0.90	[0.47, 1.32]	logit
Final uncertainty	-5.59	[-6.25,–4.95]	-5.69	[-6.86,–4.68]	logit / nats
Participant-wise variability
SD of intercept	1.58	[1.38, 1.82]	1.45	[1.13, 1.89]
SD of final uncertainty	7.49	[6.54, 8.61]	6.83	[5.33, 8.88]
Correlation of intercept and final uncertainty	-0.68	[-0.84,–0.46]	-0.81	[-0.97,–0.40]

The model can be summarized with the following R syntax formula: $a c c u r a c y \sim 0.5 + 0.5 * i n v_l o g i t (1 + f i n a l > u n c e r t a i n t y +$ $(1 + f i n a l > u n c e r t a i n t y | p a r t i c i p a n t))$ , where $i n v_l o g i t$ is the inverse logit function. This functional form limits predicted accuracy between 0.5 and 1.0, since guessing-level accuracy on a two-alternative forced-choice test is 0.5. Since accuracy is a binary variable, this regression was fit with a Bernoulli likelihood.

Appendix 3—table 2

Test confidence as a function of the final uncertainty in the exploration phase.

	Pre-registered sample		Preliminary sample
	(12,007 trials, 194 participants)		(3362 trials, 62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Threshold 1	-2.22	[-2.44,–2.00]	-2.11	[-2.49,–1.72]
Threshold 2	-0.44	[-0.67,–0.23]	-0.20	[–0.58, 0.17]
Threshold 3	1.31	[1.08, 1.53]	1.60	[1.22, 1.98]
Threshold 4	3.11	[2.88, 3.34]	3.58	[3.18, 3.98]
Final uncertainty	-2.48	[-2.93,–2.06]	-2.32	[-3.11,–1.56]	logit / nats
Choice accuracy	1.09	[0.92, 1.27]	1.05	[0.71, 1.39]	logit
Final uncertainty × choice accuracy	-3.10	[-3.76,–2.46]	-3.20	[-4.60,–1.93]	logit / nats
Participant-wise variability
SD of intercept	1.45	[1.31, 1.63]	1.43	[1.19, 1.73]
SD of final uncertainty	2.10	[1.73, 2.52]	2.12	[1.39, 2.97]
SD of choice accuracy	0.81	[0.64, 0.99]	0.84	[0.52, 1.25]
SD of uncertainty × accuracy	2.11	[1.46, 2.82]	3.02	[1.58, 4.59]
Correlation of intercept and uncertainty	0.23	[0.03, 0.41]	0.07	[–0.26, 0.39]
Correlation of intercept and accuracy	-0.20	[–0.39, 0.01]	0.02	[–0.33, 0.40]
Correlation of uncertainty and accuracy	-0.57	[-0.81,–0.30]	-0.51	[-0.83,–0.03]
Correlation of intercept and uncertainty × accuracy	0.27	[–0.01, 0.52]	0.28	[–0.11, 0.62]
Correlation of uncertainty and uncertainty × accuracy	0.68	[0.34, 0.91]	0.26	[–0.24, 0.70]
Correlation of accuracy and uncertainty × accuracy	-0.84	[-0.95,–0.61]	-0.23	[–0.66, 0.35]

The model can be summarized with the following R syntax formula: $c o n f i d e n c e \sim 0 + f i n a l u n c e r t a i n t y * a c c u r a c y +$ $(1 + f i n a l u n c e r t a i n t y * a c c u r a c y | P I D)$ . This model was fit as an ordered-logistic regression, with four threshold variables since confidence was rated on a 5-point Likert scale (Bürkner and Vuorre, 2019).

Appendix 3—table 3

Exploration-phase choices as a function of Δ-uncertainty.

	Pre-registered sample		Preliminary sample
	(146,766 trials, 194 participants)		(41,009 trials, 62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	-0.04	[–0.10, 0.02]	-0.03	[–0.11, 0.06]	logit
Δ-uncertainty	0.89	[0.76, 1.03]	1.01	[0.70, 1.30]	logit / nat
Participant-wise variability
SD of intercept	0.39	[0.35, 0.43]	0.36	[0.30, 0.45]
SD of Δ-uncertainty	0.89	[0.79, 1.00]	1.11	[0.91, 1.38]
Correlation of intercept and Δ-uncertainty	-0.05	[–0.20, 0.12]	0.07	[–0.21, 0.31]

The model can be summarized with the following R syntax formula: $t a b l e o n r i g h t c h o s e n \sim 1 + Δ u n c e r t a i n t y +$ $(1 + Δ u n c e r t a i n t y | p a r t i c i p a n t)$ . This model was fit as a logistic regression.

Appendix 3—table 4

Exploration-phase choices as a function of Δ-EIG.

	Pre-registered sample		Preliminary sample
	(146,766 trials, 194 participants)		(41,009 trials, 62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	-0.04	[–0.09, 0.02]	-0.03	[–0.12, 0.07]	logit
Δ-EIG	10.12	[8.00, 12.47]	12.50	[7.97, 16.91]	logit / nat
Participant-wise variability
SD of intercept	0.39	[0.35, 0.43]	0.35	[0.29, 0.43]
SD of Δ-EIG	1.21	[1.07, 1.37]	1.48	[1.21, 1.82]
Correlation of intercept and Δ-EIG	-0.02	[–0.17, 0.14]	-0.11	[–0.35, 0.16]

The model can be summarized with the following R syntax formula: $t a b l e o n r i g h t c h o s e n \sim 1 + Δ E I G +$ $(1 + Δ E I G | p a r t i c i p a n t)$ . This model was fit as a logistic regression.

Appendix 3—table 5

Exploration-phase choices as a function of Δ-exposure.

	Pre-registered sample		Preliminary sample
	(146,766 trials, 194 participants)		(41,009 trials, 62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	-0.04	[–0.10, 0.02]	-0.03	[–0.13, 0.07]	logit
Δ-exposure	-0.03	[-0.04,–0.02]	-0.03	[-0.05,–0.02]	logit / trial
Participant-wise variability
SD of intercept	0.39	[0.35, 0.43]	0.36	[0.29, 0.44]
SD of Δ-exposure	0.05	[0.04, 0.06]	0.05	[0.04, 0.06]
Correlation of intercept and Δ-exposure	0.16	[–0.01, 0.32]	-0.08	[–0.36, 0.21]

The model can be summarized with the following R syntax formula: $t a b l e o n r i g h t c h o s e n \sim 1 + Δ e x p o s u r e +$ $(1 + Δ e x p o s u r e | p a r t i c i p a n t)$ . This model was fit as a logistic regression.

Appendix 3—table 6

Exploration-phase choices as a function of Δ-uncertainty and overall uncertainty.

	Pre-registered sample		Preliminary sample
	(146766 trials, 194 participants)		(41,009 trials, 62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	-0.04	[–0.10, 0.02]	-0.03	[–0.13, 0.07]	logit
Δ-uncertainty	0.97	[0.83, 1.11]	1.12	[0.83, 1.42]	logit / nat
Δ-uncertainty × overall uncertainty	-428.44	[-536.60,–339.27]	-444.27	[-559.73,–353.23]	logit / nat²
Transformed threshold α	2.52	[2.40, 2.64]	2.33	[2.17, 2.49]	a.u.
Participant-wise variability
SD of intercept	0.39	[0.35, 0.43]	0.36	[0.30, 0.44]
SD ofΔ-uncertainty	0.92	[0.81, 1.04]	1.09	[0.89, 1.35]
SD ofΔ-uncertainty × overall uncertainty	12.85	[0.56, 41.41]	8.97	[0.46, 28.35]
SD of transformed threshold	0.45	[0.38, 0.54]	0.35	[0.26, 0.47]

This model can be summarized with the following formula: $l o g i t (P (t a b l e o n r i g h t c h o s e n)) = I n t e r c e p t$ $+ b_{1} * Δ u n c e r t a i n t y$ $+ b_{2} * (o v e r a l l u n c e r t a i n t y - ω)$ $* s t e p (o v e r a l l u n c e r t a i n t y - ω) * Δ u n c e r t a i n t y$ , where step is the step function, $ω = - 2 l n (0.5) * i n v_l o g i t (α)$ . The intercept and parameters b₁, b₂, and α all vary by participant. This model was fit as a logistic regression.

Appendix 3—table 7

Test performance as a function of the tendency to approach uncertainty in exploration.

	Pre-registered sample		Preliminary sample
	(194 participants)		(62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	1.56	[1.51, 1.61]	1.62	[1.53, 1.72]	logit
Approach tendency	2.96	[2.67, 3.25]	3.09	[2.65, 3.57]	logit² / nat

The model can be summarized with the following R syntax formula: $t e s t a c c u r a c y \sim 1 + a p p r o a c h t e n d e n c y$ . For the tendency to approach uncertainty, we computed the mean posterior approach parameter for each participant in the model described in Appendix 3—table 6. The model described here was fit as a logistic regression with binomial likelihood.

Appendix 3—table 8

Test performance as a function of tendency to avoid uncertainty in exploration when overall uncertainty is high.

	Pre-registered sample		Preliminary sample
	(194 participants)		(62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	1.52	[1.48, 1.57]	1.59	[1.50, 1.68]	logit
Avoid tendency	1.18	[0.80, 1.58]	-0.52	[–1.20, 0.21]	nat²

The model can be summarized with the following R syntax formula: $t e s t a c c u r a c y \sim 1 + a p p r o a c h t e n d e n c y$ . For the tendency to avoid uncertainty when overall uncertainty is high, we computed for each participant the area under the curve of the uncertainty approach/avoid graph, averaging across the posterior of the model described in Appendix 3—table 6. The model described here was fit as a logistic regression with binomial likelihood.

Appendix 3—table 9

Test performance as a function of tendency to approach uncertainty and the tendency to avoid uncertainty when overall uncertainty is high.

	Pre-registered sample		Preliminary sample
	(194 participants)		(62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	1.56	[1.52, 1.61]	1.62	[1.53, 1.72]	logit
Approach tendency	3.22	[2.89, 3.55]	3.10	[2.64, 3.58]	logit² / nat
Avoid tendency	-0.83	[-1.28,–0.40]	0.02	[–0.64, 0.75]	nat²
Participant-wise variability

The model can be summarized with the following R syntax formula: $t e s t a c c u r a c y \sim 1 + a v o i d t e n d e n c y + a p p r o a c h t e n d e n c y$ . For the tendency to avoid uncertainty when overall uncertainty is high, we computed for each participant the area under the curve of the uncertainty approach/avoid graph, averaging across the posterior of the model described in Appendix 3—table 6. The model described here was fit as a logistic regression with binomial likelihood.

Appendix 3—table 10

Drift diffusion model of exploration-phase choice and RTs.

	Pre-registered sample		Preliminary sample
	(113,746 trials, 194 participants)		(31,205 trials, 62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
B - bound height	0.74	[0.72, 0.76]	0.74	[0.71, 0.76]
µ₀ - drift rate offset	-0.01	[–0.06, 0.03]	-0.01	[–0.08, 0.06]
$κ$ - dependence of drift rate on uncertainty	0.69	[0.58, 0.78]	0.78	[0.58, 0.99]
t_ND - non-decision time	0.28	[0.26, 0.31]	0.27	[0.25, 0.31]
Participant-wise variability
SD of B	0.12	[0.11, 0.14]	0.10	[0.08, 0.12]
SD of µ₀	0.30	[0.27, 0.33]	0.25	[0.21, 0.31]
SD of $κ$	0.66	[0.58, 0.74]	0.76	[0.62, 0.94]
SD of t_ND	0.16	[0.14, 0.19]	0.12	[0.10, 0.15]

We used a drift-diffusion model to formalize the dependence of RTs and choice on evidence. A drift-diffusion model is one variant in the sequential sampling family of models. The model posits that samples of momentary evidence are integrated over time. The expectation of the momentary evidence distribution is termed the drift rate µ, and its standard deviation is termed the diffusion coefficient. The decision is made when integrated evidence reaches an upper or lower bound (±B), whose sign determines the choice. Processes external to decision making are modeled by t_ND, a constant added to the RT. In this model, µ is allowed to depend linearly on Δ-uncertainty, $μ = μ_{0} + κ \cdot Δ - u n c e r t a i n t y$ , such that $κ$ captures the dependence of drift rate on Δ-uncertainty, and µ₀ is a general bias to make rightward or leftward choices.
Prior to fitting the model to the data, we excluded trials for which overall uncertainty was above the threshold estimated for each participant by the model described in Appendix 3—table 6. As we find qualitatively different choice behavior above the threshold, we couldn’t justify modeling these trials together with the majority of trials. Fitting a piecewise regression DDM model was beyond the capabilities of current software.

Appendix 3—table 11

Test performance as a function of drift diffusion model parameters for exploration phase.

	Pre-registered sample		Preliminary sample
	(113,746 trials, 194 participants)		(31,205 trials, 62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	0.33	[–0.34, 1.01]	0.23	[–0.86, 1.28]
Final uncertainty	-0.87	[-0.98,–0.76]	-0.93	[-1.13,–0.74]
B - bound height	1.46	[0.58, 2.34]	1.49	[0.05, 2.90]
$κ$ - dependence of drift rate on uncertainty	0.81	[0.58, 1.07]	0.88	[0.57, 1.21]
Participant-wise variability
SD of intercept	0.87	[0.74, 1.01]	0.68	[0.47, 0.96]
SD of final uncertainty	0.49	[0.39, 0.60]	0.44	[0.24, 0.67]
Correlation of intercept and final uncertainty	-0.81	[-0.92,–0.66]	-0.83	[-0.98,–0.44]

This model can be summarized with the following R syntax formula: $t e s t a c c u r a c y \sim 1 + f i n a l u n c e r t a i n t y +$ $B + κ + (1 + f i n a l u n c e r t a i n t y | p a r t i c i p a n t)$ . This model was fit as a logistic regression. As B and $κ$ are parameters estimated from the model described in Appendix 3—table 10, we took into account our error in measuring them when using them as predictors in this model. Thus, the posterior distribution for each participant’s B and $κ$ parameters was summarized as a mean and standard deviation. These summary statistics were used to approximate the posterior as a normal distribution from which a latent variable was drawn during the estimation of this model. This method propagates the uncertainty in the values of B and $κ$ into the estimates reported here. Prior to using this method, we inspected the posteriors from the model summarized in Appendix 3—table 10, and made sure the normal distribution is an adequate approximation for these posteriors.

Appendix 3—table 12

Exploration-phase choices as a function of Δ-uncertainty, overall uncertainty, and side of repeat option.

	Pre-registered sample		Preliminary sample
	(146,766 trials, 194 participants)		(41,009 trials, 62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	–0.04	[–0.10, 0.02]	–0.03	[–0.14, 0.07]	logit
Δ-uncertainty	1.01	[0.86, 1.14]	1.16	[0.87, 1.44]	logit / nat
Δ-uncertainty × overall uncertainty	–326.14	[-468.61,–245.18]	–349.54	[-691.47,–251.13]	logit / nat²
Transformed threshold	2.55	[2.40, 2.72]	2.36	[2.16, 2.67]	a.u.
Repeat choice on right	0.50	[0.42, 0.59]	0.57	[0.43, 0.72]	logit difference
Participant-wise variability
SD of intercept	0.40	[0.36, 0.44]	0.38	[0.31, 0.46]
SD of Δ-uncertainty	0.90	[0.80, 1.02]	1.05	[0.86, 1.31]
SD of Δ-uncertainty × overall uncertainty	14.21	[0.98, 41.45]	9.23	[0.46, 29.28]
SD of transformed threshold	0.44	[0.36, 0.54]	0.35	[0.22, 0.50]
SD of repeat choice	0.57	[0.51, 0.64]	0.53	[0.44, 0.66]
Correlation of intercept and Δ-uncertainty	–0.06	[–0.21, 0.10]	0.05	[–0.21, 0.32]
Correlation of intercept and Δ-uncertainty×overall uncertainty	–0.08	[–0.76, 0.72]	–0.01	[–0.76, 0.75]
Correlation of Δ-uncertainty and Δ-uncertainty×overall uncertainty	0.01	[–0.69, 0.69]	–0.10	[–0.78, 0.70]
Correlation of intercept and repeat choice	–0.16	[-0.30,–0.00]	–0.06	[–0.31, 0.22]
Correlation of Δ-uncertainty and repeat choice	0.32	[0.17, 0.46]	0.11	[–0.18, 0.38]
Correlation of Δ-uncertainty × overall uncertainty and repeat choice	0.38	[–0.67, 0.87]	0.00	[–0.76, 0.77]
Correlation of intercept and threshold	–0.03	[–0.26, 0.22]	0.31	[–0.09, 0.64]
Correlation of Δ-uncertainty and threshold	–0.35	[-0.52,–0.16]	0.09	[–0.27, 0.43]
Correlation of Δ-uncertainty × overall uncertainty and threshold	–0.42	[–0.88, 0.61]	–0.10	[–0.79, 0.70]
Correlation of repeat choice and threshold	–0.60	[-0.74,–0.43]	–0.38	[-0.64,–0.05]

This model can be summarized with the following formula: $l o g i t (P (t a b l e o n r i g h t c h o s e n)) = I n t e r c e p t +$ $b 1 \cdot Δ - u n c e r t a i n t y + b 2 \cdot (o v e r a l l u n c e r t a i n t y - ω)$ $\cdot s t e p (o v e r a l l u n c e r t a i n t y - ω) \cdot Δ - u n c e r t a i n t y + b 3 \cdot r e p e a t o n r i g h t$ , where step is the step function, $ω = - 2 l n (0.5) \cdot i n v_l o g i t (α)$ . The intercept and parameters b1, b2, b3, and α all vary by participant, and their correlations across participants are modeled. This model was fit as a logistic regression.

Appendix 3—table 13

Comparing models predicting exploration-phase choice from the three hypothesized strategies and the tendency to repeat previous choices.

Predictors included	Pre-registered sample ELPD difference (SE)	Preliminary sample ELPD difference (SE)
Uncertainty and tendency to repeat	0	0
EIG and tendency to repeat	–375.97 (49.66)	91.03 (31.64)
Exposure and tendency to repeat	–1022.84 (61.47)	–443.51 (38.03)
Uncertainty	–2639.82 (71.53)	–780.11 (38.98)
EIG	–3090.33 (87.49)	–986.96 (50.48)
Exposure	–3123.77 (91.03)	–1061.01 (52.67)

Comparing the three models described in Appendix 3—tables 3–5, and the same models with the addition of a term coding for whether the repeat choice was presented on the right. Values represent the difference in expected log predictive density (ELPD) from the best fitting model. For both samples, adding the choice repetition term improves model fit substantially, while not changing the order of the three strategies.

Appendix 3—table 14

Drift diffusion model of exploration-phase choice and RTs, differentiating between repeat and switch choices.

	Pre-registered sample		Preliminary sample
	(113,746 trials, 194 participants)		(31,205 trials, 62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
B₀- average bound height	0.75	[0.73, 0.76]	0.74	[0.72, 0.77]
B_repeat- difference in bound height between repeat and switch chosen	-0.05	[-0.05,–0.04]	-0.04	[-0.06,–0.02]
µ₀- drift rate offset	-0.01	[–0.06, 0.03]	-0.01	[–0.08, 0.05]
$κ$ - dependence of drift rate on uncertainty	0.70	[0.61, 0.80]	0.81	[0.60, 1.01]
$κ_{r e p e a t}$ - difference in dependence between repeat and switch chosen	-0.32	[-0.43,–0.22]	-0.28	[-0.49,–0.08]
t_ND- non-decision time	0.28	[0.26, 0.31]	0.28	[0.25, 0.31]
Participant-wise variability
SD of B₀	0.12	[0.11, 0.14]	0.10	[0.08, 0.12]
SD of B_repeat	0.05	[0.04, 0.05]	0.05	[0.04, 0.07]
SD of µ₀	0.30	[0.27, 0.33]	0.26	[0.21, 0.31]
SD of $κ$	0.64	[0.57, 0.72]	0.74	[0.60, 0.92]
SD of $κ_{r e p e a t}$	0.50	[0.40, 0.60]	0.58	[0.39, 0.80]
SD of t_ND	0.16	[0.14, 0.19]	0.12	[0.10, 0.15]

We refit the DDM described in Appendix 3—table 10, allowing both B and $κ$ to vary by whether the choice was a repeat or switch choice (see main text for definition).

Appendix 3—table 15

Test performance as a function of the tendency to repeat exploration-phase choices.

	Pre-registered sample		Preliminary sample
	(194 participants)		(62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	1.52	[1.47, 1.56]	1.59	[1.50, 1.67]	logit
Tendency to repeat	0.09	[0.07, 0.11]	0.11	[0.06, 0.16]	logit / logit
Participant-wise variability

The model can be summarized with the following R syntax formula: $t e s t a c c u r a c y \sim 1 + t e n d e n c y t o r e p e a t$ . For the tendency to repeat, we computed the mean posterior parameter for each participant in the model described in Appendix 3—tables 1 and 2 This model was fit as a logistic regression with binomial likelihood.

Appendix 3—table 16

Exploration-phase RTs as a function of memory lag and side of repeat option.

	Pre-registered sample		Preliminary sample
	(126,848 trials, 194 participants)		(35,264 trials, 62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	-0.34	[-0.38,–0.30]	-0.35	[-0.42,–0.30]	log s
Memory lag	0.02	[0.02, 0.03]	0.03	[0.02, 0.03]	log s / trial
Repeat choice on right	-0.05	[-0.06,–0.04]	-0.05	[-0.07,–0.03]	log s difference
Memory lag × repeat	0.02	[0.02, 0.03]	0.02	[0.02, 0.03]	1/trial
Participant-wise variability
SD of intercept	0.29	[0.26, 0.31]	0.23	[0.19, 0.27]
SD of memory lag	0.02	[0.01, 0.02]	0.02	[0.01, 0.02]
SD of repeat choice on right	0.06	[0.05, 0.07]	0.07	[0.06, 0.09]
SD of memory lag ×repeat	0.02	[0.01, 0.02]	0.02	[0.01, 0.03]
Correlation of intercept and memory lag	0.41	[0.24, 0.56]	0.19	[–0.11, 0.46]
Correlation of intercept and repeat	-0.27	[-0.42,–0.11]	-0.35	[-0.57,–0.07]
Correlation of memory lag and repeat	-0.70	[-0.83,–0.53]	-0.27	[–0.57, 0.07]
Correlation of intercept and memory lag × repeat	0.27	[0.04, 0.48]	0.26	[–0.17, 0.65]
Correlation of memory lag and memory lag × repeat	0.75	[0.51, 0.90]	0.21	[–0.28, 0.65]
Correlation of repeat and memory lag × repeat	-0.91	[-0.98,–0.77]	-0.74	[-0.94,–0.36]

The model can be summarized with the following R syntax formula: $l o g R T \sim 1 + m e m o r y l a g * r e p e a t o n r i g h t +$ $(1 + m e m o r y l a g \cdot r e p e a t o n r i g h t | p a r t i c i p a n t)$ . This model was fit as a lognormal regression.

Appendix 3—table 17

Exploration-phase choices as a function of Δ-uncertainty, memory lag, and side of repeat option.

	Pre-registered sample		Preliminary sample
	(126,973 trials, 194 participants)		(35,304 trials, 62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	-0.03	[–0.09, 0.02]	-0.02	[–0.12, 0.08]	logit
Δ-uncertainty	1.03	[0.89, 1.17]	1.16	[0.87, 1.43]	logit / nat
Memory lag	-0.01	[–0.02, 0.00]	0.01	[–0.00, 0.02]	logit / trial
Repeat choice on right	0.45	[0.37, 0.52]	0.50	[0.37, 0.63]	logit difference
Δ-uncertainty × memory lag	-0.08	[-0.11,–0.04]	-0.14	[-0.20,–0.07]	logit / nat * trial
Memory lag × repeat	-0.13	[-0.15,–0.11]	-0.08	[-0.12,–0.04]	1/trial
Participant-wise variability
SD of intercept	0.40	[0.36, 0.45]	0.37	[0.31, 0.45]
SD of Δ-uncertainty	0.92	[0.81, 1.04]	1.06	[0.87, 1.32]
SD of memory lag	0.03	[0.02, 0.04]	0.02	[0.00, 0.04]
SD of repeat choice on right	0.49	[0.44, 0.55]	0.45	[0.36, 0.57]
SD of Δ-uncertainty×memory lag	0.13	[0.08, 0.17]	0.07	[0.00, 0.19]
SD of memory lag × repeat	0.10	[0.08, 0.13]	0.11	[0.07, 0.16]

The model can be summarized with the following R syntax formula: $t a b l e o n r i g h t c h o s e n \sim 1 + Δ - u n c e r t a i n t y$ $\cdot m e m o r y l a g + m e m o r y l a g : r e p e a t o n r i g h t +$ $(1 + Δ - u n c e r t a i n t y \cdot m e m o r y l a g +$ $m e m o r y l a g : r e p e a t o n r i g h t | p a r t i c i p a n t)$ . This model was fit as a logistic regression. For brevity, the correlations in participant-wise variability are omitted from this table.

Appendix 3—table 18

Exploration-phase choices as a function of Δ-uncertainty, overall uncertainty, and trial number.

	Pre-registered sample		Preliminary sample
	(146,766 trials, 194 participants)		(41,009 trials, 62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	-0.04	[–0.09, 0.02]	-0.03	[–0.12, 0.06]	logit
Δ-uncertainty	1.00	[0.85, 1.14]	1.16	[0.85, 1.46]	logit / nat
Δ-uncertainty × overall uncertainty	-480.86	[-628.28,–379.29]	-450.20	[-568.82,–356.27]	logit / nat²
Transformed threshold	2.56	[2.43, 2.69]	2.32	[2.18, 2.48]	a.u.
Δ-uncertainty × trial #	-0.00	[–0.00, 0.00]	-0.00	[–0.01, 0.00]	logit / nat ×trial
Participant-wise variability
SD of intercept	0.39	[0.35, 0.43]	0.36	[0.30, 0.45]
SD of Δ-uncertainty	0.94	[0.83, 1.06]	1.13	[0.93, 1.39]
SD of Δ-uncertainty × overall uncertainty	10.68	[0.52, 36.54]	8.87	[0.45, 28.51]
SD of transformed threshold	0.43	[0.36, 0.51]	0.34	[0.25, 0.46]
SD Δ-uncertainty × trial #	0.02	[0.01, 0.02]	0.02	[0.01, 0.02]

We refit the piecewise regression model described in Appendix 3—table 6, accounting for a possible interaction between Δ-uncertainty and trial number. We find no significant interaction in the pre-registered sample, nor the preliminary sample. All other terms in the model remained practically the same.
The model can be summarized with the following formula: $l o g i t (P (t a b l e o n r i g h t c h o s e n)) = I n t e r c e p t +$ $b 1 \cdot Δ - u n c e r t a i n t y + b 2 \cdot (o v e r a l l u n c e r t a i n t y - ω)$ $\cdot s t e p (o v e r a l l u n c e r t a i n t y - ω) \cdot Δ$ $- u n c e r t a i n t y + b 4 \cdot t r i a l # \cdot Δ - u n c e r t a i n t y$ , where step is the step function, $ω = - 2 l n (0.5) \cdot i n v_l o g i t (α)$ . The intercept and parameters b1, b2, b4, and α all vary by participant. This model was fit as a logistic regression.

Appendix 3—table 19

Model comparison for sequential sampling models of the tendency to repeat previous choices.

Parameters varying by repeat / switch choice	Pre-registered sample DIC	Preliminary sample DIC
None	218,877.63	59,365.40
$κ$ - the dependence of RT on Δ-uncertainty	218,666.99	59,311.25
$κ$ - the dependence of RT on Δ-uncertainty	218,666.99	59,311.25
B - Bound height	217,928.89	59,096.38
Both $κ$ and B	217,669.43	59,046.67

The model reported in Appendix 3—table 14 captures the tendency to repeat previous choices by allowing both the dependence of RT on Δ-uncertainty and the bound height parameters to vary by whether the choice was a repeat or switch choice (last row in this table). Here, it is compared against the simpler models nested within it. For both samples, the full model is favored over the partial models, as is indicated by lower deviance information criterion (DIC) values. DIC values are derived from the likelihood of the data given estimated parameters, and the effective number of parameters in the model. Absolute values of DIC depend on sample size and the attributes of the noise distribution. Accordingly, DIC values should only be compared between models fit to the same dataset.

Appendix 3—table 20

Test performance as a function of exploration-phase uncertainty coefficient.

	Pre-registered sample		Preliminary sample
	(194 participants)		(62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	1.56	[1.51, 1.61]	1.61	[1.52, 1.70]	logit
Coefficient for uncertainty	2.90	[2.62, 3.19]	2.72	[2.29, 3.15]	logit² / nat

The model can be summarized with the following R syntax formula: $t e s t a c c u r a c y \sim 1 + u n c e r t a i n t y$ , where uncertainty is the mean posterior parameter for each participant in the model described in Appendix 3—table 3. The model described here was fit as a logistic regression with binomial likelihood.

Appendix 3—table 21

Test performance as a function of exploration-phase EIG coefficient.

	Pre-registered sample		Preliminary sample
	(194 participants)		(62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	1.57	[1.52, 1.62]	1.61	[1.52, 1.71]	logit
Coefficient for EIG	1.98	[1.77, 2.22]	1.72	[1.37, 2.11]	logit² / nat

The model can be summarized with the following R syntax formula: $t e s t a c c u r a c y \sim 1 + E I G$ , where EIG is the mean posterior parameter for each participant in the model described in Appendix 3—table 4. The model described here was fit as a logistic regression with binomial likelihood.

Appendix 3—table 22

Test performance as a function of exploration-phase exposure coefficient.

	Pre-registered sample		Preliminary sample
	(194 participants)		(62 participants)
Term	Median	95% PI	Median	95% PI	Units
Predictors
Intercept	1.52	[1.48, 1.57]	1.58	[1.50, 1.68]	logit
Coefficient for exposure	-0.18	[–0.43, 0.06]	-1.09	[-1.56,–0.61]	logit² / trial

The model can be summarized with the following R syntax formula: $t e s t a c c u r a c y \sim 1 + e x p o s u r e$ , where exposure is the mean posterior parameter for each participant in the model described in Appendix 3—table 5. The model described here was fit as a logistic regression with binomial likelihood.

Appendix 3—table 23

Model comparison for models predicting test accuracy from coefficients for each of the three strategies.

Coefficient predicting test accuracy	Pre-registered sample ELPD difference (SE)	Preliminary sample ELPD difference (SE)
Uncertainty	0	0
EIG	–37.53 (31.78)	–31.96 (19.08)
Exposure	–232.81 (45.17)	–86.46 (24.14)

Comparing the three models described in Appendix 3—tables 20–22. Values represent the difference in expected log predictive density (ELPD) from the best fitting model. For both samples, individual differences in the uncertainty coefficient bet predict test accuracy.

Additional files

MDAR checklist: https://cdn.elifesciences.org/articles/94231/elife-94231-mdarchecklist1-v1.pdf
Download elife-94231-mdarchecklist1-v1.pdf

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Mendeley

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Yaniv Abir
Michael Neil Shadlen
Daphna Shohamy

(2026)

Human exploration strategically balances approaching and avoiding uncertainty

eLife 13:RP94231.

https://doi.org/10.7554/eLife.94231.3

Share this article

Cite this article

Examining exploration strategy in relation to uncertainty in an incremental learning task.

Hypothetical strategies make differing predictions for exploratory choice behavior.

Correlations between the three strategies.

The Bayesian observer model is validated by participants’ accuracy and confidence on the test phase.

Matching results in the preliminary sample.

Uncertainty is the best predictor of choice.

Fitting simulated data successfully recovers the underlying strategy.

Uncertainty is a sufficient predictor of choice.

Matching results in the preliminary sample.

Participants approach vs. avoid Δ-uncertainty as a function of overall uncertainty.

Matching results in the preliminary sample.

No correlation between overall uncertainty and Δ-uncertainty.

Learners benefit from approaching uncertainty, but are not penalized for avoiding it.

Matching results in the preliminary sample.

Individual differences in the use of Δ-EIG and Δ-exposure: pre-registered sample.

Individual differences in the use of Δ-EIG and Δ-exposure: preliminary sample.

Individuals who spend time deliberating during exploration make strategic choices and learn well.

Matching results in the preliminary sample.

Participants tend to repeat previous choices instead of deliberating over uncertainty.

Matching results in the preliminary sample.

Forgetting is associated with random choice rather than a systematic bias.

Matching results in the preliminary sample.

Regularizing priors used in regression models.

Test accuracy as a function of the final uncertainty in the exploration phase.

Test confidence as a function of the final uncertainty in the exploration phase.

Exploration-phase choices as a function of Δ-uncertainty.

Exploration-phase choices as a function of Δ-EIG.

Exploration-phase choices as a function of Δ-exposure.

Exploration-phase choices as a function of Δ-uncertainty and overall uncertainty.

Test performance as a function of the tendency to approach uncertainty in exploration.

Test performance as a function of tendency to avoid uncertainty in exploration when overall uncertainty is high.

Test performance as a function of tendency to approach uncertainty and the tendency to avoid uncertainty when overall uncertainty is high.

Drift diffusion model of exploration-phase choice and RTs.

Test performance as a function of drift diffusion model parameters for exploration phase.

Exploration-phase choices as a function of Δ-uncertainty, overall uncertainty, and side of repeat option.

Comparing models predicting exploration-phase choice from the three hypothesized strategies and the tendency to repeat previous choices.

Drift diffusion model of exploration-phase choice and RTs, differentiating between repeat and switch choices.

Test performance as a function of the tendency to repeat exploration-phase choices.

Exploration-phase RTs as a function of memory lag and side of repeat option.

Exploration-phase choices as a function of Δ-uncertainty, memory lag, and side of repeat option.

Exploration-phase choices as a function of Δ-uncertainty, overall uncertainty, and trial number.

Model comparison for sequential sampling models of the tendency to repeat previous choices.

Test performance as a function of exploration-phase uncertainty coefficient.

Test performance as a function of exploration-phase EIG coefficient.

Test performance as a function of exploration-phase exposure coefficient.

Model comparison for models predicting test accuracy from coefficients for each of the three strategies.

MDAR checklist

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)