Pigeon task.

On each trial, a pigeon takes a random walk towards one of two seed piles. The participant presses a key at any time to terminate the walk and indicate which pile they think the pigeon will eventually reach. Deciding quickly saves steps (which are limited to 600 per block) at the expense of accuracy. Waiting improves accuracy (which governs coins earned and lost) at the expense of more steps. Continuous feedback is given about total coins earned and steps taken. Vertical bars indicate the midpoint of the path (dotted) and the midpoint of the pigeon (solid), to allow the participant to identify their locations precisely.

Taskparameters

Performance summary.

A. Pigeon positions (lines) and the participant’s choices (circles at the final positions; gray for left, black for right) from individual trials from an example session. B. Estimates of non-decision times (NDTs). We assumed that, except for occasional lapses, the participants tried to make choices that were congruent with the position of the pigeon at the time of the choice; i.e., they chose right/left when the pigeon was to the right/left side of center. We therefore estimated the NDT for each participant (gray lines) for a given block as the delay corresponding to the maximum value of this congruence, which tended to peak at 1–2 steps (the median congruences across participants is shown as the thick black line). DT was computed on each trial by subtracting this estimate of NDT from the measured response time (RT, in steps of the decision variable). C. Overall percent correct as a function of the median DT for each participant (individual symbols). Arrows indicate median values. Data in all panels are from cohort 1, block 2 (see Table 1).

Bound summary.

A. Data from one example session showing the decision bound (absolute distance of the pigeon from the central starting point assessed at the DT, normalized by the distance from the starting point to the edge of the screen) as a function of DT. Open symbols are data from individual trials. Filled symbols are medians per DT. B. Summary of data from 60 participants. Gray lines are median bounds (z-scored across all trials for each participant, to facilitate comparing time courses across participants) per DT as a function of DT for each participant. Black line is the median of the gray lines, computed per DT. Note that bound estimates are the least reliable at the shortest DTs (see Methods). C. Histogram of slopes of linear regressions of absolute bound per DT, computed separately for each participant (excluding DT<=2 to avoid unreliable bound estimates at the shortest DTs). Values ∼0 imply no systematic (linear) dependence on DT. D. Histogram of slopes of linear regressions of the change in bound magnitude versus the previous bound, computed separately for each participant. Values ∼1 imply that trial-to-trial bound adjustments were consistent with a regression to the mean (i.e., noise). Arrows in C and D indicate median values. E. Mean±sem bound height (symbols and vertical error bars) as a function of mean±sem DT (circles and horizontal error bars) computed per participant. Data in all panels are from cohort 1, block 2 (see Table 1).

Block-wise bound adjustments to changes in rewards and costs.

60 participants each performed in sequence three blocks of trials (each block ended after 600 total pigeon steps) that differed in terms of the reward and cost structure: A) block 1, +1/0 coins for correct/error responses, no step cost; B) block 2, +1/-4 coins, no step cost; and C) block 3, +1/0 coins, –30 steps for an error. Each panel shows reward rate (coins/step; note the broken axis to include the two outlier participants in block 2) as a function of absolute bound. The red lines and ribbons are the medians and 95% CIs from simulations using the given fixed bound. Points and error bars are mean±SEM bounds and experienced reward rate from individual participants. The observed spread in reward rates around the model prediction is largely attributable to trial-by-trial variability in participant bound usage. D, E. Comparison of bounds between block 1 and 2 (D) and between block 2 and 3 (E) for individual participants (points). Arrows in all panels point to reward-rate maxima. P-values are for Wilcoxon signed-rank test for H0: median difference between x and y values=0.

SNR-dependent bounds.

A,C. Bounds from high-SNR (ordinate) versus low-SNR (abscissa) trials, when SNR changed by block (A) or trial (C). Heatmaps show the predicted reward rate (RR, colorbars to the right; maxima indicated by red squares) for different combinations of bound values for high and low SNR blocks (A) and trials (C). Circles denote median SNR-specific bounds from individual participants, which were systematically higher for low-versus high-SNR block-wise changes (A) but not reliably separated for trial-wise changes (C). P-values are for Wilcoxon signed-rank test for H0: median difference between x and y values=0. B, D. Difference in median bound between high-SNR and low-SNR trials as a function of decision time (DT). Small circles indicate data from individual participants, large circles are medians across participants, per DT. Filled large circles indicate p<0.05 for H0: median=0.

Changepoint-dependent bounds.

A,B. Reward rate functions of post-changepoint bound height depend on the time of the changepoint and the bound used before the changepoint. Example functions from one participant performing the low→high task (A) and one performing the high→low task (B). Red points indicate the peaks of each function. The other points indicate the average bound used by each participant before (blue) and after (green) the changepoint. To increase reward rate, the participant in A increased their bound after the changepoint, whereas the participant in B decreased their bound after the changepoint. C,D. Change in bound from pre-to post-changepoint decisions (ordinate) versus reward-rate gradient (change in reward per change in bound to achieve the maximum reward rate; positive/negative values indicate that reward rate increases by increasing/decreasing the bound after the changepoint) computed for each participant (circles) for low→high (C) and high→low (D) SNR conditions. Changes that increase reward rate are in the gray quadrants. Spearman’s ρ and associated p-values are shown in each panel. E. Median bound for post-changepoint decisions for high→low (ordinate) versus low→high (abscissa) SNR conditions for individual participants (circles). F. Median bound for late decisions (i.e., after the median RT) for high (ordinate) versus low (abscissa) SNR conditions for individual participants (circles). Heatmap indicates expected reward rate (RR, colorbars to the right; maximum is indicated with a red square). P-values in E and F are for Wilcoxon signed-rank test for H0: median difference between x and y values=0.

Correcting biases in bound estimates.

Bound estimates from pigeon positions were biased in simulations, such that at relatively short DTs low bounds tended to be overestimated, and high bounds tended to be underestimated. We accounted for these biases using linear regressions from simulated data to scale bound estimates. The figure shows data from simulations using a range of bounds but all resulting in choices (bound crossings) at DT=2. The linear fit (blue line) to the uncorrected data (gray line) was used to adjust the corrected data (black line).