Food choice task.

(A) In the main task, participants were presented with pairs of snack items and asked to choose which one they would prefer to consume at the end of the session. After making their choice, the chosen item was highlighted by a square box for an additional 0.5 s. Each of the 30 participants completed 210 trials, with each item appearing 7 times during the experiment. A subset of 60 item pairs were repeated once.

(B) In an initial ‘ratings’ task, participants were shown 60 individual appetizing snack items and asked to indicate how much they would be willing to pay for each item using a monetary scale ranging from $0 to $3.

(C) Proportion of trials in which participants selected the right item as a function of the difference in value between the right and left items (Δve). Proportions were first determined for each participant and then averaged across participants. Error bars indicate the standard error of the mean (s.e.m.) across participants.

(D) Mean response time as a function of the difference in value between the right and left items. Error bars indicate the s.e.m. across participants. Red curves in panels C-D are fits of a drift-diffusion model (DDM).

Individual choices are better explained by values inferred from the other trials than values reported in the ratings task.

Gray data points represent the total log-likelihood of each participant’s choices, given two types of predictions: (abscissa) from a logistic regression, fit to the explicit values; (ordinate) from a procedure that infers the values based on choices on the other trials. Predictions derived from the other trials is better in all but four participants. The red markers were obtained using the same procedure, applied to choices simulated under the assumption that the e-values are the true values of the items. It shows that the inferential procedure is not guaranteed to improve predictions.

Preferences change over time.

Probability of making the same choice on the two trials with the same item pair, shown as a function of the difference in trial number between them (Δtr). Trial pairs with identical items (N=1726) were sorted by Δtr, and the match probabilities were smoothed with a boxcar function with a width of 100 observations.

Revaluation algorithm.

(A) Schematic example of the revaluation algorithm applied to one decision. After a choice between items A and B, the value of the chosen item is increased by δ and the value of the unchosen item is decreased by the same amount. (B) Example of value changes due to revaluation, for three items, as a function of the presentation number within the session. In the experiment, each item was presented 7 times. (C) For a representative participant, deviance of the logistic regression model that uses the revalued values to explain the choices, for different values of δ. The best fitting value is ∼$0.15. The inset shows a histogram of the best-fitting δ values across participants.

Revaluation explains choice and RT better than explicit values.

(A) Proportion of rightward choices (top) and mean response time (bottom) as function of the difference in r-value between the two items. The red solid lines are fits of a drift-diffusion model that uses the r-values. The dashed line corresponds to the fits of a DDM that uses the e-values (same as in Fig. 1C-D). Error bars indicate s.e.m. across trials. Participants are more sensitive to r-values than e-values (top) and the r-values better explain the full range of RTs (bottom). (B) Model-free, empirical support for the superiority of r-values in individual subjects. Data points represent the difference in mean RTs between difficult and easy decisions. Positive values indicate that difficult decisions take longer on average than easy ones. Difficult and easy are defined relative to the median of the absolute value of Δve (left) or Δvr (right). The lines connect the mean RTs of each participant. P-value is from a paired t-test.

No revaluation in simulated data.

(A) Histogram of the best-fitting revaluation update (δ) for data simulated by sampling choices from a logistic function fit to the participants’ choices. The best-fitting δ values for the simulated choices are centered around 0. For reference, we have also included a histogram of the δ values obtained from the fits to the participants’ data, showing all positive values (gray).

(B) Similar to Fig. 5B, for the simulated data. The values obtained from Reval were no better than the explicit values at explaining the RTs, as expected, since the δ values were ∼0 and thus vrve.

Reval is sensitive to trial order.

Deviance obtained by applying Reval to the trials in the order in which they were completed (abscissa) and in the reverse order (ordinate). Each point corresponds to a different participant. The deviance is greater (i.e., the fits are worse) when Reval is applied in the reverse direction.

Stronger revaluation for the chosen than for the unchosen item.

We fit a variant of the Reval algorithm that includes separate update values (δs) for the chosen and unchosen options. The best-fitting δ value for the chosen option (abscissa) is plotted against the best-fitting value for the unchosen option (ordinate). Each data point corresponds to one participant. The increase in value for the chosen option is greater than the decrease in value for the unchosen option (paired t-test).

Revaluation reflected in BOLD activity in ventromedial prefrontal cortex.

Brain-wide fMRI analysis revealed a significant correlation between r-values and activity in the vmPFC, after controlling for e-values. Coordinates are reported in standard MNI space. Heatmap color bars range from z-stat = 2.3 to 3.2. Map was cluster corrected for familywise error rate at whole brain level (p < 0.05).

Similar results observed in other datasets.

We applied the Reval method to other publicly available datasets of the food choice task. We use the same ΔV comparison as in Figs. 5 and 7. (A) Data from the experiment of Folke et al. (2016). Participants (N=28) reported their willingness to pay (WTP) for each of 16 common snack items. In the choice task, they were presented with each unique pair of items and asked to choose the preferred item. Each unique pair was presented twice for a total of 240 trials per participant. We use the same ΔV comparison as in Figs. 5 and 7 to assess whether r-values better explain RTs of the study participants. (B and C) Data from the experiment of Sepulveda et al. (2020). Participants (N=31) reported their willingness to pay (WTP) for each of 60 snack items. They were then presented with pairs of items to choose from. Pairs were selected based on participants’ WTP reports to provide comparisons between pairs of high-value, low-value and mixed-value items. The choice task was performed under two framing conditions: like-framing, select the more preferred item, and dislike framing, select the less preferred item. The task consisted of six alternating blocks of like- and dislike-framing (40 trials per block).

Revaluation occurs in a DDM with temporally-correlated noise.

A drift-diffusion model with non-independent noise (ceDDM) captures the main features of revaluation. (A) The ceDDM accounts for choices (top) and response times (bottom), plotted as a function of the difference in values obtained from explicit reports (Δve). Same data as in Fig. 1C-D. Red curves are simulations of the best-fitting model. Each trial was simulated 100 times. Simulations were first averaged within trials and then averaged across trials. Error bars and bands indicate s.e.m. across trials. (B) Noise correlations as a function of time lag, obtained from the best-fitting model. Each curve corresponds to a different participant. Each curve corresponds to a different participant. (C) δ parameters derived by applying Reval to simulated data from the best fitting ceDDM model to each participant’s data. As in the data, δ > 0 for all participants. (D) Similar analysis as in Fig. 5B applied to simulations of the ceDDM. As for the data, Reval increased the range of RTs obtained after grouping trials by difficulty (by e-values on the left and r-values on the right; p-value from paired t-test). (E) Similar analysis to that of Fig. 7, using the simulated data. As observed in the data, the deviance resulting from applying Reval in the correct trial order (abscissa) is smaller than when applied in the opposite order (p-value from paired t-test).

RT variance explained by r-values and e-values.

Percentage of variance in response times explained by a DDM in which the drift rate is proportional to either Δvr (abscissa) or Δve (ordinate). Each data point corresponds to a different subject. For most participants, the model based on the revalued values explained a greater proportion of the variance.

Similarδ values obtained by Reval and logistic regression.

Comparison of the δ values obtained by the Reval algorithm, and by an alternative approach that uses a single logistic regression model, applied to each participant’s data, that takes into account the number of times the items in the current trial were presented and either chosen or not chosen in previous trials (Eq. 14). Each data point corresponds to one participant. The method lead to values of δ which are almost identical to Reval.

Activation table for map in Fig. 9.