(A). Overlapping activations in the parietal cortex elicited by the current bandit running average (yellow; peak: 58, −60, 26; t(19) = 4.78, p < 0.0002) and reference (red; peak: 34, −68, 22; t(19) = 6.17, p < 0.00001). (B) In the right caudate nucleus, we also observed a representation of the difference between these two quantities, that is, voxels that co-varied with the DVcur − ave but did not vary according to the rule type or decision (main effect of the value signal; peak: 18, 20, 2; t(19) = 6.63, p < 0.00001). (C) Voxels responding to the interaction of rule and value on defer trials (left, at p < 0.0001) and commit trials (right, at p < 0.001). Value is encoded (in the frame of reference of the rule) only on defer trials. (D) Mean parameter estimates, derived by regressing bandit value on the BOLD signal from within an independently-defined ROI in the rmPFC, separately for defer and commit decision under each rule. To ensure independence, ROIs were defined individually for each participant as the peak voxel responding within the region in the remaining 19 participants. All significant voxels are visualized at p < 0.001 and survive correction for multiple comparisons across the brain. (E) Parameter estimates from a regressor encoding the value of the asset pool (estimated final payoff).