A) Two abstract shapes were probabilistically related to each of two outcome identities by independent transition probabilities p1 and p2. B) Schematic of the direct transition condition. Participants chose one of the two shapes on each trial based on two pieces of information: their estimates of the probability that each would lead to either outcome identity (gift cards) and the randomly generated number of points they could potentially win if that outcome was obtained. The color of each number indicated the identity of the outcome on which that number of points could be won. In the example, green indicates the number of points for the Starbucks gift card, while pink indicates the number of points for iTunes. Next, participants observed the outcome of their choice (the gift card and amount) after a delay. C) Schematic of the indirect transition condition. Same as (B) except that after participants made their choice they transitioned into another independent decision. After this second decision was made, participants observed the outcome of their first decision. D) Results of logistic regression analysis predicting the current choice based on previously observed choice-outcome relationships. Each cell represents the combination of a previously observed choice with an observed outcome. The color of each cell shows the value of beta estimates for each combination of previous choice and observed outcome, averaged across participants. Positive values indicate that the choice-outcome pair predicted choosing the same shape again when that shape previously led to the currently desired outcome. E) Theoretical decomposition of the matrix in (D) into groups of cells which reflect “appropriate credit assignment” given the task structure (orange) and “credit spreading” (pink). F) Mean (±SEM) of beta coefficients for specific choice-outcome combinations averaged across the groupings of cells shown in E for each condition.

Left side shows the analysis scheme for decoding representations of the causal choice at feedback in the direct transition condition. An SVM decoder was used to differentiate trials at the time of the outcome (purple) based on the causal choice selected during the “choice period” (cyan). The right side shows axial and coronal slices through a t-statistic map showing significant decoding in OFC and HC during feedback. For illustration, all maps are displayed at threshold of t(19) = 2.54, p<0.01 uncorrected. All effects survive small volume correction in a priori defined anatomical ROIs.

A) Left side shows the analysis scheme for decoding information about the causal choice in “pending state” (pink) in the indirect transition condition. We decoded information about the previous choice during the feedback period, during which the causal stimulus should be “pending” credit assignment in the next trial. The image on the right shows a coronal slice through a t-statistic map, showing significant decoding in FPl. B) The analysis scheme for the information connectivity analysis which uses the trial-by-trial fidelity of causal choice representations in the “pending state” (pink) to predict the fidelity of these same choices when the outcome is observed (purple). The right side shows axial and coronal slices of a t-statistic map showing effects in lOFC and HC. All maps are displayed using the same conventions as Fig. 2 and all effects survive small volume correction in a priori defined anatomical ROIs.

A) Schematic of the decoding procedure. In task-independent “template trials”, participants passively viewed images corresponding to the two choice stimuli and two outcome stimuli in the main task. We used these trials to train a SVM to differentiate stimuli outside the task context and then tested for representations of the causal choice stimulus at the time of feedback during the learning task. B) A coronal slice through a t-statistic map showing regions of the HC with significantly above chance decoding for the causal choice stimulus identity at the time of feedback, across conditions. In this figure, “CA” refers to “credit-assignment”. C) Analysis scheme for generating each participant’s overall credit assignment precision. β-values for each participant were taken from the behavioral model predicting current choices given all combinations of the previous three choices and outcomes (Eq.1). Each participant’s pattern of β-values (left side matrices) were correlated with a matrix representing an optimal pattern of regression betas given the task structure (right side matrices). The optimal matrix was a binary matrix with ones where credit should be assigned for a given outcomes and zeros everywhere else. D) Axial slice through a t-statistic map showing regions where decoding of the stimulus identity was significantly correlated with estimates of credit assignment precision. All maps are displayed using the same conventions as Fig. 2 and all effects survive small volume correction in a priori defined anatomical ROIs.

Follow up behavioral analyses

A. Example trajectory across the experiment of the belief estimates generated from the Bayesian learner. Top is the trajectory of S1, and the bottom is the trajectory of S2. While lines represent the true probability trajectory is shown in white and the estimated belief is shown in pink. Color heatmap shows the probability mass for each possible belief in Sx ->O1. B. Comparison of model fits between our Bayesian model and a value-based RL model (vRL) which used an interactive updating procedure to track the value of each shape based on the history of received rewards. The exceedance probability for the Bayesian model was 1, and 0 for the vRL model, suggesting that Bayesian model, which tracked transition probabilities between choices and outcomes, better fit participants actual choices compared to a value tracking model. C. Logistic regression curves estimating the change in choice probabilities given the expected value difference between choices. Gray line shows participant specific lines, and the black line shows the effect across groups (associated t-statistics are calculated across participants). The left side shows the effect in the direct transition condition and the right side shows the indirect transition condition.

Pre-selected anatomical ROIs

Illustrations of pre-selected anatomical ROIs taken from Neubert et al, 2015. The lOFC ROI corresponds to index 9 and 30, FPl corresponds to indexes 14 and 35. The HC ROI was defined in Yushkevich et al., 2015.

Functionally defined ROIs for in the direct transitions condition.

A) Despite having a priori defined anatomical ROIs for our decoding analysis of the causal choice, we wanted to test whether our results depended on these ROI definitions by using a data-driven approach. Here, we trained an SVM classifier to decode representations of the causal choice in run 1 of the direct transition condition, then tested the decoder on run 2 to find regions of the orbitofrontal cortex (OFC) and hippocampus (HC) that significantly decoded causal choice representations at a significance level of t(19) > 2.54, p < .01, uncorrected. We then used these regions as ROIs for a separate analysis which trained the classifier in run 1 and tested the classifier in run 2. B) Shows ROIs generated from the same procedure as described in A, but the use of each run for training and testing are switched.

Main effect of choice decoding accuracy at the time of feedback TFCE corrected in each run of the direct transition condition

A. Regions of the OFC showing significant decoding of the causal choice in run 1 of the direct transition condition. Significance was tested using TFCE correction over voxels with the ROI generated from run 2, using the procedure described above (Fig.S1). For illustration, we show voxels that survive at threshold to t(19)=1.73, p<.05 uncorrected. B. Shows the same as A but for voxels in run 2, using the ROI generated from run 1.

Significnant informaton connectivity between FPl and OFC in functionally defined ROI from direct transition condition

A. We did not observe signficiant decoding of the causal choice a in bilateral OFC ROI defined by significant cluseter in in the idirected transition condition. Thus, we used the accuracy map for decoding choices at feedback during the direct transition condition (t (19) > 1.73; p < .05) in the OFC, averaged across runs. B) We then used those cluster as ROI for TFCE correction for regions of the lOFC that showed significant information connectivity with FPl. We did this by testing for significant correlations between the trial-by-trial fidelity of pending representations in the FPl and causal choice representation during feedback in lOFC (see Methods).

Depiction of catch trials

A. To ensure that participants where we included valuable catch trials in the passive observing “template task”. Participants were asked to report which image out of the four (2 gift cards and 2 stimuli) was the last one presented on the screen. They were endowed an extra £10 from which we removed £1 for every incorrect response. There were four catch trials per template run. B. The decision task included “bonus trials” in which participants could predict which gift card they expected to see on the subsequent feedback screen given their choice. They were given 3£ extra on the final gift card that was given to them for every correct answer. The first run of the direct transition condition had two catch trials; the second run had one. Both runs of the indirect transition condition had one catch trial each.

Control Analysis for Pending-to-Credit Assignment Information Connectivity in the Indirect Transition Condition

A. Axial (left) and coronal (right) slices through a t-statistic map showing the results of a control analysis in which test the proportion of correct classifications of causal stimulus information in OFC and HPC at the time of the outcome for trials in which the FPl showed correct classification for the causal stimulus during pending trials. The proportion of correct trials was compared to a permuted baseline of randomly drawn trials for each participant then combined over participants to create a t-statistic. B. Secondary control analysis in which we reran the classification analysis for causal choice stimulus information at the time of outcome, but only on trials where FPl was found to correctly decode pending causal choice information. Note that this test is different from A because we allowed the classifier to create a new hyperplane separating categories for only those trials in which the FPl decoding was “correct”. For illustration, all maps are displayed at threshold of t(19)=2.54, p<.01 uncorrected. All effects survive small volume correction in a priori defined anatomical ROIs.