the rat Gambling Task.

(A) Schematic of the rGT. A nose poke response in the food tray extinguished the traylight and initiated a new trial. After an inter-trial interval (ITI) of 5 s, four stimulus lights were turned on in holes 1, 2, 4, and 5, each of which was associated with a different number of sugar pellets. The order of the options from left to right was counter-balanced within each cohort to avoid development of a simple side bias (version A (shown): P1, P4, P3, P2; version B: P4, P1, P3, P2). The animal was required to respond at a hole within 10 s. This response was then rewarded or punished depending on the reinforcement schedule for that option. If the animal lost, the stimulus light in the chosen hole flashed at a frequency of 0.5 Hz for the duration of the time-out penalty, and all other lights were extinguished. The maximum number of pellets available per 30 min session shows that P1 and P2 are more optimal than P3 and P4. The percent choice of the different options is one of the primary dependent variables. A score variable is also calculated, as for the IGT, to determine the overall level of risky choice as follows: [(P1 + P2) – (P3 + P4)]. Figure is modified from Winstanley & Floresco (2016). (B) Distinct variants of the rGT. On the uncued variant, no audiovisual cues were present. The standard task featured audiovisual cues that scaled in complexity and magnitude with reward size. The reverse-cued variant inverted this relationship, such that the simplest cue was paired with the largest reward, and vice versa. Audiovisual cues were paired with both wins and losses for the outcome-cued variant. For the random-cued variant, cues were played on 50% of trials, regardless of outcome. Lastly, for the loss-cued variant, cues were only paired with losing outcomes, at the onset of the time-out penalty.

Differences in baseline performance between task variants.

Comparative baseline performance on variants of the rGT. (A) Percent choice of each option in the six rGT task variants. (B) Average decision score shows risk preference is significantly modulated by the presence and contingency of outcome-paired cues, with preference for the high-risk options (P3 and P4) strongly enhanced in task variants in which the audiovisual cues scale with outcome magnitude and occur on winning trials. (C) Premature responding across the rGT variants, and (D) for risky versus optimal decision-makers. (E) Latency for reward collection on winning trials across the variants of the rGT. Data are expressed as mean + SEM. Black asterisk indicates significant difference from uncued task; red caret indicates significant difference from standard cued task. *** p<.001

Decision score comparisons

Comparisons of decision score between task variants using Tukey’s HSD test. Bolded values indicate a significant difference.

Devaluation in risk-preferring rats: P1-P4 choice

Choice x devaluation interactions for each task in risk-preferring rats. Bolded values indicate a significant difference.

Effects of sucrose pellet devaluation on choice preference.

(A) P1-P4 choice preference after reinforcer devaluation compared to baseline preference for risk-preferring rats. Devaluation did not shift choice patterns selectively in task variants featuring consistent win-paired cues (standard, outcome-cued, reverse-cued). (B) P1-P4 choice preference after reinforcer devaluation compared to baseline preference in optimal rats. Reinforcer devaluation induced a slight shift in choice preference, with no differences found between tasks. Data are expressed as the mean change in % choice from baseline + SEM to highlight effects independent of differences in preference for each option between cohorts. Asterisk indicates significant choice x devaluation effect, p <.05.

Devaluation in risk-preferring rats: Decision score

Decision score x devaluation interactions for each task in risk-preferring rats. Bolded values indicate a significant difference.

Difference in WAIC between each model and the basic model across the rGT task variants.

Differences were calculated by subtracting each model’s WAIC from the basic model WAIC for that task variant. Larger differences indicate a better explanation of the data. Error bars are SEM of the WAIC difference.

Average simulated decision score and P1-P4 choice (sessions 36-40) for the nonlinear and scaled + offset models

Simulated with the subject-level parameter estimates for each task variant. Black asterisk indicates significant difference from uncued task; red caret indicates significant difference from standard cued task.

Group-level posterior estimates of nonlinear cost model parameters.

Asterisks within the inset tables mark parameters for which the 95% HDI of the sample difference did not contain zero, indicating a credible difference. For each distribution, the line demarcates the means, the box demarcates the interquartile interval, and the whiskers demarcate the 95% HDI.

Group-level posterior estimates of scaled + offset model parameters.

Asterisks within the inset tables mark parameters for which the 95% HDI of the sample difference did not contain zero, indicating a credible difference. For each distribution, the line demarcates the means, the box demarcates the interquartile interval, and the whiskers demarcate the 95% HDI.

Relationships between subject-level parameter estimates and final decision score across rats.

Simple linear regression models were fit for each parameter to assess whether parameter values covaried with decision score at the end of training. Dotted lines indicate 95% confidence intervals around the regression lines.

P1-P4 choice comparisons

Comparisons of P1-P4 between task variants using Tukey’s honest significant differences (HSD) test. Bolded values indicate a significant difference.

Premature responding comparisons

Comparisons of premature responding between task variants using Tukey’s HSD test. Bolded values indicate a significant difference.

Collect latency comparisons

Comparisons of collect latency between task variants using Tukey’s HSD test. Bolded values indicate a significant difference.

Nonlinear model simulated decision score comparisons

Comparisons of decision scores simulated from nonlinear model subject-level parameter estimates using Tukey’s HSD test. Bolded values indicate a significant difference, italicized values indicate a trending difference.

Scaled + offset model simulated decision score comparisons

Comparisons of decision scores simulated from scaled + offset model subject-level parameter estimates using Tukey’s HSD test. Bolded values indicate a significant difference.

Nonlinear model simulated P1-P4 choice comparisons

Comparisons of P1-P4 simulated from nonlinear model subject-level parameter estimates using Tukey’s HSD test. Bolded values indicate a significant difference.

Scaled + Offset model simulated P1-P4 choice comparisons

Comparisons of P1-P4 simulated from scaled + offset model subject-level parameter estimates using Tukey’s HSD test. Bolded values indicate a significant difference, italicized values indicate a trending difference.

Comparative baseline performance for other metrics on variants of the rGT.

(A) Latency to choose an option across the rGT variants. (B) Number of omitted trials per session across task variants. (C) Number of completed trials across task variants. No significant differences were found between task variants for any of these metrics. Data are expressed as mean + SEM.

Visualization of how different values of the m, r, and b parameters impacts the transformed timeout penalty durations into equivalent cost in pellets.

Values were chosen to represent the spread of estimated values from model fits. (A) m parameter visualization; scaled and scaled + offset models assume linear relationship between timeout duration and equivalent cost, with m parameter determining the slope of this relationship. Lower values produce a flatter relationship, such that long durations have relatively less of an impact on latent Q-values. (B) b parameter visualization (scaled + offset model); the global offset parameter allows a uniform cost to impact latent Q-values for all loss trials. (C) r parameter visualization; the nonlinear model preserves the assumption that equivalent costs monotonically increase with length of duration, but allows for nonlinearity in the shape of the relationship, such that longer durations yield a smaller increase in cost. (D) As in (B), b parameter allows a uniform cost to impact Q-values on all loss trials.

Group-level posterior estimates of scaled model parameters.

Asterisks within the inset tables mark parameters for which the 95% HDI of the sample difference did not contain zero, indicating a credible difference. For each distribution, the box demarcates the interquartile interval and the whiskers demarcate the 95% HDI.

Group-level posterior estimates of scaled reward model parameters.

Asterisks within the inset tables mark parameters for which the 95% HDI of the sample difference did not contain zero, indicating a credible difference. For each distribution, the box demarcates the interquartile interval and the whiskers demarcate the 95% HDI.