Rat Two-Step Task Description and Behavior
A Schematic of single trial during the rat two step task, where yellow suns indicate active port. The rat initiates a trial by poking into the top-center port (i), after which the rat is prompted to chose between the adjacent side ports (ii). After making a choice, the rat initiates the second step (iii). During the second step, the probabilistic transition determines the outcome probability (iv), and the rat is instructed to enter the active outcome port (v). One outcome port has some probability of delivering liquid reward, Pr, and the other outcome port delivers reward with the inverse probability, 1 – Pr (vi). Throughout the session, the reward probability Pr will reverse unpredictably (vii). B Trial-history regressions fit to data simulated by each individual RL agent, each with learning rate 0.5 (5000 trials split between 20 sessions for each simulation). Each agent demonstrates differential effects of trial type on choices, which decays across past trials. (i) The model-free reward (MFr) agent tends to repeat choices (positive weight) after rewarded (blue) trials and switch away from choices (negative weight) following omission trials (red), ignoring the observed transition. (ii) Positive model-free choice (MFc) captures choice perseveration, tending to repeat choices regardless of observed reward or transition. (iii) The model-based reward (MBr) agent tends to repeat choices following common-rewarded (blue-solid) and rare-omission (red-dashed) trials, and switches choices following rare-rewarded (blue-dashed) and common-omission (red-solid) trials, showing an effect by both transition and reward. (iv) Model-based choice (MBc) agent with positive weight tends to repeat choices following common transitions (solid) and switch away from choices following rare transition trials (dashed), capturing transition-dependent outcome port perseveration. C Trial-history regressions fit to animal behavioral data (n=20 rats), where dark lines are mean regression values across all rats and shaded error bars are 95% confidence intervals around the mean. Rats look most similar to the MBr agent, but show additional modulation that can be accounted for by the influence of other agents.