Inferred values train cached values in a dual process architecture.

(a) Diagram of neural circuits for value-learning showing: (i) The standard model of TD learning in cortico-striatal circuits. (ii) The proposed dual-process model. Note the connection from a model-based evaluation system in frontal cortex to dopamine neurons that bypasses striatum. (b) Difference between cached values and the true value function (i.e., value error) during learning in the dual-process model, a standard TD agent, and an alternative dual-process model where VNET contributes to both the RPE update target and prediction. (c) RPEs during approach to a reward at different stages of learning (early, mid, late) in: (i) a standard TD agent, and; (ii) the dual-process model. (d) Value functions in different components of the dual-process model during approach to a reward early in learning. The RPE (δ) is approximately the difference between VTD and VNET.

Dopamine ramp dynamics at different timescales.

(a) Diagram of behavioural task in Guru et al. (2020). (b) Experimental data from Guru et al. (2020) showing dopamine ramps at different training stages during runs to big (red) and small (black) reward locations. (c) Evolution of value estimates during training in the dual-process model; (i) value estimates during early training; (ii) value estimates during late training (d) Evolution of dual process model RPEs during training on a task that replicates Guru et al. (2020). (e) Experimental data from Guru et al. (2020) showing rapid development of dopamine ramps after initial encounters with rewarding goals in a novel environment. (f) Evolution of RPEs in the dual-process model during initial encounters with rewarding goals.

Reward globally updates dopamine ramps.

(a) Experimental design from Krausz et al. (2023) in which animals can reach rewarding goal locations via multiple routes. (b) Experimental data from Krausz et al. (2023) showing that outcomes at goal locations globally update dopamine ramps regardless of subsequent route. (c) Diagram showing global updating of inferred values VMB by rewards in the dual process model. (d) Effect of reward on trial t on RPEs on trial t+1 in the dual process model when goal is reached via different routes on t and t+1.

Dopamine responses to unexpected state transitions.

Experimental conditions (left), dopamine recordings (middle), and RPEs in simulations of the dual-process model (right) for key experimental conditions from Kim et al. (2020) in which unexpected state transitions occured during reward approach in a VR environment. (a) Teleport end-state manipulation, where teleports of constant distance were aligned with different end-states. (b) Teleport distance manipulation, where teleports varied in distance but ended at a common state. (c) Traversal speed manipulation.

Dopamine ramp dynamics under state-uncertainty manipulations.

(a) Experimental paradigm from Mikhael et al. (2022) where animals approached a rewarded location in a VR environment that differed in movement speed (fast-vs-slow) and luminance (bright-vs-darkening) across trials. (b) Dopamine signals from Mikhael et al. (2022), where progressively increasing state uncertainty causes dopamine bumps rather than dopamine ramps. (c) Effect of progressively increasing state uncertainty on RPEs in the dual process model.