Mesolimbic dopamine projections mediate cue-motivated reward seeking but not reward retrieval in rats

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Efficient foraging requires an ability to coordinate discrete reward-seeking and reward-retrieval behaviors. We used pathway-specific chemogenetic inhibition to investigate how rats’ mesolimbic and mesocortical dopamine circuits contribute to the expression and modulation of reward seeking and retrieval. Inhibiting ventral tegmental area dopamine neurons disrupted the tendency for reward-paired cues to motivate reward seeking, but spared their ability to increase attempts to retrieve reward. Similar effects were produced by inhibiting dopamine inputs to nucleus accumbens, but not medial prefrontal cortex. Inhibiting dopamine neurons spared the suppressive effect of reward devaluation on reward seeking, an assay of goal-directed behavior. Attempts to retrieve reward persisted after devaluation, indicating they were habitually performed as part of a fixed action sequence. Our findings show that complete bouts of reward seeking and retrieval are behaviorally and neurally dissociable from bouts of reward seeking without retrieval. This dichotomy may prove useful for uncovering mechanisms of maladaptive behavior.

https://doi.org/10.7554/eLife.43551.001

Introduction

Foraging and other reward-motivated behaviors tend to unfold as a sequence of actions, beginning with a reward-seeking phase and ending with an attempt to retrieve and consume any rewards produced by this activity. Coordinating the discrete reward-seeking and reward-retrieval behaviors that make up these action sequences is important for efficient foraging. When rewards are sparse or otherwise difficult to obtain, attempts to retrieve them are often unnecessary and should therefore be withheld to conserve energy and minimize opportunity costs (Stephens and Krebs, 1986; Niv et al., 2007). Consistent with this, studies on self-paced instrumental behavior show that the ability to efficiently pattern reward-seeking and -retrieval responses based on task demands (e.g., reinforcement schedule) can strongly impact the rate at which rewards are obtained (Ostlund et al., 2012; Wassum et al., 2012; Matamales et al., 2017). However, such behaviors must remain sensitive to changes in internal and external states. For instance, environmental cues that signal reward availability increase attempts to seek out (Estes, 1948; Corbit and Balleine, 2016) and retrieve reward (Marshall and Ostlund, 2018). While the ability to develop and modify action sequences is normally adaptive, this process may become dysregulated in certain conditions, such as obsessive-compulsive disorder (Joel and Avisar, 2001; Korff and Harvey, 2006; Frederick and Cocuzzo, 2017) and drug addiction (Tiffany, 1990; Graybiel, 2008; Volkow et al., 2013), leading to maladaptive behaviors. Despite this, the behavioral and neural mechanisms responsible for regulating reward seeking and retrieval are not well understood.

Previous studies strongly implicate dopamine in learning new action sequences (Graybiel, 1998; Jin and Costa, 2015). While other findings suggest that dopamine is not as important for the expression of well-established action sequences (Levesque et al., 2007; Wassum et al., 2012), it remains possible that dopamine contributes to action sequence performance when changes in task conditions prompt a reorganization of reward seeking and retrieval. For instance, previous studies indicate that the tendency for reward-paired cues to motivate reward-seeking behavior critically depends on dopamine signaling (Dickinson et al., 2000; Ostlund and Maidment, 2012; Wassum et al., 2011), particularly in the nucleus accumbens (NAc) (Wyvell and Berridge, 2000; Lex and Hauber, 2008; Wassum et al., 2013; Ostlund et al., 2014; Aitken et al., 2016). Interestingly, we recently found that such cues do not simply provoke reward-seeking behavior (e.g., lever pressing), they also increase the likelihood that such behavior will be followed by an attempt to retrieve reward (e.g., food-cup approach)(Marshall and Ostlund, 2018). Although this finding suggests that reward-paired cues preferentially motivate complete bouts of reward seeking and retrieval, it has yet to be established if this modulation of action sequence performance depends on dopamine.

Dopamine may also contribute to regulating attempts to seek out and retrieve a reward when the value of that reward changes. Self-paced, instrumental reward-seeking actions are normally performed in a goal-directed manner, such that they are sensitive to changes in reward value (Balleine and Dickinson, 1998). However, they can develop into inflexible stimulus-response habits with extended training (Dickinson, 1985; Dickinson et al., 1995). In contrast, it is not well understood how changes in reward value modulate attempts to retrieve rewards produced through instrumental reward-seeking behavior. For example, it has been suggested that rats’ tendency to approach the food cup after lever pressing may represent a discrete goal-directed action – one that is selected independently of the initial decision to press the lever (Rescorla, 1964). Alternatively, rats may concatenate the press-approach sequence to form an action chunk, which can then be selected and deployed as a single unit of behavior (Lashley, 1951; Graybiel, 1998; Jin and Costa, 2015). Action chunks are thought to represent a special form of habit, or behavioral chain, in which each element of the chain automatically elicits the next response. This allows for efficient action sequencing but comes with a decrease in behavioral flexibility. Once an action chunk has been initiated, it should be automatically completed without further consideration of reward value (Dezfouli et al., 2014; Smith and Graybiel, 2016).

In the current study, we applied a chemogenetic approach to investigate the role of the mesocorticolimbic dopamine system in action sequence performance in rats. We used a combination of well-established behavioral assays and novel microstructural analyses to selectively probe the influence of reward-paired cues and expected reward value on the regulation of reward-seeking and -retrieval responses. We found that inhibiting dopamine neurons in the ventral tegmental area (VTA) or their inputs to the NAc, but not the medial prefrontal cortex (mPFC), reversibly disrupted cue-motivated reward seeking, but spared the tendency for reward-paired cues to trigger complete bouts of seeking and retrieval. These dopamine manipulations had no impact on rats’ tendency to adjust their reward-seeking behavior in response to reward devaluation. Importantly, attempts to retrieve reward were not suppressed by reward devaluation, suggesting that this behavior was the product of action chunking.

Results

Effects of response-contingent feedback about reward delivery on reward retrieval

We first characterized the relationship between reward-seeking and -retrieval responses when rewards are sparse (Figure 1A). Rats were trained to lever press on a RI-60s schedule, such that this action was often nonreinforced and only occasionally earned food pellet delivery into a recessed food cup. Not surprisingly, we found that the probability of food-cup approach was elevated for several seconds after performance of the lever-press action (Figure 1B and C). This timeframe for press-contingent food-cup approach behavior is consistent with previous reports (Nicola, 2010; Marshall and Ostlund, 2018), and was relatively consistent across the current experiments (see Figure 3—figure supplement 1). We therefore used a cutoff value of 2.5 s to identify reward-retrieval attempts. To control for reward-retrieval opportunities, which were contingent on lever pressing, our analysis focuses on a normalized measure – the proportion of lever presses that were followed by food-cup approach.

Figure 1

Download asset Open asset

Microstructural organization of instrumental behavior.

(A) Hungry rats were trained to perform a self-paced ‘reward seeking’ task, in which pressing a lever was intermittently reinforced with food pellets (RI-60s schedule). Press-contingent food-cup approaches were taken as a measure of attempted ‘reward retrieval’. (B) Probability of food-cup approaches as a function of time surrounding reinforced (purple) and nonreinforced (gray) lever presses. (C) Representative pattern of food-cup approach behavior for an individual rat surrounding reinforced and nonreinforced lever presses. Individual reinforced trials are separately presented across the y-axis aligned at the point at which the lever became activated (i.e., primed for reinforcement). (**D, E**) Effects of manipulating instrumental reinforcement contingency on the organization of reward-seeking and -retrieval responses. Total lever presses (D) or presses followed by an approach (E) during tests in which lever pressing was intermittently reinforced (RI-60s) either with food pellets and associated cues (Food and Cues) or with pellet dispenser cues but no actual food delivery (Cues Only). Rats were also tested without any reinforcement (No Food or Cues). (F) The proportion of lever presses that were followed by food-cup approach was higher for reinforced presses than for nonreinforced presses, regardless of whether pressing was reinforced with Food and Cues, or Cues Only. Rats also continued to sporadically check the food cup after nonreinforced lever presses, albeit at a much lower level than after reinforced presses.

https://doi.org/10.7554/eLife.43551.002

Figure 1—source data 1 This spreadsheet contains the behavioral responses for individual rats in Figure 1.: https://doi.org/10.7554/eLife.43551.003
Download elife-43551-fig1-data1-v2.xlsx

We found that rats were much more likely to approach the food cup after reinforced presses than after nonreinforced presses (t(8) = 19.33, p<0.001), suggesting they could detect when pellets were delivered based on sound and tactile cues produced by the dispenser. This was confirmed in subsequent tests, during which lever pressing produced either 1) pellet dispenser cues and actual pellet delivery (Food and Cues), 2) pellet dispenser cues only (Cues Only), or 3) no pellet dispenser cues or pellet delivery (No Food or Cues). Here too, we found that food-cup approaches were more likely after reinforced than nonreinforced lever presses, regardless of whether pellet dispenser cues were presented alone or together with actual food delivery (Figure 1F; ts(8) ≥ 13.74, ps <0.001; the overall frequency of lever pressing (Figure 1D) and the frequency of complete bouts of presses that were followed by an approach (Figure 1E) are presented for comparison). Although pellet dispenser cues were clearly an effective trigger for rats to shift from the lever to the food cup, they also made these shifts spontaneously, indicating that they had developed the tendency to perform the complete press-approach action sequence. These unprompted approaches occurred after a relatively small subpopulation of nonreinforced lever presses, which is consistent with our previous data (Marshall and Ostlund, 2018).

Inhibiting dopamine neurons during Pavlovian-to-instrumental transfer preferentially disrupts cue-motivated reward seeking, but not reward retrieval

Our previous findings suggest that reward-predictive cues both invigorate reward-seeking behavior (i.e., the PIT effect) and increase the likelihood that such actions will be followed by an attempt to retrieve reward from the food cup (Marshall and Ostlund, 2018). Experiment 2 investigated the contributions of the mesocorticolimbic dopamine system to these distinct behavioral effects of reward-paired cues.

Rats with dopamine neuron-specific expression of the inhibitory DREADD hM4Di or mCherry in the VTA (Figure 2) were trained on a PIT task (Figure 3A) consisting of a Pavlovian conditioning phase, in which two different auditory cues were paired (CS+) or unpaired (CS-) with food pellets, and a separate instrumental training phase, in which rats were trained to lever press for pellets. During PIT testing, we noncontingently presented the CS+ and CS- while rats were free to lever press and check the food cup without response-contingent food or cue delivery.

Figure 2

Download asset Open asset

DREADD expression in Th:Cre +rats.

(A) Th:Cre+ rats received bilateral injections of AAV-hSyn-DIO-hM4Di-mCherry or AAV-hSyn-DIO-mCherry in the VTA. (B) Representative expression of the mCherry-tagged inhibitory DREADD hM4Di (red) in VTA Th positive neurons (green) of Th:Cre+ rats, as well as in neuronal terminals (C) projecting to the nucleus accumbens (NAc) and medial prefrontal cortex (mPFC). Scale bar is 500 µm.

https://doi.org/10.7554/eLife.43551.004

Figure 3 with 3 supplements see all

Download asset Open asset

Chemogenetic inhibition of dopamine neurons on Pavlovian to instrumental transfer (PIT) performance.

(A) Experimental design: Following viral vectors injections and recovery, rats received Pavlovian training, during which they learned to associate an auditory cue (CS+) with food pellet delivery. During instrumental conditioning, rats performed the same lever-press task used in Experiment 1. Lever pressing was extinguished (Ext) before rats were submitted to a PIT test, which included separate noncontingent presentations of the CS+ and an unpaired control cue (CS-). (B) Chemogenetic inhibition of VTA dopamine neurons disrupted cue-motivated reward seeking. Total lever presses during PIT trials for rats expressing the inhibitory DREADD hM4Di or mCherry following vehicle (left) or CNO (5 mg/kg, right) treatment prior to test. Presses during pre-CS (gray) and CS periods (red) are plotted separately. (C) PIT expression is specifically impaired in hM4Di expressing Th:Cre+ rats. PIT scores (total presses: CS+ - pre-CS+) show that the CS+ increased lever pressing after vehicle treatment for both groups, but that CNO suppressed this effect in the hM4Di group but not the mCherry group. **p<0.01. (D) The CS+ increased the proportion of lever presses that were followed by a food-cup approach during PIT testing. Inhibiting VTA dopamine neurons did not disrupt expression of this effect. Instead, rats in both groups showed a modest increase in their likelihood of checking the food cup after lever pressing when treated with CNO. (E) Representative organization of the effects of the CS+ and CS- on attempts to seek out and retrieve reward during PIT. Data show lever presses and food-cup approaches (press-contingent or noncontingent) for two control rats (Th:Cre+ rats expressing mCherry and receiving vehicle).

https://doi.org/10.7554/eLife.43551.005

Figure 3—source data 1 This spreadsheet contains the behavioral responses for individual rats in Figure 3.: https://doi.org/10.7554/eLife.43551.009
Download elife-43551-fig3-data1-v2.xlsx

We found that rats selectively increased their lever press performance during CS+ presentations, relative to the CS- and pre-CS response rates (Figure 3B; CS Period * CS Type interaction, p<0.001; see Supplementary file 1A for full generalized linear mixed-effects model output). This effect was significantly attenuated by CNO in a group-specific manner (Group * Drug * CS Period * CS Type interaction, p=0.002). Analysis of data from CS+ trials (only) found that CNO selectively suppressed cue-induced lever pressing in hM4Di relative to mCherry rats (Drug * Group * CS Period interaction, p=0.013). Further analysis found that the mCherry group displayed a pronounced increase in lever pressing during CS+ trials (CS Period * CS Type interaction, p<0.001), and this effect was not altered by CNO (Drug * CS Period * CS Type interaction, p=0.780). In contrast, CNO pretreatment significantly disrupted expression of CS+ induced lever pressing in the hM4Di group (Drug * CS Period * CS Type interaction, p<0.001). hM4Di rats showed a CS+ specific elevation in lever pressing when pretreated with vehicle (CS Period * CS Type interaction, p<0.001) but not CNO (CS Period * CS Type interaction, p=0.684). While these findings indicate that CNO selectively disrupted the response-invigorating influence of the CS+ by inhibiting VTA dopamine neurons in hM4Di rats, there was also some indication that CNO may have produced a nonspecific, group-independent, suppression of PIT performance (Drug x CS Period x CS Type, p=0.007). We therefore conducted a more focused analysis of CS+ induced changes in lever-press performance (PIT score: CS+ - pre-CS+; Figure 3C), which confirmed that CNO significantly suppressed this behavioral effect in the hM4Di group (t(17) = −3.83, p<0.001), but not in the mCherry group (t(13) = −1.21, p=0.249). This is in line with recent findings that similar CNO treatment does not significantly alter PIT performance in DREADD-free rats (Collins et al., 2019).

We also investigated if VTA dopamine neuron inhibition impacts the tendency for the CS+ to increase attempts to retrieve reward after performing the reward-seeking response (Figure 3D and E; see Figure 3—figure supplement 1 for illustration of the probability of food-cup approach surrounding lever presses during nonreinforced PIT trials). We found that the CS+ (p<0.001) but not the CS- (p=0.501) increased the proportion of lever presses that were followed by a food-cup approach, even though no rewards were actually delivered at test (see Supplementary file 1B for full generalized linear mixed-effects model output; see Figure 3—figure supplements 2 and 3 for analysis of total press-contingent and noncontingent approaches, respectively). Importantly, CNO did not alter this response to the CS+ in a group-specific manner (Group * Drug * CS+ Period, p=0.835), indicating that VTA dopamine neuron function is not required for this behavior. However, CNO did induce some nonspecific, group-independent alterations in the proportion of presses that were followed by a food-cup approach, lowering the overall likelihood of this behavior (Drug effect, p=0.019), but enhancing the effect of the CS+ (Drug * CS+ Period, p<0.037).

Pathway-specific inhibition of dopamine projections to NAc, but not mPFC, disrupts cue-motivated reward seeking but not retrieval

As previously reported (Mahler et al., 2019), hM4Di expression in VTA dopamine neurons resulted in transport of DREADDs to axonal terminals in the NAc and mPFC (Figure 2). We took advantage of this to investigate the roles of these two pathways in PIT performance, again distinguishing between the influence of reward-paired cues on reward seeking and reward retrieval. Guide cannulae were aimed at the NAc or mPFC in rats expressing hM4Di in VTA dopamine neurons (Experiment 3A; Figure 4A and Figure 4—figure supplement 1). These rats underwent training and testing for PIT (Figure 4B), as described above, but were pretreated with intra-NAc or mPFC injections of CNO (1 mM) or vehicle to achieve local inhibition of neurotransmitter release (Mahler et al., 2014; Stachniak et al., 2014; Lichtenberg et al., 2017), an approach previously shown to be effective in inhibiting dopamine release (Mahler et al., 2019). Figure 4C shows that, in hM4Di-expressing rats, the CS+ specific increase in lever pressing (CS Period * CS Type interaction, p<0.001) was disrupted by CNO in a manner that depended on microinjection site (Drug * CS Period * CS Type * Site interaction, p=0.003; Supplementary file 1C for full generalized linear mixed-effects model output). After intracranial vehicle injections, rats showed a CS+ specific elevation in pressing (CS Period * CS Type interaction, p<0.001), which did not differ significantly across vehicle injection sites (CS Period * CS Type * Site interaction, p=0.151). Unlike with systemic CNO, the CS+ remained effective in increasing lever pressing after CNO microinjection into the mPFC (CS Type * CS Period interaction, p<0.001) and NAc (CS Type * CS Period interaction, p<0.001). However, this effect was significantly attenuated when CNO was injected into the NAc versus the mPFC (CS Period * CS Type * Site interaction, p=0.012; analysis of CNO data only). A more focused analysis of CS+ elicited lever pressing (Figure 4D; PIT score) confirmed that CNO disrupted this effect in the NAc group (t(6) = −2.49, p=0.047), but not in the mPFC group (t(8) = 0.34, p=0.746).

Figure 4 with 5 supplements see all

Download asset Open asset

Pathway specific chemogenetic inhibition of dopamine on PIT performance.

(A) Th:Cre+ rats initially received VTA AAV-hSyn-DIO-hM4Di-mCherry injections and were implanted with guide cannulas aimed at the medial prefrontal cortex (mPFC) or nucleus accumbens (NAc) for microinjection of CNO (1 mM) or vehicle to inhibit dopamine terminals at test. (B) Following surgery, rats underwent training and testing for PIT, as described above. We analyzed the microstructural organization of behavior (Lever presses: seeking, and presses followed by a food-cup approach: retrieval) at test. (C) Pathway specific inhibition of dopamine terminals in the NAc but not the mPFC disrupted cue-motivated reward seeking. Total lever presses during PIT trials for rats expressing the inhibitory DREADD hM4Di and receiving CNO or vehicle microinfusions in either the mPFC or NAc prior to test. Presses during pre-CS (gray) and CS periods (red) are plotted separately. (D) PIT expression was specifically impaired following NAc CNO treatment. PIT scores (total presses: CS+ - pre-CS+) show that the CS+ increased lever pressing following vehicle treatment in both groups, but that CNO suppressed this effect when injected into the NAc but not the mPFC. *p<0.05. (E) The CS+ increased the proportion of lever presses that were followed by a food-cup approach during PIT testing. This effect did not significantly vary as a function of drug treatment or group. (F) Scatter plots show the relationship between individual differences in the effect of the CS+ on lever presses that were not followed by food-cup approach in the vehicle condition (PIT Score for presses without approach) and the suppressive effect of CNO on CS+ evoked lever pressing (PIT Score for CNO test - PIT Score for vehicle test). Data points are from individual rats receiving intra-mPFC (left panel) or intra-NAc (right panel) microinjections.

https://doi.org/10.7554/eLife.43551.010

Figure 4—source data 1 This spreadsheet contains the behavioral responses for individual rats in Figure 4.: https://doi.org/10.7554/eLife.43551.016
Download elife-43551-fig4-data1-v2.xlsx

The disruptive effect of intra-NAc CNO administration on PIT performance did not systematically vary as a function of injection site (data not presented), which is not surprising given previous findings that this effect is modulated by dopamine signaling in both the core and shell of the NAc (Lex and Hauber, 2008; Peciña and Berridge, 2013). Given such findings, it is possible that complete inhibition of ventral striatal dopamine transmission would abolish expression of the PIT effect, as it was found with systemic CNO treatment in Experiment 2. It is also possible that VTA dopamine projections to areas not targeted in the current study (e.g., amygdala) make an important, parallel contribution to this behavior.

We also conducted a separate experiment (Experiment 3B) with rats expressing the mCherry reporter (only) in VTA dopamine neurons to determine if this behavioral effects of CNO microinfusion was hM4Di-dependent. While there was evidence that CNO may have produced some nonspecific response suppression when injected into the mPFC but not the NAc (Drug * Site * CS Period * CS Type, p=0.068), this drug treatment did not significantly disrupt expression of CS+ elicited lever pressing for either injection site (p’s > 0.165; Figure 4—figure supplement 2).

As in the previous experiment, we found that the CS+ (p<0.001) increased the proportion of lever presses that were followed by an attempt to retrieve reward from the food cup (Figure 4E; Supplementary file 1D for full generalized linear mixed-effects model output; see Figure 4—figure supplements 3 and 4 for analysis of total press-contingent and noncontingent approaches, respectively). CNO seemed to generally reduce the likelihood that lever pressing would be followed by food-cup approach, though this effect did not reach statistical significance (Drug effect, p=0.057). If anything, intra-NAc injections of CNO tended to enhance the effect of the CS+ on this approach response, though this effect also failed to reach significance (Drug * Site * CS+ Period, p=0.093).

The above findings indicate that VTA dopamine circuitry supports the motivational influence of the CS+ on reward seeking but does not mediate that cue's ability to promote reward retrieval. We wondered if this might account for variability in the partial, response-suppressive effect of NAc dopamine terminal inhibition. Specifically, we hypothesized that rats inclined to respond to the CS+ by engaging in discrete bouts of lever pressing, without attempting to retrieve reward, would be particularly sensitive to inhibition of NAc dopamine inputs. Consistent with this, we found that for the NAc group, individual differences in the effect of the CS+ on lever presses without subsequent food cup approach (during the vehicle test) were correlated with the degree to which CNO suppressed CS+ evoked lever pressing (PIT Score for all presses), relative to vehicle (CNO – Vehicle; r = −0.81, p=0.027; Figure 4F). No such relationship was found for the mPFC group (r = −0.19, p=0.618), which did not show sensitivity to dopamine terminal inhibition. Similar analysis of data from Experiment 2 also found no correlation between these measures (Figure 4—figure supplement 5), which may not be surprising given that systemic inhibition of VTA dopamine neurons led to a more robust and consistent suppression of CS+ evoked lever pressing (Figure 3B).

Altogether, these findings demonstrate that the mesolimbic dopamine system selectively mediates cue-motivated reward seeking, and suggest that dopamine inputs to the NAc are particularly important for individuals that tend to respond to such cues with discrete bouts of reward seeking without subsequent reward retrieval.

Inhibiting dopamine neurons spares the sensitivity of reward-seeking actions to reward devaluation

It is unclear from the above findings if rats' tendency to approach the food cup after lever pressing reflects a discrete goal-directed action or if this response tends to be performed habitually, as part of a fixed press-approach action chunk. We conducted a reward devaluation experiment to probe this issue and investigate the role of VTA dopamine neurons in goal-directed action selection. Rats expressing mCherry or hM4Di in VTA dopamine neurons were trained on two distinct instrumental action-outcome contingencies, after which they underwent reward devaluation testing after pretreatment with CNO (5 mg/kg) or vehicle (Figure 5A). Rats performed significantly fewer presses on the devalued lever than on the valued lever (Figure 5B; Lever effect, p<0.001; Supplementary file 1E for full generalized linear mixed-effects model output). CNO treatment did not significantly alter the effect of reward devaluation on lever pressing in either hM4Di or mCherry rats (Drug * Lever, p=0.146; Group * Drug * Lever interaction, p=0.591), indicating that VTA dopamine neuron function is not required for this aspect of goal-directed action selection. Inhibiting VTA dopamine neurons also failed to disrupt sensitivity to devaluation during reinforced testing (see Figure 5—figure supplement 1).

Figure 5 with 1 supplement see all

Download asset Open asset

Chemogenetic inhibition of dopamine neurons on reward devaluation performance.

(A) Th:Cre+ rats received VTA injections of AAV-hSyn-DIO-hM4Di-mCherry or AAV-hSyn-DIO-mCherry. Following recovery, rats were trained on two distinct lever-press actions for two different rewards (Instrumental Learning). Rats then underwent reward-specific devaluation testing following treatment with CNO (5 mg/kg) or vehicle. (B) Chemogenetic VTA dopamine inhibition did not alter the impact of reward devaluation on reward seeking. Total lever presses on the valued (red bars) and devalued (gray) levers in hM4Di or mCherry expressing Th:Cre+ rats, following CNO (5 mg/kg) or vehicle treatments. (C) Proportion of valued (blue) and devalued (gray) lever-press actions that were followed by a food-cup approach. Rats were more likely to attempt to retrieve reward after performing the devalued lever-press action. This effect was not altered by VTA dopamine neuron inhibition. (D). Lever presses performed without a subsequent food-cup approach response (red) were more sensitive to reward devaluation than presses that were followed by an approach (blue).

https://doi.org/10.7554/eLife.43551.017

Figure 5—source data 1 This spreadsheet contains the behavioral responses for individual rats in Figure 5.: https://doi.org/10.7554/eLife.43551.019
Download elife-43551-fig5-data1-v2.xlsx

VTA dopamine neuron inhibition did not significantly alter the overall likelihood of press-contingent approach behavior or its sensitivity to reward devaluation (Figure 5C; ps ≥. 109; see Supplementary file 1F for full generalized linear mixed-effects model output). Interestingly, we found that the proportion of presses that were followed by a food-cup approach was actually greater for the devalued lever than for the valued lever (Lever effect, p=0.040). This effect was driven by the fact that lever presses that were not followed by approach were more strongly suppressed by reward devaluation than presses that were directly followed by an approach (Press Type * Lever interaction, p<0.001; see Figure 5D and Supplementary file 1G for full generalized linear mixed-effects model output).

It was not possible to analyze the impact of reward devaluation on noncontingent approach responses performed at test because this behavior was associated with both the valued and devalued reward. However, other findings from our lab (data not shown) from studies involving a single reward-type indicate that noncontingent approaches are readily suppressed by reward devaluation, in contrast to response-contingent approaches. This is in line with previous reports that food-cup approach behavior is generally sensitive to reward devaluation (Balleine, 1992; Thrailkill and Bouton, 2017), particularly if it is elicited by Pavlovian reward-predicted cues (Holland and Straub, 1979; Lichtenberg et al., 2017).

Discussion

We investigated the role of mesocorticolimbic dopamine circuitry in regulating reward-seeking (lever pressing) and reward-retrieval responses (press-contingent food-cup approach). Consistent with a recent study (Marshall and Ostlund, 2018), we found that noncontingent CS+ presentations increased reward seeking, generally, but also increased the likelihood that rats would attempt to retrieve reward after performing such actions. These behaviors were differentially mediated by the mesolimbic dopamine system. Specifically, chemogenetic inhibition of VTA dopamine neurons or their inputs to NAc, but not mPFC, disrupted the excitatory influence of the CS+ on reward seeking, but spared that cue’s ability to increase attempts to retrieve reward. These behaviors were also differentially sensitive to reward devaluation, which suppressed reward seeking but actually increased the likelihood that rats would attempt to retrieve reward. VTA dopamine neurons inhibition did not impact the influence of reward devaluation on either component of behavior.

We found that attempts to retrieve reward by transitioning from the lever to the food cup were executed in a habitual manner, without consideration of reward value, consistent with action chunking (Dezfouli et al., 2014; Smith and Graybiel, 2016). However, task performance was not limited to these press-approach action chunks. When rats pressed the lever but were not reinforced (with food or cues), they would occasionally check the food cup but often omitted this response. This sporadic pattern of reward retrieval is adaptive given that strict press-approach action sequencing is unnecessary under such conditions, when rewards are sparse and uncertain. Instead, rats seemed to vacillate between two different strategies when initiating the lever-press response, performing it as part of a complete action chunk (press-approach) or as a discrete action (press only). These distinct patterns of reward seeking appeared to be differentially sensitive to reward devaluation. While rats were generally less likely to lever press for the devalued reward than for the valued reward, press-approach action chunks tended to be less sensitive to reward devaluation than presses that were not followed by approach. Because of this differential sensitivity to reward devaluation, the proportion of all lever presses followed by an attempt to retrieve reward was actually greater for devalued action than for the valued action. Such findings supports the connection between action chunking and habitual behavior (Graybiel, 2008; Dezfouli et al., 2014; Smith and Graybiel, 2016), and suggest that moment-to-moment control over self-paced, reward-seeking behavior may shift back and forth between habit and goal-directed systems.

PIT testing revealed that the CS+ generally increased lever pressing, but disproportionately increased the performance of press-approach action chunks, at least relative to their otherwise low frequency of occurring in the absence of the CS+. This finding further bolsters the connection between action chunking and habitual control given previous reports that habitual reward-seeking actions are particularly sensitive to the motivational effects of reward-paired cues (Holland, 2004; Wiltgen et al., 2012). However, while press-approach action chunks were elevated during the CS+, they still accounted for only a minority (between 30% and 50%) of lever presses that were performed during these trials. Most lever presses evoked by the CS+ were not followed by a food-cup approach, and it was this component of the PIT effect that was selectively disrupted by chemogenetic inhibition of VTA dopamine neurons or their inputs to NAc. The ability of the CS+ to promote press-approach chunks was, in contrast, completely spared by these manipulations. Consistent with this, we found that the response-suppressive effect of NAc dopamine terminal inhibition varied across rats based on the way they normally responded to the CS+. Rats that responded to that cue with a large increase in discrete lever presses (i.e., without subsequent food-cup approach) showed the greatest suppression. We suggest that this may reflect differences across rats in their sensitivity to the dopamine-dependent motivational effects of reward-paired cues.

Previous studies have found that dopamine receptor antagonists either selectively suppress lever pressing without affecting concomitant food-cup approach (Nelson and Killcross, 2013), or suppress both types behavior to a similar extent (Wassum et al., 2011; Ostlund et al., 2012). Even this latter finding is consistent with dopamine contributing more to reward seeking than reward retrieval, since a reduction in reward seeking creates fewer opportunities to retrieve reward. Interpreting these findings is problematic, however, because such studies typically have not applied microstructural analyses, like those used here, to distinguish between press-contingent and noncontingent food-cup approaches. One exception is a study by Nicola (2010) showing that blocking dopamine receptors in the NAc attenuates cue-triggered lever pressing without impacting the latency of subsequent food-cup approach behavior. Building on such findings, the current study used the PIT paradigm to show that the mesolimbic dopamine system specifically mediates the motivational influence of reward-paired cues on reward seeking but not their dissociable ability to increase the likelihood that such actions will be followed by an attempt to retrieve reward.

Our previous studies monitoring mesolimbic dopamine release during PIT performance are also interesting to consider together with the current findings. For instance, we found that CS+ evoked phasic dopamine release in the NAc correlates with that cue’s effect on lever pressing (Wassum et al., 2013; Ostlund et al., 2014) but not food-cup approaches (Aitken et al., 2016). We also found that individual CS+ evoked lever presses are temporally correlated with transient bouts of phasic dopamine release (Ostlund et al., 2014). The current findings suggest that this relationship between NAc dopamine release and cue-motivated reward seeking may be stronger for discrete presses that are performed without a subsequent food-cup approach than for complete press-approach chunks. This question remains to be investigated, and would help resolve whether the mesolimbic dopamine system is involved in modulating reward seeking, generally, or whether its activity becomes uncoupled from the execution of action chunks, which may become differentially associated with nigrostriatal dopamine system activity (Jin and Costa, 2010).

While dopamine is known to play a crucial role in forming new action chunks (Graybiel, 1998; Jin and Costa, 2015), its role in the expression of previously learned action chunks is less clear. Our findings indicate that VTA dopamine circuitry does not play a necessary role in the execution of press-approach action chunks, regardless of whether they are self-initiated or are prompted by a reward-paired cue. This is generally compatible with previous findings. For instance, dopamine receptor blockade suppresses action sequence performance early but not late in training (Levesque et al., 2007; Wassum et al., 2012). Moreover, the phasic NAc dopamine release that normally precedes action sequence performance tends to become attenuated as rats acquire efficient task performance, presumably through action chunking (Cacciapaglia et al., 2012; Wassum et al., 2012; Klanker et al., 2015; Collins et al., 2016). That said, the mesolimbic dopamine system continues to contribute to action sequence tasks that require considerable effort, such as the execution of a long series of lever presses (Fischbach-Weiss et al., 2018).

Inhibiting VTA dopamine neurons did not impact rats’ sensitivity to reward devaluation, which is consistent with other findings in the literature (Dickinson et al., 2000; Lex and Hauber, 2010a; Lex and Hauber, 2010b; Wassum et al., 2011). Such findings are interesting given that regions innervated by this dopamine system, including the NAc and mPFC, are known to make important contributions to goal-directed decision making (Bradfield and Balleine, 2017; Sharpe et al., 2019). Of course, dopamine likely contributes to goal-directed decision making in more demanding tasks that require greater cognitive resources (Floresco, 2013; Cools, 2015; Westbrook and Braver, 2016).

It is also notable that inhibiting mPFC dopamine terminals had no detectable effects on expression of PIT, since food-paired cues are known to elicit dopamine release (Bassareo and Di Chiara, 1997; Feenstra et al., 1999) and neural activity (Homayoun and Moghaddam, 2009) in the mPFC. It is possible that the dissociable effects of NAc versus mPFC dopamine terminal inhibition reported here may relate to inherent differences between the mesolimbic and mesocortical dopamine systems, which include regional differences in release kinetics and in the density of dopamine terminals or receptors (Lammel et al., 2008; Weele et al., 2019; Mahler et al., 2019). However, previous lesion studies suggest that the mPFC may not be an essential component of the circuitry that mediates PIT performance (Cardinal et al., 2003; Corbit and Balleine, 2003), which is more in line with the current results.

Our findings may also have implications for understanding the role of dopamine in pathologies of behavioral control such as obsessive-compulsive disorder (OCD). In the signal attenuation model of OCD (Joel and Avisar, 2001), rats learn that response-contingent cues no longer signal that an instrumental reward-seeking action will produce reward. In this case, the logical organization of reward-seeking and -retrieval actions disintegrates, such that rats exhibit persistent reward seeking, typically without attempting to collect reward from the food cup. It was previously reported that blocking D1-dopamine receptors disrupts expression of these incomplete bouts of compulsive-like reward seeking, without affecting the production of complete bouts of reward seeking and retrieval, which continue to be performed on some test trials (Joel and Doljansky, 2003). Considered in this light, our findings suggest that the mesolimbic dopamine system may mediate the tendency for reward-paired cues to promote this potentially compulsive component of cue-motivated reward seeking. This link deserves further research, and may facilitate research to advance understanding and treatment of compulsive disorders like OCD and addiction (Joel et al., 2008; Robinson et al., 2014).

Materials and methods

Animals

In total, 89 male and female Long-Evans Tyrosine hydroxylase (Th):Cre+ rats (hemizygous Cre+) (Witten et al., 2011; Mahler et al., 2019) and wildtype (WT) littermates were used for this study. Subjects were at least 3 months of age at the start of the experiment and were single- or paired-housed in standard Plexiglas cages on a 12 hr/12 hr light/dark cycle. Animals were maintained at ~85% of their free-feeding weight during behavioral procedures. All experimental procedures that involved rats were approved by the UC Irvine Institutional Animal Care and Use Committee and were in accordance with the National Research Council Guide for the Care and Use of Laboratory Animals.

Apparatus

Behavioral procedures took place in sound- and light-attenuated Med Associates chambers (St Albans, VT, USA; ENV-007). Individual chambers were equipped with two retractable levers (Med Associates; ENV-112CM) positioned to the left and right of recessed food cup. Grain-based dustless precision pellets (45 mg, BioServ, Frenchtown, NJ, USA) were delivered into the cup using a pellet dispenser (Med Associates; ENV-203M-45). Sucrose solution (20% wt/vol) was delivered into the cup with a syringe pump (Med Associates; PHM-100). A photobeam detector (Med Associates; ENV-254-CB) positioned across the magazine entrance was used to record food-cup approaches. Chambers were illuminated by a houselight during all sessions.

Surgery

Request a detailed protocol

Th:Cre+ rats were anesthetized using isoflurane and placed in a stereotaxic frame for microinjections of a Cre-dependent (DIO) serotype two adeno-associated virus (AAV) vectors to induce dopamine neuron-specific expression of the inhibitory designer receptor exclusively activated by designer drug (DREADD) hM4Di fused to mCherry (AAV-hSyn-DIO-hM4Di-mCherry), or mCherry alone (AAV-hSyn-DIO-mCherry) (University of North Carolina Chapel Hill vector Core, Chapel Hill, NC, USA/Addgene, Cambridge, MA, USA; Experiment 2 was replicated with both sources) (Armbruster et al., 2007; Mahler et al., 2019). The AAV was injected bilaterally into the VTA (−5.5 mm AP,±0.8 mm ML, −8.15 mm DV; 1µL/side). Experiment 3 rats were bilaterally implanted with guide cannulae (22 gage, Plastic One) 1 mm dorsal to NAc (+1.3 AP, ±1.8 ML, −6.2 DV) or mPFC (+3.00 AP, ±0.5 ML, −3.0 DV) for subsequent clozapine-n-oxide (CNO) microinjections. Animals were randomly assigned to virus (hM4Di or mCherry) and cannula location (NAc or mPFC) groups. Animals were allowed at least 5 days of recovery before undergoing food restriction and behavioral training. Testing occurred at least 25 days after surgery to allow adequate time for viral expression of hM4Di throughout dopamine neurons, including in terminals within the NAc and mPFC.

Experiment 1: Effects of response-contingent feedback about reward delivery on reward retrieval

Instrumental learning

Request a detailed protocol

WT rats (n = 9) underwent 2 d of magazine training. In each session, 40 pellets were delivered into the food cup on a random 90 s intertrial interval (ITI). Rats then received 9 d of instrumental lever-press training. In each session, rats had continuous access to the right lever, which could be pressed to deliver food pellets into the food cup. The schedule of reinforcement was adjusted over days from continuous reinforcement (CRF) to increasing random intervals (RI), such that reinforcement only became available once a randomly determined interval had elapsed since the last reinforcer delivery. Rats received one day each of CRF, RI-15s, and RI-30s training, before undergoing 6 days of training with RI-60s. Each session was terminated after 30 min or after 20 rewards deliveries.

Varying response-contingent feedback

Request a detailed protocol

Following training, rats were given a series of tests to assess the influence of response-contingent feedback about reward delivery on instrumental reward-seeking (lever presses) and reward-retrieval responses (press-contingent food-cup approach). Rats were given three tests (30 min each, pseudorandom order over days) during which lever pressing caused: 1) activation of the pellet dispenser to deliver a pellet into the food cup (RI-60s schedule; Food and Cues Test), 2) activation of the pellet dispenser to deliver a pellet into an external cup not accessible to the rats, producing associated sound and tactile cues but no reward (also RI-60s schedule; Cues Only Test), or 3) no dispenser activation (i.e., extinction; No Food or Cues Test).

Experiments 2 and 3: Role of mesocorticolimbic dopamine in cue-motivated reward seeking and retrieval

Pavlovian conditioning

Request a detailed protocol

Th:Cre+ rats (n = 60) underwent 2 d of magazine training, as in Experiment 1 (40 pellets on 90 s random ITI). Rats then received eight daily Pavlovian conditioning sessions. Each session consisted of a series of 6 presentations of a two-min audio cue (CS+; either a pulsating 2 kHz pure tone (0.1 s on and 0.1 s off) or white noise; 80 dB), with trials separated by a 5 min variable ITI (range 4–6 min between CS onsets). During each CS+ trial, pellets were delivered on a 30 s random time schedule, resulting in an average of 4 pellets per trial. Rats were separately habituated to an unpaired auditory stimulus (CS-; alternative audio stimulus; 2 min duration). CS- exposure procedures differed slightly across experiments. For Experiment 2, which assessed the effects of system-wide dopamine neurons inhibition, rats received a final Pavlovian conditioning session consisting of four trials with the CS+ (reinforced, as described above) followed by four trials with the CS- (nonreinforced), separated by a 5 min variable ITI. In Experiment 3, which assessed the effects of local inhibition of dopamine terminals in NAc or mPFC, rats were given 2 days of CS- only exposure (eight nonreinforced trials per session, 5 min variable ITI) following initial CS+ training. Conditioning was measured by comparing the rate of food-cup approach between the CS onset and the first pellet delivery (to exclude unconditioned behavior) to the rate of approach during the pre-CS period.

Instrumental training

Request a detailed protocol

Following Pavlovian conditioning, rats were given 9 d of instrumental training, as in Experiment 1, with one day each of CRF, RI-15s, RI-30s, and 6 days of RI-60s. Sessions ended after 30 min or 20 rewards were earned.

Pavlovian-to-instrumental transfer (PIT) test

Request a detailed protocol

After the last instrumental training session, rats were given a session of Pavlovian (CS+) training, identical to initial training. They were then given a 30 min extinction session, during which lever presses were recorded but had no consequence (i.e., no food or cues). On the next day, rats were given a PIT test, during which the lever was continuously available but produced no rewards. Following 8 min of extinction, the CS+ and CS- were each presented four times (2 min per trial) in pseudorandom order and separated by a 3 min fixed ITI. Before each new round of testing, rats were given two sessions of instrumental retraining (RI-60s), one session of CS+ retraining, and one 30 min extinction session, as described above. Test procedures differed slightly between Experiments 2 and 3.

Experiment 2

Request a detailed protocol

Th:Cre+ rats expressing hM4Di (n = 18) or mCherry only (n = 14) in VTA dopamine neurons were used to assess the effects of system-wide inhibition of the mesocorticolimbic dopamine system on PIT performance. These groups were run together and received CNO (5 mg/kg, i.p.) or vehicle (5% DMSO in saline) injections 30 min prior to testing. They underwent a second test following retraining (described above), prior to which the alternative drug pretreatment was administered.

Experiment 3

Request a detailed protocol

In Experiment 3A, Th:Cre+ rats expressing hM4Di in VTA dopamine neurons were used to assess the impact of locally inhibiting dopaminergic terminals in the NAc (n = 7) or mPFC (n = 9) on PIT performance. Because microinjection procedures produced additional variability in task performance, rats in this experiment underwent a total of 4 tests. Rats received either CNO microinfusions (1 mM, 0.5 µL/side or 0.3 µL/side, for NAc and mPFC respectively) or vehicle (DMSO 5% in aCSF) 5 min before the start of each test and were given two rounds of testing each with CNO and vehicle (test order counterbalanced across other experimental conditions). To determine if the effects of CNO microinjections depended on hM4Di expression, a separate control study (Experiment 3B) was run using Th:Cre+ rats expressing mCherry only in VTA dopamine neurons. Experiments 3A and 3B were run and analyzed separately.

Experiment 4: Role of mesocorticolimbic dopamine in goal-directed action selection

Instrumental Training

Request a detailed protocol

Th:Cre+ rats expressing hM4Di (n = 11) or mCherry only (n = 9) in VTA dopamine neurons began with 2 d of magazine training, during which they received 20 grain-pellets and 20 liquid sucrose rewards (0.1 mL of 20% sucrose solution, wt/vol) in random order according to a common 30 s random ITI. This was followed by 11 d of instrumental training with two distinct action–outcome contingencies (e.g., left-lever press → grain; right-lever press→ sucrose). The reinforcement schedule that was gradually shifted over days with 2d of CRF to increasingly effortful random ratio (RR) schedules, with 3 d of RR-5, 3 d of RR-10, and 3d of RR-20 reinforcement. The left and right lever-press responses were trained in separate sessions, at least 2 hr apart, on each day. Action-outcome contingencies were counterbalanced across subjects. Sessions were terminated after 30 min elapsed or 20 pellets were earned.

Devaluation Testing

Request a detailed protocol

To selectively devalue one of the food rewards prior to testing, rats were satiated on grain pellets or sucrose solution by providing them with 90 min of unrestricted access to that food in the home cage. After 60 min of feeding, rats received CNO (5 mg/kg, i.p.) or vehicle injections. After an additional 30 min of feeding, rats were placed in the chamber for a test in which they had continuous access to both levers. The test began with a 5 min nonreinforced phase (no food or cues), which was immediately followed by a 15 min reinforced phase, during which each action was reinforced with its respective reward (CRF for the first five rewards, then RR-20 for the remainder of the session). Rats were given a total of 4 devaluation tests, two after CNO and two after vehicle, alternating the identity of the devalued reward across the two tests in each drug condition (test order counterbalanced across training and drug conditions).

Histology

Request a detailed protocol

Rats were deeply anesthetized with a lethal dose of pentobarbital and perfused with 1x PBS followed by 4% paraformaldehyde. Brains were postfixed in 4% paraformaldehyde, cryoprotected in 20% sucrose and sliced at 40 μm on a cryostat. To visualize hM4Di expression, we performed immunohistochemistry for Th and mCherry tag. Tissue was first incubated in 3% normal donkey serum PBS plus Triton X-100 (PBST; 2 hr) and then in primary antibodies in PBST at 4°C for 48 hr using rabbit anti-DsRed (mCherry tag; 1:500; Clontech; 632496), and mouse anti-Th (1:1,000, Immunostar; 22941) antibodies. Sections were incubated for 4 hr at room temperature in fluorescent conjugated secondary antibodies (Alexa Fluor 488 goat anti-mouse (Th; 1:500; Invitrogen; A10667) and Alexa Fluor 594 goat anti-rabbit (DsRed; 1:500; Invitrogen; A11037)).

Drugs

CNO was obtained from NIMH (Experiments 2 and 4) or Sigma-Aldrich (St. Louis, MO, USA; Experiment 3), and dissolved in 5% DMSO in saline, or aCSF for microinjection.

Behavioral measures

Request a detailed protocol

Reward-seeking actions were quantified as the total number (frequency) of lever presses performed per unit time. Based on microstructural analyses described below, lever presses that were followed by a food-cup approach (≤2.5 s) were distinguished from presses that were not followed by an approach. The proportion of presses that were followed by an approach response served as our primary measure of press-contingent reward retrieval. We also analyzed bouts of noncontingent food-cup approach (occurring >2.5 s after the most recent press or approach), which served as a measure of spontaneous or cue-evoked reward retrieval.

Statistical analysis

Request a detailed protocol

Data were analyzed using general(ized) linear mixed-effects models (Pinheiro and Bates, 2000), which allows for simultaneous parameter estimation as a function of condition (fixed effects) and the individual rat (random effects) (Pinheiro and Bates, 2000; Bolker et al., 2009; Boisgontier and Cheval, 2016). Analyses on count data (e.g., response frequency) incorporated a Poisson response distribution and a log link function (Coxe et al., 2009). Fixed-effects structures included an overall intercept and the full factorial of all primary manipulations (Experiment 2: Group, Drug, CS Type, CS Period; Experiment 3: Site, Drug, CS Type, CS Period; Experiment 4: Group, Drug, Lever), and the random-effects structures included by-subjects uncorrelated intercepts adjusted for the within-subjects manipulations (i.e., Experiments 2 and 3: Drug, CS Type, and CS Period; Experiment 4: Drug, Lever). ‘CS Type’ refers to the distinction between the CS+ and CS-, while ‘CS Period’ refers to the distinction between the 120 s CS duration and the 120 s period preceding its onset. Proportion data were square-root transformed prior to analysis to correct positive skew, but are plotted in non-transformed space for ease of interpretation. These data were collapsed across pre-CS+ and pre-CS- periods, such that the factor ‘CS Period’ had three levels (CS+, CS-, and Pre-CS). The fixed- and random-effects structures of this analysis was identical to the frequency analysis above with the exception that CS Type was not included in the analysis, and the random-effects structure only included by-subjects intercepts.

All statistical analyses were conducted using the Statistics and Machine Learning Toolbox in MATLAB (The MathWorks; Natick, MA, USA). The alpha level for all tests was .05. As all predictors were categorical in the mixed-effects analysis, effect size was represented by the unstandardized regression coefficient (Baguley, 2009), reported as b in model output tables. Mixed-effects models provide t-values to reflect the statistical significance of the coefficient relative to the population mean (i.e., simple effects). These simple effects are indicative of main effects and interactions when a factor has only two levels. For factors with at least three levels, F-tests were conducted to reveal the overall significance of the effect or interaction(s) involving this factor. The source of significant interactions was determined by secondary mixed-effects models identical to those described above but split by the relevant factor of interest. For analyses in which a significant main effect had more than two levels, post-hoc tests of main effects employed MATLAB’s coefTest function, and interactions were reported in-text as the results of ANOVA F-tests (i.e., whether the coefficients for each fixed effect were significantly different from 0).

When analyzing data from PIT experiments, the ability of the CS+ to selectively increase performance of a response (relative to the CS-) over baseline (pre-CS) levels was indicated by a significant CS Type * CS Period interaction. We were particularly interested in treatment-induced alterations in the expression of this effect, as indicated by significant 3-way and 4-way interactions involving this CS Type * CS Period term, in combination with Drug and/or Group factors. We were also interested in potential main effects of Drug and/or Group factors, reflecting broad, cue-independent behavioral effects. While statistical output tables include a summary of all fixed effects included in the model, only these theoretically interesting findings are discussed in the main text. Lower level interactions involving only CS Type or CS Period, but not their combination, are provided in the output tables but are not discussed in the main text given that they may be the product of incidental or spurious behavioral differences across cue conditions.

PIT Scores (CS+ – pre-CS+) were calculated for more focused analysis of CS+ elicited lever pressing. One-sample t-tests were used to assess the effect of CNO for each group. Because inhibiting VTA dopamine neurons or their NAc terminals predominantly disrupted the ability of the CS+ to elicit lever presses that were not followed by an approach response, we also assessed if differences across rats in their tendency to exhibit such behavior in the Vehicle Test (PIT score; presses without approach) correlated with differences in their sensitivity to the response-suppressive effect of CNO on CS+ elicited lever pressing (CNO – Vehicle; PIT score, all presses).

Data availability

All data generated and analyzed during this study are included in supporting files. Source data files have been provided for Figures 1, 3, 4 and 5, as well as their respective figure supplements.

References

(2016) Nucleus accumbens core dopamine signaling tracks the need-based motivational value of food-paired cues
Journal of Neurochemistry 136:1026–1036.

https://doi.org/10.1111/jnc.13494
- Google Scholar
1. Armbruster BN
2. Li X
3. Pausch MH
4. Herlitze S
5. Roth BL
(2007) Evolving the lock to fit the key to create a family of G protein-coupled receptors potently activated by an inert ligand
Proceedings of the National Academy of Sciences 104:5163–5168.

https://doi.org/10.1073/pnas.0700293104
- Google Scholar
1. Baguley T
(2009) Standardized or simple effect size: what should be reported?
British Journal of Psychology 100:603–617.

https://doi.org/10.1348/000712608X377117
- PubMed
- Google Scholar
1. Balleine B
(1992) Instrumental performance following a shift in primary motivation depends on incentive learning
Journal of Experimental Psychology: Animal Behavior Processes 18:236–250.

https://doi.org/10.1037/0097-7403.18.3.236
- Google Scholar
1. Balleine BW
2. Dickinson A
(1998) Goal-directed instrumental action: contingency and incentive learning and their cortical substrates
Neuropharmacology 37:407–419.

https://doi.org/10.1016/S0028-3908(98)00033-1
- PubMed
- Google Scholar
1. Bassareo V
2. Di Chiara G
(1997) Differential influence of associative and nonassociative learning mechanisms on the responsiveness of prefrontal and accumbal dopamine transmission to food stimuli in rats fed ad libitum
The Journal of Neuroscience 17:851–861.

https://doi.org/10.1523/JNEUROSCI.17-02-00851.1997
- PubMed
- Google Scholar
1. Boisgontier MP
2. Cheval B
(2016) The anova to mixed model transition
Neuroscience & Biobehavioral Reviews 68:1004–1005.

https://doi.org/10.1016/j.neubiorev.2016.05.034
- PubMed
- Google Scholar
1. Bolker BM
2. Brooks ME
3. Clark CJ
4. Geange SW
5. Poulsen JR
6. Stevens MH
7. White JS
(2009) Generalized linear mixed models: a practical guide for ecology and evolution
Trends in Ecology & Evolution 24:127–135.

https://doi.org/10.1016/j.tree.2008.10.008
- PubMed
- Google Scholar
1. Bradfield L
2. Balleine B
(2017)
The learning and motivational processes controlling Goal-Directed action and their neural bases

Decision Neuroscience pp. 71–80.
- Google Scholar
(2012) Differential dopamine release dynamics in the nucleus accumbens core and shell track distinct aspects of goal-directed behavior for sucrose
Neuropharmacology 62:2050–2056.

https://doi.org/10.1016/j.neuropharm.2011.12.027
- PubMed
- Google Scholar
(2003) Role of the anterior cingulate cortex in the control over behavior by pavlovian conditioned stimuli in rats
Behavioral Neuroscience 117:566–587.

https://doi.org/10.1037/0735-7044.117.3.566
- PubMed
- Google Scholar
1. Collins AL
2. Greenfield VY
3. Bye JK
4. Linker KE
5. Wang AS
6. Wassum KM
(2016) Dynamic mesolimbic dopamine signaling during action sequence learning and expectation violation
Scientific Reports 6:20231.

https://doi.org/10.1038/srep20231
- PubMed
- Google Scholar
(2019) Nucleus accumbens cholinergic interneurons oppose Cue-Motivated behavior
Biological Psychiatry.

https://doi.org/10.1016/j.biopsych.2019.02.014
- PubMed
- Google Scholar
1. Cools R
(2015) The cost of dopamine for dynamic cognitive control
Current Opinion in Behavioral Sciences 4:152–159.

https://doi.org/10.1016/j.cobeha.2015.05.007
- Google Scholar
1. Corbit LH
2. Balleine BW
(2003) The role of prelimbic cortex in instrumental conditioning
Behavioural Brain Research 146:145–157.

https://doi.org/10.1016/j.bbr.2003.09.023
- PubMed
- Google Scholar
1. Corbit LH
2. Balleine BW
(2016) Learning and motivational processes contributing to Pavlovian-Instrumental transfer and their neural bases: dopamine and beyond
Current Topics in Behavioral Neurosciences 27:259–289.

https://doi.org/10.1007/7854_2015_388
- PubMed
- Google Scholar
1. Coxe S
2. West SG
3. Aiken LS
(2009) The analysis of count data: a gentle introduction to poisson regression and its alternatives
Journal of Personality Assessment 91:121–136.

https://doi.org/10.1080/00223890802634175
- PubMed
- Google Scholar
(2014) Habits as action sequences: hierarchical action control and changes in outcome value
Philosophical Transactions of the Royal Society B: Biological Sciences 369:20130482.

https://doi.org/10.1098/rstb.2013.0482
- Google Scholar
1. Dickinson A
(1985) Actions and habits: the development of behavioural autonomy
Philosophical Transactions of the Royal Society B: Biological Sciences 308:67–78.

https://doi.org/10.1098/rstb.1985.0010
- Google Scholar
1. Dickinson A
2. Balleine B
3. Watt A
4. Gonzales F
5. Boakes R
(1995)
Overtraining and the motivational control of instrumental action

Animal Learning & Behavior 22:197–206.
- Google Scholar
(2000) Dissociation of pavlovian and instrumental incentive learning under dopamine antagonists
Behavioral Neuroscience 114:468–483.

https://doi.org/10.1037/0735-7044.114.3.468
- PubMed
- Google Scholar
1. Estes WK
(1948) Discriminative conditioning; effects of a pavlovian conditioned stimulus upon a subsequently established operant response
Journal of Experimental Psychology 38:173–177.

https://doi.org/10.1037/h0057525
- PubMed
- Google Scholar
(1999) Dopamine and noradrenaline release in the prefrontal cortex of rats during classical aversive and appetitive conditioning to a contextual stimulus: interference by novelty effects
Neuroscience Letters 272:179–182.

https://doi.org/10.1016/S0304-3940(99)00601-1
- PubMed
- Google Scholar
(2018) Inhibiting mesolimbic dopamine neurons reduces the initiation and maintenance of instrumental responding
Neuroscience 372:306–315.

https://doi.org/10.1016/j.neuroscience.2017.12.003
- PubMed
- Google Scholar
1. Floresco SB
(2013) Prefrontal dopamine and behavioral flexibility: shifting from an "inverted-U" toward a family of functions
Frontiers in Neuroscience 7:62.

https://doi.org/10.3389/fnins.2013.00062
- PubMed
- Google Scholar
1. Frederick MJ
2. Cocuzzo SE
(2017) Contrafreeloading in rats is adaptive and flexible: support for an animal model of compulsive checking
Evolutionary Psychology 15:147470491773593.

https://doi.org/10.1177/1474704917735937
- Google Scholar
1. Graybiel AM
(1998) The basal ganglia and chunking of action repertoires
Neurobiology of Learning and Memory 70:119–136.

https://doi.org/10.1006/nlme.1998.3843
- PubMed
- Google Scholar
1. Graybiel AM
(2008) Habits, rituals, and the evaluative brain
Annual Review of Neuroscience 31:359–387.

https://doi.org/10.1146/annurev.neuro.29.051605.112851
- PubMed
- Google Scholar
1. Holland PC
(2004) Relations between Pavlovian-instrumental transfer and reinforcer devaluation
Journal of Experimental Psychology: Animal Behavior Processes 30:104–117.

https://doi.org/10.1037/0097-7403.30.2.104
- PubMed
- Google Scholar
1. Holland PC
2. Straub JJ
(1979) Differential effects of two ways of devaluing the unconditioned stimulus after pavlovian appetitive conditioning
Journal of Experimental Psychology: Animal Behavior Processes 5:65–78.

https://doi.org/10.1037/0097-7403.5.1.65
- PubMed
- Google Scholar
1. Homayoun H
2. Moghaddam B
(2009) Differential representation of Pavlovian-instrumental transfer by prefrontal cortex subregions and striatum
European Journal of Neuroscience 29:1461–1476.

https://doi.org/10.1111/j.1460-9568.2009.06679.x
- PubMed
- Google Scholar
1. Jin X
2. Costa RM
(2010) Start/stop signals emerge in nigrostriatal circuits during sequence learning
Nature 466:457–462.

https://doi.org/10.1038/nature09263
- PubMed
- Google Scholar
1. Jin X
2. Costa RM
(2015) Shaping action sequences in basal ganglia circuits
Current Opinion in Neurobiology 33:188–196.

https://doi.org/10.1016/j.conb.2015.06.011
- PubMed
- Google Scholar
Book
(2008)
Animal Models of Obsessive–Compulsive Disorder: From Bench to Bedside via Endophenotypes and Biomarkers A2 - McArthur

In: Robert A, Borsini F, editors. Animal and Translational Models for CNS Drug Discovery. San Diego: Academic Press. pp. 133–164.
- Google Scholar
1. Joel D
2. Avisar A
(2001) Excessive lever pressing following post-training signal attenuation in rats: a possible animal model of obsessive compulsive disorder?
Behavioural Brain Research 123:77–87.

https://doi.org/10.1016/S0166-4328(01)00201-7
- PubMed
- Google Scholar
1. Joel D
2. Doljansky J
(2003) Selective alleviation of compulsive lever-pressing in rats by D1, but not D2, blockade: possible implications for the involvement of D1 receptors in obsessive-compulsive disorder
Neuropsychopharmacology 28:77–85.

https://doi.org/10.1038/sj.npp.1300010
- PubMed
- Google Scholar
1. Klanker M
2. Sandberg T
3. Joosten R
4. Willuhn I
5. Feenstra M
6. Denys D
(2015) Phasic dopamine release induced by positive feedback predicts individual differences in reversal learning
Neurobiology of Learning and Memory 125:135–145.

https://doi.org/10.1016/j.nlm.2015.08.011
- PubMed
- Google Scholar
1. Korff S
2. Harvey BH
(2006) Animal models of obsessive-compulsive disorder: rationale to understanding psychobiology and pharmacology
Psychiatric Clinics of North America 29:371–390.

https://doi.org/10.1016/j.psc.2006.02.007
- PubMed
- Google Scholar
1. Lammel S
2. Hetzel A
3. Häckel O
4. Jones I
5. Liss B
6. Roeper J
(2008) Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system
Neuron 57:760–773.

https://doi.org/10.1016/j.neuron.2008.01.022
- PubMed
- Google Scholar
Book
1. Lashley KS
(1951)
The Problem of Serial Order in Behavior

Bobbs-Merrill.
- Google Scholar
(2007) Raclopride-induced motor consolidation impairment in primates: role of the dopamine type-2 receptor in movement chunking into integrated sequences
Experimental Brain Research 182:499–508.

https://doi.org/10.1007/s00221-007-1010-4
- PubMed
- Google Scholar
1. Lex A
2. Hauber W
(2008) Dopamine D1 and D2 receptors in the nucleus accumbens core and shell mediate Pavlovian-instrumental transfer
Learning & Memory 15:483–491.

https://doi.org/10.1101/lm.978708
- PubMed
- Google Scholar
1. Lex B
2. Hauber W
(2010a) The role of dopamine in the prelimbic cortex and the dorsomedial striatum in instrumental conditioning
Cerebral Cortex 20:873–883.

https://doi.org/10.1093/cercor/bhp151
- PubMed
- Google Scholar
1. Lex B
2. Hauber W
(2010b) The role of nucleus accumbens dopamine in outcome encoding in instrumental and pavlovian conditioning
Neurobiology of Learning and Memory 93:283–290.

https://doi.org/10.1016/j.nlm.2009.11.002
- PubMed
- Google Scholar
(2017) Basolateral amygdala to orbitofrontal cortex projections enable Cue-Triggered reward expectations
The Journal of Neuroscience 37:8374–8384.

https://doi.org/10.1523/JNEUROSCI.0486-17.2017
- PubMed
- Google Scholar
(2014) Designer receptors show role for ventral pallidum input to ventral tegmental area in cocaine seeking
Nature Neuroscience 17:577–585.

https://doi.org/10.1038/nn.3664
- PubMed
- Google Scholar
1. Mahler SV
2. Brodnik ZD
3. Cox BM
4. Buchta WC
5. Bentzley BS
6. Quintanilla J
7. Cope ZA
8. Lin EC
9. Riedy MD
10. Scofield MD
11. Messinger J
12. Ruiz CM
13. Riegel AC
14. España RA
15. Aston-Jones G
(2019) Chemogenetic manipulations of ventral tegmental area dopamine neurons reveal multifaceted roles in cocaine abuse
The Journal of Neuroscience 39:503–518.

https://doi.org/10.1523/JNEUROSCI.0537-18.2018
- PubMed
- Google Scholar
1. Marshall AT
2. Ostlund SB
(2018) Repeated cocaine exposure dysregulates cognitive control over cue-evoked reward-seeking behavior during Pavlovian-to-instrumental transfer
Learning & Memory 25:399–409.

https://doi.org/10.1101/lm.047621.118
- PubMed
- Google Scholar
(2017) A corticostriatal deficit promotes temporal distortion of automatic action in ageing
eLife 6:e29908.

https://doi.org/10.7554/eLife.29908
- PubMed
- Google Scholar
1. Nelson AJ
2. Killcross S
(2013) Accelerated habit formation following amphetamine exposure is reversed by D1, but enhanced by D2, receptor antagonists
Frontiers in Neuroscience 7:76.

https://doi.org/10.3389/fnins.2013.00076
- PubMed
- Google Scholar
1. Nicola SM
(2010) The flexible approach hypothesis: unification of effort and cue-responding hypotheses for the role of nucleus accumbens dopamine in the activation of reward-seeking behavior
Journal of Neuroscience 30:16585–16600.

https://doi.org/10.1523/JNEUROSCI.3958-10.2010
- PubMed
- Google Scholar
1. Niv Y
2. Daw ND
3. Joel D
4. Dayan P
(2007) Tonic dopamine: opportunity costs and the control of response vigor
Psychopharmacology 191:507–520.

https://doi.org/10.1007/s00213-006-0502-4
- PubMed
- Google Scholar
(2012) Relative response cost determines the sensitivity of instrumental reward seeking to dopamine receptor blockade
Neuropsychopharmacology 37:2653–2660.

https://doi.org/10.1038/npp.2012.129
- PubMed
- Google Scholar
(2014) Phasic mesolimbic dopamine signaling encodes the facilitation of incentive motivation produced by repeated cocaine exposure
Neuropsychopharmacology 39:2441–2449.

https://doi.org/10.1038/npp.2014.96
- PubMed
- Google Scholar
1. Ostlund SB
2. Maidment NT
(2012) Dopamine receptor blockade attenuates the general incentive motivational effects of noncontingently delivered rewards and reward-paired cues without affecting their ability to bias action selection
Neuropsychopharmacology 37:508–519.

https://doi.org/10.1038/npp.2011.217
- PubMed
- Google Scholar
1. Peciña S
2. Berridge KC
(2013) Dopamine or opioid stimulation of nucleus accumbens similarly amplify cue-triggered 'wanting' for reward: entire core and medial shell mapped as substrates for PIT enhancement
European Journal of Neuroscience 37:1529–1540.

https://doi.org/10.1111/ejn.12174
- PubMed
- Google Scholar
Book
1. Pinheiro J
2. Bates D
(2000)
Mixed-Effects Models in S and S-Plus

New York: Springer.
- Google Scholar
1. Rescorla RA
(1964) Relation of Bar-Presses to magazine approaches
Psychological Reports 14:943–948.

https://doi.org/10.2466/pr0.1964.14.3.943
- Google Scholar
(2014) On the motivational properties of reward cues: individual differences
Neuropharmacology 76:450–459.

https://doi.org/10.1016/j.neuropharm.2013.05.040
- PubMed
- Google Scholar
(2019) An integrated model of action selection: distinct modes of cortical control of striatal decision making
Annual Review of Psychology 70:53–76.

https://doi.org/10.1146/annurev-psych-010418-102824
- PubMed
- Google Scholar
1. Smith KS
2. Graybiel AM
(2016)
Habit formation

Dialogues in Clinical Neuroscience 18:33.
- PubMed
- Google Scholar
(2014) Chemogenetic synaptic silencing of neural circuits localizes a hypothalamus→midbrain pathway for feeding behavior
Neuron 82:797–808.

https://doi.org/10.1016/j.neuron.2014.04.008
- PubMed
- Google Scholar
Book
1. Stephens DW
2. Krebs JR
(1986)
Foraging Theory

Princeton University Press.
- Google Scholar
1. Thrailkill EA
2. Bouton ME
(2017) Effects of outcome devaluation on instrumental behaviors in a discriminated heterogeneous chain
Journal of Experimental Psychology: Animal Learning and Cognition 43:88–95.

https://doi.org/10.1037/xan0000119
- Google Scholar
1. Tiffany ST
(1990) A cognitive model of drug urges and drug-use behavior: role of automatic and nonautomatic processes
Psychological Review 97:147–168.

https://doi.org/10.1037/0033-295X.97.2.147
- PubMed
- Google Scholar
1. Volkow ND
2. Wang GJ
3. Tomasi D
4. Baler RD
(2013) Unbalanced neuronal circuits in addiction
Current Opinion in Neurobiology 23:639–648.

https://doi.org/10.1016/j.conb.2013.01.002
- PubMed
- Google Scholar
(2011) Differential dependence of pavlovian incentive motivation and instrumental incentive learning processes on dopamine signaling
Learning & Memory 18:475–483.

https://doi.org/10.1101/lm.2229311
- PubMed
- Google Scholar
(2012) Phasic mesolimbic dopamine signaling precedes and predicts performance of a self-initiated action sequence task
Biological Psychiatry 71:846–854.

https://doi.org/10.1016/j.biopsych.2011.12.019
- PubMed
- Google Scholar
(2013) Phasic mesolimbic dopamine release tracks reward seeking during expression of Pavlovian-to-instrumental transfer
Biological Psychiatry 73:747–755.

https://doi.org/10.1016/j.biopsych.2012.12.005
- PubMed
- Google Scholar
(2019) Dopamine tunes prefrontal outputs to orchestrate aversive processing
Brain Research 1713:16–31.

https://doi.org/10.1016/j.brainres.2018.11.044
- PubMed
- Google Scholar
1. Westbrook A
2. Braver TS
(2016) Dopamine does double duty in motivating cognitive effort
Neuron 89:695–710.

https://doi.org/10.1016/j.neuron.2015.12.029
- PubMed
- Google Scholar
(2012) The effect of ratio and interval training on Pavlovian-instrumental transfer in mice
PLOS ONE 7:e48227.

https://doi.org/10.1371/journal.pone.0048227
- PubMed
- Google Scholar
1. Witten IB
2. Steinberg EE
3. Lee SY
4. Davidson TJ
5. Zalocusky KA
6. Brodsky M
7. Yizhar O
8. Cho SL
9. Gong S
10. Ramakrishnan C
11. Stuber GD
12. Tye KM
13. Janak PH
14. Deisseroth K
(2011) Recombinase-driver rat lines: tools, techniques, and optogenetic application to dopamine-mediated reinforcement
Neuron 72:721–733.

https://doi.org/10.1016/j.neuron.2011.10.028
- PubMed
- Google Scholar
1. Wyvell CL
2. Berridge KC
(2000) Intra-accumbens amphetamine increases the conditioned incentive salience of sucrose reward: enhancement of reward "wanting" without enhanced "liking" or response reinforcement
The Journal of Neuroscience 20:8122–8130.

https://doi.org/10.1523/JNEUROSCI.20-21-08122.2000
- PubMed
- Google Scholar

Article and author information

Author details

Briac Halbout
1. Department of Anesthesiology and Perioperative Care, University of California, Irvine, Irvine, United States
2. Irvine Center for Addiction Neuroscience, University of California, Irvine, Irvine, United States
Contribution
Conceptualization, Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing

For correspondence
halboutb@uci.edu

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-6128-2601
Andrew T Marshall
1. Department of Anesthesiology and Perioperative Care, University of California, Irvine, Irvine, United States
2. Irvine Center for Addiction Neuroscience, University of California, Irvine, Irvine, United States
Contribution
Conceptualization, Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-0068-8138
Ali Azimi
1. Department of Anesthesiology and Perioperative Care, University of California, Irvine, Irvine, United States
2. Irvine Center for Addiction Neuroscience, University of California, Irvine, Irvine, United States
Contribution
Data curation, Investigation

Competing interests
No competing interests declared
Mimi Liljeholm

Department of Cognitive Sciences, University of California, Irvine, Irvine, United States

Contribution
Conceptualization, Methodology, Writing—review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0001-9066-6989
Stephen V Mahler
1. Irvine Center for Addiction Neuroscience, University of California, Irvine, Irvine, United States
2. Department of Neurobiology and Behavior, University of California, Irvine, Irvine, United States
Contribution
Conceptualization, Validation, Investigation, Methodology, Writing—review and editing

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0002-8698-0905
Kate M Wassum
1. Department of Psychology, University of California, Los Angeles, Los Angeles, United States
2. Brain Research Institute, University of California, Los Angeles, Los Angeles, United States
Contribution
Conceptualization, Supervision, Funding acquisition, Writing—review and editing

Competing interests
Reviewing editor, eLife
Sean B Ostlund
1. Department of Anesthesiology and Perioperative Care, University of California, Irvine, Irvine, United States
2. Irvine Center for Addiction Neuroscience, University of California, Irvine, Irvine, United States
Contribution
Conceptualization, Data curation, Formal analysis, Supervision, Funding acquisition, Investigation, Methodology, Writing—original draft, Project administration, Writing—review and editing

For correspondence
sostlund@uci.edu

Competing interests
No competing interests declared

"This ORCID iD identifies the author of this article:" 0000-0003-1635-3911

Funding

National Institute on Drug Abuse

Stephen V Mahler

National Institute of Mental Health (106972)

Kate M Wassum
Sean B Ostlund

National Institute of Diabetes and Digestive and Kidney Diseases (098709)

Sean B Ostlund

National Institute on Drug Abuse (029035)

Sean B Ostlund

National Institute on Aging (045380)

Sean B Ostlund

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The authors acknowledge the assistance of Christy N Munson in the acquisition of behavioral data.

Ethics

Animal experimentation: All experimental procedures that involved rats were approved by the UC Irvine Institutional Animal Care and Use Committee (protocol AUP-17-68) and were in accordance with the National Research Council Guide for the Care and Use of Laboratory Animals.

Copyright

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.