1. Neuroscience
Download icon

Preconditioned cues have no value

  1. Melissa J Sharpe  Is a corresponding author
  2. Hannah M Batchelor
  3. Geoffrey Schoenbaum  Is a corresponding author
  1. NIDA Intramural Research Program, United States
  2. Princeton University, United States
  3. University of New South Wales, Australia
  4. University of Maryland School of Medicine, United States
  5. The Johns Hopkins University, United States
Research Advance
  • Cited 9
  • Views 1,588
  • Annotations
Cite this article as: eLife 2017;6:e28362 doi: 10.7554/eLife.28362

Abstract

Sensory preconditioning has been used to implicate midbrain dopamine in model-based learning, contradicting the view that dopamine transients reflect model-free value. However, it has been suggested that model-free value might accrue directly to the preconditioned cue through mediated learning. Here, building on previous work (Sadacca et al., 2016), we address this question by testing whether a preconditioned cue will support conditioned reinforcement in rats. We found that while both directly conditioned and second-order conditioned cues supported robust conditioned reinforcement, a preconditioned cue did not. These data show that the preconditioned cue in our procedure does not directly accrue model-free value and further suggest that the cue may not necessarily access value even indirectly in a model-based manner. If so, then phasic response of dopamine neurons to cues in this setting cannot be described as signaling errors in predicting value.

https://doi.org/10.7554/eLife.28362.001

Introduction

Behaviour is often divided into two broad categories. One, termed goal-directed or model-based, utilizes an associative map of the task at hand, which can be navigated to anticipate likely outcomes and their desirability. Maps acquired separately can be linked and the value of outcomes updated on-the-fly to allow flexible responding. The other, contrasting category of behaviour, termed model-free or habitual, reflects simpler associations linking cues to the responses that have been reinforced in their presence. Behaviours in both categories are typically described as reflecting value, however in the former category, the value is inferred and reflects value stored downstream, whereas in the latter, the value is directly attached or ‘cached’ in the antecedent cue.

Our lab has recently used sensory preconditioning to identify neural systems critical for model-based behaviour (Jones et al., 2012; Wied et al., 2013; Sadacca et al., 2016; Sharpe et al., 2017). These data include the demonstration that midbrain dopamine neurons exhibit error-like activity to preconditioned cues. Our use of this task is based on the belief that the design is particularly effective in isolating model-based behaviour from behaviour reflecting model-free value. In sensory preconditioning, two neutral cues are paired together in close succession such that a relationship can form between them (e.g. A→B). While there are no observable changes to behaviour during this phase, the existence of this association can be revealed if cue B is paired with reward, which causes subjects to start responding to A as if they expect reward to be delivered. Indeed, responding to cue A is sensitive to the current desire for the food reward at the time of the probe test (Blundell et al., 2003). From data such as these, it is thought that subjects respond to the preconditioned cue either because A evokes a representation of B and B leads to thoughts of reward during the test phase, or because B evokes a representation of A during conditioning that allows A to become directly associated with reward (Jones et al., 2012; Wimmer and Shohamy, 2012; Gershman, 2017). Thus, sensory preconditioning seems to be an iconic example of a model-based behaviour.

However, while it is clear that sensory preconditioning utilizes model-based associations, this procedure may also permit the preconditioned cue to directly accrue value. Specifically, if presentation of cue B were to evoke a representation of cue A during conditioning, then the value of the food might become directly associated with A (Wimmer and Shohamy, 2012; Doll and Daw, 2016). Importantly, this question is not resolved by the effect of food devaluation on responding to the preconditioned cue, since the cue could maintain any such model-free value subsequently, independent of the new value of the food as the association between cue A and the devalued food has not been directly experienced. If this were occurring in our procedure, it would introduce difficulties in its use to strictly isolate model-based neural processing. For example, the ability of a preconditioned cue to evoke phasic activity in a dopamine neuron could be easily explained by existing proposals that dopaminergic transients reflect errors in predicting model-free value (Schultz et al., 1997).

Here we directly addressed this question by assessing the ability of a preconditioned cue trained in our task to support conditioned reinforcement. For comparison, we also assessed conditioned reinforcement supported by cues trained to predict reward directly or through second-order conditioning. Conditioned reinforcement – or the ability of a cue to support acquisition of an instrumental response in the absence of any reward - is generally conceptualised as a test of cue value. Notably, subjects will work for a cue predicting food even if the food reward has been devalued (Parkinson et al., 2005), indicating that model-free value is normally sufficient to support conditioned reinforcement. Accordingly, we found that both directly conditioned and second-order cues would support conditioned reinforcement. However, a preconditioned cue would not. These data show that, at least for our procedure in rats, the preconditioned cue does not acquire model-free value during training. Further they suggest that the cue also does not automatically or by default access value cached in events downstream in a model-based manner, such as through the other cue or the sensory properties of the reward.

Results

Preconditioned cues do not support conditioned reinforcement

Preconditioning

Rats were first presented with the neutral cues (A→B; C→D) in close succession 12 times each to promote the development of a relationship between them. As expected, since training did not involve presentation of reward, the rats spent little time in the magazine during this phase, and there were no differences between cues (Figure 1A). ANOVA revealed no main effect of cue (F(3,63)=2.12, p>0.05).

Preconditioned cues do not support conditioned reinforcement.

Rates of responding are represented as percent time spent in the magazine during cue presentation (Figures A, B, and D) or number of lever presses (±SEM). Graphs show preconditioning (A), conditioning (B), conditioned reinforcement (C), and Pavlovian probe tests (D).

https://doi.org/10.7554/eLife.28362.002

Conditioning

Following preconditioning, rats underwent conditioning for 4 days. Each day, rats received 12 presentations of cue B followed by the delivery of two sucrose pellets (B→2US) and 12 presentation of cue D without reward (D→ no US). As training progressed, all rats acquired a conditioned response to cue B as indexed by a greater time spent in the magazine during presentation of this cue as training progressed (Figure 1B). A two-factor ANOVA (cue ×day) showed main effects of cue (F(1,21)=87.47, p<0.05) and day (F(3,63)=4.45, p<0.05) and an interaction between these factors (F(3,63)=21.42, p<0.05).

Conditioned reinforcement tests

Following Pavlovian training, we next gave rats two conditioned reinforcement sessions. In the first test, pressing one lever led to a 2 s presentation of cue A (R1→A), and pressing the other lever led to a 2 s presentation of cue C (R2→C). Here, we found that rats made a small number of lever presses on each lever and did not show any difference in the number of lever presses made for presentation of either cue (Figure 1C; left).

To ensure that we could obtain conditioned reinforcement in this cohort of rats, we gave rats another conditioned reinforcement test. In this test, one lever press led to a 2 s presentation of cue B (R1→ B) and the other lever press led to a 2 s presentation of cue D (R2→D). In contrast to the first conditioned reinforcement test, during this session rats showed a higher rate of lever pressing on the lever which produced the reward-paired cue B and a low level of lever presses for non-rewarded cue D (Figure 1C; right).

The difference in the pattern of results seen across the first and second session of the conditioned reinforcement tests was confirmed with statistical analyses. A two-factor ANOVA [cue type (preconditioned vs. conditioned)×reinforcement (rewarded or non-rewarded)] showed no effects of cue type (AC vs BD; F(1,21)=0.82, p>0.05) or reinforcement (AB vs. CD; F(1,21)=1.44, p>0.05), however there was a significant interaction between these factors (F(1,21)=10.92, p<0.05). Simple-main effects analyses showed that the source of this interaction was due to a significant elevation in lever pressing for B that was not observed for the other cues (vs A: F(1,21)=7.64, p<0.05; vs D: F(1,21)=7.38, p<0.05; C vs. D: F(1,21)=3.08, p>0.05; A vs. C: F < 1). Thus, preconditioned cues did not support conditioned reinforcement in the same rats that readily showed conditioned reinforcement for the cue directly paired with reward.

Pavlovian probe tests

It is plausible that the reason we failed to see effective conditioned reinforcement with the preconditioned cue A was because rats failed to learn the relationship between A and B. In this case, they would be failing to press the lever because they were failing to generate the normal expectation, after conditioning, that A might lead to reward. In order to test this hypothesis, we next gave rats two Pavlovian probe tests to assess learning. In the first session, we gave rats unrewarded presentations of A and C; in the second session, we gave rats unrewarded presentations of B and D. We found that rats made more entries into the food port during presentation of either cue A or B, demonstrating effective conditioning and sensory preconditioning (Figure 1D). A two-factor ANOVA [cue type (preconditioned vs. conditioned)×reinforcement (rewarded or non-rewarded)] revealed a main effect of reinforcement (AB vs CD; F(1,21)=15.11, p<0.05). There was also a main effect of cue type (AC vs BD; F(1,21)=9.39, p<0.05), likely reflecting that the A vs. C extinction tests were given prior to the B vs D tests since the A vs. C test is the critical comparison. Importantly, however, there was no interaction with cue type (F < 1). Thus, rats spent a greater amount of time in the food port during presentation of cues A and B relative to cues C and D, and there was no difference in the magnitude of this difference. In order to full rule out any possibility that the lack of conditioned reinforcement observed to the preconditioned cue A was due to a failure of sensory preconditioning, we also separately tested the difference between A and C. This analysis revealed a significant difference between responding to A and C (F(1,21)=5.35, p<0.05).

Second-order conditioned cues do support conditioned reinforcement

Our first experiment showed that a preconditioned cue is insufficient for conditioned reinforcement, whereas a cue directly paired with a valuable reward was sufficient. To confirm that this effect was not simply the result of the introduction of an additional cue between the preconditioned cue and the reward, we conducted a second experiment in which we tested the ability of a second-order conditioned cue to support conditioned reinforcement. Importantly, the second-order cue is trained exactly like the preconditioned cue except that the pairing of the neutral cues (A→B; C→D) occurs after rather than before training with reward (B→2US; D→ no US).

Conditioning

Conditioning lasted for 4 days. Each day, the rats received 12 presentations of cue B followed by delivery of two sucrose pellets and 12 unrewarded presentations of cue D. As training progressed, all rats acquired a conditioned response to cue B (Figure 2A). A two-factor ANOVA (cue ×day) revealed a main effect of cue (F(1,14)=37.13, p<0.05), a main effect of day (F(1,14)=6.32, p<0.05), and a significant interaction between these factors (F(1,14)=8.47, p<0.05).

Second-order conditioned cues do support conditioned reinforcement.

Rates of responding are represented as percent time spent in the magazine during cue presentation (Figures A, B, and D) or number of lever presses (±SEM). Graphs show preconditioning (A), conditioning (B), conditioned reinforcement (C), and Pavlovian probe tests (D).

https://doi.org/10.7554/eLife.28362.003

Second-order conditioning

Following conditioning, rats were presented with the neutral cues (A→B; C→D) in close succession 12 times each to promote the development of a relationship between them. Rats spent more time in the magazine during cues A and B relative to cues C and D (Figure 2B). This was confirmed with statistical analyses. A two-factor ANOVA [cue type (second-order conditioned vs. conditioned)×reinforcement (rewarded or non-rewarded)] a main effect of reinforcement (AB vs CD; F(1,14)=17.13, p<0.05), but no interaction (F(1,14)=2.19, p>0.05) nor main effect of cue type (AC vs BD; F(1,14)=4.04, p>0.05). Thus, rats spent a greater amount of time in the food port during presentation of cues A and B relative to cues C and D, and there was no difference in the magnitude of this difference.

Conditioned reinforcement tests

Following second-order conditioning, we again gave rats two conditioned reinforcement tests. In the first, rats could press either lever for a 2 s presentation of cue A or C (R1→A; R2→C). In the second, rats could press these levers for either a 2 s presentation of cue B or D (R1→ B; R2→D). In both tests, we found that rats would press the lever more for the cue paired either directly or indirectly with reward (i.e. A and B relative to C and D; Figure 2D). A two-factor ANOVA [cue type (second-order conditioned vs. conditioned)×reinforcement (rewarded or non-rewarded)] showed a significant main effect of reinforcement (AB vs.CD; F(1,14)=5.07, p<0.05), but no main effect nor any interaction with cue type (AC vs BD; F < 1). Thus A and B both supported conditioned reinforcement and did so to a similar degree.

Pavlovian probe tests

Following the conditioned reinforcement tests, we gave rats two probe test to assess the ability of the cues A and B to promote entry into the food port. In the first, we gave rats unrewarded presentations of cue A and C. In the second, we gave rats unrewarded presentations of cue B and D. Rats spent a larger proportion of time in the magazine during presentation of cues A and B relative to cues C and D, confirming the second-order conditioning effect. A two-factor ANOVA [cue type (preconditioned vs. conditioned)×reinforcement (rewarded or non-rewarded)] revealed a main effect of reinforcement (AB vs CD; F(1,14)=14.07, p<0.05) but no main effect nor any interaction with cue type (AC vs BD; F < 1). Thus, rats spent a greater amount of time in the food port during presentation of cues A and B relative to cues C and D and there was no difference in the magnitude of this difference.

Discussion

Here we have shown that preconditioned cues do not support conditioned reinforcement. Rats showed no evidence of increased lever pressing for the cue trained to predict a cue that was later paired with reward. This was true despite strong responding at the food cup for the preconditioned cue in a subsequent probe test and robust conditioned reinforcement for the cue paired directly with food in the same rats. Further, in a second experiment, we also showed that a second-order cue supports conditioned reinforcement. Critically, our second-order conditioning procedures were identical to those used for sensory preconditioning, except for the order of training in second-order conditioning, which allowed the initial cue in the series to be paired with something of value at the time of conditioning.

In interpreting these data, it is important to emphasize that conditioned reinforcement is normally insensitive to devaluation of the food reward (Parkinson et al., 2005; Burke et al., 2007; Burke et al., 2008). In other words, if the food reward is devalued by pairing it with illness prior to conditioned reinforcement training, a cue that was previously paired with that reward will still support acquisition of lever pressing. Thus, value cached in the cue is normally sufficient to support the behaviour. Given this, our failure to detect any evidence of conditioned reinforcement for a preconditioned cue is strong evidence that a preconditioned cue does not accrue model-free value in this task.

This result has important implications for recent work using this task to investigate the neural circuits involved in model-based learning and behaviour (Sadacca et al., 2016; Sharpe et al., 2017). For example, we have recently shown that dopamine neurons exhibit phasic responses to both directly- and pre-conditioned cues (Sadacca et al., 2016). We interpreted this result as showing that model-based information is reflected in dopaminergic error-signals, based on the presumption that the behaviour directed at the preconditioned cue is due to inference or model-based processing. This conclusion would be contrary to current proposals that these signals only reflect model-free value (Sutton and Barto, 1981; Schultz et al., 1997; Schultz, 1998; Waelti et al., 2001; Schultz, 2002; Cohen et al., 2012). However, it was proposed that the firing of the dopamine neurons to the preconditioned cue could reflect value that accrues to the cue via mediated learning in the conditioning phase or some other form of post-training rehearsal (Doll and Daw, 2016). The current results are inconsistent with this alternative interpretation. In particular, while our data do not rule out mediated learning as an underlying mechanism, they suggest that if responding to the preconditioned cue in our task is supported by mediated learning, as has been suggested in other designs and species (Wimmer and Shohamy, 2012), then that process does not cause the preconditioned cue to accrue model-free value.

Our data also raise questions as to whether preconditioned cues access, at least automatically or by default, any sort of stored value. As noted earlier, one way to think about responding to the preconditioned cue is as reflecting an inferred or model-based value. This is a value stored in downstream events and accessed through the associative model of the task acquired during prior training (Jones et al., 2012; Wimmer and Shohamy, 2012; Gershman, 2017). That is, in the probe test, the preconditioned cue evokes a representation of the sensory properties of the food reward, either directly or indirectly, and thereby activates the current value of the food. This view is consistent with the effects of devaluation, which normally eliminates responding to the food cup upon presentation of the preconditioned cue (Blundell et al., 2003). Yet if the preconditioned cue accesses the value stored in the food in this model-based manner, then one might have expected this cue to support conditioned reinforcement. This would make intuitive sense and is consistent with evidence that model-based value can support conditioned reinforcement (Burke et al., 2007; Burke et al., 2008). The failure of the preconditioned cue to support conditioned reinforcement suggests that it does not have automatic access to the value stored in the food, perhaps because it is never directly paired with anything that has value at the time. While speculative, this conclusion would have profound implications for interpreting the firing of dopamine neurons in this setting and perhaps in other tasks, where they exhibit phasic responses that are not obviously value based (Horvitz, 2000; Tobler et al., 2003; Bromberg-Martin and Hikosaka, 2009; Sadacca et al., 2016; Takahashi et al., 2017). These transient responses may signal the sensory, state, or informational error inherent in these designs, rather than anything related to a representation of value, model-based or otherwise.

Materials and methods

Subjects

Thirty-seven experimentally-naïve male Long-Evans rats (NIDA breeding program) were used in these experiments. Rats were maintained on a 12 hr light-dark cycle, where all behavioural experiments were conducted during the light cycle. Prior to behavioural testing, rats were placed on food restriction and maintained on ~85% of their free-feeding body weight. All experimental procedures were conducted in accordance with Institutional Animal Care and Use Committee of the US National Institute of Health guidelines.

Apparatus, cues, and general procedures

Request a detailed protocol

Training was conducted in eight standard behavioural chambers (Coulbourn Instruments; Allentown, PA) individually housed in light- and sound-attenuating chambers. Each chamber was equipped with a pellet dispenser that delivered one 45 mg pellet into a recessed magazine when activated. Access to, and duration spent in, the magazine was detected by means of infrared detectors mounted across the mouth of the recess. The chambers contained an auditory stimulus generator, which delivered the tone and siren stimulus through a common speaker on the top right-hand side of the front chamber wall when activated. A second speaker on the back wall of the chamber, connected to another auditory stimulus generator, delivered the white noise stimulus. Finally, a heavy-duty relay delivering a 5 kHz clicker stimulus was located on the top left-hand side of the front chamber wall. During conditioned reinforcement tests, two levers were placed in the behavioural chamber, on the left or right side of the front wall, and the magazine and pellet dispenser were removed. A computer equipped with Coulbourn Instruments software (Allentown, PA) controlled the equipment and recorded the responses. Cues A and C were either a white noise or clicker, and cues B and D were either a tone or siren (counterbalanced across rats). During Pavlovian training, stimuli were 10 s in length, and the order of trials was randomly intermixed and counterbalanced, with inter-trial intervals (ITI) averaging 6 min. During conditioned reinforcement testing, lever pressing produced 2 s of the relevant cue. Prior to training, all rats were shaped to enter the magazine to retrieve reward (two 45 mg sucrose pellets; 5TUT, Test Diet, MO), receiving 30 pellets in the magazine across a one hour period. Subsequently, rats received 2 sessions of training each day, one in the morning and one in the afternoon.

Sensory preconditioning

Request a detailed protocol

Rats began with 2 sessions of compound cue training. In each session, rats received 6 presentations of serial compounds A→ B and C→ D, where cues A or C were immediately followed by presentation of cue B or D. Subsequently, rats underwent conditioning where cue B was followed by presentation of sucrose pellets while D was presented without reward. Rats received a total of 8 conditioning sessions with each consisting of six reinforced presentations of B and six non-reinforced presentation of D.

Second-order conditioning

Request a detailed protocol

Rats began with 8 sessions of conditioning. In each session, rats received six reinforced presentations of B and six non-reinforced presentation of D. Subsequently, rats underwent 2 sessions of compound cue training, consisting of 6 presentations of serial compounds A→ B and C→ D, where cues A or C were immediately followed by presentation of cue B or D.

Conditioned reinforcement and pavlovian probe tests

Request a detailed protocol

Following Pavlovian training, rats received two conditioned reinforcement tests each lasting 30 min. For these tests, levers were inserted in the chamber and the food magazine was removed (Burke et al., 2008). In the first test session, pressing one lever resulted in immediate 2 s presentation of cue A, while pressing the other lever resulted in a 2 s presentation of cue C (counterbalanced). In the second, the lever presses resulted in an immediate 2 s presentation of either cue B or D. To ensure that all animals learnt the associations promoted by sensory preconditioning, we also conducted two probe tests following conditioned reinforcement. In these tests, the levers were removed, and the food magazine was put back into the chamber. In the first probe test, rats received 6 presentation of cue A and C and magazine entries were measured. In the second, rats received six presentations each of cue B and D. No reward was presented during either the conditioned reinforcement or probe tests.

Statistical analyses

Request a detailed protocol

Conditioned responding was measured as the fraction of time that the rats spent in the food magazine during cue presentation. This was restricted to the last five seconds when cues led to reward or a reward-paired cue, reflecting the normal escalation of responding towards the end of the cue when the reward is more likely to be delivered (i.e. inhibition of delay). Analyses on data from the final Pavlovian probe tests were conducted on the first two trials of each cue in the test session. Conditioned reinforcement was measured as the sum of the lever presses made across the full 30 min of each test session. All statistics were conducted using SPSS 24 IBM statistics package (Sharpe and Killcross, 2014). Generally, analyses were conducted using a mixed-design repeated-measures analysis of variance (ANOVA). All analyses of simple main effects were planned and orthogonal and therefore did not necessitate controlling for multiple comparisons.

References

  1. 1
    Preserved sensitivity to outcome value after lesions of the basolateral amygdala
    1. P Blundell
    2. G Hall
    3. S Killcross
    (2003)
    Journal of Neuroscience 23:7702–7709.
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
    Predictive reward signal of dopamine neurons
    1. W Schultz
    (1998)
    Journal of neurophysiology 80:1–27.
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
    Coding of predicted reward omission by dopamine neurons in a conditioned inhibition paradigm
    1. PN Tobler
    2. A Dickinson
    3. W Schultz
    (2003)
    Journal of Neuroscience 23:10402–10410.
  20. 20
  21. 21
  22. 22

Decision letter

  1. Timothy E Behrens
    Reviewing Editor; University of Oxford,, United Kingdom

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Preconditioned cues have no value" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by Timothy Behrens as the Senior Editor and Reviewing Editor. The following individual involved in review of your submission has agreed to reveal their identity: Nathaniel D Daw (Reviewer #1).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

In Saccada et al., upon which this paper adds, the authors showed that VTA neurons signaled fired for cues that had never been reinforced but where value could be inferred via a model based mechanism. This suggested that the dopamine system can access model-based (or inferred) values. This is an important result but one potential caveat to this result that was raised by during an insight piece written in eLife is that there may be some mediated transfer of value onto these "sensory preconditioned" cues. That is inferential processes may happen offline (for example during sleep) transferring value from the cue that has experienced reward onto the associated cue that has not.

In this paper, the authors attempt to address this issue directly by showing that rats will not work for these sensory preconditioned cues but will work for conditioned ones or even for secondary conditioned ones. Combining this with their earlier results (Sadacca et al), they suggest that this rules out a mediated learning explanation of their sensory preconditioning paradigm, implies that sensory preconditioning depends essentially on model-based inferences (and thus that the dopamine activity seen to preconditioned cues in the previous experiment is associated with model-based rather than model-free learning). It is a very clean experiment and a very interesting result, particularly in combination with the previous study. It is an ideal exemplar of the research advance format. There are however some major concerns that need addressing (one is particularly important).

Essential revisions:

I have left the reviews intact below because I know you like them that way.

In discussions, myself and the two reviewers agreed that you need to address the statistical issue raised by both reviewers (most clearly expressed in Nathaniel's review). In all of our views, the story relies critically on a demonstration that value can be inferred on the cue A in the SPC paradigm and we do not agree that your statistics test for this. I think there is no getting around this point. You need to show that A vs. C is significant for magazine entry. Here is the point in question most clearly enunciated:

Copied from R1 below:

“The analysis in Figures 1D and 2D (the core sensory preconditioning and SOC probes) seems to be based on an ANOVA with a factor of reinforced (AB) vs. non (CD) and a factor of first (AC) vs. second (BD) stimulus. This means that the key probe test whether there is actually a preconditioning or SOC effect is based on there being a reinforcer effect AB > BC but no significant stimulus effect AC ~= BD.

I think this is not the right test for preconditioning or SOC. The simple t test A>C, comparing the stimulus to its appropriately matched control, seems like the obvious and appropriate test for these phenomena. Comparing AB vs. CD, as the positive part of the ANOVA does, inappropriately "credits" responding to B vs. D toward a putative preconditioning effect, and it is only by affirming the null hypothesis in comparing AC to BD that this analysis "rules out" the possibility that the AB>CD main effect is driven by B>D rather than A>C. But, of course, this is a fallacy.”

You will also see that there are various issues to do with the interpretation of the study. They pertain principally to whether conditioned reinforcement is a good test of model free value, why model-based value should not support conditioned reinforcement, and in particular why not, if we know that model-based value leads to dopaminergic activity which mediates conditioned reinforcement.

The queries are remarkably consistent between the two reviewers (and are shared by me), so should be taken seriously in the revised text.

Here are the reviews verbatim.

Reviewer #1:

This is a super interesting and clean study, which shows a dissociation between sensory preconditioning and conditioned inhibition in terms of whether the resulting CS associations support conditioned reinforcement. Taken (among other things) in light of the authors' previous eLife report, this supports a dissociation in the associative structures learned, e.g. that sensory preconditioning is based on stimulus-stimulus or model-based association, and deepens the mystery of the original study's finding of dopamine responses related to the sensory preconditioned cues.

A few suggestions:

- The analysis in Figures 1D and 2D (the core sensory preconditioning and SOC probes) seems to be based on an ANOVA with a factor of reinforced (AB) vs. non (CD) and a factor of first (AC) vs. second (BD) stimulus. This means that the key probe test whether there is actually a preconditioning or SOC effect is based on there being a reinforcer effect AB > BC but no significant stimulus effect AC ~= BD.

I think this is not the right test for preconditioning or SOC. The simple t test A>C, comparing the stimulus to its appropriately matched control, seems like the obvious and appropriate test for these phenomena. Comparing AB vs. CD, as the positive part of the ANOVA does, inappropriately "credits" responding to B vs. D toward a putative preconditioning effect, and it is only by affirming the null hypothesis in comparing AC to BD that this analysis "rules out" the possibility that the AB>CD main effect is driven by B>D rather than A>C. But of course this is a fallacy.

- Without taking anything away from the richness of the result, I feel like the interpretation is a little overly broad. First, I don't think it's appropriate or necessary to draw conclusions about sensory preconditioning in general. Shohamy and Wimmer (2012) (which the authors should absolutely cite as the real evidentiary source for the idea mentioned by Doll and Daw 2016) show good evidence for mediated conditioning driving sensory preconditioning in humans; there is also a literature attributing closely related acquired equivalence effects in rodents in these terms (e.g. Ward Robinson and Hall 1999) though without much direct evidence for the mechanism.

The important thing about the current results is that they show this dissociation from SOC using a common set of procedures, and that these are the same ones previously used to identify dopaminergic correlates. So what's really important is that this demonstrates dissociations between associative value measured several different ways (preconditioning, SOC, pavlovian CRs, dopamine). But given that there is evidence that in other circumstances, mediated conditioning does occur, and that these sorts of preconditioning effects are themselves variable as to whether they even occur at all (e.g. see cites in Ward Robinson paper), we clearly don't understand the factors that govern to what extent mediated conditioning vs. other factors contributes in different circumstances. I think the conclusions (including title, Abstract) should be qualified more.

- Relatedly, the basic framing and interpretation seems to rest on assumption that conditioned reinforcement works exclusively via transmitting model-free value, i.e. that conditioned reinforcement is an unambiguous test of (model-free) "value". Although this might be true, I don't see why this would be expected to be the case. In principle, a stimulus with model-based associations to reward could serve as the incentive for goal-directed behavior, and this might support lever pressing for much the same reason it supports food cup responding no? The Parkinson result doesn't seem to rule this out.

Moreover, it's not even clear that the authors wouldn't expect CRF to work for preconditioned cues, to the extent I understand their view on dopamine. I think the idea is that preconditioned cues can activate dopamine, and also that dopamine can reinforce lever pressing (this is the usual story of how model-free CRF works, supported by data from Everitt), so it's not clear why it doesn't have this effect here. I don't think this in any way cuts against the interest of the paper, but I do think the finding that preconditioning supports dopamine responding, food cup responding, but not conditioned reinforcement, doesn't have a particularly straightforward interpretation in terms of dopamine's relationship to model-based vs model-free learning and it's not entirely clear what the authors are getting at with the last sentence of the Abstract.

Reviewer #2:

In this paper, the authors suggest using rats that sensory preconditioned cues are not capable of supporting conditioned reinforcement (whereas, for instance, in an otherwise similar paradigm, secondary conditioned cues are). Combining this with their earlier results (Sadacca et al), they suggest that this rules out a mediated learning explanation of their sensory preconditioning paradigm, implies that sensory preconditioning depends essentially on model-based inferences (and thus that the dopamine activity seen to preconditioned cues in the previous experiment is associated with model-based rather than model-free learning). I think that the results are very interesting - and will be an important contributor to the literature.

I have some questions about the statistics; but am mostly rather puzzledat the interpretation. The trouble is that the key conclusions of thepaper lies on the interpretation of conditioned reinforcement – which isitself far from completely straightforward. In particular, the paperdoes not really explain either why model-based learning cannot/shouldnot support conditioned reinforcement, nor why the dopamine activitythat Sadacca et al. would predict would be inspired by the sensorypreconditioned cues, would not support operant responding here, whenoptogenetically stimulated dopamine responses (for instance)can. Further, the conclusion that no value is attributed to thepre-conditioned cues seems a bit beyond the paper – value is not definedsolely in terms of conditioned reinforcement.

- From Figure 1D, it looks superficially as if cue C (which has no reasonto inspire magazine entries) does so more than cue D (which also hasno reason to do so). Is this difference statistically significant byitself? If so, then perhaps something about the prediction that itsupports is important – and this could underpin part of the magazineresponding to A. The statistical test that is done (A&C vs. B&D) doesnot quite tell us the answer to that.

- Although it is an unfair between-subject comparison, it is notablethat the conditioned responding to cue A in Figure 2C does not looksignificantly different to that to cue A in Figure 1C. Of course, thekey comparison is with other cues – but it does make one wonder aboutthe relative strengths of the effects (the magazine entries in 2D arealso weaker than those in 1D).

- Why wasn't exactly the same paradigm used as in Sadacca et al. By notdoing that, generalization is obviously weakened.

- Did you score for sign tracking vs. goal tracking (are any of the cuessufficiently approachable to support sign tracking?). This is quiteimportant given results about differential dopaminergic effects insign vs. goal trackers.

https://doi.org/10.7554/eLife.28362.005

Author response

Essential revisions:

I have left the reviews intact below because I know you like them that way.

In discussions, myself and the two reviewers agreed that you need to address the statistical issue raised by both reviewers (most clearly expressed in Nathaniel's review). In all of our views, the story relies critically on a demonstration that value can be inferred on the cue A in the SPC paradigm and we do not agree that your statistics test for this. I think there is no getting around this point. You need to show that A vs. C is significant for magazine entry. Here is the point in question most clearly enunciated:

We appreciate the desire to see a significant difference between the A vs. C comparison in the SPC probe test, given that we failed to find an A vs. C difference in the conditioned reinforcement. An analysis on the original data yielded a difference only at p=0.053. It is worth noting that the probe test is conducted after the cues are presented a number of times without reward in the course of running the conditioned reinforcement, so some weakness is to be expected as these cues have extinguished somewhat prior to these tests. Nevertheless, to satisfy the reviewers on this point, we ran an extra set of rats using identical procedures to our original study, which allowed us to 1) replicate our finding that preconditioned cues do not promote conditioned reinforcement, and 2) pull out the significant difference between A and C in the probe tests after the conditioned reinforcement tests. We have now added these rats to the original data and updated the results and figures.

You will also see that there are various issues to do with the interpretation of the study. They pertain principally to whether conditioned reinforcement is a good test of model free value, why model-based value should not support conditioned reinforcement, and in particular why not, if we know that model-based value leads to dopaminergic activity which mediates conditioned reinforcement.

We have tried to respond below to the specific queries regarding this point. But we wanted to lay out our general thinking here. Basically, we think this is a terrific set of questions and thinking about it caused us to entirely rewrite the Introduction and Discussion to be clearer as to what we think our data mean. Our main conclusion has not changed, but we hope the new text makes the significance of these data much clearer.

In essence, we agree there is no perfect test of value. We chose to use conditioned reinforcement to ask whether the preconditioned cues have value, because of the general idea that this procedure assesses the subject’s willingness to work to obtain a cue, independent of what the cue predicts. Normally conditioned reinforcement supported by a first-order cue is insensitive to devaluation of the predicted reward (Parkinson, Roberts et al. 2005, Burke, Franz et al. 2007, Burke, Franz et al. 2008), thus a form of value that is presumably model-free or cached in the cue is sufficient to support conditioned reinforcement. That cached value is sufficient is very important – because of this we think that our failure to see any evidence of conditioned reinforcement for the preconditioned cue means, at a minimum, that it does not have such cached or model free value (in our procedure in rats).

This is the essential finding of the study. We think it is hard to argue with really. If preconditioned cues had cached value, then the rats would press the lever to get them. The rats clearly do not do this. This result by itself if meaningful because it rules out the simplest proposal that dopamine neurons fire to preconditioned cues in this task because they have accrued model-free value.

Less clear but perhaps more intriguing is what our failure to find conditioned reinforcement means for proposals that value can also be model-based or inferred. In saying this, we are referring to the idea that because A predicts B and B predicts food, when the rat is presented with A, it evokes a representation of the food and thereby triggers some “value” representation in a model-based manner – presumably the value cached in the sensory features of that food. The model-free value of the food. We believe that this sort of representation exists. Indeed it is how we have in the past thought of the behaviour of the rats in the probe test in preconditioning (Jones, Esber etal. 2012). Further, we think this sort of representation is also sufficient to support conditioned reinforcement, since we know that in some cases, instrumental responding for a cue directly paired with reward is sensitive to devaluation (Burke, Franz et al. 2007, Burke, Franz et al. 2008).

The resolution to this conundrum is not clear. We would speculate that the difference lies in the fact that a preconditioned cue is not the same as a cue that is directly paired with a reward (or with a reward predicting cue). Because it is never directly paired with anything that has any value of its own, it may trigger representations of the downstream entities (the second cue or the food reward) that are dissociable from any sort of cached or model-free value that those entities possess. Perhaps when the rat sees A it understands the associative relationship to the reward but does not, by default, access its associated value. If one accepts this line of reasoning, then the dopamine response to the preconditioned cue may not be about value at all – accessed either directly or indirectly – rather it is about the unexpected information provided by the cue. This is obviously consistent with proposals by Ethan Bromberg-Martin and our own in press data showing that dopamine neurons signal prediction errors for valueless changes in sensory information (Bromberg-Martin and Hikosaka 2009, Takahashi, Batchelor et al. 2017).

We would hasten to add that our explanation for why we don’t see conditioned reinforcement for the putative model-based value of the preconditioned cue is highly speculative. So, we bring this up only at the end of the Discussion, and we have tried to separate it from the more straightforward point that the cue cannot have any intrinsic cached or model-based value, and what this means for our prior results.

The queries are remarkably consistent between the two reviewers (and are shared by me), so should be taken seriously in the revised text.

Here are the reviews verbatim.

Reviewer #1:

[…] A few suggestions:

- The analysis in Figures 1D and 2D (the core sensory preconditioning and SOC probes) seems to be based on an ANOVA with a factor of reinforced (AB) vs. non (CD) and a factor of first (AC) vs. second (BD) stimulus. This means that the key probe test whether there is actually a preconditioning or SOC effect is based on there being a reinforcer effect AB > BC but no significant stimulus effect AC ~= BD.

I think this is not the right test for preconditioning or SOC. The simple t test A>C, comparing the stimulus to its appropriately matched control, seems like the obvious and appropriate test for these phenomena. Comparing AB vs. CD, as the positive part of the ANOVA does, inappropriately "credits" responding to B vs. D toward a putative preconditioning effect, and it is only by affirming the null hypothesis in comparing AC to BD that this analysis "rules out" the possibility that the AB>CD main effect is driven by B>D rather than A>C. But of course this is a fallacy.

As described in our general response, we appreciate the reviewer’s desire to see a direct comparison of Pavlovian responding to the preconditioned cues, given our failure to see conditioned reinforcement for those cues. Unfortunately, in our original experiment, this difference did not quite reach significance in isolation (p = 0.053). To address this, we have now run an additional cohort of rats. These rats behaved similarly to the prior group – they failed to show conditioned reinforcement for A while showing a subsequent difference in Pavlovian responding to A versus C. When combined with the prior subjects, the difference reached significance. In the revised manuscript, we have combined them in the results and figure.

- Without taking anything away from the richness of the result, I feel like the interpretation is a little overly broad. First, I don't think it's appropriate or necessary to draw conclusions about sensory preconditioning in general. Shohamy and Wimmer (2012) (which the authors should absolutely cite as the real evidentiary source for the idea mentioned by Doll and Daw 2016) show good evidence for mediated conditioning driving sensory preconditioning in humans; there is also a literature attributing closely related acquired equivalence effects in rodents in these terms (e.g. Ward Robinson and Hall 1999) though without much direct evidence for the mechanism.

The important thing about the current results is that they show this dissociation from SOC using a common set of procedures, and that these are the same ones previously used to identify dopaminergic correlates. So what's really important is that this demonstrates dissociations between associative value measured several different ways (preconditioning, SOC, pavlovian CRs, dopamine). But given that there is evidence that in other circumstances, mediated conditioning does occur, and that these sorts of preconditioning effects are themselves variable as to whether they even occur at all (e.g. see cites in Ward Robinson paper), we clearly don't understand the factors that govern to what extent mediated conditioning vs. other factors contributes in different circumstances. I think the conclusions (including title, Abstract) should be qualified more.

We apologize if our conclusions were too broad, and we have tried in our revision to emphasise that our results are most relevant for our procedure. Most importantly, we did not mean to imply that our data rule out a role for mediated conditioning in sensory preconditioning. Rather, they suggest that if mediated conditioning does occur, as for example suggested by Wimmer and Shohamy’s data (Wimmer and Shohamy 2012), it likely does not allow the preconditioned cue to access or accrue model-‐‑free value (again at least in our procedure in rats). Instead the preconditioned cues may be activated during conditioning to acquire the ability to activate representations of either the directly conditioned cue and/or the sensory specific properties of the food reward, independent of any value stored in them. This would qualify as mediated learning and yet still not support conditioned reinforcement. Or, alternatively, our results may not apply to other settings. We have added text to the Discussion to address these points.

- Relatedly, the basic framing and interpretation seems to rest on assumption that conditioned reinforcement works exclusively via transmitting model-free value, i.e. that conditioned reinforcement is an unambiguous test of (model-free) "value". Although this might be true, I don't see why this would be expected to be the case. In principle, a stimulus with model-based associations to reward could serve as the incentive for goal-directed behavior, and this might support lever pressing for much the same reason it supports food cup responding no? The Parkinson result doesn't seem to rule this out.

Moreover, it's not even clear that the authors wouldn't expect CRF to work for preconditioned cues, to the extent I understand their view on dopamine. I think the idea is that preconditioned cues can activate dopamine, and also that dopamine can reinforce lever pressing (this is the usual story of how model-free CRF works, supported by data from Everitt), so it's not clear why it doesn't have this effect here. I don't think this in any way cuts against the interest of the paper, but I do think the finding that preconditioning supports dopamine responding, food cup responding, but not conditioned reinforcement, doesn't have a particularly straightforward interpretation in terms of dopamine's relationship to model-based vs model-free learning and it's not entirely clear what the authors are getting at with the last sentence of the Abstract.

Just to recap our views here (see also our remarks to the essential revisions above), we believe that model-free value is sufficient to support conditioned reinforcement, since it has been shown that conditioned reinforcement for directly conditioned cues is normally insensitive to reward devaluation (Parkinson, Roberts et al. 2005). As a result, our failure to find conditioned reinforcement for the preconditioned cues provides strong evidence that these cues (in our procedure in rats) differ from normal cues in that they do not possess such model-free value.

A more difficult question is whether they have the ability to link to other representations that access such value – such as the other cue or the food itself. Certainly we believe this is the case for a cue directly paired with food reward – that is while it has a cached value representation that is sufficient to support conditioned reinforcement, we would agree that it likely also has value through its direct association to the food (essentially the value cached in the sensory properties of the food) that can also support instrumental responding. This is something we have explored in other studies (Burke, Franz et al. 2007, Burke, Franz et al. 2008).

However here we are testing a cue that has never been paired with food or anything else that has value. So we believe our failure to see any evidence of conditioned reinforcement indicates that the associative representations triggered by the preconditioned cue do not, by default, dredge up any remote value representations via the associative model. In this, the preconditioned cues could be fundamentally different from any directly conditioned cue. They are purely informational. As we noted above, this has implications about what dopamine neurons may be signalling when these cues are encountered that go beyond whether it is a model-free or model-based value signal.

We have tried to make clear in the Discussion, first, that we think our data rule out a simple model-free value explanation of the firing of the dopamine neurons, and then second, that we would speculate the results pose additional problems for explaining the dopamine response as a value signal at all. Obviously the second point is highly speculative, so we have tried to indicate that. And we can remove or curtail it. But we think the reviewer has identified a very important implication of the data that should be discussed.

Reviewer #2:

In this paper, the authors suggest using rats that sensory preconditioned cues are not capable of supporting conditioned reinforcement (whereas, for instance, in an otherwise similar paradigm, secondary conditioned cues are). Combining this with their earlier results (Sadacca et al), they suggest that this rules out a mediated learning explanation of their sensory preconditioning paradigm, implies that sensory preconditioning depends essentially on model-based inferences (and thus that the dopamine activity seen to preconditioned cues in the previous experiment is associated with model-based rather than model-free learning). I think that the results are very interesting – and will be an important contributor to the literature.

I have some questions about the statistics; but am mostly rather puzzled at the interpretation. The trouble is that the key conclusions of the paper lies on the interpretation of conditioned reinforcement – which is itself far from completely straightforward. In particular, the paper does not really explain either why model-based learning cannot/should not support conditioned reinforcement, nor why the dopamine activity that Sadacca et al. would predict would be inspired by the sensory preconditioned cues, would not support operant responding here, when optogenetically stimulated dopamine responses (for instance) can. Further, the conclusion that no value is attributed to the pre-conditioned cues seems a bit beyond the paper – value is not defined solely in terms of conditioned reinforcement.

We apologize for the confusion. Please see our responses above to the general comments and reviewer 1. Briefly we agree that model-free value is likely not the only type of value to support conditioned reinforcement for a cue directly paired with reward. However based on data showing that such responding is insensitive to reward devaluation (Parkinson, Roberts et al. 2005), value cached in the cue is clearly sufficient. So we believe our failure to see any evidence of conditioned reinforcement for a preconditioned cue means this cue does not possess its own (model‑free, cached) value. This has obvious implications for why dopamine neurons fire to this cue in our hands.

However, we also agree that, for a reward paired cue, there might also be a model-based value that could support responding. Indeed we have made such a suggestion (Burke, Franz et al. 2007, Burke, Franz et al. 2008). Yet a preconditioned cue has never been paired with anything of value. In this way, it is different from cues that we tested in these studies (Burke, Franz et al. 2007, Burke, Franz et al. 2008). We would speculate that our failure to see any conditioned reinforcement for the preconditioned cue means that it is not accessing cached value, even indirectly in a model-based manner. This obviously has further implications for why dopamine neurons fire to this cue. We have added to the Discussion to make these points.

- From Figure 1D, it looks superficially as if cue C (which has no reason to inspire magazine entries) does so more than cue D (which also has no reason to do so). Is this difference statistically significant by itself? If so, then perhaps something about the prediction that it supports is important – and this could underpin part of the magazine responding to A. The statistical test that is done (A&C vs. B&D) does not quite tell us the answer to that.

We appreciate the reviewer’s concern. Indeed there was a main effect of cue type (AC vs. BD; F(1,21)=9.39, p<0.05); although critically there was no interaction (F<1). We believe this difference in responding to the preconditioned versus the directly conditioned cues occurs because the AC extinction is run before the BD extinction. We did this on purpose, since the AC comparison is so critical (as indicated by reviewer 1’s comments), and rats normally reduce even baseline responding quickly once we start extinction testing. We have added text to the results to explain this.

Most importantly, this general difference in responding simply highlights that C is the control for A. It is chosen to be similar in both modality and salience – for example, in both designs, we counterbalance the identity of A and C (white noise or click) and B and D (tone or siren) separately – and C is treated as similarly as possible to A in terms of the number of times and conditions under which it is presented. For this reason, it is the appropriate comparison for A. Comparing to D ignores all of these differences, any of which might lead to differences in responding for trivial reasons.

- Although it is an unfair between-subject comparison, it is notable that the conditioned responding to cue A in Figure 2C does not look significantly different to that to cue A in Figure 1C. Of course, the key comparison is with other cues – but it does make one wonder about the relative strengths of the effects (the magazine entries in 2D are also weaker than those in 1D).

As the reviewer notes, this is a comparison made between subjects and across cues that have had different training and treatment. Indeed rates of responding are slightly lower across the board in the SOC experiment. This may reflect training differences across subjects in the two studies. For example, in the SOC experiment, rats first receive conditioning, then second-order conditioning. This effectively constitutes an extinction session before they go into the conditioned reinforcement tests. Thus the rats may show lower responding since they have had more experience with the cues and no reward (and all the more impressive that they show conditioned reinforcement for the second-order cue). These potential differences are a key reason we used a within-subjects design.

- Why wasn't exactly the same paradigm used as in Sadacca et al. By notdoing that, generalization is obviously weakened.

The procedure is substantially the same. The only significant deviation we can find is the expansion of conditioning by two sessions, versus Sadacca et al. (2016). This is a fairly trivial change and was not purposeful but rather one of the many ways we run this effect in the lab (e.g. we recently used this specific version of the task to implicate a causal role for dopamine in sensory preconditioning (Sharpe, Chang et al. 2017). It is also the same as the conditioning used in the SOC study here, so we don't think the longer conditioning is likely to have caused the difference in conditioned reinforcement we observed for SPC vs. SOC cues.

- Did you score for sign tracking vs. goal tracking (are any of the cues sufficiently approachable to support sign tracking?). This is quite important given results about differential dopaminergic effects in sign vs. goal trackers.

This is an interesting point raised by the reviewer. However, unfortunately we cannot look at those behaviours. The levers were not inserted as cues, which is typically how sign tracking is done; our levers were not retractable and remained accessible for the entire conditioned reinforcement session in order to encourage responding, and of course we did not have a food port either. So, it is not possible to measure sign or goal tracking.

https://doi.org/10.7554/eLife.28362.006

Article and author information

Author details

  1. Melissa J Sharpe

    1. NIDA Intramural Research Program, Baltimore, United States
    2. Princeton Neuroscience Institute, Princeton University, Princeton, United States
    3. School of Psychology, University of New South Wales, Sydney, Australia
    Contribution
    Conceptualization, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    melissa.sharpe@nih.gov
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-5375-2076
  2. Hannah M Batchelor

    NIDA Intramural Research Program, Baltimore, United States
    Contribution
    Investigation, Writing—review and editing
    Competing interests
    No competing interests declared
  3. Geoffrey Schoenbaum

    1. NIDA Intramural Research Program, Baltimore, United States
    2. Department of Anatomy and Neurobiology, University of Maryland School of Medicine, Baltimore, United States
    3. Department of Psychiatry, University of Maryland School of Medicine, Baltimore, United States
    4. Solomon H. Snyder Department of Neuroscience, The Johns Hopkins University, Baltimore, United States
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Investigation, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    geoffrey.schoenbaum@nih.gov
    Competing interests
    Reviewing editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8180-0701

Funding

National Health and Medical Research Council (APP1122980)

  • Melissa J Sharpe

National Institute on Drug Abuse (Intramural Research Program zia-da000587)

  • Geoffrey Schoenbaum

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by the Intramural Research Program at the National Institute on Drug Abuse (ZIA-DA000587) and a CJ Martin overseas biomedical fellowship awarded to MJS (National Health and Medical Research Council, Australia). The opinions expressed in this article are the authors’ own and do not reflect the view of the NIH/DHHS.

Ethics

Animal experimentation: This study was performed in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. All of the animals were handled according to approved institutional animal care and use committee (IACUC) protocols (#15-CNRB-108) of the NIDA-IRP. The protocol was approved by the ACUC at the IRP (Permit Number: A4149-01). Every effort was made to minimize suffering.

Reviewing Editor

  1. Timothy E Behrens, University of Oxford,, United Kingdom

Publication history

  1. Received: May 5, 2017
  2. Accepted: September 18, 2017
  3. Accepted Manuscript published: September 19, 2017 (version 1)
  4. Version of Record published: September 28, 2017 (version 2)

Copyright

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Metrics

  • 1,588
    Page views
  • 280
    Downloads
  • 9
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

  1. Further reading

Further reading

    1. Neuroscience
    Brian F Sadacca et al.
    Short Report Updated

    Midbrain dopamine neurons have been proposed to signal reward prediction errors as defined in temporal difference (TD) learning algorithms. While these models have been extremely powerful in interpreting dopamine activity, they typically do not use value derived through inference in computing errors. This is important because much real world behavior – and thus many opportunities for error-driven learning – is based on such predictions. Here, we show that error-signaling rat dopamine neurons respond to the inferred, model-based value of cues that have not been paired with reward and do so in the same framework as they track the putative cached value of cues previously paired with reward. This suggests that dopamine neurons access a wider variety of information than contemplated by standard TD models and that, while their firing conforms to predictions of TD models in some cases, they may not be restricted to signaling errors from TD predictions.

    1. Neuroscience
    Chang-Hao Kao et al.
    Research Article Updated

    Effective learning requires using errors in a task-dependent manner, for example adjusting to errors that result from unpredicted environmental changes but ignoring errors that result from environmental stochasticity. Where and how the brain represents errors in a task-dependent manner and uses them to guide behavior are not well understood. We imaged the brains of human participants performing a predictive-inference task with two conditions that had different sources of errors. Their performance was sensitive to this difference, including more choice switches after fundamental changes versus stochastic fluctuations in reward contingencies. Using multi-voxel pattern classification, we identified task-dependent representations of error magnitude and past errors in posterior parietal cortex. These representations were distinct from representations of the resulting behavioral adjustments in dorsomedial frontal, anterior cingulate, and orbitofrontal cortex. The results provide new insights into how the human brain represents errors in a task-dependent manner and guides subsequent adaptive behavior.