1. Neuroscience
Download icon

Dopamine neuron ensembles signal the content of sensory prediction errors

  1. Thomas A Stalnaker  Is a corresponding author
  2. James D Howard
  3. Yuji K Takahashi
  4. Samuel J Gershman
  5. Thorsten Kahnt
  6. Geoffrey Schoenbaum  Is a corresponding author
  1. National Institute on Drug Abuse, National Institutes of Health, United States
  2. Northwestern University, United States
  3. Harvard University, United States
  4. University of Maryland School of Medicine, United States
  5. Johns Hopkins School of Medicine, United States
Short Report
  • Cited 0
  • Views 1,238
  • Annotations
Cite this article as: eLife 2019;8:e49315 doi: 10.7554/eLife.49315

Abstract

Dopamine neurons respond to errors in predicting value-neutral sensory information. These data, combined with causal evidence that dopamine transients support sensory-based associative learning, suggest that the dopamine system signals a multidimensional prediction error. Yet such complexity is not evident in the activity of individual neurons or population averages. How then do downstream areas know what to learn in response to these signals? One possibility is that information about content is contained in the pattern of firing across many dopamine neurons. Consistent with this, here we show that the pattern of firing across a small group of dopamine neurons recorded in rats signals the identity of a mis-predicted sensory event. Further, this same information is reflected in the BOLD response elicited by sensory prediction errors in human midbrain. These data provide evidence that ensembles of dopamine neurons provide highly specific teaching signals, opening new possibilities for how this system might contribute to learning.

https://doi.org/10.7554/eLife.49315.001

Introduction

Midbrain dopamine neurons are widely proposed to signal value prediction errors (Mirenowicz and Schultz, 1994). However, the same neurons also respond to errors in predicting the features of rewarding events, even when their value remains unchanged (Howard and Kahnt, 2018; Takahashi et al., 2017). Such sensory prediction errors would be useful for learning detailed information about the relationships between real-world events (Gardner et al., 2018; Howard and Kahnt, 2018; Langdon et al., 2018; Takahashi et al., 2017). Indeed, dopamine transients facilitate learning such relationships, independent of value, when they are appropriately positioned to mimic endogenous errors (Chang et al., 2017; Keiflin et al., 2019; Sharpe et al., 2017). Yet dopaminergic responses to sensory prediction errors do not seem to encode the content of the mis-predicted event, either at the level of individual neurons or summed across populations (Howard and Kahnt, 2018; Takahashi et al., 2017).

How then do downstream areas that receive this teaching signal know what to learn? The conventional response is that such signals are permissive, with downstream areas controlling the content of the resultant learning (Glimcher, 2011). However, another possibility is that information about the content of the learning might be contained, at least partly, in the pattern of firing across ensembles of dopamine neurons. It is now widely accepted that information is represented in areas like cortex and hippocampus not by individual neurons, but rather in a distributed fashion in the firing of groups of cells (Gochin et al., 1994; Jennings et al., 2019; Jones et al., 2007; Rich and Wallis, 2016; Rigotti et al., 2013; Schoenbaum and Eichenbaum, 1995; Wikenheiser and Redish, 2015; Wilson and McNaughton, 1993). If this is true for the cortex and hippocampus, then why not for the midbrain dopamine system? Consistent with this, here we show that the pattern of firing across a small group of dopamine neurons recorded in rats contains specific information about the identity of a mis-predicted event. We further show that this same content-rich signal is evident in the BOLD response elicited by sensory prediction errors in human midbrain. These data provide the first evidence of which we are aware that dopamine neuron ensembles generate firing patterns capable of conveying not only the occurrence of a prediction error to downstream areas but also information regarding what exactly was mis-predicted. These findings open new possibilities for how dopaminergic error signals might contribute to the learning of complex associative information.

Results

To address whether dopamine neurons function as an ensemble to represent sensory prediction errors, we analyzed data from rats trained on a variant of the odor-guided choice task used to demonstrate the joint signaling of value and sensory prediction errors in our prior report (Takahashi et al., 2017) (while a limited analysis of a subset of these data were presented in a supplemental section of our prior report, this is the first presentation of the full dataset and its analysis as an ensemble). In the task variant (Figure 1a), two fluid wells delivered either one or three drops of discriminable but equally-preferred solutions of grape or tropical punch Kool Aid. Rats initiated each trial with a nose-poke into an odor port. After a brief delay, one of two odors was presented, indicating that reward would be available in the left or right well on that trial. If the rat responded at the proper fluid well, the reward was delivered. To induce prediction errors to correlate with neural activity, reward number or flavor were manipulated across a series of four transitions between five trial blocks in each recording session. At the first and second transitions, rewards were omitted and delivered unexpectedly, respectively, to allow identification of classic reward prediction errors. At the third and fourth transitions, reward number remained constant, but flavor was changed. At one transition, the flavor of all three drops were changed to replicate what was done previously, while at the other, only one drop of the three changed, leaving the others unchanged to provide a control condition to distinguish signaling of flavor errors from signaling of flavor itself.

Task design and behavior during recording.

Schematic (a) illustrates the order of events in trials at each well and the number and type of reward delivered at each well in the five trial-blocks performed in all recording sessions. Dashed lines indicate the omission of drops previously delivered. Rats were highly accurate in choosing the rewarded well during recording (b), and accuracy was unaffected by the flavor or number of drops at a particular well, either for the group or for individual subjects (flavor: F1,193=1.3, p=0.26; number: F1,193=1.0, p=0.32; interactions with subject: F’s <= 1.0, p’s > 0.47). Rats were faster to respond for the 3-drop rewards (c), and this effect was again unaffected by the flavor of reward, either for the group or for individual subjects (main effect of number: F1,193=190, p<10−6; main effect of flavor: F1,193=1.75, p=0.19; flavor X subject interaction: F9,193=0.86, p=0.56). A two-bottle preference test run at the end of the sessions (d) also revealed no effect of flavor (F1,9=0.17, p=0.69). Data for individual subjects is illustrated by lines; error bars represent standard errors across sessions for percent correct and latency and across rats for the consumption test. Recordings were made in ventral tegmental area (e), and dopaminergic neurons (n = 30) were identified by waveform cluster analysis (f). **p<0.01. g = grape, tp = tropical punch.

https://doi.org/10.7554/eLife.49315.002

Neural activity in VTA was recorded using drivable bundles of microelectrodes. During recording, the rats were highly accurate, responding correctly on ~95% of the forced-choice trials, indicating that they had learned the meaning of the odor cues, independent of reward number or flavor (Figure 1b). The rats also exhibited an appreciation of the reward number, responding significantly faster when the 3-drop reward was at stake, an effect that was also independent of the reward flavor (Figure 1c). Indeed, choice latency was similar across the two flavors, even in the behavior of individual rats, suggesting that they valued the two flavors similarly in the task (Figure 1c, lines). This is consistent with preference testing conducted separately after recording, which indicated that individually and as a group the rats had no significant preference between the two flavors of Kool-Aid (Figure 1d).

Using waveform characteristics and firing in response to reward, as in previous papers (see Materials and methods), we identified 30 putative dopaminergic neurons recorded during these sessions (Figure 1e and f and Table 1). As previously reported (Takahashi et al., 2017 in Supplemental Figure 2), the firing of these neurons exhibited classic reward prediction error correlates, decreasing in response to reward omission at the first transition and increasing in response to unexpected reward at the second transition, changes that were inversely correlated across neurons (Figure 2a–c). This is as expected based on numerous prior reports that individual dopamine neurons signal bidirectional errors in the prediction of reward, in different species, tasks, and labs (Schultz, 2016).

Dopamine neurons do not distinguish the identity of sensory prediction errors.

Plots show firing rates of dopamine neurons in response to transitions in number of reward drops (omission or delivery; a–c) and flavor (grape or tropical punch; d–f). Changes in firing rate in response to omission (negative errors) and delivery (positive errors) were readily distinguishable (a; t29 = 4.0, p<10−3), inversely correlated across neurons (b), and firing rates were markedly different after the transition (c; t29 = 5.2, p<10−4). The same neurons exhibited increased firing rates in response to transitions in the expected flavor of reward (d); t29 = 2.1, p<0.05), but the increases to the two flavors were indistinguishable (t29 = −1.95, ns), positively correlated across neurons (e), and firing rates after the transition also did not distinguish the two flavor errors (f; t29 = 0.13, ns).

https://doi.org/10.7554/eLife.49315.003
Table 1
Numbers of putative dopamine neurons recorded in each subject (subjects without dopamine neurons are not listed).
https://doi.org/10.7554/eLife.49315.004
Rat ID# Dopamine Neurons
AA016
AA059
AA061
AA074
AA093
AA101
AA126

In addition, however, the same neurons also responded with elevated firing across transitions in which there was a change in reward flavor, combining both the third transition, presented previously (Takahashi et al., 2017 in Supplemental Figure 2), and the more selective fourth transition, included here. These changes in firing occurred even though the rats’ behavior – both in the task and in separate preference testing (Figure 1b–d) – indicated no difference in the subjective value of the two flavors, even for individual subjects. The dopamine neurons increased firing to changes in flavor, and the size of these increases were positively correlated between the two flavor errors (Figure 2d and e). Further, individual neurons showed very little difference between initial firing rates in response to the two different flavor errors (Figure 2f). Thus, the activity of these neurons, individually or on average, signaled that something unexpected had happened, but it did not distinguish details of that event (e.g. if grape was switched for tropical punch or vice versa).

To test whether such information might be available in the pattern of firing across a group or ensemble of dopamine neurons, we aligned activity from all neurons on like trials from each block, and then used a ‘training set’ of trials from each flavor-switch block to identify the ensemble pattern characteristic of the neural response to each flavor change. Individual trials left out of this training set were then matched to the two patterns in an attempt to decode the flavor that had been delivered. To assess the evolution of information coding within and across trials, we used a sliding time window aligned to events in a trial and a sliding window of trials that progressed across each block. The results indicated that the pattern of activity across the ensemble did contain information about flavor in both of the flavor-change trial blocks (Figure 3a and b). Critically, however, accurate decoding of flavor was observed only for the drops where flavor had changed and then only on trials early in the blocks; accuracy was only seen in epochs immediately after the new drop was delivered and fell to chance later in the block, consistent with representation of the error in predicting the flavor – either the omission of the expected flavor or the delivery of the new flavor - and not representation of flavor itself.

Dopamine ensembles distinguish the identity of sensory prediction errors.

Heat plots show decoding of flavor by dopamine neuron ensembles, using data from a sliding window during trials after all three drops changed flavor (a) or when only the second drop changed flavor (b). Red arrows indicate the time of the new flavor drop delivery. In each case, decoding was significantly above chance at the changed drops, but only early in the block (dotted lines on scale bars show one-tailed 95% confidence interval upper bounds for chance, by permutation tests). This effect was also evident when we collapsed data from the two blocks and compared decoding in epochs capturing firing to the drops where flavor changed versus control epochs capturing firing where flavors had not changed (c); flavor could be decoded accurately by dopamine ensembles only immediately after changes in flavor (patterns in confusion matrices were significantly different at p<10−4 by permutation test). A more detailed analysis using sliding sets of 10-trials (d) showed the decay of flavor decoding as the block progressed (upper plot, solid line), while control decoding of flavor (dotted line) and baseline firing rates in both conditions (lower plot) were unchanged across the block. Thick line in the upper plot shows significance compared to chance (p<0.05 for at least five significant trial sets by permutation test). Thin dotted line in upper plot shows chance decoding level.

https://doi.org/10.7554/eLife.49315.005

This impression was confirmed when we formally compared decoding accuracy in time windows surrounding drops where the flavor had changed versus windows surrounding drops where the flavor had not changed. Accurate decoding was only observed when the drop had changed flavor, and then only in the first 10 trials of these blocks; decoding was best in the earliest trials immediately after the transition, fell to chance in the last 10 trials, and flavors from the early trials did not misclassify with the same flavors in the later trials (Figure 3c and d). Separate analyses indicated that flavor could be decoded from neural activity in these early trials as early as 175 ms after fluid delivery (see Materials and methods for details of analysis). The decline in decoding accuracy across the block occurred without any gross changes in baseline firing rates (Figure 3d). Thus, the dopamine neuron ensemble was representing not the flavor itself, but flavor when it had been mis-predicted.

Finally, as an additional test of this idea, we applied a similar approach to examine encoding of the information content of sensory prediction error signals previously reported in fMRI data in the human midbrain (Howard and Kahnt, 2018) (while these data were analyzed for sensory errors in our prior report, this is the first presentation of an MVPA analysis of these data to attempt to distinguish the content of the error signal). These data were collected from subjects performing a task in which they learned that abstract visual cues predicted the odors of different sweet (SW) and savory (SV) food odor rewards (Figure 4a). The rewarding odors were matched in value, as reflected in both pleasantness ratings acquired before the learning task (Figure 4b) and choices made during the task (Figure 4c). During the fMRI scanning session, the odors associated with the visual cues were switched across blocks of trials (i.e., SW→SV and SV→SW), thereby inducing value-neutral sensory prediction errors similar to those induced by the flavor switches in the rat task described above. Previously it was reported that these switches evoked prediction error-like responses in the BOLD signal in the midbrain (Howard and Kahnt, 2018; Suarez et al., 2019). Here we utilized a multivoxel pattern analysis (MVPA) to test whether distributed fMRI activity patterns in this region contained information about the content of the error immediately after a switch and then later after learning.

Human midbrain distinguishes the identity of sensory prediction errors.

(a) The reversal learning task involved binary choices between two visual cues to receive either a high or low concentration of one of two food odor rewards (one sweet [SW] and one savory [SV]). The associations were covertly changed throughout the task to induce either sensory prediction errors (e.g. transition from block 1 to block 2) or value prediction errors (e.g. transition from block 2 to block 3). (b) Sweet and savory food odors were matched for pleasantness within each odor concentration (SW high vs. SV high: t22 = 0.18, p=0.86; SW low vs. SV low: t22 = 1.16, p=0.26). Error bars depict within-subject s.e.m. (c) On free choice trials, the cue associated with the high-concentration odor was chosen significantly above chance (50%) for both odor identities (SW: t22 = 4.03, p=2.83×10–4; SV: t22 = 4.20, p=1.83×10−4) and these choice proportions did not differ significantly from each other (t22 = 0.71, p=0.48). Error bars depict within-subject s.e.m. (d) Decoding accuracy of SW vs. SV was significantly above chance on the error trial of flavor transitions (black line) (t22 = 3.22, p=0.004), but not for subsequent trials or the trial preceding error trials (p’s > 0.12). Decoding accuracy of SW vs. SV was at chance for the error trial on value transitions (gray line), as well as subsequent trials, and the trial preceding the value transitions (p’s > 0.15). Error bars depict within-subject s.e.m. (e) Confusion matrices show the decoding accuracy for individual conditions within the decoding analyses (there was a trend that patterns in confusion matrices were different at p=0.08 by permutation test). Within the top left quadrant of the flavor transition matrix (i.e. training and testing the classifier on the error trial of flavor transitions), across all subjects and iterations, accuracy was at 63.3% for SW predictions and 63.8% for SV predictions. All other comparisons for flavor transitions and all comparisons for value transitions were at chance.

https://doi.org/10.7554/eLife.49315.006

This analysis, which is conceptually similar to that applied to the single unit activity described above, found that it was possible to decode the identity (SW or SV) of the unexpected odor from the midbrain activity at the time the error was experienced (Figure 4d). Importantly, decoding was significantly above chance only on the trials in which the food odors were mis-predicted but at chance on subsequent trials when food odors were delivered as expected (Figure 4d). Follow-up examination of the decoder performance confirmed that decoding was only above chance on the error trial, and that the decoder was not biased towards prediction of a particular odor (Figure 4e), consistent with representation of the mis-predicted food odors and not the food odors themselves.

Discussion

The results presented here show that, in both rats and humans, putative dopaminergic sensory prediction error responses in the midbrain contain specific information about the features of the mis-predicted event itself, appropriate for instructing or updating representations in downstream brain regions. These results are consistent with the proposal that the midbrain dopamine system signals a multidimensional prediction error, able to reflect a failure to predict information about an unexpected event beyond and even orthogonal to value (Gardner et al., 2018; Howard and Kahnt, 2018; Langdon et al., 2018; Takahashi et al., 2017). Importantly this proposal is not necessarily contrary to current canon; it can account for value errors as a special example of a more general function (Gardner et al., 2018), one readily apparent in the firing of individual neurons perhaps due to the priority given to such information when it is the goal of the experimental subject. However, this proposal also explains in a relatively straightforward way why dopamine neurons are often phasically active in settings where value errors were not anticipated a priori, at least by the experimenters, such as when novel cues or even information is first presented (Bromberg-Martin and Hikosaka, 2009; Horvitz, 2000; Horvitz et al., 1997; Kakade and Dayan, 2002), or even in response to violations in beliefs or auditory expectations (Gläscher et al., 2010; Gold et al., 2019; Iglesias et al., 2013; Schwartenbeck et al., 2016). That the pattern of firing across a relatively small population of dopamine neurons can provide details regarding the mis-predicted event endows the dopamine system with the ability to serve as an instructive ‘teaching’ signal outside the dimension of value.

One interesting question raised by the prior and current results is whether and how such a system would distinguish the omission of an expected sensory event from its unexpected appearance. The designs of the two experiments analyzed here do not allow us to distinguish representation of these two types of errors. We would speculate that both should be encoded in the neural activity of the system, including in the current data. Thus, the decoding demonstrated here would reflect the combination of these two changes. Of course, the actual presence of something is likely to support a much stronger signal than its absence, so in practice, it may be difficult or require substantially higher statistical power to see a representation of an omitted event, particularly one that involves subtle features orthogonal to value.

Another interesting question raised by these results is whether downstream areas use the information in the signal to support learning. While the current data is only correlative, it is notable that the information is only there when it is relevant to learning at the start of the blocks, so it is appropriately positioned to be of use to drive learning in downstream structures. And of course, a causal role for the signal shown here is in line with recent demonstrations that dopamine transients are necessary and sufficient for learning that cannot be easily accounted for by classic reinforcement learning mechanisms (Chang et al., 2017; Keiflin et al., 2019; Sharpe et al., 2017). Keiflin et al. (2019) is particularly relevant in this regard, since in this study, conditioned responding to a cue unblocked by artificial activation of VTA dopamine neurons at the time of an expected reward was shown to be sensitive to subsequent devaluation of that reward. Sensitivity to devaluation indicates that the artificial dopamine transients induced the formation of an association between the conditioned stimulus and the sensory properties (i.e. the flavor) of the reward, precisely the type of learning the signal here would be proposed to support (Gardner et al., 2018).

How the artificial activation of neurons engaged in representing information through a pattern of activity can cause normal learning in studies such as those cited above is another outstanding question raised by the current data. One possible explanation for this may be found in the appearance of external events at the time of stimulation in these studies. Even though these events are largely expected in the blocking designs used in Sharpe et al. (2017); Keiflin et al. (2019), input reflecting their appearance still impinges on the dopamine neuron population at the proper time to support learning. By randomly injecting current across a subset of this population, the artificial stimulation may recover a ghost of the error pattern that would be caused by these events if they were unexpected – a pattern close enough to cause learning that seems normal, given the very simple behavioral readouts used in these studies.

If dopamine neurons do provide information about errors beyond the single dimension of value, this brings up questions about the limits on this and how this system deals with the vastness of the possible error space relative to the number of dopamine neurons. There are approximately 40,000 dopamine neurons in the VTA of rats, and another 25,000 in SN (Nair-Roberts et al., 2008). In humans, the total number is about 300,000 (Hirsch et al., 1988). If each neuron provides only a single bit of information, the capacity of just the VTA in rats is still 2^40,000. Of course, there is surely substantial redundancy across neurons, yet even if we reduce the cell number to 1000 real bits of information, we still end up with 1.0715e+301 potential patterns. This is a huge number. And of course, information represented in spiking may be augmented (or attenuated) by factors such as co-release of other neurotransmitters downstream and the location (region, cell type, dendritic compartments) and type (receptors, second messenger cascades, interactions capable of modulating) of interactions with downstream regions, etc. Even if all this combines to yield only 20 or 30 unique coded dimensions, we still end up with a billion possible patterns of output. This number seems big enough, with assistance from other systems (we do not propose this to be the only learning signal) and with contextual modulation of the processing (i.e. some factors might be given priority or not, depending on situation, by modulating inputs), to deal with much of the problem of dimensionality.

Finally it is worth noting that the demonstration here mirrors advances in the computational field, where distributed, multidimensional error signaling is a key component of more advanced algorithms, such as distributed reinforcement learning and successor representation (Dabney et al., 2017; Dayan, 1993). In both, the error driving learning is not unitary but rather is represented as a vector. Distributed reinforcement learning has recently been suggested as an explanation for the heterogeneity of the responses of individual dopamine neurons to errors in predicting reward value (Kurth-Nelson et al., 2019). The current results extend this to show for the first time that an assembly of dopamine neurons can function to represent the content of errors, even outside the realm of value. That the same information available in the pattern of activity is not readily apparent in the activity of individual neurons is in accord with ideas guiding behavioral neurophysiology in other areas (Yuste, 2015), and suggests it is time to consider the functions of the dopamine system across rather than within individual neurons.

Materials and methods

Experiment 1

Subjects

Ten male Long-Evans rats (Charles River Labs, Wilmington, MA), aged approximately 3 months at the start of the experiment and single-housed once the experiment began, were used in this study. Rats were tested at the NIDA-IRP in accordance with NIH guidelines determined by the Animal Care and Use Committee.

Surgical procedures

Request a detailed protocol

All surgical procedures adhered to guidelines for aseptic technique. For electrode implantation, a drivable bundle of eight 25 um diameter NiCr/Formvar wires (A-M Systems, Sequim, WA) chronically implanted dorsal to VTA in the left or right hemisphere at 5.2 mm posterior to bregma, 0.7 mm laterally, and 7.5 mm ventral to the brain surface at an angle of 5° toward the midline from vertical. Wires were cut with surgical scissors to extend ~2.0 mm beyond the cannula and electroplated with platinum (H2PtCl6, Aldrich, Milwaukee, WI) to an impedance of 800–1000 kOhms. Cephalexin (15 mg/kg p.o.) was administered twice daily for two weeks post-operatively.

Histology

Request a detailed protocol

All rats were perfused with phosphate-buffered saline (PBS) followed by 4% paraformaldehyde (Santa Cruz Biotechnology Inc, CA). Brains were cut in 40 µm sections and stained with thionin and then examined to determine electrode placement.

Behavioral task

Request a detailed protocol

Training and recording was conducted in aluminum chambers approximately 18’ on each side with sloping walls narrowing to an area of 12’ x 12’ at the bottom. A central odor port consisting of a small hemicylinder accessible by nose-poke was located about 2 cm above two fluid wells, and higher up on the same wall were mounted two lights. The odor port was connected to an airflow dilution olfactometer to allow the rapid delivery of olfactory cues, which were chosen from compounds obtained from International Flavors and Fragrances (New York, NY). Trial availability was signaled by illumination of the panel lights inside the box. When these lights were on, a nosepoke into the odor port resulted in delivery of the odor cue for 500 ms. One of two different odors was delivered to the port on each trial in a pseudorandom order such that in each 50 trials there were 25 of each, and the same odor was never presented for more than three consecutive trials. At odor offset, the rat had 3 s to make a response at one of the two fluid wells. One odor indicated that reward would be available at the left well, while the other indicated it would be available at the right well; errors resulted in no reward delivery and the lights turning off (errors occurred on about 5% of trials across all recording sessions; see Figure 1b). On correct trials, lights turned off once rats had finished licking at the well; the intertrial interval was ~2–3 s before the light turned on once again. Once the rats were shaped to respond accurately (at least ~75%) on both odors, we introduced trial-blocks in which the number and flavor of reward drops (one or three drops of Grape or Tropical Punch Kool-Aid solution) were constant within a block but changed between blocks according to the schedule summarized in Figure 1a. The drop volume was ~0.05 ml and multiple drops were delivered 1000 ms apart. For each recording session, wells were randomly designated such that in the first trial-block, correct responses at one well resulted in delivery of 3 drops of grape solution while correct responses at the other well resulted in 3 drops of tropical punch solution. In the second trial-block, the number of drops available on both sides changed from three to one, with the flavor remaining the same. In the third trial-block, the number of drops available on both sides changed from one back to three, again with the flavor remaining the same. On the fourth trial-block, the flavor of all three drops on each side were switched to the other flavor. Finally, in the fifth trial-block, the flavor of the second drop on each side was switched to the opposite flavor, with the other two on both sides remaining the same. Thus, in each session, there was one number downshift transition (drop omission), one number upshift transition (new drop deliveries), one flavor transition across all three drops, and one flavor transition occurring at only the second drop. In each of the two flavor transitions, one side went from grape to tropical punch, while the other did the opposite.

Flavor preference testing

Request a detailed protocol

After the completion of all recording sessions, we conducted two-bottle consumption tests of the Kool-Aid solutions two times over two days for nine of the ten rats. These tests were run in a housing cage different from home-cages and experimental chambers. Tests were 2 min in duration and the location of the bottles was swapped roughly every 20 s to equate time on each side. The flavor and the initial location of the bottles were randomized in rats and swapped between the 1st and 2nd tests.

Single-unit recording

Request a detailed protocol

Wires were screened for activity daily; if no isolable single-unit activity was detected, the rat was removed and the electrode assembly was advanced 40 or 80 µm. Otherwise active wires were selected to be recorded, a session was conducted, and the electrode was advanced at the end of the session. Neural activity was recorded using Plexon Multichannel Acquisition Processor systems (Dallas, TX). Signals from the electrode wires were amplified 20X by an op-amp headstage (Plexon Inc, HST/8o50-G20-GR), located on the electrode array. Immediately outside the training chamber, the signals were passed through a differential pre-amplifier (Plexon Inc, PBX2/16sp-r-G50/16fp-G50), where the single unit signals were amplified 50X and filtered at 150–9000 Hz. The single unit signals were then sent to the Multichannel Acquisition Processor box, where they were further filtered at 250–8000 Hz, digitized at 40 kHz and amplified at 1-32X. Waveforms (>2.5:1 signal-to-noise) were extracted from active channels and recorded to disk by an associated workstation.

Measures and statistical analyses

Request a detailed protocol

Average percent correct and choice latency (defined as the time from the end of odor delivery to withdrawal from the odor port on trials resulting in a correct response) were calculated by trial-type (3-drop, 1-drop, grape, tropical punch) across all trials. The flavor of the reward was defined as that of the first drop.

Units were sorted using Offline Sorter software from Plexon Inc (Dallas, TX). Sorted files were then processed and analyzed in Matlab (Natick, MA). Dopamine neurons were identified via a waveform analysis. Briefly, a cluster analysis was performed based on the half-time of the spike duration and the ratio comparing the amplitude of the first positive and negative waveform segments. The center and variance of each cluster was computed without data from the neuron of interest, and then that neuron was assigned to a cluster if it was within 3 s.d. of the cluster’s center. Neurons that met this criterion for more than one cluster were not classified. This process was repeated for each neuron. Neurons were considered putatively dopaminergic if they were in the wide waveform cluster and were also reward-responsive, defined as those that were significant at p<0.05 by t-test comparing baseline firing rate with the first 500 ms of reward delivery across all rewarded trials. This waveform analysis is based on criteria similar to that typically used to identity dopamine neurons in primate studies (Bromberg-Martin et al., 2010; Fiorillo et al., 2008; Hollerman and Schultz, 1998; Kobayashi and Schultz, 2008; Matsumoto and Hikosaka, 2009; Mirenowicz and Schultz, 1994; Morris et al., 2006; Waelti et al., 2001) and isolates neurons in rat VTA whose firing is sensitive to intravenous infusion of apomorphine or quinpirole (Jo et al., 2013; Roesch et al., 2007). Neurons identified in this manner are also selectively eliminated by expression of a Casp3 neurotoxin in TH+ neurons in VTA (by infusion of AAV1-Flex-TaCasp3-TEVp into TH-Cre transgenic rats; Takahashi et al., 2017).

To calculate difference scores and firing rates for scatter plots, firing rates were aligned to drop delivery and baseline-subtracted using the 500 ms immediately before the light-on at the start of the trial. To capture the peak reward-responsive activity, firing rates from 200 ms to 700 ms after the timestamp for the relevant drop delivery or drop omission were calculated. For number errors, the epochs were aligned to the first omitted drop (at the time the second drop would normally be delivered) in block 2, and the first newly delivered drop (second drop) in block 3. For flavor errors, the epochs were aligned to the first new flavor drop in both blocks 4 and 5. Difference scores were calculated for number transitions as the difference between the average firing rate on the first three rewarded trials in the relevant block and the last five rewarded trials in the same block and direction, and for flavor transitions as the difference between the average firing rate in the first three rewarded trials in the relevant block and the last five trials in the previous block in the same direction.

For the decoding analyses, we used Matlab code from the Neural Decoding Toolbox (www.readout.info) (Meyers, 2013) to construct pseudoensembles consisting of all 30 putative dopamine neurons as described below. Decoding using pseudoensembles has been found to reveal the information held by the activity of populations of neurons in well-learned tasks such as the one we used here as effectively as analyses of real-time simultaneously recorded ensembles (Rigotti et al., 2013; Schoenbaum and Eichenbaum, 1995). The spike-trains of the 30 neurons were aligned to various trial events (light-on, odor delivery, odor port withdrawal, reward delivery, and light-off), concatenated according to the average time between these events, and then binned into sliding 900 ms bins across the resulting spike-trains. All the correct trials from blocks 4 and 5 were labeled according to the flavor delivered on that trial, with trials from block five labeled according to the flavor of the second drop (the changed drop). The first ten trials in each block for each flavor were then taken from blocks 4 and 5, resulting in 40 total trials for each neuron. This selection resulted in flavor being fully crossed with side (10 trials from each flavor being left-well rewarded and 10 being right-well rewarded). The trials were then randomly divided into 18-20 splits, in each of which there was one test trial of each flavor for each neuron and the remaining 17-19 training trials of each flavor for each neuron. For each split, the flavor of each test trial was classified according to which training set had the highest correlation coefficient with it across the 30 neurons. This random split and test procedure was then repeated 500 times for every epoch to yield the average 1–0 accuracy of the classification at that epoch. This entire procedure was then repeated for sliding sets of 10 trials across the blocks (i.e. trials 1–10 of each flavor in each block, trials 2–11 of each flavor in each block, etc., ending with the last 10 trials of each flavor in each block). The 1–0 accuracy was then plotted separately for test trials taken from block 4 and block 5. The one-tailed 95% confidence interval for chance for the first sliding set of trials was calculated by shuffling the flavor labels 100 times and performing the entire analysis on each resulting dataset.

The decoding analysis shown in Figure 3c was similar to that described above, except that only the 900 ms epoch beginning 100 ms after the first new flavor drop was used, test data from blocks 4 and 5 were included together, and the first ten and last ten trials were labeled separately and both included in the same analysis. The resulting classification accuracy was compared with a control classification of flavor in which the identical procedure was followed, except that data from the first drop of block three and the first drop of block five were used. These drops were selected because flavor was unchanged at those drops compared to the previous blocks, because they were part of 3-drop sequences just as in the experimental dataset, and because flavor was crossed with direction just as in the flavor transition analysis. The patterns in the flavor transition vs. flavor unchanged confusion matrices were compared by permutation test in which the flavor labels were shuffled 100 times for each analysis and 100,000 comparisons between the resulting confusion matrices were used to construct a distribution of comparisons. We then calculated the probability that the actual pattern of the two confusion matrices would be observed by chance. That is, we calculated the chance that the differences between flavor transition vs. flavor unchanged in grape early and tropical punch early would be as great as they were in the real data, while the differences in grape late and tropical punch late would be as small as they were in the real data.

The decoding analysis shown in Figure 3d was similar to that described above, except that the decay of decoding accuracy across the block was tested by using a sliding set of trials for both the flavor transition and flavor unchanged analyses. Each curve was then compared to chance by permutation tests with 100 shuffles of the flavor labels each. The accuracy in the unshuffled data was considered significantly greater than chance when it was in the top 5% of the shuffle distribution for five consecutive sliding sets of trials. Average baseline firing rate on the trial-sets included in each of the decoding algorithms was also calculated and shown on Figure 3d.

We tested the latency of flavor decoding in the first ten trials of blocks 4 and 5 combined by advancing a 200 ms sliding epoch from the time of new flavor drop delivery until significance (by permutation test, p<0.05) was reached and maintained for at least five consecutive bins. We identified the latency as the end of the first significant epoch.

Experiment 2

Subjects

Twenty three human participants (nine male, ages 19–34, mean ± SD = 25.5±4.1 years) with no history of psychiatric illness gave informed written consent to participate in this study. The study protocol was approved by the Northwestern University Institutional Review Board.

Odor stimuli and presentation

Request a detailed protocol

Eight food odors, including four sweet (strawberry, caramel, cupcake, gingerbread) and four savory (potato chips, pot roast, sautéed onions, garlic), were provided by International Flavors and Fragrances (New York, NY). For all experimental tasks, odors were delivered directly to participants’ noses using a custom-built computer-controlled olfactometer.

Odor selection and task familiarization

Request a detailed protocol

In an initial behavioral testing session, hungry participants (fasted for at least 6 hr) first provided pleasantness ratings of the eight food odors. Based on these ratings, one sweet odor and one savory odor were chosen such that they were matched as closely as possible in pleasantness. Next, we acquired pleasantness ratings for the two selected odors across a range of odor concentrations, diluted to varying degrees with odorless air. Based on these ratings, we selected two concentrations for each odor, such that the two low-concentration odors had the same pleasantness and the two high-concentration odors had the same pleasantness.

Participants next completed 84 trials of the instrumental reversal learning task they would eventually complete in the fMRI scanner. For this task, two abstract visual symbols were randomly chosen to serve as conditioned stimuli (CS) throughout the rest of the experiment. Each trial started with either one of the two CS’s (indicating it was a forced choice trial) or a question mark (indicating it was a free choice trial) presented for 4 s. Both CS’s were then presented on either side of a center crosshair (side fully randomized and counterbalanced) for 1.5 s, during which time participants were instructed to choose via left or right mouse click the CS that appeared alone in the preceding screen (in the case of a forced choice trial), or whichever CS they preferred (in the case of a free choice trial). If no response was made within 1.5 s, ‘TOO SLOW’ appeared on the screen and the next trial was initiated after a variable delay. If a response was made, the odor currently paired with the selected CS was delivered after a 2 s delay. Odor delivery, lasting 3 s, was indicated by changing the color of the center crosshair to blue, informing participants to sniff. Participants then rated either the pleasantness or identity of the received odor (rating type randomized), followed by a 0–2 s inter-trial interval.

Across the 84 trials, the choice task was covertly subdivided into 8 blocks of trials delineated by the specific CS-US associations predetermined for that block. Each block consisted of either 9 or 12 trials, and the length of blocks across the session was pseudorandomized. Within a given block, one of the CS’s was paired deterministically with the high concentration of one odor identity (e.g., sweet high: SWH), while the other CS was paired deterministically with the low concentration of the same odor identity (e.g., sweet low: SWL). After each block, the CS-US associations were changed without warning, and new blocks always began with two forced choice trials (one for each CS). In the case of flavor reversals, the flavor of the US was changed for both CS’s while leaving CS-value associations the same. In the case of reward value reversals, the CS-value association was swapped between the two CS’s, while leaving flavor unchanged. Reversals alternated between flavor and value, and there were seven total reversals across the 84-trial task.

Choice task during fMRI scanning

Request a detailed protocol

The fMRI scanning session was conducted within ~10 days (mean ± SD = 10.0±4.4 days) of the initial behavioral session. During scanning, hungry participants (fasted for at least 6 hr) completed 3 runs of the 84-trial reversal learning task described above. Each run lasted ~21 min, and the sequence of alternating flavor and value reversals was counterbalanced across subjects.

fMRI data acquisition

Request a detailed protocol

MRI data were acquired on a Siemens 3T PRISMA system equipped with a 64-channel head-neck coil. Echo-Planar Imaging (EPI) volumes were acquired with a parallel imaging sequence with the following parameters: repetition time, 2 s; echo time, 22 ms; flip angle, 90°; multi-band acceleration factor, 2; slice thickness, 2 mm; no gap; number of slices, 58; interleaved slice acquisition order; matrix size, 104 × 96 voxels; field of view 208 mm x 192 mm. The functional scanning window was tilted ~30° from axial to minimize susceptibility artifacts in OFC (Weiskopf et al., 2006). Each fMRI run consisted of 640 EPI volumes covering all but the dorsal portion of the parietal lobes. To aid in co-registration and normalization of the functional scans, we also acquired 10 EPI volumes for each participant covering the entire brain, with the same parameters as described above except 95 slices and a repetition time of 5.25 s. A 1 mm isotropic T1-weighted structural scan was also acquired for each participant. This image was used for spatial normalization.

fMRI data preprocessing

Request a detailed protocol

All image preprocessing and general linear modeling was done using SPM12 software (www.fil.ion.ucl.ac.uk/spm/). To correct for head motion during scanning, for each subject all functional EPI images across the 3 fMRI runs were aligned to the first acquired image. The motion-corrected images were smoothed with a Gaussian kernel at native scan resolution (2 × 2×2 mm) to reduce noise but retain potential information content (Gardumi et al., 2016). For reverse normalization of midbrain regions of interest to participant-specific native space, each participant’s T1-scan was normalized to Montreal Neurological Institute (MNI) space using the 6-tissue probability map provided by SPM12. The inverse deformation field resulting from this normalization step was then applied for each participant to a region of interest in MNI space defined by spheres of 4-voxel radius centering on the two midbrain coordinates reported to show a significant univariate response to flavor prediction errors (left: x=-16, y=-14, z=-12; right: x = 6, y=-14, z=-14) (Howard and Kahnt, 2018).

General linear modeling and MVPA analyses

Request a detailed protocol

For the decoding analysis, we constructed independent subject-level event-related general linear models (GLMs) for each fMRI run using finite impulse response (FIR) functions specified over 12 time bins time-locked to the onset of each trial. Nuisance regressors included: normalized respiratory activity traces (measured by MR-safe breathing belts affixed around the torso); the six realignment parameters calculated for each scanned image during motion-correction; the derivative, square, and square of the derivative of each realignment regressor; the absolute signal difference between even and odd slices, and the variance across slices, in each functional volume; additional regressors as needed to censor individual volumes in which particularly strong head motion occurred. Odor onsets corresponding to 13 conditions were specified in each GLM: SV→SW reversals, SW→SV reversals, SW and SV 1, 2, 3, and four trials after reversals, SW and SV on the trial immediately preceding reversals, and all other trials. The resulting parameter estimates within a region of interest (ROI) defined by the intersection of an un-normalized anatomical mask of the midbrain and the un-normalized spherical mask described above were extracted for each subject, fMRI run, and condition at the time bin corresponding most closely to odor delivery given hemodynamic lag. Prior to decoding, voxels within each subject’s midbrain ROI were sorted according to the difference in responses to flavor transitions on the error trial (combined across SV→SW and SW→SV) and responses on the trial preceding error trials (combined across SW and SV).

The resulting sorted parameter estimates were then submitted to pairwise linear support vector machine decoding analyses using the libsvm implementation (Chang and Lin, 2011). Each pairwise analysis corresponded to the SW and SV conditions at a given trial point (i.e., error trial, error trial +1, error trial +2, etc.), and was conducted using a nested cross-validation approach in which we first performed leave-one-subject-out cross-validation in increasing numbers of voxels within the ROI to determine the number of voxels that most effectively decodes reward flavor in a ‘training set’ of subjects. Leave-one-run-out cross-validated decoding of flavor in the left out subject was then conducted in the number of voxels giving maximal decoding accuracy from the training set of subjects. This process was repeated for each subject, resulting in an independent decoding accuracy value calculated for each subject and decoding pair.

An identical analysis was conducted for value transitions (i.e., flavor unchanged), in which GLM’s were specified using the same condition types time locked to these type of reversals: SW and SV at the value error trial, SW and SV at 1, 2, 3, and four trials after value reversal and immediately before value reversal, and all other trials. We then implemented the same nested cross-validation method to generate decoding accuracies for pairwise tests at each trial point.

The patterns in the flavor transition vs. flavor unchanged confusion matrices were compared by permutation test. For each analysis and subject we shuffled the condition labels (first sweet, first savory, last sweet, last savory, within each run) 100 times. For each of these shuffled permutations, we conducted the decoding analysis to generate a confusion matrix. We then randomly sampled one of these confusion matrices for each subject 100 times and averaged the sampled matrices across subjects to generate 100 population averages. We then randomly sampled from these 100 averages to generate 100,000 comparisons between the two matrices to generate a distribution of comparisons. From this distribution, we calculated the probability that differences between flavor transition vs. flavor unchanged in sweet error trial and savory error trial were as great as they were in the real data, and differences in sweet last trial and savory last trial were as small as they were in the real data.

References

  1. 1
  2. 2
  3. 3
  4. 4
    LIBSVM: a library for support vector machines
    1. C-C Chang
    2. C-J Lin
    (2011)
    ACM Transactions on Intelligent Systems and Technology. pp. 1–27.
    https://doi.org/10.1145/1961189.1961199
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
    A distributional code for value in dopamine-based reinforcement learning
    1. Z Kurth-Nelson
    2. M Botvinick
    3. W Dabney
    4. N Uchida
    5. D Hassabis
    6. CK Starkweather
    7. R Munos
    (2019)
    RLDM.
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46

Decision letter

  1. Kate M Wassum
    Senior and Reviewing Editor; University of California, Los Angeles, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Acceptance summary:

In this study, the authors demonstrate that dopamine neuron ensembles encode the specific identity of surprising outcomes. This work is an extension, and more in depth analysis, of two previously published studies by these research groups who previously demonstrated that (putative) dopamine neurons or midbrain BOLD signals do not only signal surprising changes in value (as generally assumed), but also signal sensory prediction errors (i.e. value-neutral violations in expected outcome identity). Here, the authors take this a step further, using a decoding analysis to show that the firing pattern in putative dopamine ensembles or midbrain fMRI signals also encode the specific identity of the surprising outcome. This has potentially far-reaching implications for our understanding of the role of dopamine in learning.

Decision letter after peer review:

Thank you for submitting your article "Dopamine neuron ensembles signal the content of sensory prediction errors" for consideration by eLife. Your article has been reviewed by three peer reviewers and the evaluation has been overseen by Kate Wassum as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Essential revisions:

You will see that the reviewers were very enthusiastic about the paper, but raised a number of overlapping concerns. We expect you will make good faith effort to address each of the concerns noted in the appended reviews. We especially direct your attention to the following:

1) Discuss the problem of dimensionality (i.e., the vastness of potential sensory prediction errors relative to the number of dopamine neurons; reviewer 2 point 2) and of the limitations of using 2 outcomes in this task.

2) Discuss of the limitations of the decoding analysis approach (reviewer 3 point 1) including whether or how this information might be read out in the brain (reviewer 2 point 1).

3) Apply the decoding analysis to the value prediction error phase of the task to ask whether DA/midbrain ensembles encode outcome identity when the presence of the outcome altogether, not only its identity, is surprising (reviewer 1 point 1).

4) Discuss how these data are compatible (or not) with evidence that optogenetic activation, which activates DA neurons, but disrupts the precise firing pattern of DA ensembles, can promote learning about the sensory features of an event (reviewer 1 point 3).

Reviewer #1:

In this study, Stalnaker et al. make the original and thought-provoking argument that dopamine neuron ensembles encode the specific identity of surprising outcomes. This work is an extension, and more in depth analysis, of two previously published studies by these two research groups (Schoenbaum and Kahnt) who independently demonstrated that (putative) dopamine neurons do not only signal surprising changes in value (as generally assumed), but also signal sensory prediction errors (i.e. value-neutral violations in expected outcome identity). Here, the authors take it a step further. Using a decoding analysis, they present converging evidence from single cells recording in rats (experiment 1) and fMRI in humans (experiment 2) showing that the firing pattern in dopamine ensembles also encode the specific identity of the surprising outcome. This has potentially far-reaching implications for our understanding of the role of dopamine in learning.

The results are compelling, data are presented in a manner that is easy to understand, interpretations are logical, and the manuscript is very well written. The inclusion of rodent and human work in the same paper is commendable. Although some results from these datasets have already been published, the type of analyses and the conclusions derived from these analyses are clearly novel, and well beyond what was previously published, and justifies a separate publication.

- Encoding of sensory information in "classic" reward prediction errors?

It seems that the decoding analysis was applied strictly to the portion of the task that features identity prediction errors (value-neutral switch from O1 to O2). But what about the portion of the task that features "classic" reward prediction errors? Is there also evidence that DA ensembles encode outcome identity when the presence of the outcome altogether, not only its identity, is surprising (arguably the most common scenario when organisms learn about outcomes)?

- Functional role for sensory information in DA prediction errors?

Authors propose that the information about the identity of a surprising outcome contained in the firing pattern of DA ensembles, instructs downstream areas what to learn. The authors should highlight how this hypothesis is a significant shift from the way prediction errors (even sensory PE) are generally conceived (i.e. as permissive signals for learning, with the content of learning being determined by activity in downstream regions, e.g. Glimcher, 2011). The authors should also consider discussing how this new hypothesis fits with the highly divergent and overlapping nature of DA projections (Arbuthnott and Wickens, 2007; Matsunda et al., 2009; Bolam and Pissadaki, 2012).

- The authors cite studies showing that DA firing promote learning about sensory features of an event. Some of these studies (Sharpe et al., 2017; Keiflin et al., 2019) used optogenetic activation, which activates DA neurons as a whole but most likely disrupts the precise firing pattern of DA ensembles. How is that compatible with the idea the precise pattern of firing in DA ensembles instruct the content of learning?

Reviewer #2:

This paper studies the population activity of dopamine neurons in rodents and midbrain BOLD in humans and shows that it is possible to decode more sophisticated aspects of predictive failures than would be implied just by changes in scalar value. In particular, a change from an outcome of one flavor (rodents) or sweet vs. savory (humans) is decodeable from this population activity.

In short, this is a very well conducted pair of studies, and the results and the analyses are convincing. The paper relates well to previous studies looking at less scalar aspects of the dopamine signal, adding an interesting new perspective.

My main concern is that the underlying implications are not so clear:

- Many dopamine cells have exuberant axonal arbors (although for sure, more in the SNc than the VTA) – thus the experimenter's ability to decode this signal from individual cells may vastly outweigh the ability of downstream neurons or tissue to do this decoding. It would be good to be convinced that this signal had a way of actually mattering. The fMRI results don't constitute such a proof.

- Related to this – what my computational colleagues refer to as the dimensionality of the problem seems unfavorable. The number of possible juices, let alone things of equivalent value, that could be substituted, is vast. They have elaborated representations in various regions of the cortex which is sized to match. However, there aren't many dopamine neurons. Thus, the fact that there is a reliably decodable signal – coming, for instance, from the upstream sampling properties of individual dopamine neurons (e.g., if they get somewhat random collections of inputs) – when there are only two choices, and the chance to build a sophisticated decoder bears little on the real problem of making state- rather than value-based prediction errors be useful.

Reviewer #3:

This study examines the role of dopamine neurons in sensory prediction errors, which are signals that indicate an event that violates expectancies, independent of the value of the event. The authors show that the response of single dopamine neurons detects sensory violations, but they don't necessarily discriminate between different kinds of violations. However, at the population level it is possible to decode the nature of the violation. In addition, they show that a similar pattern of results can be obtained in humans using fMRI. The study is a solid contribution to the literature and its interpretation is relatively straightforward. The task is well-designed, the analyses are appropriate, and the results are clear.

1) The Discussion should do more to couch the interpretation of the results. A danger with decoding analyses is that just because something can be decoded, it doesn't necessarily mean that the brain is using that information. For example, it would be relatively straightforward to decode line orientation from retinal ganglion cells, even though we know that these cells are not encoding this information.

2) Figure 3A: The mountain plot obscures the data. I'd suggest just plotting the top-down view and allowing the color to represent decoding accuracy.

3) Figure 3C: I don't know why this data has been spatially smoothed. Confusion matrices are not continuous.

4) Figure 4D: The description in the figure legend is confusing. The first sentence states that decoding accuracy was above chance on the error trial, but then the next sentence states that it was not above chance on this trial.

5) Figure 4E: Unlike Figure 3C there are no stats to support the claims as to what data is or is not significant.

https://doi.org/10.7554/eLife.49315.009

Author response

Essential revisions:

You will see that the reviewers were very enthusiastic about the paper, but raised a number of overlapping concerns. We expect you will make good faith effort to address each of the concerns noted in the appended reviews. We especially direct your attention to the following:

1) Discuss the problem of dimensionality (i.e., the vastness of potential sensory prediction errors relative to the number of dopamine neurons; reviewer 2 point 2) and of the limitations of using 2 outcomes in this task.

We thank the reviewers for all their work in reading our original paper, their kind comments of support, and their excellent and insightful questions. We have tried to address them as best we can with the existing data and within the constraints of the manuscript, but we hope they will appreciate that some of the questions raised are almost existential and thus hard to arrive at a fully satisfying text response. We have done our best.

With regard to the problem of dimensionality (reviewer 2, point 2), we definitely appreciate this issue. However, it is undeniable that our brains do deal with this problem. And thus the solution must be in our heads. While we do not argue that our small experiment provides the full solution, it is a step in that direction to show that this system, which is already accepted as serving as a teaching signal in one critical dimension (value), appears to contain information to do so in another completely orthogonal dimension (flavor). The odds are low that we stumbled upon the only other dimension in the world that activates this system, and so if we accept that many other dimensions may be represented similarly, then the current results suggest the dopamine system may broadly provide information about unexpected events in the world. And, critically, we think the system has the capacity to do this. There are approximately 40,000 dopamine neurons in the VTA of rats, and another 25,000 in SN (Nair-Roberts, 2008). In humans, the total number is about 300,000 (Hirsch et al., 1988). If we assume that each neuron only provides a single bit of information, this means the capacity of just the VTA in rats is 2^40,000. Of course, this calculation is obviously problematic as it ignores certain redundancy in any coding, yet even if we reduce the cell number to 1000 real bits of information, we still end up with 1.0715e+301 potential patterns. This would cover an unimaginably large number of possible sensory features and events. And of course, information represented in spiking may be augmented (or attenuated) by factors such as co-release of other neurotransmitters, location (region, cell type, dendritic compartments) and type (receptors, second messenger cascades) of interactions with post-synaptic neurons, etc. Even if all this combines to yield only 20 or 30 uniquely coded dimensions, we still end up with a billion possible patterns of output. And this enormous number assumes a binary code; it seems likely to us that a single neuron does more than signal yes/no. This number seems big enough, with assistance from other systems (we are not arguing this is the only learning signal) and with contextual modulation of the signal (i.e. some dimensions of the environment might be given priority or not, depending on the situation, by modulating inputs), to deal with much of the problem of dimensionality.

The potential importance of contextual modulation of the processing to reduce dimensionality is worth emphasizing. The hippocampus is perhaps the most famous example a system or brain area where we understand its function most clearly by considering ensemble-based coding. This area is thought not only to represent location in space, but to integrate this information with other dimensions of information (head direction, direction of movement, reward, internal motivation, and so on). Further, the same structure is now generally accepted as representing non-spatial information similarly. Yet it is not thought that it holds all this information simultaneously online about every place in the world at once, let alone every continuous variable. Rather, it is thought to do so in a way that is constrained by inputs regarding the current context and goals of the subject. Here, similarly, the dopamine system is presumably governed by the content of the inputs from upstream areas directing its processing to certain variables or dimensions of information. Of course, here the problem is different as the system is tasked with registering unexpected events. But one could imagine a major reduction in dimensionality if upstream areas ignored many aspects of the environment deemed to be stable and irrelevant unless they passed some threshold, particularly in highly structured settings such as experimental tasks where so much of the world can be categorized as irrelevant. For instance, imagine all the things in the training chamber that the rat has learned to ignore and that presumably would not evoke any changes in dopamine firing unless there was a sudden large deviation in them (temperature, ambient lighting, external sounds of the experimenter, smells here and there, and so on). Indeed we have some evidence of this – in our original report (Takahashi et al., 2017), we found that the dopamine neurons only exhibited sensory prediction errors when there was some change in the rats’ behavior when the reward identity switched. To be clear, this behavioral change was not directional and thus not value-based. But it indicated objectively that the rat had noticed the change in flavor. When there were no changes in behavior that we could detect at the transition, the same neurons did not show error signals. This is potential evidence of a shutting off or modulation of the dimensions responded to by the dopamine neurons based on the rats’ subjective goals or perceptions of the environment. Such a mechanism could drastically simplify the dimensionality problem.

We have edited the Discussion to include a paragraph that makes some of the above points in a limited fashion.

Regarding whether the content of the sensory prediction error upon a switch from A->B reflects the presence of B or the absence of A, we are not equipped to test this with data from the experiments presented here because they only involved two distinct reward identities. However, we agree that this is an excellent and very important question, and we are currently planning experiments to get directly at this question. For now, we have included a paragraph in the Discussion noting the issue.

Meanwhile, to try to satisfy this concern a bit, we have looked at data from another study on sensory prediction errors in humans (Suarez et al., 2019) in which we used four distinct value-matched food odors as rewards in a similar reversal learning task (A, B, C, D). This allowed us to conduct a reanalysis of this dataset in which we attempted to decode the identity of the prediction error on trials in which only one of these types of information was in error. For instance, we examined activity on A->B versus A->C switches to determine if the identity of the unexpected outcome delivered could be decoded, and we examined activity on A->B versus C->B switches to test if the identity of the omitted outcome could be decoded. We found above-chance decoding accuracy in the former scenario (t18=2.19, p=0.042), but chance accuracy in the latter (t18=0.89, p=0.39).

These results indicate that the sensory prediction error signal encoded in ensemble midbrain activity, at least in humans, reflects the content of the present unexpected outcome. However, we are reluctant to conclude that the omitted outcome is not also represented. If the absence of expected sensory information is represented by reductions in patterns of activity, similar to the absence of expected value, it may be much harder or require greater statistical power to see, especially against the backdrop of delivered actual events. That is, when something new is delivered, it is not clear that the omission of the thing replaced registers as strongly, thus the error pattern may be biased to reflect the actual unexpected event that occurs. Further, Suarez et al. involved one long run of fMRI data collection rather than distinct runs that would be most appropriate for constructing separate independent training and test sets for decoding. We therefore feel that these results are somewhat tenuous and would prefer to not include them in the present manuscript. We hope the paragraph in the Discussion alluding to this outstanding question is sufficient for now.

2) Discuss of the limitations of the decoding analysis approach (reviewer 3 point 1) including whether or how this information might be read out in the brain (reviewer 2 point 1).

Reviewer 3 makes the good point that we cannot be certain from our decoding results that the regions downstream of dopamine neurons are actually using the outcome identity information that we can decode from their activity. This is a hazard of any correlative study, it seems to us, not just a problem for decoding analyses. However, in our results, unlike the retinal cell example cited by the reviewer, the relevant information can be decoded only when the hypothesis predicts it would be useful to downstream regions (i.e. when it is unexpected), and not at other times. If one could decode line orientation from retinal cell activity only when line orientation was useful to the rest of the brain, that would be much stronger evidence of functional relevance at the level of the retina. Although we cannot be sure outcome identity information held by dopamine ensembles is functionally relevant for error-based learning, the data are at least consistent with this proposal. Further, there is in fact evidence that manipulating patterns of dopamine activity causes sensory error-based learning (also see the answer to essential revision 4 below for a related discussion of how such a manipulation may produce this learning). These studies include Sharpe et al., 2017, and perhaps even more obviously the study by Keiflin et al. (2019), in which stimulation of dopamine neurons at the time of reward in a blocking procedure unblocks devaluation-sensitive reward learning (indicating dopamine stimulation drove the development of an association between the CS and the sensory properties of the US, such as its flavor). Admittedly, the pattern of activity created by optogenetic manipulation in these studies was not specifically known, but they clearly show that a change in the pattern can be functionally relevant for learning. We have added a paragraph to the Discussion to clearly acknowledge this limitation and point out the relationship to the other studies that have found a functional relevance of dopamine release to sensory associative learning.

Regarding the question of how information held by dopamine ensemble firing rates could be read out by downstream regions, we must acknowledge that our study was not designed to address this question (as would be true of any neural recording study). Reviewer 2 mentions the important issue of exuberant axonal arborization of dopamine neurons. This is perhaps related to the broader idea, commonly held for many years, that dopamine operates by volume transmission, which would not maintain the fine structure of ensemble firing rates. However, the exclusivity of dopaminergic volume transmission is being challenged in recent literature (e.g. Liu and Kaeser, Current Opinion in Neurobiology, 2019). And regarding the specific issue of axonal arborization, whether it makes a readout of firing rate information difficult depends on the specific pattern of arborization and release points coming from each dopamine neuron. We would argue that the question of how much and what information is carried by dopamine release has yet to be well-tested, in large part because the techniques for tracking release have lacked the spatial resolution to ask this question (until recently). In future studies, we intend to address this question using recently developed imaging techniques, such as using the DLight fluorescent dopamine sensor with an in vivo miniscope.

3) Apply the decoding analysis to the value prediction error phase of the task to ask whether DA/midbrain ensembles encode outcome identity when the presence of the outcome altogether, not only its identity, is surprising (reviewer 1 point 1).

In response to the first point, we would observe that, given the task design, it is unclear whether the flavor (or identity) is unexpected across transitions designed to elicit “classical” reward prediction errors, in both the rat and human version of the task. Further, it is reasonable (and c/w behavioral data) to think that the “value” represented by the additional drops (in the rat task) or increased intensity of a food odorant (in the human case) might swamp any attention to what precise flavor those drops have or what specific qualities the food odorant has. Consistent with this idea, we do not find evidence that identity per se is decoded across these transitions. In the human case, that data was already shown in the original manuscript in Figure 4D-E (admittedly this might not have been clear, because we labeled this transition “flavor unchanged,” but in fact it tested flavor decoding across value transitions – we have now clarified this in the figure).

In the rat case, the data in the “flavor unchanged” control condition did not test this idea, because it included only the first drop of milk, which would have been fully expected across the value transitions. However, in response to this concern, we did run an additional analysis, testing flavor decoding on the second drop of milk (i.e. the new one) across these value transitions. This analysis, presented in Author response image 1, did show somewhat elevated decoding of flavor early in the block. However, in line with the human data, this decoding was not significantly above chance. We have chosen not to include these data in the revised manuscript, because (1) the above reasons suggest it is not the best test of this question, and (2) the residual decoding, though not significantly greater than chance, might result from the ensemble decoding the location of the milk delivery, rather than the flavor. In our design, this analysis confounds “side” with “flavor”; whereas in both the value transitions and the flavor transitions, side is counterbalanced with the tested factors. As discussed below, we have evidence that the location of the reward is another sensory feature possibly signaled by dopamine neurons.

Author response image 1

Regarding the “selectivity of information encoded” comment, we conducted additional analyses to test whether we could decode the identity of the predictive cue or the response made (side in the rat task, L/R button press in the human task) from midbrain patterns of activity at the time these events occurred. In the rat data, decoding accuracy was significantly greater than chance for cue identity across the entire block (see Author response image 2A) and for “side” across flavor transitions (see Author response image 2B). It could be argued that cue identity is always surprising to the rats, because trial-types are pseudorandomly chosen, thus the presence of this signal undiminished across the block potentially makes sense. And side or location is arguably a feature of the reward, and thus it is reasonable that the error code would include information not only about the flavor of the new reward but also that the flavor is appearing in a new location.

However, in the human data, decoding accuracy was at chance for both analyses (cue identity: t22=0.24, p=0.81; side: t22=0.77, p=0.45). Whether that reflects a problem with the results or a difference in species, task (e.g., side of port vs. left or right button press), or the detail available in BOLD response versus single units, we do not know. And obviously the experiments were not rigorously designed to address the encoding of errors in other information in any case, the way they were designed to test encoding of flavor errors. For these reasons, we have not included these analyses in the revision, since they could be interpreted and potentially misinterpreted in many ways. We would prefer to leave these questions for future, more properly designed studies.

Author response image 2

4) Discuss how these data are compatible (or not) with evidence that optogenetic activation, which activates DA neurons, but disrupts the precise firing pattern of DA ensembles, can promote learning about the sensory features of an event (reviewer 1 point 3).

We agree this is an excellent point. And in our opinion it is a question that is at issue in nearly every optogenetic study of which we are aware (but see Jennings et al., 2019). Specifically, how can the random activation (and in our opinion even inhibition) of neurons engaged in representing information through a meaningful pattern of activity (as we generally accept the entire brain is doing) cause anything meaningful and expected to happen? This, to us, is a fundamental and nearly entirely ignored issue at the heart of the field’s current hyperventilation over optogenetics. We do not pretend to know the answer to this conundrum. However, in our particular case, we would offer the following speculation. Although in isolation the pattern of activation in studies such as Sharpe et al. (2017) would surely be random, owing to factors governing viral expression and light penetration, we believe that in the controlled settings we have used, it can actually recapitulate a somewhat normal pattern of firing due to the presence of well-controlled external events during the stimulation. That is, even though the outcomes are somewhat expected in the blocking design used in Sharpe et al., they are still there. And as a result, the input reflecting their presence is still impinging on the system, although the effect of this input cannot be detected in the output of the dopamine neurons. One might imagine that there could be a subthreshold or “ghost” pattern of activity in dopamine neurons that reflects the presence of this outcome. By randomly injecting current across a subset of this population, our stimulation could recover from this “ghost" pattern one sufficiently similar to the one that would have occurred had the same event been unexpected. As a result, it would reinstate apparently normal learning, given the very simple behavioral readout we use. We are now running studies to look at the neural similarity of such an artificial memory to what is normally learned, but that is our speculation for how to reconcile the current data with those prior results. We have now added this idea to the Discussion.

Reviewer #1:

[…]

- Encoding of sensory information in "classic" reward prediction errors?

It seems that the decoding analysis was applied strictly to the portion of the task that features identity prediction errors (value-neutral switch from O1 to O2). But what about the portion of the task that features "classic" reward prediction errors? Is there also evidence that DA ensembles encode outcome identity when the presence of the outcome altogether, not only its identity, is surprising (arguably the most common scenario when organisms learn about outcomes)?

Please see the response to essential revision 3 above.

- Functional role for sensory information in DA prediction errors?

Authors propose that the information about the identity of a surprising outcome contained in the firing pattern of DA ensembles, instructs downstream areas what to learn. The authors should highlight how this hypothesis is a significant shift from the way prediction errors (even sensory PE) are generally conceived (i.e. as permissive signals for learning, with the content of learning being determined by activity in downstream regions, e.g. Glimcher, 2011). The authors should also consider discussing how this new hypothesis fits with the highly divergent and overlapping nature of DA projections (Arbuthnott and Wickens, 2007; Matsunda et al., 2009; Bolam and Pissadaki, 2012).

We have added to our discussion of the novelty of this idea in our Introduction, referencing Glimcher, 2011 as well as other papers in discussing how this signal would differ from salience. We apologize but we were not sure how to incorporate the other citations. We agree that the implementation or effectiveness of any teaching signal in the spiking would be affected by the distribution and sparsity of the connections to the downstream neurons, and we have generally noted the importance of this in responding to essential revision 2 above. Doing more, especially without knowing specifically what the reviewer is suggesting, seems beyond the scope of the study.

- The authors cite studies showing that DA firing promote learning about sensory features of an event. Some of these studies (Sharpe et al., 2017; Keiflin et al., 2019) used optogenetic activation, which activates DA neurons as a whole but most likely disrupts the precise firing pattern of DA ensembles. How is that compatible with the idea the precise pattern of firing in DA ensembles instruct the content of learning?

Please see the response to essential revision 4 above.

Reviewer #2:

[…]

My main concern is that the underlying implications are not so clear:

- Many dopamine cells have exuberant axonal arbors (although for sure, more in the SNc than the VTA) – thus the experimenter's ability to decode this signal from individual cells may vastly outweigh the ability of downstream neurons or tissue to do this decoding. It would be good to be convinced that this signal had a way of actually mattering. The fMRI results don't constitute such a proof.

Please see the response to essential revision 2 above.

- Related to this – what my computational colleagues refer to as the dimensionality of the problem seems unfavorable. The number of possible juices, let alone things of equivalent value, that could be substituted, is vast. They have elaborated representations in various regions of the cortex which is sized to match. However, there aren't many dopamine neurons. Thus, the fact that there is a reliably decodable signal – coming, for instance, from the upstream sampling properties of individual dopamine neurons (e.g., if they get somewhat random collections of inputs) – when there are only two choices, and the chance to build a sophisticated decoder bears little on the real problem of making state- rather than value-based prediction errors be useful.

Please see the response to essential revision 1 above.

Reviewer #3:

[…]

1) The Discussion should do more to couch the interpretation of the results. A danger with decoding analyses is that just because something can be decoded, it doesn't necessarily mean that the brain is using that information. For example, it would be relatively straightforward to decode line orientation from retinal ganglion cells, even though we know that these cells are not encoding this information.

Please see the response to essential revision 2 above.

2) Figure 3A: The mountain plot obscures the data. I'd suggest just plotting the top-down view and allowing the color to represent decoding accuracy.

To address this comment and the similar one made by reviewer 2, we have rotated the surface plots so that the obscured parts of the plots are visible. In our view, the rotated surface plots illustrate the data well. However, we would happily substitute flat heat plots if the reviewers still believe the surface plots are inadequate.

3) Figure 3C: I don't know why this data has been spatially smoothed. Confusion matrices are not continuous.

We apologize for this problem. Although the data in the confusion matrices were not spatially smoothed, there was some sort of graphics distortion affecting those panels which was apparent only on some computers (which is why we didn’t notice it before submission). We believe we have corrected it in the revised figures; however, if you still see any apparent smoothing on your device, please let us know and we will take further steps to correct the problem.

4) Figure 4D: The description in the figure legend is confusing. The first sentence states that decoding accuracy was above chance on the error trial, but then the next sentence states that it was not above chance on this trial.

We apologize for the lack of clarity. The first sentence referred to error trials on flavor transitions (black line), whereas the second sentence referred to value transitions (gray line). We have now clarified the figure legend.

5) Figure 4E: Unlike Figure 3C, there are no stats to support the claims as to what data is or is not significant.

We have added the permutation test to of the legend for Figure 4E and describe this test in the Materials and methods section. We also indicate in Figure 4D which data are significantly different from chance.

https://doi.org/10.7554/eLife.49315.010

Article and author information

Author details

  1. Thomas A Stalnaker

    Intramural Research Program, National Institute on Drug Abuse, National Institutes of Health, Baltimore, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Supervision, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing, Experiment 1 in rats: designed the experiment with YKT and GS and analyzed the data, Experiment 2 in humans: provided input on approaches and interpretation with SJG and GS, Wrote the manuscript with JDH, TK and GS with input from all of the other authors
    Contributed equally with
    James D Howard
    For correspondence
    thomas.stalnaker@nih.gov
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4402-5448
  2. James D Howard

    Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing—review and editing, Experiment 2 in humans: designed, conducted, and analyzed the experiment with TK, Wrote the manuscript with TAS, TK and GS with input from all of the other authors
    Contributed equally with
    Thomas A Stalnaker
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-9309-3773
  3. Yuji K Takahashi

    Intramural Research Program, National Institute on Drug Abuse, National Institutes of Health, Baltimore, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing, Experiment 1 in rats: designed the experiment with TAS and GS, Conducted the experiment
    Competing interests
    No competing interests declared
  4. Samuel J Gershman

    Department of Psychology and Center for Brain Science, Harvard University, Cambridge, United States
    Contribution
    Conceptualization, Writing—original draft, Writing—review and editing, Experiment 1 in rats: provided input on approaches and interpretation with TK and GS, Experiment 2 in humans: provided input on approaches and interpretation with TAS and GS
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6546-3298
  5. Thorsten Kahnt

    1. Department of Neurology, Feinberg School of Medicine, Northwestern University, Chicago, United States
    2. Department of Psychiatry and Behavioral Sciences, Feinberg School of Medicine, Northwestern University, Chicago, United States
    3. Department of Psychology, Weinberg College of Arts and Sciences, Northwestern University, Chicago, United States
    Contribution
    Conceptualization, Resources, Data curation, Formal analysis, Supervision, Funding acquisition, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing, Experiment 1 in rats: provided input on approaches and interpretation with SJG and GS, Experiment 2 in humans: designed, conducted and analyzed the experiment with JDH, Wrote the manuscript with TAS, JDH and GS with input from all of the other authors
    Contributed equally with
    Geoffrey Schoenbaum
    Competing interests
    Reviewing editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3575-2670
  6. Geoffrey Schoenbaum

    1. Intramural Research Program, National Institute on Drug Abuse, National Institutes of Health, Baltimore, United States
    2. Department of Anatomy and Neurobiology, University of Maryland School of Medicine, Baltimore, United States
    3. Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, United States
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Investigation, Methodology, Writing—original draft, Project administration, Writing—review and editing, Experiment 1 in rats: designed the experiment with YKT and TAS, provided input on approaches and interpretation with TK and SJG, Experiment 2 in humans: provided input on approaches and interpretation with TAS and SJG, Wrote the manuscript with TAS, JDH and TK with input from all of the other authors
    Contributed equally with
    Thorsten Kahnt
    For correspondence
    geoffrey.schoenbaum@nih.gov
    Competing interests
    Reviewing editor, eLife
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-8180-0701

Funding

National Institute on Drug Abuse (ZIA-DA000587)

  • Geoffrey Schoenbaum

National Institute on Deafness and Other Communication Disorders (R01DC015426)

  • Thorsten Kahnt

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

This work was supported by the Intramural Research Program at the National Institute on Drug Abuse and National Institute on Deafness and Other Communication Disorders grant R01DC015426 (to TK). The opinions expressed in this article are the authors’ own and do not reflect the view of the NIH/DHHS.

Ethics

Human subjects: Subjects gave informed consent to participate in the experiment. The protocol (#STU00098371) and consent forms were approved by Northwestern University's Institutional Review Board.

Animal experimentation: This study was performed in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals published by The National Research Council of The National Academies. All of the animals were handled according to approved animal care and use committee protocols of the NIH. The protocol (#15-CNRB-108) was approved by the NIDA-IRP ACUC. All surgery was performed under isoflurane anesthesia, and every effort was made to minimize suffering.

Senior and Reviewing Editor

  1. Kate M Wassum, University of California, Los Angeles, United States

Publication history

  1. Received: August 2, 2019
  2. Accepted: October 28, 2019
  3. Accepted Manuscript published: November 1, 2019 (version 1)
  4. Version of Record published: November 8, 2019 (version 2)

Copyright

This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Metrics

  • 1,238
    Page views
  • 228
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

  1. Further reading