1. Neuroscience
Download icon

Activity patterns of serotonin neurons underlying cognitive flexibility

  1. Sara Matias
  2. Eran Lottem
  3. Guillaume P Dugué
  4. Zachary F Mainen Is a corresponding author
  1. Champalimaud Centre for the Unknown, Portugal
  2. MIT-Portugal Program, Portugal
  3. Institut de Biologie de l’Ecole Normale Supérieure, Centre National de la Recherche Scientifique, UMR8197, Institut National de la Santé et de la Recherche Médicale, France
Research Article
Cited
7
Views
3,498
Comments
0
Cite as: eLife 2017;6:e20552 doi: 10.7554/eLife.20552

Abstract

Serotonin is implicated in mood and affective disorders. However, growing evidence suggests that a core endogenous role is to promote flexible adaptation to changes in the causal structure of the environment, through behavioral inhibition and enhanced plasticity. We used long-term photometric recordings in mice to study a population of dorsal raphe serotonin neurons, whose activity we could link to normal reversal learning using pharmacogenetics. We found that these neurons are activated by both positive and negative prediction errors, and thus report signals similar to those proposed to promote learning in conditions of uncertainty. Furthermore, by comparing the cue responses of serotonin and dopamine neurons, we found differences in learning rates that could explain the importance of serotonin in inhibiting perseverative responding. Our findings show how the activity patterns of serotonin neurons support a role in cognitive flexibility, and suggest a revised model of dopamine–serotonin opponency with potential clinical implications.

https://doi.org/10.7554/eLife.20552.001

eLife digest

Serotonin is a molecule that plays various roles in the human body. In the brain, it is involved in regulating mood and emotions. Growing evidence suggests that serotonin also helps animals – including humans – adapt their behavior to changes in their environment. To allow for such behavioral flexibility, serotonin might promote changes in the underlying brain structures and activity.

In a type of learning known as ‘reversal learning’, for instance, it is necessary to adapt to a sudden change in a previously familiar environment. For example, if there were a road closure on a person’s way to work, they might want to learn to stop following their usual route and learn a new and better one. Previous research has shown that when serotonin signaling is reduced, people persevere. That is, they will keep following the old route even if it is no longer the best choice. How this process works is still largely unknown.

To start unraveling these mechanisms, Matias et al. trained mice in a reversal learning task while manipulating and recording the activity of the neurons that produce serotonin. The results showed that when the activity in serotonin neurons was experimentally blocked, the mice tended to keep looking for a reward that was no longer available. Then, by recording the activity of serotonin neurons, Matias et al. found that it was the surprise of discovering a change in a previously familiar environment that activates serotonin neurons. It did not matter whether the change was better or worse than expected. The findings suggest that together with dopamine, another molecule involved in learning from rewards, serotonin could play an important role during reversal learning.

One next step will be to determine if serotonin mainly stops perseverance in its tracks, or whether it works by helping to unlearn the old behavior, or a combination of both. In the future, this could further our understanding of depression, which can be viewed as a disorder characterized by patients being unable to adapt to adverse situations, leaving them trapped to repeat behaviors and thoughts that are not beneficial. Future studies could also build on these findings to guide the development of new treatments for depression that involve serotonin.

https://doi.org/10.7554/eLife.20552.002

Introduction

Serotonin (5-HT) is classically known to be implicated in mood and affective disorders (Dayan and Huys, 2009; Cools et al., 2011; Li et al., 2012), but it also plays a fundamental role when organisms need to adapt to sudden changes in the causal structure of an environment, such as during extinction and reversal learning paradigms (Clarke et al., 2004, 2007; Boulougouris and Robbins, 2010; Bari et al., 2010; Brigman et al., 2010; Berg et al., 2014). These studies have shown that 5-HT depletion, particularly in the orbitofrontal cortex (OFC) of primates, causes perseverative errors, that is, difficulties in stopping responses to previously rewarded stimuli which are no longer reinforced, without affecting learning of new associations or retention of learned associations (Clarke et al., 2007). Such results seem to stem from two functions of endogenous 5-HT activation: inhibiting learned responses that are not currently adaptive (Soubrié, 1986; Bari and Robbins, 2013) and driving plasticity to reconfigure them (Maya Vetencourt et al., 2008; Jitsuki et al., 2011; He et al., 2015). These mirror dual functions of dopamine (DA) in invigorating reward-related responses (Niv et al., 2007; Panigrahi et al., 2015) and promoting plasticity that reinforces new ones (Tsai et al., 2009; Kim et al., 2012; Steinberg et al., 2013). However, while DA neurons are known to be activated by reward prediction errors (Schultz et al., 1997; Cohen et al., 2012; Eshel et al., 2015), consistent with theories of reinforcement learning (Sutton and Barto, 1998; Schultz et al., 1997), the reported firing patterns of 5-HT neurons (Liu et al., 2014; Cohen et al., 2015; Li et al., 2016) do not accord with any existing theories (Daw et al., 2002; Boureau and Dayan, 2011; Cools et al., 2011; Nakamura, 2013). Indeed, 5-HT neurons have been proposed to signal worse-than-expected outcomes by being activated by negative reward prediction errors in the reinforcement learning framework (Daw et al., 2002; Boureau and Dayan, 2011), but there is little experimental evidence for such a signal in 5-HT neurons (Cohen et al., 2015; Hayashi et al., 2015; Li et al., 2016) and 5-HT activation does not appear to drive aversive learning processes (Dugué et al., 2014; Liu et al., 2014; McDevitt et al., 2014; Qi et al., 2014; Miyazaki et al., 2014; Fonseca et al., 2015) the way DA drives appetitive learning (Tsai et al., 2009; Kim et al., 2012; Steinberg et al., 2013).

To investigate how 5-HT neurons could be involved in cognitive and behavioral flexibility in changing environments, we recorded their activity over several days in mice engaged in a reversal learning task in which the associations between neutral odor cues and different positive and negative outcomes are first well-learned and then suddenly changed. We reasoned that the scarcity of prediction error–like responses in previous recordings of identified 5-HT neurons (Liu et al., 2014; Cohen et al., 2015; Li et al., 2016) or unidentified raphe neurons (Ranade and Mainen, 2009; Hayashi et al., 2015) might be due to inadequately strong prediction errors. In these studies, the omission of rewards in a small fraction of trials was used to generate prediction errors. While increasing the variability of the outcome, this results in expected uncertainty. In contrast, in a reversal task, there is an abrupt violation of previously stable predictions and a step increase in the frequency of the prediction errors, termed unexpected uncertainty. Expected and unexpected uncertainty may differentially activate neuromodulatory systems (Yu and Dayan, 2005).

Results

Pharmacogenetic inactivation of DRN 5-HT neurons slows negative reversal learning

We first sought causal evidence that 5-HT neurons were linked to reversal learning in mice engaged in such a task by using a pharmacogenetic approach to silence 5-HT neurons (Ray et al., 2011; Teissier et al., 2015; Armbruster et al., 2007). Transgenic mice expressing CRE recombinase under the 5-HT transporter promoter (Gong et al., 2007) (SERT-Cre, n = 8) were transduced with a Cre-dependent adeno-associated (AAV.Flex) virus expressing the synthetic receptor Di (DREADD, hM4D) (Armbruster et al., 2007) injected in the dorsal raphe nucleus (DRN), the major source of 5-HT to the forebrain (Figure 1A). These mice and their wild-type littermates (WT, n = 4) were trained in a head-fixed classical conditioning paradigm in which one of four odor cues (conditioned stimuli, CSs) was randomly presented in each trial. After a fixed 2 s trace period, each odor was followed by a tone and a specific outcome, or unconditioned stimulus (US) (Figure 1B top). For two odors the US was a water reward, and for the other two it was nothing (that is, only the tone was played). After training, mice showed learning of the odor–outcome contingencies, as indicated by differences in the anticipatory lick rate (Figure 1B bottom).

Figure 1 with 1 supplement see all
Inhibition of DRN 5-HT neurons causes perseverative responding.

(A) Injections of Cre-dependent hM4Di-mCherry (right) in the dorsal raphe nucleus (DRN) of SERT-Cre mice (left). (B) Trial structure of the task (top) and mean lick rate of an example session along the four trial types (bottom). (C) Reversal procedure (top) and example of adaptation in mean anticipatory licking (baseline lick rate subtracted) across trials around reversals (bottom, gray), with exponential fits to the reversed odors (red and black traces). Gray shade represents the trials of sessions after CNO injection. (D) Mean exponential fits of anticipatory licking for each group of mice after reversal. (E) Mean time constants for the groups in (D) (one-way ANOVA, F2,19 = 6.28, p=0.008 for negative reversal, F2,16 = 0.34, p=0.715 for positive reversal; multiple comparisons indicated in the figure). *p<0.05.

https://doi.org/10.7554/eLife.20552.003

To test the impact of inhibiting DRN SERT-Cre expressing neurons (hereafter simply ‘5-HT neurons’) we used a within-animal cross-over design in which each mouse experienced two reversals (Figure 1C top), receiving the DREADD ligand clozapine-N-oxide (CNO) during one and vehicle during the other; WT mice, which always received CNO, served as additional controls (Figure 1—figure supplement 1A). As expected, mice adjusted their anticipatory licking according to the new associations in both reversals (Figure 1C bottom, gray traces). For worse-than-expected outcomes (negative reversals), the kinetics of adaptation to the new contingencies were significantly slower in hM4D mice receiving CNO, compared to hM4D no-CNO controls and WT controls (Figure 1C,D,E; Figure 1—figure supplement 1B,C). In contrast, for better-than-expected outcomes (positive reversals), there was no significant difference between treatment and control groups (Figure 1D,E).

This experiment shows that a population of 5-HT neurons in the DRN contributes to inhibiting perseverative responding, suggesting an anatomical and genetic substrate for previous results obtained with pharmacological and lesion experiments (Clarke et al., 2004, 2007; Boulougouris and Robbins, 2010; Bari et al., 2010; Brigman et al., 2010). These findings also defined an access point to assess how the net activity of a specific population of 5-HT neurons could account for its effects on reversal learning.

Photometric monitoring of DRN 5-HT activity patterns in a reversal task

To obtain a broad view of DRN 5-HT activity and compare our results to other DRN recording studies (Hayashi et al., 2015; Cohen et al., 2015), for the next series of recording experiments we used a second reversal task in which mice learned to associate four odors with four different outcomes: a large water reward, a small water reward, nothing (neutral) and a mild air puff to the eye (Figure 2A). After approximately two weeks of training, mice showed robust CS-triggered anticipatory licking correlated to the reward value of the associated USs (large water > small water > neutral ≈ air puff) and eye-blink responses to the delivery of air puffs (Figure 2B). We then reversed the CS–US associations in pairs, such that the CSs associated with the large and small rewards now predicted the air puff and neutral outcomes, respectively, and vice versa (Figure 2C). Upon this reversal, mice experienced strong violations of CS-based expectations (unexpected uncertainty), both positive and negative in value, when the unexpected USs were delivered. Anticipatory licking measurements showed that mice adapted to reversal of contingencies over 1–3 additional sessions (Figure 2D).

Behavior of head-fixed mice trained in a reversal task.

(A) Schematics of the trial structure in the classical conditioning task (before reversal) with four different outcomes. In each trial, one of four odors was randomly selected and presented for 1 s after a variable foreperiod (Forep). The associated outcome was delivered after a 2 s trace period, together with a tone (same tone for all trial types). Mice were presented with 140 to 346 interleaved trials (mean ± SD: 223 ± 30) per session (day). (B) Top: Mean lick rate of SERT-Cre mice in this task (n = 10) along the duration of each trial type. For each mouse, three sessions of the classical conditioning task where initial associations had already been learned were averaged. Bottom: Mean eye movement of SERT-Cre mice (n = 6) along the duration of each trial type. Shaded areas represent s.e.m. (C) Reversal of CS–US contingencies (negative reversal: CS 1 and 2; positive reversal: CS 3 and 4). (D) Anticipatory licking (mean of 500–2800 ms after odor onset, after subtracting the baseline) across mice for sessions around reversal, showing that the lick rate triggered by the presentation of each odor is adjusted after reversal (n = 8, two-way ANOVA with factors day (days −2 and −1 are considered together) and mouse, main effect of day: F4,2597 = 722.14, p<0.001 for odor 1, F4,2554 = 355.53, p<0.001 for odor 2, F4,2513 = 104.93, p<0.001 for odor 3, F4,2559 = 381.55, p<0.001 for odor 4). Colors follow odor identity as in (A). ***p<0.001.

https://doi.org/10.7554/eLife.20552.005

To record the population activity of 5-HT neurons across days around the time of the reversal, we used photometry to monitor the activity of these DRN 5-HT neurons through an implanted optical fiber (Tecuapetla et al., 2014) (Figure 3A). SERT-Cre mice were infected in the DRN using two AAV.Flex viruses containing the genetically-encoded calcium indicator GCaMP6s (Chen et al., 2013) and the activity-insensitive fluorophore, tdTomato (Figure 3B,C). We verified the specificity of GCaMP6s expression to DRN 5-HT neurons using histological methods (Figure 3—figure supplement 1). We used a regression-based method to decompose the dual fluorescence signals into a GCaMP6s-specific component, reflecting activity-dependent changes, and a shared component, reflecting general fluorescence changes (for example, movement artifacts; see Methods and Figure 3—figure supplement 2). We validated the effectiveness of this approach in control mice (n = 4) infected in the DRN with yellow fluorescent protein (YFP; replacing GCaMP) and tdTomato (Figure 3—figure supplements 1 and 2).

Figure 3 with 5 supplements see all
Responses of 5-HT and DA neurons before reversal.

(A) Fiber photometry with movement artifact correction in head-fixed mice. L: laser; PMT: photomultiplier tube; D.M: dichroic mirror; Ex: excitation; Em: emission; F: filter. (B) Cre-dependent fluorophores used. (C) Coronal section showing expression of GCaMP6s and tdTomato in the DRN of a SERT-Cre mouse (scale bar: 200 µm). PAG: periaqueductal gray; Aq: Aqueduct. (D) Mean responses of 5-HT neurons to the four CSs and USs during an example session of a mouse before reversal. Shaded areas represent 95% confidence interval (CI). (E) Coronal section showing expression of GCaMP6s and tdTomato in the ventral tegmental area (VTA) of a TH-Cre mouse (scale bar: 200 µm). RLi: rostral linear nucleus of the raphe; RPC: red nucleus, parvicellular part; IPR: interpeduncular nucleus. (F) Mean responses of DA neurons to the four CSs and USs during an example session of a mouse before reversal. Shaded areas represent 95% CI.

https://doi.org/10.7554/eLife.20552.006

Before reversal, photometric 5-HT responses were similar to previous electrical (Liu et al., 2014; Cohen et al., 2015) and photometric (Li et al., 2016) recordings of identified 5-HT neurons: 5-HT neurons were activated by reward-predicting CSs and air puffs (Figure 3D, Figure 3—figure supplement 3). YFP control mice implanted and recorded in the same manner showed no photometric responses to these events (Figure 3—figure supplement 4). To compare directly how DA neurons respond in the same paradigm, we infected TH-Cre mice and targeted neurons in either the posterior lateral ventral tegmental nucleus (VTA) or the substantia nigra pars compacta (SNc) (Figure 3E, Figure 3—figure supplement 5). DA photometry responses in these two areas were similar and were therefore combined. As expected, DA neurons were activated by reward-predicting cues, and showed small responses to predicted rewards (Figure 3F).

DRN 5-HT neurons respond to both positive and negative US prediction errors

To understand the pattern of 5-HT neural activity that could underlie adaptation to reversal of contingencies, we first analyzed US responses, which could contribute to or modulate reinforcement learning. In general, we found that the abrupt reversal of cue–outcome associations caused immediate changes in 5-HT and DA US responses, much more so than in reward omission tests (Ranade and Mainen, 2009; Cohen et al., 2015; Hayashi et al., 2015; Li et al., 2016), consistent with sensitivity to the sudden increase in uncertainty that occurred upon reversal after extensive training.

We first examined the case of positive reversals. 5-HT neurons showed little or no response to large water rewards before reversal when they were predicted by the preceding CS, but responded robustly to the same events when they were unpredicted, after reversal (Figure 4A,B). Thus, 5-HT neurons showed an excitatory response to a better-than-expected outcome, or positive reward prediction error (RPE). The response to the small reward was also modulated by reward expectation (Figure 4—figure supplement 1), although to a lesser degree, perhaps due to the presence of a small response even after extensive training (Figure 3—figure supplement 3B). Like 5-HT neurons, DA neurons also showed stronger excitatory responses to water rewards immediately after reversal when they violated cue-based predictions, as opposed to before reversal when they occurred as predicted (Figure 4C, Figure 4—figure supplement 1C). Therefore, both 5-HT and DA neurons showed an increase in activity in response to positive RPEs, and both showed a larger response for the larger magnitude RPE.

Figure 4 with 1 supplement see all
US responses of 5-HT and DA neurons to the large reward during reversal.

(A) Schematic of the reversal procedure following the large reward US. (B) Top: Mean large reward US responses of an example mouse (SERT1) across days around reversal (shaded areas represent 95% CI); Bottom: change in mean large reward response amplitude (z-scored across days): gray dots represent individual mice (n = 8), black dots average (± s.e.m.) of mice (two-way ANOVA with factors day and mouse, the main effect of day is F4,2592 = 31.47 p<0.001; multiple comparisons with the two days before reversal, corrected using Scheffé’s method, are indicated in the figure). (C) Same as (B) for DA neurons (n = 3 mice): F4,853 = 32.46, p<0.001. *p<0.05, ***p<0.001.

https://doi.org/10.7554/eLife.20552.012

We next examined the response to the neutral USs. Before reversal, this US elicited little response from either 5-HT or DA neurons. After reversal, the neutral US was presented when a small water reward was predicted. Therefore, it represented a reward omission or negative RPE. Interestingly, 5-HT neurons showed a robust excitatory response to the neutral US after reversal (Figure 5B). In contrast, DA neurons showed an inhibitory response to the same event (Figure 5C).

Figure 5 with 1 supplement see all
US responses of 5-HT and DA neurons to neutral outcome during reversal.

(A) Schematic of the reversal procedure following neutral US. (B) Top: Mean neutral US responses of an example mouse (SERT1) across days around reversal (shaded areas represent 95% CI); Bottom: change in mean neutral response amplitude (z-scored across days): gray dots represent individual mice (n = 8), black dots average (± s.e.m.) of mice (two-way ANOVA with factors day and mouse, the main effect of day F4,2535 = 10.71, p<0.001; multiple comparisons with the two days before reversal, corrected using Scheffé’s method, are indicated in the figure). (C) Same as (B) for DA neurons (n = 3 mice): F4,843 = 4.54, p=0.001. *p<0.05, ***p<0.001.

https://doi.org/10.7554/eLife.20552.014

Taking the neutral and rewarding USs together, the results show that midbrain DA neurons respond to positive and negative RPEs with modulation of the opposite sign, as reported previously in reward omission paradigms (Cohen et al., 2012; Schultz et al., 1997); but see Matsumoto and Hikosaka (2009); Lammel et al. (2011); Kim et al. (2016); Matsumoto et al., 2016). On the other hand, SERT-positive DRN 5-HT neurons show excitatory responses to both positive and negative RPEs. Thus, DRN 5-HT responses to rewards and reward omissions resemble an ‘unsigned RPE’ or ‘surprise’ signal (see Discussion).

Finally, we examined the response of 5-HT and DA neurons to predicted and unpredicted air puffs. In contrast to other USs, DRN 5-HT neurons were mildly activated by air puff USs, even after extensive training (Figure 3; Figure 3—figure supplement 3B). Upon reversal, despite the fact that the air puff US now represented a large negative RPE (since the large water reward was predicted), 5-HT neurons showed no significant response (Figure 5—figure supplement 1B). Midbrain DA neurons, on the other hand, showed no response to the air puff US after training, but showed a small but significant inhibitory response after reversal (Figure 5—figure supplement 1C).

The results for all USs are summarized in Figure 6. Overall, midbrain DA responses adhered closely to the model of a ‘signed RPE’, including for the air puff, whereas the DRN 5-HT neurons resembled an ‘unsigned RPE’ with respect to rewards and reward omissions, but they diverged from this model for air puff responses (see Discussion for further interpretation). Thus, 5-HT and DA neurons are both sensitive to violations of expectation that occur during an abrupt reversal, with the two systems responding in the same way to better-than-expected outcomes but in opposite ways to worse-than-expected outcomes.

Responses of 5-HT and DA neurons to outcomes are differentially modulated by expectations.

(A) Mean (± s.e.m.) response of 5-HT neurons, across mice, to the four USs before (day −1, filled bars) and right after (day 0, open bars) reversal (n = 8 mice, two-way ANOVA with factors mouse and day, the main effect of day F1,764 = 84.36, p<0.001 for large reward, F1,748 = 3.49, p=0.062 for small reward, F1,756 = 38.17, p<0.001 for neutral, F1,766 = 2.79, p=0.095 for air puff). (B) Same as (A) for midbrain DA neurons (n = 3 mice, F1,249 = 67.9, p<0.001 for large reward, F1,277 = 8.49, p=0.004 for small reward, F1,278 = 10.95, p=0.001 for neutral, F1,250 = 12.74, p<0.001 for air puff. **p<0.01, ***p<0.001.

https://doi.org/10.7554/eLife.20552.016

DRN 5-HT neurons are activated by out-of-context delivery of USs

To further investigate the idea that 5-HT neurons might report prediction errors, we examined responses to USs delivered outside of the normal context. For this, five days after reversal, on a small fraction (20%) of trials, a randomly-selected US was delivered at the time that a CS was normally presented (Figure 7A). We found that water rewards produced larger 5-HT responses when they were presented in this way, compared to when preceded by a well-learned cue (Figure 7B). Of particular interest was that even neutral tones produced an excitatory response when an odor was expected (Figure 7B; Figure 7—figure supplement 1). Therefore, 5-HT neurons were activated by the substitution of one neutral stimulus with another. DA neurons also responded strongly to uncued rewards, as previously reported (Schultz et al., 1997; Cohen et al., 2012), but little to other uncued USs (Figure 7C) (Matsumoto et al., 2016). Thus, consistent with the responses following CS–US reversal, this experiment also showed that, with respect to water rewards and reward omissions, 5-HT neurons respond in the same manner to unexpected events, whether negative, neutral or positive, whereas DA neurons are primarily sensitive to unexpected events that have some reward value.

Figure 7 with 1 supplement see all
DRN 5-HT neurons respond more to uncued outcomes.

(A) Behavioral task diagram. (B) Mean (± s.e.m.) response of DRN 5-HT neurons across mice to the four USs when they are predicted (filled bars) and when they are unpredicted (open bars) (n = 4 mice, two-way ANOVA with factors type (predicted or unpredicted) and mouse, the main effect of type: large reward F1,923 = 45.17, p<0.001, small reward F1,944 = 8.42, p=0.0038, neutral F1,924 = 5.36, p=0.0208, air puff F1,924 = 0.61, p=0.4331). (C) Same as (B) but for midbrain DA neurons (n = 3 mice, large reward F1,642 = 175.05, p<0.001, small reward F1,589 = 17.53, p<0.001, neutral F1,673 = 0.52, p=0.4707, air puff F1,601 = 0.34, p=0.5598). *p<0.05, **p<0.01, ***p<0.001.

https://doi.org/10.7554/eLife.20552.017

CS responses of 5-HT neurons have slower kinetics after reversal than DA neurons’

US responses are appropriate to drive learning across trials, but occur too late within a given trial to inhibit CS-driven behavioral responses directly. If it is to intervene in time to prevent a response, behavioral inhibition should be triggered by predictive CS cues. We therefore examined the CS responses of 5-HT and DA neurons carefully, to test how they might contribute to reversal learning.

Before reversal, both 5-HT and DA neurons showed CS responses that correlated with the relative value of the US predicted by the CS (large reward > small reward > neutral ≈ air puff) (Figure 3, Figure 3—figure supplements 3 and 5). After the reversal, both adjusted to the new contingencies such that, by three days post-reversal, the CS responses reflected their new US associations (Figure 8). Thus, despite small differences in their relative magnitudes, and in contrast to their distinct US responses, DA and 5-HT neurons showed CS responses that were remarkably similar, both before and after reversal learning. If DA and 5-HT have opposing direct effects on behavior (for example, Cools et al., 2011), these results suggest that they would simply cancel one another out.

5-HT and DA CS responses are relearned after the reversal.

(A) Mean (± s.e.m.) response of 5-HT neurons across mice to the four CSs before reversal (filled bars) and after adaptation to the reversed contingencies (open bars) (n = 8 mice, two-way ANOVA with factors day and mouse, the main effect of day: large reward F1,906 = 17.35, p<0.001, small reward F1,902 = 14.87, p<0.001, neutral F1,882 = 0.13, p=0.72, air puff F1,914 = 17.12, p<0.001). (B) Same as (A) for midbrain DA neurons (n = 3 mice, large reward F1,294 = 15.35, p<0.001, small reward F1,336 = 71.72, p<0.001, neutral F1,282 = 3.45, p=0.06, air puff F1,312 = 6.56, p=0.01). *p<0.05, ***p<0.001.

https://doi.org/10.7554/eLife.20552.019

However, when we analyzed the time course of the adaptation of the CS responses, we found that 5-HT CS responses had a markedly slower rate of adaptation to the new contingencies than did DA CS responses (Figure 9A,B, Figure 9—figure supplements 1 and 2). The difference in the time constant of CS adaptation was significant for both negative and positive reversals, and was not due to differences in learning rates between groups of mice (Figure 9C). We also tested whether US responses, which presumably reflect, in part, CS-related learning, also show a difference in the time course of adaptation. However, because the US signals showed a smaller signal-to-noise ratio than the CS signals, reliable time courses could not be extracted.

Figure 9 with 2 supplements see all
Distinct speed of CS reversal learning in DRN 5-HT and midbrain DA neurons.

(A) Normalized exponential fits (black traces) to the mean amplitude of the CS responses (gray traces) across trials for CS 2 and CS 3 of an example SERT-Cre mouse. Insets on top show mean CS response (and 95% CI) on days −1 (left) and 3 (right). (B) Same as (A) for an example TH-Cre mouse. (C) Mean time constants (± s.e.m., green and purple dots) of the exponential fits of CS responses obtained for TH-Cre and SERT-Cre mice during reversal learning (neural activity: unpaired t-tests, p<0.001 for negative reversal, p=0.0023 for positive reversal; no significance obtained for anticipatory licking). Gray dots represent individual mouse–odor pairs for each category of reversal type; gray dots with darker edges represent odors 2 or 4, while the remaining dots represent odors 1 or 3. (D) Difference in the mean fitted amplitude of CS response between DA and 5-HT during negative reversal (left) and during positive reversal (right). **p<0.01, ***p<0.001.

https://doi.org/10.7554/eLife.20552.020

A potentially important consequence of the difference in CS learning time constants is that it implies an asymmetry between DA and 5-HT systems in positive and negative reversals (Solomon and Corbit, 1974). During a positive reversal, because the adaptation of 5-HT cue responses is much slower than that of DA cue responses, the net signal will be transiently biased towards the effects of DA (Figure 9D, right). Conversely, during a negative reversal, because 5-HT cue responses persist longer than those of DA, the difference will be biased towards the effects of 5-HT (Figure 9D, left). This suggests a novel mechanism by which 5-HT can contribute to preventing perseverative responding during negative reversals (Clarke et al., 2007), by directly inhibiting behavioral responses to CSs that have undergone decreases in associated outcome values.

The DREADD inactivation experiment (Figure 1) supported the contribution of 5-HT to negative reversal learning, but did not distinguish whether the relevant activity occurs during the CS or the US. To test for a contribution of the CS-related activity, we asked whether there was a correlation in the animal-to-animal variability in the time constant of behavioral adaptation (anticipatory licking) and neural adaptation (CS magnitude). Remarkably, we observed a significant correlation between the time constant of DRN 5-HT CS response and the time constant of CS-related licking for the negative reversals but not the positive one (Figure 10), suggesting that these responses could be involved in adapting to negative reversals. Moreover, during such negative reversals the time constant of DRN 5-HT responses was slower than that of anticipatory licking for all animals (Figure 10; see Figure 10—figure supplement 1 for DA). We note that, while we expected that the adaptation of 5-HT CS responses to the reversal should be at least as slow as that of anticipatory licking for the two to be causally related, the fact that it was much slower (around eight times as slow) requires an explanation. One possibility is that our behavioral readout (that is, tongue protrusions long enough to be detected by our sensor) is just a ‘tip-of-the-iceberg’ of motor responses to appetitive cues, and that other, covert, movements also need to be suppressed by 5-HT during relearning, and thus 5-HT neurons need to be active until all motor responses to appetitive cues have disappeared. Alternatively, it may be the case that 5-HT CS responses could serve more than a mere motor suppression function during reversal learning, and contribute to the longer-lasting learning processes required for reversal learning (He et al., 2015), such as those that prevent spontaneous recovery following extinction training (Karpova et al., 2011).

Figure 10 with 1 supplement see all
The correlation between the speed of DRN 5-HT cue learning and anticipatory licking.

(A) Correlation between time constants of 5-HT CS responses and anticipatory licking for the negative reversal. A significant linear relationship was found: y = 8.4*x + 34; r2: 0.288; F = 5.67, p=0.032. (B) Same as (A) for positive reversals (no relationship was found). Diagonal dashed lines represent y = x.

https://doi.org/10.7554/eLife.20552.023

Discussion

We used a reversal learning task in head-fixed mice to study the role of 5-HT in adapting to the reversal of cue–outcome contingencies, a model of the cognitive flexibility required to adapt to dynamic environmental conditions. Pharmacogenetic inhibition of DRN 5-HT neurons showed that 5-HT activity contributes to preventing perseverative responses to formerly reward-predictive cues, consistent with previous work in rodents and primates (Clarke et al., 2004, 2007; Boulougouris and Robbins, 2010; Bari et al., 2010; Brigman et al., 2010; Berg et al., 2014; Bari and Robbins, 2013). These observations suggest two possible complementary contributions of 5-HT to behavioral flexibility: (1) to facilitate the learning of new associations and (2) to directly inhibit responses which are no longer appropriate. To elucidate how the dynamics of endogenous neural activity could support these functions, we used fiber photometry to monitor 5-HT and DA during reversal learning. This revealed two important findings.

DRN 5-HT neurons are activated by both positive and negative reward prediction errors

First, we found that 5-HT US responses were strongly sensitive to changes in cue–outcome contingency after the reversal. Remarkably, 5-HT neurons responded with a similar transient excitation to violations of expectation that were either better-than-expected or worse-than-expected reward outcomes. Midbrain DA neurons, on the other hand, responded oppositely to better-than-expected and worse-than-expected outcomes. Thus, whereas DA neurons could be described as reporting a signed RPE, 5-HT neurons appeared to report, in part, an unsigned RPE (but see below for discussion of responses to aversive events). That is, 5-HT neurons were sensitive not to the direction of error but to its magnitude. These responses could also be described as a type of ‘surprise’ signal (for example, Courville et al., 2006). Supporting this idea, we found that 5-HT neurons were also sensitive to substitution of one neutral cue for a cue of another modality (sound for odor). It remains to be determined whether these responses were dictated entirely by small differences in reward value, or whether they reflect sensory as well as value prediction errors.

Unsigned prediction error signals have been proposed on theoretical grounds to be ideal for regulating learning and attention based on uncertainty (Pearce and Hall, 1980; Courville et al., 2006). By reporting such signals, 5-HT US responses would be suitable to drive plasticity and re-learning during reversal of contingencies. The strong excitatory response of the 5-HT system to negative RPEs, caused by reward omissions, provides a possible explanation for why inhibiting this system impairs negative reversal learning (Clarke et al., 2007; Bari et al., 2010) (Figure 1). That is, during negative reversals or extinction learning, the 5-HT system, either directly or through an interaction with the DA system (Boureau and Dayan, 2011), could facilitate trial-by-trial undoing of DA-dependent learning. Since 5-HT neurons also respond during positive prediction errors, such as during positive reversal or initial learning, such activation might compete with co-occurring DA signals, slowing positive learning, as has been described (Fletcher et al., 1999). The preferential involvement of 5-HT in ‘unlearning’ responses could be explained by the relative effects of 5-HT release on downstream targets, where 5-HT may favor long-term depression (LTD) and DA long term potentiation (LTP) (He et al., 2015).

Response to aversive events by DRN 5-HT neurons

5-HT US responses contained one notable divergence from an idealized prediction error: air puff USs continued to evoke responses, even after extensive training, and showed only minor sensitivity to the presence of a predictive cue — observations consistent with a previous report (Cohen et al., 2015). One possible explanation is that mice failed to learn the predictive relationship between the CS and the air puff. Indeed, mice showed air puff–triggered blink responses, but failed to learn anticipatory blinking responses despite extensive training. This result likely depends on the relatively long duration of the CS–US trace period, here 2 s (Reynolds, 1945; Boneau, 1958; Cohen et al., 2012, 2015; Caro-Martín et al., 2015; cf. Matsumoto et al., 2016). This is consistent with the idea that mice did not learn the CS–air puff association. If mice formed no CS-dependent predictions about the air puff, then they might have experienced each air puff as ‘unpredicted’, whether before or after reversal. In this case, the presence of robust air puff US responses would be consistent with an unsigned value prediction error. However, since we have no explanation for how mice could succeed in learning a CS–reward association while failing to learn the CS–air puff association, other explanations should also be considered.

A second possible line of explanation for the observation that the air puff did not elicit an increased response after the reversal is that 5-HT neurons report at least two qualitatively distinct signals: one relating to the processing of rewards and the other to the processing of aversive stimuli. In principle, following a reversal from large reward to air puff, one would have expected a contribution of the reward omission response to the US response, as seen in the small reward to neutral reversal. The lack of such a response could indicate either simple saturation or a suppressive influence of the air puff on the reward omission signal. A distinction between the encoding of rewarding vs. aversive events by the DA system has been proposed (Fiorillo, 2013). The presence of dual signals might reflect the inclusion of multiple 5-HT neuronal populations within our photometric recordings. In future experiments, these could be distinguished using a pathway-specific labeling, as has been done in the DA system (Lerner et al., 2015; see further discussion below). On the other hand, VTA DA neurons have been reported to integrate reward and aversive outcome values, but with aversive responses being strongly modulated by the rate of reward available in the current context (Matsumoto et al., 2016). In future experiments, it will be important to understand how individual 5-HT neurons integrate information from combinations of outcomes, and in different reward contexts.

An alternative possibility is that the pattern of 5-HT US responses could be understood together as a variation on a prediction error signal. Whereas mice can control the consumption of available water, they cannot control the delivery of air puffs; they are afforded no means to escape in the head-fixed configuration. It is therefore interesting to consider the possibility that 5-HT neurons might report errors of control rather than errors of prediction. Under this hypothesis, an aversive outcome such as the air puff continues to generate a response in 5-HT neurons because the organism has not managed to control this aspect of its environment. If the mouse were offered a means to escape, we would expect to see the air puff response diminish. Conversely, because the 5-HT US response is also sensitive to errors of a positive nature, we would also expect to see continued responses to a non-controllable reward, for example, direct oral infusion of sucrose (Li et al., 2016). Such ‘unsigned control errors’ could provide the organism with a signal of the magnitude of cognitive or behavioral effort required to adapt to a given situation, a signal that could be read out for the purpose of energizing or deenergizing actions.

Consistent with the control error hypothesis, predictable but uncontrollable shocks robustly activate the immediate early gene Fos in DRN 5-HT neurons (Takase et al., 2004), and this activation is lowered by controllability signals from the ventral medial prefrontal cortex (Bland et al., 2003; Amat et al., 2005). This proposal also finds support in a recent study showing that DRN 5-HT activity mediates short-term sensorimotor adaptation in zebrafish, by reporting the difference between the expected and actual sensory consequences of motor commands (Kawashima et al., 2016). However, further experiments will clearly be necessary to test these ideas as explanations of the present data.

5-HT CS responses could be responsible for inhibiting perseverative responding

5-HT could thus contribute to cognitive flexibility not only through learning and plasticity, but also by directly suppressing activity in systems responsible for violated predictions. Indeed, 5-HT has been strongly associated with suppressing both impulsive and perseverative responses through ‘behavioral inhibition’ (Clarke et al., 2007; Boureau and Dayan, 2011; Cools et al., 2011). In addition to US signals that could explain the contribution of 5-HT to uncertainty-driven learning, we also found CS or cue responses that could explain a direct and immediate contribution to behavioral control during environmental change. We found that 5-HT CS responses, like DA CS responses, were strongly positively correlated with CS value, consistent with previous reports (Liu et al., 2014; Cohen et al., 2015; Hayashi et al., 2015). Indeed, 5-HT and DA CS signals were qualitatively extremely similar, both after initial training and after relearning. Given that 5-HT and DA are thought to drive opposing processes of behavioral inhibition and invigoration, respectively (Boureau and Dayan, 2011; Cools et al., 2011), this would suggest that the two systems effectively cancel one another out. However, surprisingly, we found that the CS responses of 5-HT neurons were not only much slower than DA neurons to adapt to new associations after the reversal, but were also maintained throughout the extinction of the maladaptive perseverative response, as would be needed to prevent interference of the old appetitive response. Furthermore, there was a significant correlation across animals in the post-reversal learning rates of trial-by-trial 5-HT activity and that of anticipatory licking (Figure 10).

This difference in rates of adaptation between the two systems, which to our knowledge was not previously reported in any neuromodulatory system, implies that the net balance between DA and 5-HT will undergo specific dynamics during learning that resemble the classical proposal concerning opponent processes by Solomon & Corbit (1974). Specifically, because DA cue responses are quicker to establish, cues undergoing positive changes in outcome value will temporarily favor DA signals. Conversely, because DA cue responses are also quicker to withdraw, cues undergoing negative changes in outcome value will temporarily favor 5-HT signals. Thus, positive changes will favor DA and behavioral invigoration, and negative changes will favor 5-HT and behavioral suppression. This may explain why 5-HT is specifically critical in preventing responses to cues that were previously rewarding, which is observed experimentally (Figure 1; Clarke et al., 2007). The origins of the differences in 5-HT and DA learning dynamics will be important to uncover, and might arise from differences in the systems feeding into the two neuromodulators. Interestingly, neural responses in the caudate nucleus, a major recipient of DA projections (Clarke et al., 2011), adapt faster during reversal learning, while the PFC, a major target of 5-HT projections (Muzerelle et al., 2016), adapts more slowly (Pasupathy and Miller, 2005).

Implications of neuronal heterogeneity and other complexities of the 5-HT system

The technique of fiber photometry of genetically-encoded calcium indicators provides excellent genetic specificity and stable long-term recordings, but does not allow the resolution of single-neuron responses. It is therefore possible that differential activity patterns within specific subpopulations of DRN 5-HT neurons exist that could not be resolved by this recording method. In fact, several studies point to a heterogeneity among DRN neurons, both in terms of physiological responses (Ranade and Mainen, 2009) and in terms of projection targets of DRN cell groups (Muzerelle et al., 2016) and single neurons (Gagnon and Parent, 2014). This would suggest that the different signals we observed—for example, CS vs. US or rewarding vs. aversive USs—could have different origins and functions.

Even if this is the case to some extent, and given the consistency of our optical fiber targeting (Figure 3—figure supplement 1), we believe that such heterogeneity probably will not substantially impact our conclusions for several reasons. First, importantly, we established that the population from which we are recording contributes to reversal learning, and it is therefore a relevant population. Second, activity patterns were consistent across mice (Figure 3—figure supplements 3 and 5, Figure 4, Figure 4—figure supplement 1, Figure 5, Figure 5—figure supplement 1, Figure 9—figure supplements 1 and 2), despite inevitable small variations in infections and fiber placements (Figure 3—figure supplement 1), indicating that the findings are robust to the precise population monitored. Third, single-unit recordings (Cohen et al., 2015; Hayashi et al., 2015) show that rewarding and aversive events activate the same individual DRN neurons (including identified 5-HT neurons), and are therefore not generated by distinct populations. Finally, because individual 5-HT neurons have broad projection fields (Muzerelle et al., 2016) and transmit primarily by volume conduction (Dankoski and Wightman, 2013), heterogeneity will tend to be averaged out through pooling by downstream targets.

Another limitation of our study relates to the pharmacogenetic approach to inhibiting 5-HT neurons. While it has good genetic specificity, its spatial resolution is limited by the spread of the viral particles containing the hM4Di receptor in the DRN, and its temporal resolution is on the order of dozens of minutes. Additionally, although we know this approach should inhibit 5-HT neurons in vivo (Teissier et al., 2015), we did not test the efficacy of this inhibition in our animals. The limited temporal resolution of this approach makes it impossible to distinguish the contribution of CS and US 5-HT signals to behavioral flexibility. Still, we have an indication that CS responses might play a role in behavioral inhibition of perseverative responding. This could potentially be resolved in future experiments using optogenetic inhibition.

Implications for the DA–5-HT opponency theory

Our results support, in a general sense, the long-standing notion of DA–5-HT opponency (Boureau and Dayan, 2011), but call for a refinement of this view. Rather than carrying the positive and negative sides of a single-signed prediction error (Daw et al., 2002; Boureau and Dayan, 2011; Cools et al., 2011), DA–5-HT opponency seems to be more complex and subtle. As has been classically described, the activity of DA neurons which we recorded closely resembled a so-called signed RPE (Schultz et al., 1997). The US-related 5-HT signals, on the other hand, resemble in important respects, but don’t perfectly match, the concept of an ‘unsigned RPE’ signal. Thus, 5-HT neurons responded not to an opposing class of events, but to an overlapping and broader range of events compared to DA. In this respect, they might be acting as a kind of inhibitory ‘surround’ to DA’s excitatory ‘center’, helping to sharpen the focus of behavioral attention. Nevertheless, just as DA signaling is increasingly acknowledged to be more complex than classically described (Cohen et al., 2012; Eshel et al., 2015; 2016; Matsumoto et al., 2016; Wise, 2004), attributing a single function to 5-HT neurons is also clearly an oversimplification.

With respect to CS responses, 5-HT neurons showed a remarkably similar pattern of activity to that of DA neurons, scaling closely with the value of the stimuli. A possible explanation for this observation is that 5-HT CS responses could be learned by the same DA-dependent process that generates DA CS responses. If this were the entire story, then 5-HT and DA CS responses might simply balance and nullify one another. However, the fact that 5-HT CS responses evolved much more slowly than did DA CS responses means that such a balance will not hold in dynamic environments. This dynamic balance between positive and negative forces resembles the balance of excitation and inhibition in the cortex (for example, Wehr and Zador, 2003), albeit on a much slower time scale. Such a temporal asymmetry between opponent processes endows the joint system with novel and potentially important dynamics, which may be an important substrate in the dynamics of learning, as previously proposed (Solomon and Corbit, 1974). CS and US responses of a similar nature to those observed in 5-HT and DA neurons also appear to be observed in other neuromodulatory systems as well (Yu and Dayan, 2005; Dayan and Yu, 2006; Sara and Bouret, 2012; Hangya et al., 2015). This suggests that, contrary to the notion that each neuromodulator reports a completely distinct signal (Daw et al., 2002; Doya, 2008; Dayan, 2012), they have highly overlapping signals, presumably derived from partly overlapping inputs, but with more subtle differences through which their joint actions are orchestrated.

This description of the dynamics of 5-HT neurons during reversal learning provides novel insights into how this system can contribute to cognitive flexibility. Moreover, the results also suggest the need for a refinement in conventional conceptions of 5-HT’s function in the regulation of mood, with implications for understanding its role in depression and other psychiatric disorders. More than reporting the affective value of the environment (Boureau and Dayan, 2011; Luo et al., 2016), we suggest that 5-HT facilitates the ability of an organism to adapt flexibly to dynamic environments through plasticity and behavioral control. The clinical benefits of an enhancement of 5-HT function would therefore stem not from directly biasing affective states toward the positive, but by preventing the negative consequences of maladaptive world views and facilitating adaptive change (Branchi, 2011).

Materials and methods

Animals

All procedures were reviewed and performed in accordance with the European Union Directive 2010/63/EU and the Champalimaud Centre for the Unknown Ethics Committee guidelines, and approved by the Portuguese Veterinary General Board (Direcção Geral de Veterinária, approvals 0420/000/000/2011 and 0421/000/000/2016). Thirty-four C57BL/6 male mice between two and nine months of age were used in this study. Mice resulted from the backcrossing of BAC transgenic mice into Black C57BL for at least six generations, and expressed the Cre recombinase under the control of specific promoters. Twenty-six mice expressed Cre under the serotonin transporter gene (Tg(Slc6a4-cre)ET33Gsat/Mmucd) from GENSAT (Gong et al., 2007); RRID:MMRRC_017260-UCD), four mice under the tyrosine hydroxylase gene, two mice (Tg(Th-cre)FI12Gsat/Mmucd) from GENSAT (Gong et al., 2007); RRID:MMRRC_017262-UCD), and two mice (B6.Cg-Tg(Th-Cre)1Tmd/J) from the Jackson Laboratory (Savitt et al., 2005); RRID:IMSR_JAX:008601). Animals (25–45 g) were group-housed prior to surgery and individually housed post-surgery and kept under a normal 12 hr light/dark cycle. All experiments were performed in the light phase. Mice had free access to food. After training initiation, mice used in behavioral experiments had water availability restricted to the behavioral sessions.

Stereotaxic viral injections and fiber implantation

Mice were deeply anaesthetized with isoflurane mixed with O2 (4% for induction and 0.5–1% for maintenance) and placed in a stereotaxic apparatus (David Kopf Instruments). Butorphanol (0.4 mg/kg) was injected subcutaneously for analgesia and Lidocaine (2%) was injected subcutaneously before incising the scalp and exposing the skull. For SERT-Cre mice a craniotomy was drilled over lobule 4/5 of the cerebellum, and a pipette filled with a viral solution was lowered to the DRN (bregma −4.55 anteroposterior (AP), −2.85 dorsoventral (DV)) with a 32° angle toward the back of the animal. For the two TH-Cre mice from The Jackson Laboratory, the pipette was targeted to the VTA (bregma −3.3 AP, 0.35 mediolateral (ML), −4.2 DV) with a 10° lateral angle, and for the two TH-Cre mice from GENSAT we targeted the SNc (bregma −3.15 AP, 1.4 ML, −4.2 DV). Although the TH-Cre lines have been characterized as less specific than other DA-specific lines (Lammel et al., 2015), we targeted our fibers to areas where this specificity problem is reduced (Lammel et al., 2015) and that are known to contain the classical DA neurons that show RPE activity and are involved in reward processing (Lammel et al., 2011, 2012; Matsumoto and Hikosaka, 2009; Lerner et al., 2015; Kim et al., 2016).

Viral solution was injected using a Picospritzer II (Parker Hannifin) at a rate of approximately 38 nl per minute. The expression of hM4D and of all fluorophores was Cre-dependent, and all viruses were obtained from the University of Pennsylvania (with 1012 or 1013 GC/mL titers). For hM4D experiments 1 µl AAV2/5 - Syn.DIO.hM4D.mCherry was injected in the DRN of 8 SERT-Cre mice. No virus was injected in WT controls (n = 4). For analysis of GCaMP6s specific expression in 5-HT neurons, four SERT-Cre mice were transduced in the DRN with 1 µl of viral stock solution of AAV2/1 - Syn.Flex.GCaMP6s.WPRE.SV40. For behavioral experiments in control mice (four SERT-Cre mice), 1.5 µl of a mixture of equal volumes of AAV2/1.EF1a.DIO.eYFP.WPRE.hGH and of AAV2/1.CAG.FLEX.tdTomato.WPRE.bGH was used. For the remaining mice, a mixture of equal volumes of AAV2/(1 or 9).Syn.Flex.GCaMP6s.WPRE.SV40 and of AAV2/1.CAG.FLEX.tdTomato.WPRE.bGH was injected: 1.5 µl in ten SERT-Cre mice (distributed around six points around the target coordinates) and 0.75 µl of 10 times diluted mixture in four TH-Cre mice (distributed around four points around the target).

For photometry experiments, optical fiber implantations were done after infection and a head plate for head fixation was placed above Bregma; the skull was cleaned and covered with a layer of Super Bond C and B (Morita). An optical fiber (300 µm, 0.22 NA) housed inside a connectorized implant (M3, Doric Lenses) was inserted in the brain, with the fiber tip positioned at the target for SERT-Cre mice and 200 µm above the infection target for TH-Cre mice. The implants were secured with dental acrylic (Pi-Ku-Plast HP 36, Bredent).

Behavioral training and testing protocol

Mice were water-deprived in their home cage on the day of surgery, or up to five days before it. During water deprivation each mouse’s weight was maintained above 80% of its original value. Following infection and implantation surgery, mice were habituated to the head-fixed setup by receiving water every 4 s (6 µl drops) for three days, after which training in the odor-guided task started. A mouse nose poke (007120.0002, Island Motion Corporation) using an IR photoemitter-photodetector was adapted to measure licking as IR beam breaks. To deliver air puffs, a pulse of air was delivered through a tube to the right eye of the mouse. Sounds signaling the beginning of the trial and the outcomes were amplified (PCA1, PYLE Audio Inc.) and presented through speakers (Neo3 PDRW W/BC, Bohlender-Graebener). Water valves (LHDA1233115H, The Lee Company) were calibrated and a custom made olfactometer designed by Z.F.M. (Island Motion) was used for odor delivery. The behavioral control system (Bcontrol) was developed by Carlos Brody (Princeton University) in collaboration with Calin Culianu, Tony Zador (Cold Spring Harbor Laboratory) and Z.F.M. Odors were diluted in mineral oil (Sigma-Aldrich) at 1:10 and 25 µl of each diluted odor was placed inside a syringe filter (2.7 µm pore size, 6823–1327, GE Healthcare) to be used in two sessions (~100 trials for each odor). Odorized air was delivered at 1000 ml/min. Odors used were carvone (R)-(-), 2-octanol (S)-(+), amyl acetate and cuminaldehyde. For the behavioral task used in the hM4D experiment, these odors were associated with reward, reward, nothing and nothing, respectively. For the behavioral task used in the GCaMP6s experiment, they were associated with a large reward (4 µl water drop), small reward (2 µl water drop), neutral (no outcome) and punishment (air puff to the eye) before reversal, and with punishment, neutral, small reward and large reward after the reversal of the cue–outcome associations, respectively. In each trial, white noise was played to signal the beginning of the trial and to mask odor valve sounds. A randomly selected odor was presented for 1 s. Following a 2 s trace period, the corresponding outcome was available. Mice completed one session per day. For hM4D experiments, odors were introduced in pairs. For photometry experiments, training started by presenting only the large and small reward trials to the mice, followed by the introduction of the neutral type of trial in the next session, and finally the punishment trial in the following one. Punishment trials were presented gradually until all four types of trials had the same probability of occurrence and each session consisted of 140–346 trials (minimum to maximum, 223 ± 30, mean ± SD). Time to odor (foreperiod), trace period and inter-trial interval (ITI) were also gradually increased during training until mice could do the task with their final values: foreperiod was 3 to 4 s, taken from a uniform distribution, trace was fixed at 2 s, ITI was 4 to 8 s taken from a uniform distribution.

hM4D experiments were run in two batches: the experiment was run first on the WT animals and then on the SERT-Cre animals (with some overlapping days). Photometry experiments were run in five batches in the following sequence: 3 (SERT-Cre, experimental)+3 (SERT-Cre, experimental and YFP controls)+2 (SERT-Cre YFP controls)+6 (SERT-Cre, experimental)+4 (TH-Cre, experimental).

For the hM4D experiments, mice received a daily injection of vehicle (saline 0.9% and DMSO 0.25%) 40 min. before session start. The volume of these daily injections of vehicle was determined according to each mouse’s weight, and it required an adjustment of the water drop size for each mouse to keep them motivated to do 150 trials per session. On the reversal day and the two following days, for experimental mice, CNO was diluted in the vehicle solution and delivered at a concentration of 3 mg/kg. In both reversal learning tasks used, we ensured that mice could correctly perform the task on at least three consecutive days before reversing the odor–outcome contingency for the first time. On the reversal day, mice started the session as before and the contingencies were reversed at trial 50 in the hM4D experiment, and between the 32th and the 100th trial (73 ± 12, mean ± SD) in the GCaMP6s experiments. One SERT-Cre mouse was excluded from the hM4D analysis for not showing a differential lick rate within 1.5 s of US delivery, between odors 1 and 2 (rewards) and odors 3 and 4 (nothing). Two mice were excluded from the GCaMP6s data analysis for bad fiber placement assessed after histology analysis (more than 400 µm away from the infection area): one SERT-Cre and one TH-Cre mouse. Additionally, another SERT-Cre mouse was discarded from the reversal data analysis because of experimental problems with the fiber during the reversal session. In four SERT-Cre mice and in all TH-Cre mice, at five to six days after the reversal, we introduced uncued US trials during the task. These trials represented approximately 20% of the total number of trials in a session during which no odor cue was presented; the typical white noise of the foreperiod was immediately followed by one of the four possible outcomes, randomly selected (11 ± 4 uncued vs 44 ± 8 cued trials per session, mean ± SD). To analyze these data, four sessions with cued and uncued outcomes were pooled together for each mouse. All GCaMP6s experiments were performed within the limit of one month from the viral injection date, to avoid cell death due to over-expression of GCaMP6s in neurons.

Fiber photometry setup

The dual-color fiber photometry acquisition setup consists of a three-stage tabletop black case containing optical components (filters, dichroic mirrors, collimator), two light sources for excitation and two photomultiplier tubes (PMTs) for acquisition of fluorescence of a green (GCaMP6s) and of a red (tdTomato) fluorophore.

We used a 473 nm (maximum power: 30 mW) and a 561 nm (maximum power: 100 mW) diode-pumped solid-state laser (both from Crystalaser) for excitation of GCaMP6s and of tdTomato, respectively. Beamsplitters (BS007, Thorlabs) and photodiodes (SM1PD1A, Thorlabs) were used to monitor the output of each laser. The laser beams were attenuated with absorptive neutral density filters (Thorlabs), and each was aligned to one of the two entrances of the three-stage tabletop black case (Doric Lenses). At the corresponding entrances the excitation filters used were 473 nm (LD01-473/10-25 Semrock) and 561 nm (LL02-561-25 Semrock). Inside the black case three interchangeable/stackable cubes (Doric Lenses) with dichroic mirrors were used: one to separate the 473 nm excitation light from longer wavelengths (Chroma T495LP), one to collect the emission light of GCaMP6s (FF552-Di02−25 × 36 Semrock), and one to separate the 561 nm excitation light from tdTomato’s fluorescence (Di01-R561−25 × 36). A collimator (F = 12 mm, NA = 0.50, Doric Lenses) focused the laser beams in a single multimode silica optical fiber with 300 µm core and 0.22 NA (MFP_300/330/900–0.22_2.5m-FC_CM3, Doric Lenses), which was used for transmission of all excitation and emission wavelengths. The three-stage tabletop black case had two exits, one for each fluorophore emission, at which we placed the corresponding emission filters (Chroma ET525/50m for GCaMP6s and Semrock LP02-568RS-25 for tdTomato), and convergent lenses (F = 40 mm and F = 50 mm, Thorlabs) before the photodetectors (photomultiplier tube module H7422-02, Hamamatsu Photonics). The output signals of the PMTs were amplified by a preamplifier (C7319, Hamamatsu), acquired in a Micro1401-3 unit at 5000 Hz and visualized in Spike2 software (Cambridge Electronic Design).

Light power at the tip of the patchcord fiber was 200 µW for each wavelength (473 nm and 561 nm) for all experiments (measured before each experiment with a powermeter PM130D, Thorlabs). This patchcord fiber was attached to the fiber cannula each animal had implanted (MFC_300/330–0.22_5 mm_RM3_FLT Fibre Polymicro, polymide removed) through a titanium M3 thread receptacle.

Data analysis

All data were analyzed in MATLAB (RRID:SCR:001622). For the behavioral experiments, lick rate was acquired at 1 KHz and smoothed using convolution with a Gaussian filter of 50 ms standard deviation. Mean anticipatory licking was calculated for each trial as the mean lick rate in the period of 500–2800 ms after odor onset, after subtracting the mean lick rate over a baseline period of −500 to 500 ms around odor onset. To evaluate the aversiveness of the air puff delivered to the mice in the photometry experiment, we used a CCD camera (Point Grey) to record the right eye of six mice during several sessions at 60 Hz. To quantify blinking in video data, we manually selected the eye area in each session and calculated the mean pixel value for that area; then, for each frame, we subtracted this value from the previous frame to obtain a measure of movement. The start and end of blinking created a sudden increase and decrease, respectively, in the difference between the mean pixel value of consecutive video frames. In the time course analysis of the licking behavior in the hM4D experiment, trials of sessions around reversal were concatenated and smoothing over three trials was performed along the trials. For each reversed odor and each mouse, the last 50 trials before reversal were fit by a constant function of the form (A+B); the first 200 trials after the reversal were fit by an exponential function of the form (A+B*exp(-t/τ)) using fminsearch in MATLAB. The conditions for this fitting to be done were: the last 100 trials before reversal had to be statistically different from trials 100–200 after the reversal (t-test), the change in licking pattern had to follow the correct trend of the reversal (increase in licking for positive reversals and decrease in licking for negative ones), and the time constant obtained had to be larger than 1. Mouse–odor pairs that did not fulfill this condition were excluded (that is, odor 4 of mice M#4 and M#5). Time constants were grouped according to the type of reversal and genotype with drug treatment, and compared using one-way ANOVA. Then, for each SERT-Cre mouse, the time constant of the reversal with the vehicle was subtracted from the reversal with CNO. The same was done for WT mice, but subtracting the time constant of reversal two from that of reversal 1 (since CNO was delivered in both). t-tests were used to determine whether these differences had means significantly different from zero.

Fluorescence data were downsampled to 1 kHz and smoothed using convolution with a Gaussian filter of 100 ms standard deviation. For each trial, the relative change in fluorescence, ΔF/F0 = (F-F0)/F0, was calculated by taking F0 to be the mean fluorescence during a 1 s period before the odor presentation for both the red and the green channels ([ΔF/F0] GREEN and [ΔF/F0] RED). For each session and each mouse, the distribution of green and red values of ΔF/F0 was fitted by the sum of two Gaussians along the red channel, and the crossing point between these two Gaussians was used as a boundary (excluding the first and last 1000 ms of each trial because of filtering artifacts). All values of [ΔF/F0]RED below this boundary were used, together with the corresponding [ΔF/F0]GREEN, to fit a linear regression line. Then, for each trial we corrected the green ΔF/F0 values using the parameters (a - slope; b - offset) obtained with the regression model of that mouse in that session: [ΔF/F0]GREEN_corr = [ΔF/F0] GREEN - a*[ΔF/F0] RED - b.

Behavioral data were organized as a function of US type and divided into CS and US responses. [ΔF/F0]GREEN_corr US responses were normalized by subtracting the mean [ΔF/F0]GREEN_corr over the 1 s interval before US onset. The CS or US response was considered the mean of the signal during the 1.5 s period after CS or US onset, respectively. For each mouse, all CS and US responses were z-scored in the expert phase, so that the amplitudes of responses to the different events could be compared. Analysis of US responses across days was performed by z-scoring all US responses of each mouse across days for each US type. Statistical analysis was done by comparing each day with pre-reversal days −1 and −2. For each mouse, mean amplitude of response to each US on the reversal day was also compared to the day before the reversal. For analysis of uncued US trials, four days of each mouse were pooled together due to the small number of uncued trials of each US type in each session.

For the analysis of CS response time courses during a reversal, each mean amplitude change across trials was fitted by an exponential function with maximum time constant of 225 trials (minimum number of trials after the reversal for any US type of any mouse). The same criteria and parameters used for the hM4D experiments were used here. Time constants for mouse–odor pairs were pooled together in pairs (odors 1 and 2, and odors 3 and 4) which correspond to the negative and positive reversals, respectively.

The data are available from the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.649nk (Matias, 2016).

Immunohistochemistry and anatomical verification

Mice were deeply anesthetized with pentobarbital (Eutasil, CEVA Sante Animale), exsanguinated transcardially with cold saline and perfused with 4% paraformaldehyde (P6148, Sigma-Aldrich). Coronal sections (40 µm) were cut with a vibratome and used for immunohistochemistry. For SERT-Cre mice used in expression specificity analysis, anti-5-HT (36 hr incubation with rabbit anti-5-HT antibody 1:2000, Immunostar, RRID:AB_572263, followed by 2 hr incubation with Alexa Fluor 594 goat anti-rabbit 1:1000, Life Technologies) and anti-GFP immunostaining (15 hr incubation with mouse anti-GFP antibody 1:1000, Life Technologies, followed by 2 hr incubation with Alexa Fluor 488 goat anti-mouse 1:1000, Life Technologies) were performed sequentially. For SERT-Cre mice used in behavioral experiments, anti-GFP immunostaining was performed (15 hr incubation with rabbit polyclonal anti-GFP antibody 1:1000, Life Technologies, followed by 2 hr incubation with Alexa Fluor 488 goat anti-rabbit 1:1000, Life Technologies).

For TH-Cre mice, anti-GFP (15 hr incubation with rabbit polyclonal anti-GFP antibody 1:1000, Life Technologies, followed by 2 hr incubation with Alexa Fluor 488 goat anti-rabbit 1:1000, Life Technologies) and anti-TH immunostaining (15 hr incubation with mouse monoclonal anti-TH antibody 1:5000, Immunostar, RRID:AB_572268, followed by 2 hr incubation with Alexa Fluor 647 goat anti-mouse, 1:1000, Life Technologies) were performed sequentially.

To quantify the specificity of GCaMP6s expression in 5-HT neurons of SERT-Cre mice, we used a confocal microscope (Zeiss LSM 710, Zeiss) with a 20X objective (optical slice thickness of 1.8 µm) to acquire z-stacks of three slices around the center of infection. Images for DAPI, GFP and Alexa Fluor 592 were acquired, and cells expressing GCaMP6s and cells stained with 5-HT antibody were quantified in a 200 × 200 µm window in the center of the DRN. The same was done for quantification of specificity in DA neurons of TH-Cre mice, but acquiring Alexa Fluor 647 instead of 592, and taking the 200 × 200 µm window on the infection side. To evaluate fiber location in relation to infection, images for DAPI, YFP or GFP and tdTomato were acquired with an upright fluorescence scanning microscope (Axio Imager M2, Zeiss) equipped with a digital CCD camera (AxioCam MRm, Zeiss) with a 10X objective. The location of the fiber tip was determined by the most anterior brain damage made by the optical fiber subtracted by its radius. The center of infection was estimated through visual inspection of slices as the location where there were most infected cells. The distance between the fiber tip location and center of infection was calculated as an anterior–posterior distance, which was estimated by comparing each corresponding location in the mouse brain atlas (Paxinos and Franklin, 2001). To determine the overlap between cells expressing YFP or GCaMP6s and tdTomato in SERT-Cre mice, we used a confocal microscope (Zeiss LSM 710, Zeiss) with a 20X objective (optical slice thickness of 1.8 µm) to image three slices around the center of infection (slices −1, 0 and 1, relative to it). All cell counts were done using the Cell Counter plugin of Fiji (RRID:SCR_002285).

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
    Mouse Brain in Stereotaxic Coordinates
    1. G Paxinos
    2. KBJ Franklin
    (2001)
    USA: Academic Press.
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
    Reinforcement Learning: An Introduction
    1. RS Sutton
    2. AG Barto
    (1998)
    Cambridge, MA: MIT Press.
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
  80. 80
  81. 81

Decision letter

  1. Joshua I Gold
    Reviewing Editor; University of Pennsylvania, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Firing patterns of serotonin neurons underlying cognitive flexibility" for consideration by eLife. Your article has been favorably evaluated by Timothy Behrens (Senior Editor) and three reviewers, one of whom is a member of our Board of Reviewing Editors. The following individual involved in review of your submission has agreed to reveal his identity: Jeremiah Y Cohen (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

This study by Mainen and colleagues examines the role of 5HT in reversal learning. They use a combination of techniques, including inactivation, imaging, and behavior in rodents, to show that dorsal raphe 5HT neurons encode both positive and negative prediction errors and may play a role in modulating perseverative behaviors. A particularly noteworthy feature of this study is that they compared response properties of 5HT and DA neurons, showing that they share many similarities but some key differences that help to distinguish their specific computational roles in adaptive behavior.

The reviewers agreed that the experiments are well done, the analyses sound, and the paper well written. The reviewers also commended the novel idea of how different time courses of learning could cause 5-HT to regulate behavior during certain forms of reversal.

Essential revisions:

1) The claim that "5-HT US responses resemble closely an unsigned prediction error" seems like an over-simplification, given the data. Several complications to this interpretation should be addressed more directly, including:

The very next paragraph in the Discussion notes the "one notable divergence" from this idea, involving the response to the air puff. The argument that this difference involves a difference in control is interesting, but not tested directly. Also, what about blinks, which presumably could reduce the aversiveness of the air puffs. And what is the evidence that rodents learned to predict the air puffs as well as the positive rewards? Moreover, consider the reversals between the large reward and air puff. In theory, both unexpected outcomes should generate equal unsigned prediction errors. However, only the response to the large reward is higher when unpredicted. Response to the air puff is unchanged, with a trend to being lower. Even if animals failed to respond to air puffs appropriately (as the authors suggest in the Discussion), they should still detect that on unexpected air puff trials there is an omission of big reward. We know they have detected this omission and learned from it because they rapidly reduced licking to the CS. This should entail a large unsigned prediction error to omission of big reward at least as large as the clear response to omission of small reward. Indeed, DA had similar inhibitory responses to unexpected neutral and air puff USs. However 5-HT's air puff expectation effect was non-significant and the response actually trended to be larger for expected air puff, which is the opposite effect from what it should have. It seems like the most parsimonious account is that 5-HT reports unsigned prediction error except when an air puff is present, in which case it ignores the unsigned prediction error and has a stereotyped excitation. This would be a very interesting wrinkle on the conclusions, because the authors argue throughout the paper that 5-HT is important for learning, so there should be a big difference in reward learning when air puff is present versus absent. However, the present experiments don't provide evidence of this.

The statement that "5-HT neurons showed little or no response to expected water rewards before reversal, but responded robustly to the same rewards when they were unexpected, after reversal" seems problematic. Specifically, unexpected small reward did not seem to have a robust response that was affected by expectation (Figure 2F, Figure 3—figure supplement 1). If 5-HT reported unsigned prediction error this response should be significant as it was in DA neurons.

In Figure 3B-C, 5-HT was activated by unexpected neutral tone but DA did not respond (at least as indicated in Figure 3 (Figure 3—figure supplement 1 seems to show mild inhibition for unexpected neutral and air puff, so the absence of effect in Figure 3 may be an issue with the analysis windows). According to the logic of the paper, either 5-HT does not signal unsigned prediction error, or it is based on a different concept of prediction error than DA.

If 5-HT reports unsigned prediction error, "surprise", or "uncertainty", why is it activated by CSs in proportion to their value? If it reported unsigned prediction error then it should be more activated by CSs with extreme values than ones with intermediate values because those evoke the biggest unsigned prediction errors. If it reported surprise then it should be excited similarly by all CSs because all four CSs are equally probable. If it reported uncertainty then it should be more activated by all CSs shortly after reversal when their CS-US associations became more uncertain.

2) The statement that "rather than reporting the affective value of the environment…, we suggest that 5-HT facilitates the ability of an organism to flexibly adapt to dynamic environments through plasticity and behavioural control" is a very intriguing suggestion made throughout the paper. However, it is not tested directly, but perhaps could be with the existing data. Were correlations between 5-HT activity and behavioral flexibility? 5-HT responds more to positive than negative CSs but animals seem to learn about both with similar speed (at least in Figure 4). 5-HT responds more to unexpected neutral than unexpected small reward, and more to unexpected large reward than unexpected air puff, so if the authors were correct animals should learn at different rates from those outcomes. This could be tested. Also, the authors measure learning rates for CS but not US responses. Are their learning rates consistent? This seems highly relevant to their interpretations. For instance 5-HT CS responses are slow to learn but Figure 2E makes US learning seem very fast. Does this mean that the US "surprise" response and CS "value" response are based on different expectations? If so this would be further evidence that the 5-HT responses obey different rules, and may not be explained by a single umbrella concept like "surprise" or "unsigned prediction error".

3) Given that we don't know the underlying distribution of single-5-HT-neuron calcium dynamics, it is difficult to interpret the point estimate of that distribution (population calcium dynamics measured here). This is especially salient in the context of electrophysiological recordings of individual DR neurons (including work from the senior author's lab [Ranade and Mainen, 2009]), that show a high degree of heterogeneity of individual DR (and identified 5-HT) neuronal responses. This is treated in the Discussion, but the limitations need to be more clearly defined. Indeed, Muzerelle et al., cited here to support the idea of diffuse projection targets of individual 5-HT neurons, actually shows quite a bit of target specificity, and agrees with work from Gagnon and Parent (PLoS ONE, 2014). Also, I strongly suggest changing the title, as firing patterns were not measured in this study.

4) The argument that previous studies did not find prediction-error-like responses due to weak prediction errors could be more clearly explained. Why is reversal a "stronger" prediction error than unexpected rewards or omissions? If anything, it seems that reversals as presented here are relatively weak in driving behavioral changes, because it took mice a couple of hundred trials to achieve asymptotic behavior. Could the different timescales of reversals compared to momentary deviations from predicted outcomes be the more relevant discrepancy? How is prediction error strength quantified here?

5) In Figure 1D, it appears that there may be a small difference in the time constant fit to reversals from negative to positive. Is this true for particular mice? Specifically, were any of the effects of inactivation on the time constant of positive reversal significant? Showing mouse-by-mouse data would be useful.

6) The authors should consider rearranging their figures for publication in eLife, which allows more than 4 figures. The figures are all very dense and are associated with up to 9(!) supplemental figures. Also, the critical comparison between 5-HT and DA is hard to understand because no DA CS or US traces are shown in the main figures. At the very least, the authors should present a direct comparison of the traces of 5-HT and DA responses before, during, and after reversal, which is the key result of the paper.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Activity patterns of serotonin neurons underlying cognitive flexibility" for further consideration at eLife. Your revised article has been favorably evaluated by Timothy Behrens (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

1. Figure 1—figure supplement 1B: presumably the y labels should be "Anticp. licking (norm. fit)", not "Time constant (trials)," correct?

2) Figure 2B: how is "eye movement or blinking" measured? What are the units that allowed those two measures to be combined (and then why is it listed as "a.u." in the figure)?

3) Figure 3—figure supplement 2: panels A and B are a bit hard to parse and could use some more explanation; e.g., these are (presumably) ordered by duration of the lick bout; they (presumably) show a fixed time before the onset of, and after the offset of, the lick bout; etc. Why do the trials at the bottom not appear to have "bouts"?

4) Figure 10A: This analysis is an interesting and welcome addition, but perhaps a bit more and nuanced interpretation of these results would be useful. There seems to be an order of magnitude (at least) difference in how much longer the 5HT response versus the behavior persisted for a negative reversal.

https://doi.org/10.7554/eLife.20552.027

Author response

Essential revisions:

1) The claim that "5-HT US responses resemble closely an unsigned prediction error" seems like an over-simplification, given the data.

We agree with the reviewers that we had presented a simplified interpretation. We have tried to rectify this in the revised version.

A) We all-but-eliminated mention of “unsigned RPE” from the Results, keeping more strictly to the observations and retaining just two references to RPE (subsection “DRN 5-HT neurons respond to both positive and negative US prediction errors”).

B) We included an extended explicit Discussion (subsection”Response to aversive events by DRN 5- HT neurons”) emphasizing the possibility that rewarding and aversive responses reflect two distinct signals (as proposed by the reviewers) as well as the possibility that the air puff response reflects a kind of “control error”.

C) We acknowledge explicitly when mentioning unsigned RPE in the Discussion that the data only partly resembled this (subsection “DRN 5-HT neurons are activated by both positive and negative reward prediction errors”, first paragraph and subsection “Implications for the DA–5-HT opponency theory”, first paragraph).

Several complications to this interpretation should be addressed more directly, including:

The very next paragraph in the Discussion notes the "one notable divergence" from this idea, involving the response to the air puff. The argument that this difference involves a difference in control is interesting, but not tested directly.

We agree that the main point of divergence between the US data and the idealized prediction error signal is with respect to responses to aversive events. We have reduced the strength of this claim throughout the manuscript. We now directly consider the proposal that US responses to rewarding and aversive stimuli may reflect two independent sources of information as well as the “control” hypothesis in the Discussion subsection “Response to aversive events by DRN 5- HT neurons”.

Also, what about blinks, which presumably could reduce the aversiveness of the air puffs. And what is the evidence that rodents learned to predict the air puffs as well as the positive rewards?

Mice did not show anticipatory blinking to air puffs, despite extensive training. They did show air puff triggered blinking that was present before training. Therefore, one potential explanation for the continued presence of air puff responses after training and the relative insensitivity of the air puff response to reversals is that the air puff US was never predictable. Indeed, there is evidence other previous studies that with a 2 s trace period mice do not learn predictive eye blink responses (Reynolds 1945; Boneau 1958; Cohen et al., 2012, 2015). However, it does seem puzzling that mice would learn CS-US relationships with the same trace period for rewards but not for air puffs. We now discuss this issue explicitly in the Discussion (subsection “Response to aversive events by DRN 5-HT neurons”, first paragraph).

Moreover, consider the reversals between the large reward and air puff. In theory, both unexpected outcomes should generate equal unsigned prediction errors. However, only the response to the large reward is higher when unpredicted. Response to the air puff is unchanged, with a trend to being lower. Even if animals failed to respond to air puffs appropriately (as the authors suggest in the Discussion), they should still detect that on unexpected air puff trials there is an omission of big reward. We know they have detected this omission and learned from it because they rapidly reduced licking to the CS. This should entail a large unsigned prediction error to omission of big reward at least as large as the clear response to omission of small reward. Indeed, DA had similar inhibitory responses to unexpected neutral and air puff USs. However 5-HT's air puff expectation effect was non-significant and the response actually trended to be larger for expected air puff, which is the opposite effect from what it should have. It seems like the most parsimonious account is that 5-HT reports unsigned prediction error except when an air puff is present, in which case it ignores the unsigned prediction error and has a stereotyped excitation.

The reviewers raise an interesting point. After the reversal from large reward to air puff the air puff US represents not only an aversive stimulus, but also the omission of an expected reward. Even if the air puff is equally unexpected before and after reversal, the presence of the additional reward omission should generate a positive US response, just as the omission of the small US response did.

We admit that we do not have a full explanation for this phenomenon. We agree that one possibility is that the air puff has an inhibitory modulatory effect on the reward omission response. This is reminiscent of the interactions recently reported by Matsumoto et al. (2016). Another possibility is simply that the air puff response alone is already at a level close to saturation and the absence of significant additional effect reflects a ceiling effect. We now discuss these issues in the Discussion (subsection “Response to aversive events by DRN 5-HT neurons”).

This would be a very interesting wrinkle on the conclusions, because the authors argue throughout the paper that 5-HT is important for learning, so there should be a big difference in reward learning when air puff is present versus absent. However, the present experiments don't provide evidence of this.

We do not wish to claim and are careful not to state that the 5-HT system is solely responsible for reversal learning (see point 2 below). Therefore, the idea that the speed of learning of specific CS-US associations will depend directly on the magnitude of 5-HT US responses is not a strong implication of our interpretation of the data. We expect other systems to contribute to reversal learning and their contributions may dominate or warp these relationships. It will certainly be interesting but is beyond the scope of this paper to examine directly the difference between reversal learning with and without aversive stimuli. We do note that the time constants in the DREADD experiments, in which no air puffs were used, are a bit slower than those in the GCaMP experiments.

The statement that "5-HT neurons showed little or no response to expected water rewards before reversal, but responded robustly to the same rewards when they were unexpected, after reversal" seems problematic. Specifically, unexpected small reward did not seem to have a robust response that was affected by expectation (Figure 2F, Figure 3—figure supplement 1). If 5-HT reported unsigned prediction error this response should be significant as it was in DA neurons.

The reviewers are correct that the response was not significant when comparing only days -1 and day 0 (previous Figure 2C, now Figure 6A). However, when running an ANOVA across days (previous Figure 2—figure supplement 7, now Figure 4—figure supplement 1) comparing each day post reversal with the 2 days before reversal there is a significant increase in the reversal day for the population. We substituted the example mouse in Figure 4—figure supplement 1 to make this point clearer.

Nevertheless, the size of the small reward error response was probably diminished by the failure of the small water responses to disappear even after training. We added a note to this effect in the Results section: “The response to small reward was also modulated by reward expectation (Figure 4—figure supplement 1), although to a lesser degree, perhaps due to the presence of a small response even after extensive training (Figure 3—figure supplement 3B).”

In Figure 3B-C, 5-HT was activated by unexpected neutral tone but DA did not respond (at least as indicated in Figure 3 (Figure 3—figure supplement 1 seems to show mild inhibition for unexpected neutral and air puff, so the absence of effect in Figure 3 may be an issue with the analysis windows). According to the logic of the paper, either 5-HT does not signal unsigned prediction error, or it is based on a different concept of prediction error than DA.

As described above, we have reduced the strength of our conclusions regarding the idea that 5- HT signals a pure unsigned reward prediction error. We agree with the reviewer that the absence of significant inhibition in DA neurons to neutral and air puff US’s in this experiment (previously Figure 3C; now figure 7C), is likely due to the analysis window used to calculate the response. Because the detailed analysis and interpretation of DA responses to neutral and aversive stimuli is not the focus of our manuscript and is a topic of debate among DA experts (Matsumoto & Hikosaka 2009, Lammel et al. 2012, Fiorillo 2013; Lerner et al. 2015; Kim et al. 2016, Matsumoto et al., 2016), we chose to keep the analysis window constant throughout the paper/analysis (1.5 s from outcome onset). We acknowledge that further work will be important to address these issues.

If 5-HT reports unsigned prediction error, "surprise", or "uncertainty", why is it activated by CSs in proportion to their value? If it reported unsigned prediction error then it should be more activated by CSs with extreme values than ones with intermediate values because those evoke the biggest unsigned prediction errors. If it reported surprise then it should be excited similarly by all CSs because all four CSs are equally probable. If it reported uncertainty then it should be more activated by all CSs shortly after reversal when their CS-US associations became more uncertain.

We agree with the reviewers that the pattern of CS and US responses does not neatly fit into a highly specific pattern of “surprise”. We have made every effort to be explicit about this in our revised Results and Discussion, as discussed above.

Still, we do not believe the distinction between the terms “unsigned prediction error”, “surprise”, and “uncertainty” is as clear in the literature as proposed in the reviewers’ comment. Therefore, we do not fully understand the specific predictions being suggested.

For example, according to Courville et al., 2006, “surprise” is a term used in associative learning approaches to name non-predicted reinforcers. According to these authors, surprise signals “change”, which leads to uncertainty about one’s model of the world in a Bayesian approach. In such an approach, surprising events that are non-reinforcers can also increase uncertainty and thus increase learning. Thus, “prediction error”, “surprise” and “uncertainty” are used more or less interchangeably in previous work.

In order to try to better avoid possible confusion, we decided to avoid as much as possible using the words “surprise” and “uncertainty” except where we discuss explicitly the meaning and relationship between these terms (Discussion).

2) The statement that "rather than reporting the affective value of the environment…, we suggest that 5-HT facilitates the ability of an organism to flexibly adapt to dynamic environments through plasticity and behavioural control" is a very intriguing suggestion made throughout the paper. However, it is not tested directly, but perhaps could be with the existing data.

We believe that the DREADD inhibition experiment (Figure 1) does provide direct evidence for the role for DRN 5-HT in flexible adaptation of this sort, adding to the previous work cited (Clarke 2004; Clarke et al. 2007; Boulougouris & Robbins 2010; Bari et al. 2010; Brigman et al. 2010; Berg et al. 2014).

Were correlations between 5-HT activity and behavioral flexibility? 5-HT responds more to positive than negative CSs but animals seem to learn about both with similar speed (at least in Figure 4). 5-HT responds more to unexpected neutral than unexpected small reward, and more to unexpected large reward than unexpected air puff, so if the authors were correct animals should learn at different rates from those outcomes. This could be tested.

We thank the reviewers for the suggestion of examining more carefully the relationship between the time constants of 5-HT and behavioral changes.

We never intended to claim that 5-HT is the only mechanism underlying reversal learning – we use the phrase “contributes to” cognitive flexibility deliberately for that reason (subsection “5-HT CS responses could be responsible for inhibiting perseverative responding”, first paragraph, subsection “5-HT CS responses could be responsible for inhibiting perseverative responding”, last paragraph). Therefore, we don’t expect that behavioral learning rates will have a 1:1 correspondence with the magnitude of 5-HT signals, since other signals (e.g. other neuromodulators) whose time constants are unknown are likely also contributing. Moreover, for these analyses, to improve signal-to-noise we needed to group positive and negative odour data (i.e. CS1 & 2 together and CS3 & 4 together), which prevents the suggested single-CS analysis.

However, we were motivated by this suggestion to explore other potential correlations in the data, such as the relationship between the time constant of behavioral adaptation and the 5-HT CS and US responses before or after reversal.

In the process of these analyses we found a small error in the previous analysis (Figure 4, new Figure 9), namely that we were not using the background-subtracted licking rate. After reversal some mice show an overall increase in lick rate along the entire trial, including in the period before they smell the odour. Subtracting the period just before the odour onset therefore improves the detection of odor-specific differences in anticipatory lick rate. It is then easier to see the faster adaptation to the negative reversals (stopping anticipatory licking) than to the positive reversals (starting anticipatory licking).

We found two interesting results. First, we found a significant correlation between the time constant of behavior for stopping anticipatory licking and the corresponding time constant for the 5-HT CS signal during a negative reversal. There was no such correlation during positive reversals. This supports the result of the DREADD experiment and suggests that 5-HT seems to act at least partly through its CS-mediated effects. Second, for all animals the time constant of change in 5-HT CS activity for negative reversals is slower than change in anticipatory licking, as would be needed for 5-HT to continue to suppress this response by direct behavioral inhibition.

We believe these analyses provide important new support for the involvement of 5-HT in the process of behavioral adaptation. We have included them as new Figure 10 and Results (last paragraph) and Discussion (subsection “5-HT CS responses could be responsible for inhibiting perseverative responding”, first paragraph). We again thank the reviewers for suggesting this approach.

Also, the authors measure learning rates for CS but not US responses. Are their learning rates consistent? This seems highly relevant to their interpretations. For instance 5-HT CS responses are slow to learn but Figure 2E makes US learning seem very fast. Does this mean that the US "surprise" response and CS "value" response are based on different expectations? If so this would be further evidence that the 5-HT responses obey different rules, and may not be explained by a single umbrella concept like "surprise" or "unsigned prediction error".

We agree with the reviewers that analysis of the time course of US responses would be very interesting. Unfortunately, the US responses are somewhat smaller and less consistent than the CS responses. For example, US responses tend to decrease over a session and then increase again in the beginning of the next session. Hence, although we attempted to, we were not able to get reasonable fits to the time course of the US’s. We now mention this point in the Results (subsection “CS responses of 5-HT neurons have slower kinetics after reversal than DA’s”, third paragraph).

3) Given that we don't know the underlying distribution of single-5-HT-neuron calcium dynamics, it is difficult to interpret the point estimate of that distribution (population calcium dynamics measured here). This is especially salient in the context of electrophysiological recordings of individual DR neurons (including work from the senior author's lab [Ranade and Mainen, 2009]), that show a high degree of heterogeneity of individual DR (and identified 5-HT) neuronal responses. This is treated in the Discussion, but the limitations need to be more clearly defined. Indeed, Muzerelle et al., cited here to support the idea of diffuse projection targets of individual 5-HT neurons, actually shows quite a bit of target specificity, and agrees with work from Gagnon and Parent (PLoS ONE, 2014). Also, I strongly suggest changing the title, as firing patterns were not measured in this study.

We agree with the suggestion of the reviewers to change the title of the manuscript to avoid referring to firing patterns. The new title is: “Activity patterns of serotonin neurons underlying cognitive flexibility”.

In addition, we fleshed out this section of the Discussion, “Implications of neuronal heterogeneity and other complexities of the 5-HT system”. We now more explicitly discuss the limitations of fiber photometry with respect to possible heterogeneity. We now cite Muzerelle et al. 2014 and Gagnon & Parent (2014) for the specificity of projections.

4) The argument that previous studies did not find prediction-error-like responses due to weak prediction errors could be more clearly explained. Why is reversal a "stronger" prediction error than unexpected rewards or omissions? If anything, it seems that reversals as presented here are relatively weak in driving behavioral changes, because it took mice a couple of hundred trials to achieve asymptotic behavior. Could the different timescales of reversals compared to momentary deviations from predicted outcomes be the more relevant discrepancy? How is prediction error strength quantified here?

We agree with the reviewer that this point could have been more clear in our manuscript. We agree that the discrepancy in timescales is a very relevant point. This issue has been previously discussed in terms of expected and unexpected uncertainty by, e.g. Yu and Dayan (2002; 2005). In omission trials, which are randomly interleaved and averaged over hundreds of trials, omissions become “expected uncertainty” or variability in the outcome. In contrast, in a reversal task, the sudden change that occurs after many days of stable conditions constitutes “unexpected uncertainty” and indeed leads to a change in behaviour. We now include a similar discussion in the last paragraph of the Introduction).

5) In Figure 1D, it appears that there may be a small difference in the time constant fit to reversals from negative to positive. Is this true for particular mice? Specifically, were any of the effects of inactivation on the time constant of positive reversal significant? Showing mouse-by-mouse data would be useful.

We agree that it appears that there is a small difference here, but it was not statistically significant. We have added the normalized fittings for individual mice-odour pairs in Figure 1—figure supplement 1. It can be seen from those plots that in the positive reversal there is a lot of variability in the three groups of mice. Because we have only one fit for each odour of each animal, we need several mice to perform statistics. This is what we did in the manuscript and we could not reach significance (as described in Figure 1 legend): 1-way ANOVA, F2,16 = 0.34, p = 0.715 for positive reversal.

6) The authors should consider rearranging their figures for publication in eLife, which allows more than 4 figures. The figures are all very dense and are associated with up to 9(!) supplemental figures. Also, the critical comparison between 5-HT and DA is hard to understand because no DA CS or US traces are shown in the main figures. At the very least, the authors should present a direct comparison of the traces of 5-HT and DA responses before, during, and after reversal, which is the key result of the paper.

We took this suggestion and reworked both the figures and associated text. The figures have been rearranged and DA traces included in the main figures for comparison with 5-HT:

Figure 1 was not changed;

Figure 2 now shows only the behavior of the animals to the reversal task used for fiber photometry;

Figure 3 shows the experimental approach and pre-reversal data for 5-HT and DA;

Figure 4 shows 5-HT and DA US responses to large reward during reversal;

Figure 5 shows 5-HT and DA US responses to neutral outcome during reversal;

Figure 6 summarizes the comparison of mean US responses to the four outcomes on the reversal day with that on the day before reversal;

Figure 7 shows the mean US responses of 5-HT and DA to surprising and predicted outcomes;

Figure 8 compares mean CS responses before reversal and 3 days after it for both 5-HT and DA;

Figure 9 shows the time course analysis of the change in CS responses for 5- HT and DA as well as our proposed model of DA-5-HT interaction during reversal;

Figure 10 shows the relationship between time constant of adaptation of 5-HT CS responses versus the time constants of behavioral adaptation.

[Editors' note: further revisions were requested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed before acceptance, as outlined below:

1. Figure 1—figure supplement 1B: presumably the y labels should be "Anticp. licking (norm. fit)", not "Time constant (trials)," correct?

Yes, thank you for pointing it out. We have corrected the y label in the figure.

2) Figure 2B: how is "eye movement or blinking" measured? What are the units that allowed those two measures to be combined (and then why is it listed as "a.u." in the figure)?

We have now restricted ourselves to the use of “blinking” in the figure and in the text. The way we measured it is described in the Materials and methods section and we expanded this description in this version of the manuscript: “To quantify blinking in video data, we manually selected the eye area in each session and calculated the mean pixel value for that area; then, for each frame, we subtracted this value from the previous frame to obtain a measure of movement. The start and end of blinking created a sudden increase and decrease, respectively, in the difference between the mean pixel value of consecutive video frames.”

Because this difference between consecutive frames is the mean pixel value of the eye area, we quantify it in arbitrary units (a.u.) in the figure.

3) Figure 3—figure supplement 2: panels A and B are a bit hard to parse and could use some more explanation; e.g., these are (presumably) ordered by duration of the lick bout; they (presumably) show a fixed time before the onset of, and after the offset of, the lick bout; etc. Why do the trials at the bottom not appear to have "bouts"?

Thank you for pointing out important clarifications needed in the figure legend. We have included an additional panel as an inset of Figure 3—figure supplement 2 A to show the distribution of inter-lick intervals, which is used to define the bouts. The figure legend has been made more explicit: “(A), (B) Surface plots showing raw GCaMP6s (A) and tdTomato (B) fluorescence signals aligned on the onsets of lick bouts during an example session of a SERT-Cre mouse infected with the corresponding fluorophore. Gray dots represent single licks. Inset in (A, right) shows the distribution of inter-lick intervals for this session: lick bouts were defined as sequences of licks separated by no more than 315 ms. The surface plots are aligned from longer lick bouts at the top to shorter ones at the bottom (the last ones are single licks that do not belong to any bout). Fluorescence data is shown from 2 s before to 2 s after the duration of the bouts.”

4) Figure 10A: This analysis is an interesting and welcome addition, but perhaps a bit more and nuanced interpretation of these results would be useful. There seems to be an order of magnitude (at least) difference in how much longer the 5HT response versus the behavior persisted for a negative reversal.

Thank you for this comment. We have now added the following paragraph at the end of the Results section to provide further interpretation:

“We note that while we expected that the adaptation of 5-HT CS responses to the reversal should be at least as slow as that of anticipatory licking for the two to be causally related, the fact that it was much slower (around 8 times as slow) requires explanation. […] Alternatively, it may be the case that 5-HT CS responses could serve more than a mere motor suppression function during reversal learning, and contribute to the longer-lasting learning processes required for reversal learning (He et al. 2015), such as those that prevent spontaneous recovery following extinction training (Karpova et al., 2011).”

https://doi.org/10.7554/eLife.20552.028

Article and author information

Author details

  1. Sara Matias

    1. Champalimaud Research, Champalimaud Centre for the Unknown, Lisbon, Portugal
    2. MIT-Portugal Program, Porto Salvo, Portugal
    Contribution
    SM, Conceptualization, Data curation, Formal analysis, Validation, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    Contributed equally with
    Eran Lottem
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-7432-6754
  2. Eran Lottem

    Champalimaud Research, Champalimaud Centre for the Unknown, Lisbon, Portugal
    Contribution
    EL, Conceptualization, Data curation, Formal analysis, Supervision, Validation, Investigation, Visualization, Methodology, Writing—review and editing
    Contributed equally with
    Sara Matias
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0001-5852-928X
  3. Guillaume P Dugué

    Institut de Biologie de l’Ecole Normale Supérieure, Centre National de la Recherche Scientifique, UMR8197, Institut National de la Santé et de la Recherche Médicale, Paris, France
    Contribution
    GPD, Supervision, Methodology
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0002-4106-6132
  4. Zachary F Mainen

    Champalimaud Research, Champalimaud Centre for the Unknown, Lisbon, Portugal
    Contribution
    ZFM, Conceptualization, Resources, Supervision, Funding acquisition, Visualization, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    zmainen@neuro.fchampalimaud.org
    Competing interests
    The authors declare that no competing interests exist.
    ORCID icon 0000-0001-7913-9109

Funding

Fundação para a Ciência e a Tecnologia (SFRH/BD/43072/2008)

  • Sara Matias

Human Frontier Science Program (LT00088/011L)

  • Eran Lottem

European Research Council (250334)

  • Zachary F Mainen

Champalimaud Foundation

  • Zachary F Mainen

European Research Council (671251)

  • Zachary F Mainen

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank R M Costa and J J Paton Labs for TH-Cre mice, Susana Dias and Sérgio Casimiro for histology and immunohistochemistry assistance, and Dario Sarra for running behavioral experiments for a few days. We also thank C Poo, B V Atallah, M Murakami, G Agarwal and J J Paton for comments on a previous version of the manuscript, and all members of the Systems Neuroscience Lab and the Champalimaud Research for useful discussions and feedback during the development of this project. This work was supported by the Fundação para a Ciência e Tecnologia (fellowship SFRH/BD/43072/2008 to SM), Human Frontier Science Program (fellowship LT000881/2011L to EL), European Research Council (Advanced Investigator Grants 250334 and 671251 to ZFM) and Champalimaud Foundation (ZFM).

Ethics

Animal experimentation: This study was performed in strict accordance with the European Union Directive 2010/63/EU. All animals were handled according to approved institutional animal care and use guidelines by the Champalimaud Centre for the Unknown Ethics Committee. The protocol was approved by the Portuguese Veterinary General Board (Direccao Geral de Veterinaria, approvals 0420/000/000/2011 and 0421/000/000/2016).

Reviewing Editor

  1. Joshua I Gold, Reviewing Editor, University of Pennsylvania, United States

Publication history

  1. Received: August 28, 2016
  2. Accepted: February 26, 2017
  3. Version of Record published: March 21, 2017 (version 1)
  4. Version of Record updated: March 23, 2017 (version 2)

Copyright

© 2017, Matias et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 3,498
    Page views
  • 803
    Downloads
  • 7
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Comments

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)