1. Neuroscience
Download icon

Thalamocortical and corticothalamic pathways differentially contribute to goal-directed behaviors in the rat

  1. Fabien Alcaraz
  2. Virginie Fresno
  3. Alain R Marchand
  4. Eric J Kremer
  5. Etienne Coutureau
  6. Mathieu Wolff  Is a corresponding author
  1. CNRS, INCIA, UMR 5287, France
  2. Université de Bordeaux, INCIA, UMR 5287, France
  3. Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, France
Research Article
  • Cited 3
  • Views 1,983
  • Annotations
Cite as: eLife 2018;7:e32517 doi: 10.7554/eLife.32517

Abstract

Highly distributed neural circuits are thought to support adaptive decision-making in volatile and complex environments. Notably, the functional interactions between prefrontal and reciprocally connected thalamic nuclei areas may be important when choices are guided by current goal value or action-outcome contingency. We examined the functional involvement of selected thalamocortical and corticothalamic pathways connecting the dorsomedial prefrontal cortex (dmPFC) and the mediodorsal thalamus (MD) in the behaving rat. Using a chemogenetic approach to inhibit projection-defined dmPFC and MD neurons during an instrumental learning task, we show that thalamocortical and corticothalamic pathways differentially support goal attributes. Both pathways participate in adaptation to the current goal value, but only thalamocortical neurons are required to integrate current causal relationships. These data indicate that antiparallel flow of information within thalamocortical circuits may convey qualitatively distinct aspects of adaptive decision-making and highlight the importance of the direction of information flow within neural circuits.

https://doi.org/10.7554/eLife.32517.001

eLife digest

Planning and decision-making rely upon a region of the brain called the prefrontal cortex. But the prefrontal cortex does not act in isolation. Instead, it works together with a number of other brain regions. These include the thalamus, an area long thought to pass information on to the cortex for further processing. But signals also travel in the opposite direction, from the cortex back to the thalamus. Does the cortex-to-thalamus pathway carry the same information as the thalamus-to-cortex pathway?

To find out, Alcaraz et al. blocked each pathway in rats performing a decision-making task. The rats had learned that pressing a lever led to one type of reward, whereas moving a rod led to another. Alcaraz et al. reduced the desirability of one of the rewards by giving the rats free access to it for an hour. Afterwards, the rats opted mainly for the action associated with the reward that had remained desirable. However, blocking either the thalamus-to-cortex or cortex-to-thalamus pathway prevented this preference from emerging. This suggests that an information flow in both directions is necessary to update knowledge about the value of a reward.

In a second experiment, Alcaraz et al. removed the link between one of the actions and its reward. The reward instead appeared at random, irrespective of the rat’s own behavior. Control rats responded by focusing their efforts on the action that still delivered a reliable reward, and by performing the other action less often. Blocking the thalamus-to-cortex pathway prevented this response, but blocking the cortex-to-thalamus pathway did not. This suggests that only the former pathway is necessary to re-evaluate the relationship between an action and an outcome.

Two key aspects of goal-directed behavior – recognizing the value of a reward and the link between an action and an outcome – thus depend differently on the thalamus-to-cortex and cortex-to-thalamus pathways. This same principle may also be at work in other neural circuits with bidirectional connections. Understanding such principles may lead to better strategies for treating disorders of brain connectivity, such as schizophrenia.

https://doi.org/10.7554/eLife.32517.002

Introduction

To reach specific goals in volatile environments, living organisms must integrate current internal motivational states with an up-to-date causal understanding of the relationships between external events (Rangel et al., 2008; Dickinson, 2012). Such complex cognitive abilities are supported by highly evolved brain structures. Past research points to a central role for the dorsomedial prefrontal cortex (dmPFC) in adaptive decision-making (Corbit and Balleine, 2003; Killcross and Coutureau, 2003). A circuit-level analysis of the functional role of the PFC also indicates a major role for areas innervating the PFC (O'Doherty, 2011). In this respect, the mediodorsal thalamus (MD) appears of special interest due to the extensive reciprocal projections connecting these two areas (Groenewegen, 1988; Gabbott et al., 2005; Alcaraz et al., 2016a).

These anatomical considerations helped to shape a new functional view of the thalamus, wherein its role is not limited to that of a relay (Sherman, 2005; Mitchell, 2015; Wolff et al., 2015a; Sherman, 2016). Indeed experimental interventions aimed at the MD produce a vast array of specific cognitive deficits on both rodents and primates (Corbit et al., 2003; Baxter, 2013; Parnaudeau et al., 2013; Parnaudeau et al., 2015; Alcaraz et al., 2016b; Chakraborty et al., 2016), supporting the view that this thalamic area plays an integrative role within thalamocortical circuits (Schmitt et al., 2017). Surprisingly, the functional significance of reciprocal projections, which are the hallmark of thalamocortical organization, has not been directly examined in the context of adaptive decision-making. A recent report provided initial evidence that thalamocortical and corticothalamic pathways recruited by the same behavioral task may support qualitatively distinct aspects of working memory (Bolkan et al., 2017). This further underscores the importance of the functional interactions between cortical and thalamic areas in high-order cognition. Gaining clearer insight into the functioning of thalamocortical circuits therefore requires manipulating thalamocortical and corticothalamic pathways separately.

In the present study, we applied a chemogenetic strategy in rats to specifically inhibit projection-defined dmPFC or MD neurons during a classic instrumental task requiring adaptive actions. We found that distinct goal attributes, namely current goal value and current action-outcome contingency, are differentially supported by thalamocortical and corticothalamic pathways.

Results

To qualify as ‘goal-directed’, actions must classically fulfill two criteria: dependence on current goal value and on the causal link between the action and its outcome (Balleine and Dickinson, 1998). After an initial instrumental learning phase, both can be assessed separately in the same animals. To gain insights into a potential differential contribution from thalamocortical and corticothalamic pathways, we inhibited projections-defined cortical or thalamic cells during initial training on two distinct actions (pressing a lever or pushing a tilt, see methods) and subsequent choice tests conducted under extinction (Figure 1).

Experimental design.

After an initial magazine training phase (MT), all rats underwent instrumental training consisting in either pushing a lever or a tilt (see methods) using successively fixed (FR) and random ratios (RR) schedules. To assess how the animals can use current goal value to guide choice, we performed a choice test immediately after selective outcome devaluation, under extinction conditions (both the lever and the tilt are now present in the chamber). After retraining, all rats underwent further instrumental training consisting in a selective degradation procedure (see methods). Another choice test was conducted after this phase, identical to that conducted following outcome devaluation. For both experiments, separate groups of rats were treated with either 1 mg/kg CNO or 1 ml/kg Saline 0.9% given 60 mm prior each behavioral session except during MT. To rule out any potential confounding effect of CNO injection alone, an additional control experiment is provided as an appendix. For each action, instrumental performance during the last RR10 session or the last RR10 retraining session was considered baseline for the devaluation test and the degradation phase, respectively.

https://doi.org/10.7554/eLife.32517.003

Experiment 1: Thalamocortical pathways are necessary to track changes in both goal value and current instrumental contingency

To express an inhibitory DREADD receptor (Armbruster et al., 2007) only in dmPFC-projecting MD cells, an adeno-associated virus carrying a floxed hM4Di receptor expression cassette was injected in the MD, while a retrograde CAV-2 vector (Junyent and Kremer, 2015) carrying the Cre recombinase was injected in the dmPFC (Figure 2A). As a result, only thalamic cells projecting to the dmPFC were infected by both vectors and therefore expressed mCherry and hM4Di. In general mCherry expression was more evident in the lateral portion of the MD, in agreement with our current knowledge of these thalamocortical projections (Alcaraz et al., 2016a). mCherry expression was also visible to some degree in adjacent dmPFC-projecting thalamic areas such as the intralaminar group (PC and CL mostly) and, to some extent, the CM and the PV. In some cases, fluorescence was also observed in the habenula. Eight rats showed only minimal (or unilateral) levels of DREADD expression and were therefore excluded from the analyses (saline: n = 7, CNO: n = 7). Figure 2B and C illustrate the extent of mCherry expression at the thalamic level.

Dual-viral chemogenetic strategy to target TC (A) and CT pathways (D).

The CAV-2-Cre vector has a retrograde tropism. Maximal (grey) and minimal (dark) extent of DREADD expression in included rats at three rostrocaudal levels (expressed relative to Bregma, in mm) for CT (B) and TC (E) pathways. Representative examples of mCherry expression at thalamic (C) and cortical (F) levels. Insets in dashed green lines correspond to the higher magnification images provided on the right. CL: centrolateral thalamic nucleus, PC: paracentral thalamic nucleus, CM: centromedial thalamic nucleus, PV: paraventricular thalamic nucleus. A32 area corresponds to the prelimbic and most dorsal portion of the infralimbic areas in the seventh edition of the Paxinos and Watson atlas (Paxinos and Watson, 2014).

https://doi.org/10.7554/eLife.32517.004

Instrumental learning took place 1 month postsurgery. Instrumental performance progressively increased over training for both CNO- and saline-treated groups, as shown by the significant effect of Session (F(5,60) = 11.7, p<0.0001), but CNO-treated rats tended to perform fewer lever presses overall (F(1,12) = 11.4, p=0.0055) (Figure 3A). There was no significant Drug X Session interaction however (F < 1), confirming efficient instrumental learning even for the CNO-treated group. In addition, the asymptotical performance did not differ between saline-treated and CNO-treated groups during the final session of training (F(1,12) = 2.58, p=0.1338).

Chemogenetic inhibition of TC pathways.

(A) Mean number of lever presses (±sem) during instrumental training. (B) Mean number of lever presses (±sem, relative to baseline) during the instrumental choice test conducted immediately after selective outcome devaluation under extinction conditions. (C) Consumption test. (D) Mean number of lever presses (±sem, relative to baseline) during the contingency degradation procedure. (E) Mean number of lever presses (±sem, relative to baseline) for the final choice test conducted under extinction conditions.

https://doi.org/10.7554/eLife.32517.005

During devaluation by specific satiety, both groups of rats consumed the same amount of food (Saline group: 10.8 ± 0.5 g, CNO group: 12.8 ± 0.8 g; F(1,12) = 1.7, p=0.2214), indicating that basic motivational processes were not altered by CNO treatment. The ability to use current goal value to guide behavior was assessed during a choice test conducted immediately after devaluation, under extinction conditions. While the group of rats that received saline exhibited the expected adaptive behavior during that test, expressing a clear bias toward the action associated with the still valued outcome, rats that received CNO showed only little differential response toward either actions (Figure 3B). Consistent with these observations, the critical Devaluation X Drug interaction approached significance (F(1,12) = 4.0, p=0.0679), while the main effect of Devaluation (F(1,12) = 10.0, p=0.0081) but not of Drug (F < 1) reached significance. When considering each group separately, a significant effect of Devaluation was evident for the saline (F(1,6) = 7.7, p=0.0324) but not the CNO group (F(1,6) = 2.7, p=0.1543). To determine if this was due to a performance deficit during this test, we verified the dynamics of responding over time by analyzing the data as 2 min blocks. This analysis confirmed the existence of extinction with a significant effect of Block (F(9,108) = 8.3, p<0.0001) and this factor did not interact with Drug (Block X Drug (F(9,108) = 1.2, p=0.2908); Block X Devaluation X Drug (F(9,108) = 1.6, p=0.1335). Moreover, analyses conducted on each group separately confirmed that responding gradually decreased over time during this test (Saline: Block, F(4,24) = 6.6, p=0.0010; CNO: Block, F(4,24) = 7.4, p=0.0005). Thus, responding was initially higher and then declined to comparable rates for both saline- and CNO-treated groups suggesting that the impairment in the CNO group was not the result of a performance deficit. Collectively, these data therefore suggest that the CNO treatment produced a mild deficit in the ability to update goal value representation and/or its use to guide behavior. Importantly, a consumption test performed immediately after the devaluation test confirmed the effectiveness of the sensory-specific satiety, which was left unaltered by CNO administration. That is, both saline- and CNO-treated rats preferably consumed the still valued outcome when they could freely select from the two outcomes (Devaluation (F(1,12) = 25.6, p=0.0003; Drug and Drug X Devaluation, Fs < 1; Figure 3C).

After two sessions of retraining under standard conditions, rats were subjected to a new phase of instrumental training, during which the contingency between one of the actions and its associated outcome was selectively degraded. On this occasion, rats continued to receive the same treatment (CNO or saline) as that during initial instrumental learning (see Figure 1). During the degradation phase, differential responding was evident for both saline- and CNO-treated groups with lower responding when action-outcome contingency was degraded (Figure 3D), as shown by the significant Degradation effect (F(1,12) = 52.8, p<0.0001) as well as the significant Session X Degradation interaction (F(5,60) = 3.1, p=0.0152). Drug treatment did not produce any visible effect on this occasion (Drug, F(1,12) = 1.6, p=0.2305; Session x Drug, F(5,60) = 1.6, p=0.1820; all remaining Fs < 1). Thus, when the sensory feedback provided by the outcome was available, all rats were capable of exhibiting adaptive decision-making, showing that the consequences of their actions strongly affected their behavior.

Finally, a critical choice test was conducted again under extinction conditions (Figure 3E) Interestingly, the behavior exhibited by the two groups of rats was now markedly different. While saline-treated rats continued to express differential responding for both actions, CNO-treated rats were unable to do so. In line with these observations, the critical Degradation X Drug interaction was significant (F(1,12) = 5.6, p=0.0354), as were the main effects of Drug (F(1,12) = 6.8, p=0.0225) and of Degradation (P(1,12) = 6.0, p=0.0310). Further analyses confirmed that the main effect of Degradation was significant for the saline-treated (F(1,6) = 12.2, p=0.0129) but not the CNO-treated group (F < 1).

We provide as an Appendix supplemental data showing that these effect did not result from CNO alone because neither CNO nor DMSO treatment altered behavior throughout testing (Appendix 1—figures 13). Thus, inhibiting dmPFC-projecting MD neurons produced selective impairments when rats were forced to rely on representations to guide behavior. The impairment appeared to be mild when rats were required to use current goal value, but more pronounced when the contingency between an action and its consequence was altered. Overall, these data indicate a central role for thalamocortical pathways in the ability to guide choice based on current knowledge of the causal link between actions and their outcomes.

Experiment 2: Corticothalamic pathways are necessary to track changes in goal value but not instrumental contingency

Next, we used the same strategy in a distinct set of rats to examine the behavioral outcome of inhibiting dmPFC neurons projecting to the MD. For this purpose, injections sites for either viral construct were reversed (Figure 2D). The resulting mCherry expression at the cortical level is shown in Figure 2E and F. A marked expression of mCherry was evident in deep cortical layers, consistent with the existence of abundant corticothalamic projections targeting the MD from cortical layers 5/6 (Gabbott et al., 2005). Although we did not quantify the number of labelled cells, comparing experiments 1 and 2 shows that greater labelling was evident for CT cells, consistent with the view that CT cells outnumber TC cells (eg., Haber and Calzavara, 2009). Six animals showed little or unilateral DREADD expression and were not considered for analyses (saline: n = 8, CNO: n = 6).

Instrumental learning was comparable between saline- and CNO-treated rats (Figure 4A), with improved instrumental learning over training (F(5,12) = 99.3, p<0.0001). Instrumental learning was not affected by CNO (Drug, F(1,12) = 3.1, p=0.1050; Session X Drug interaction, F < 1).

Chemogenetic inhibition of corticothalamic pathways.

(A) Mean number of lever presses (±sem) during instrumental training. (B) Mean number of lever presses (±sem, relative to baseline) during the instrumental choice test conducted immediately after selective outcome devaluation under extinction conditions. (C) Consumption test. (D) Mean number of lever presses (±sem, relative to baseline) during the degradation procedure. (E) Mean number of lever presses (±sem, relative to baseline), for the final choice test conducted under extinction.

https://doi.org/10.7554/eLife.32517.006

During devaluation, all rats again consumed an equal amount of food, irrespective of whether they were treated with saline or CNO (Saline group: 9.1 ± 0.6 g, CNO group: 9.1 ± 0.7 g; F < 1). Immediately after the devaluation procedure however, the choice test conducted in extinction revealed a markedly distinct pattern of response in the two groups of rats (Figure 4B). While the saline group expressed a clear bias for the still valued option, the CNO group responded similarly for both actions, consistent with the view that they failed to use current goal value to guide behavior. Importantly, the critical Drug X Devaluation interaction reached significance (F(1,12) = 9.2, p=0.0103), providing compelling support for these observations. In addition, the main effect of Devaluation (F(1,12) = 8.1, p=0.0149) and Drug (F(1,12) = 6.6, p=0.0244) also reached significance. Separate analyses confirmed the existence of a selective deficit in CNO-treated (Devaluation, F < 1) but not saline-treated (Devaluation, F(1,7) = 23.1, p=0.0020) rats. Again, the presence of extinction was confirmed by analyzing the data as blocks of 2 min (Block, (F(19,108) = 3.9, p=0.0002). In addition, drug treatment did not interact with the general dynamics of responding during this test (Block X Drug (F(9,108) = 1.3, p=0.2721); Block X Devaluation X Drug (F(9,108) = 1.4, p=0.2062). Further analyses confirmed a significant effect of Block for both saline- (F(4,28) = 5.0, p=0.0035) and CNO-treated (F(4,28) = 5.6, p=0.0034) groups suggesting that responding decreased over time in a similar fashion for both groups. Thus, floor effect alone cannot account for the specific impairment evident during this test. Consumption tests conducted immediately after yielded essentially the same results as in experiment 1: all rats expressed a clear bias for the still valued outcome (the sensory-specific satiety procedure was efficient) and CNO treatment did not alter behavior at this stage (Devaluation, (F(1,12) = 44.0, p<0.0001; Drug, (F(1,12) = 1.9, p=0.1934; Drug X Devaluation, F < 1, Figure 4C).

After two days of retraining, the degradation phase began. Differential responding was evident for both saline- and CNO-treated groups with lower responding when action-outcome contingency was degraded (CNO treatment produced no visible effect, Figure 4D). As a consequence, the main effect of Degradation (F1,12)=15.4, p=0.0020) and Session (F(5,60) = 4.6, p=0.0014) as well as the interaction between these factors (F(5,60) = 9.8, p<0.0001) reached significance. The main effect of Drug did not reach significance (F < 1) and no interaction was observed with this factor (Degradation X Drug, F(1,12) = 1.6, p=0.2252; Session X Drug, F(5,60) = 1.6, p=0.1719; Degradation X Session X Drug, F < 1).

During the final choice test conducted in extinction (Figure 4E), all rats continued to select the action with reliable consequences as attested by the significant effect of Degradation (F(1,12) = 5.8, p=0.0331). Importantly, inhibiting corticothalamic pathways produced no noticeable effect and did not prevent rats to display adaptive decision-making even when they could only rely on represented information (Drug and Drug X Degradation, Fs <1).

Thus, dissociable patterns of performance were obtained when inhibiting thalamocortical and corticothalamic pathways during choice tests conducted under extinction conditions. A clear deficit in the ability to use recently updated goal value was observed when inhibiting the corticothalamic pathway, but the same treatment did not prevent animals to adapt to a selective change in the contingency between an action and its outcome. The latter ability was however abolished by the inhibition of the thalamocortical pathway, which also produced a mild deficit in the ability to use goal value to guide behavior.

Discussion

In this study we sought to disentangle the functional contribution of thalamocortical and corticothalamic pathways connecting the dmPFC with the MD in the context of goal-directed behaviors. The present data indicate an important contribution for both cortical and thalamic neurons in the ability to perform adaptive actions. However, while both neuronal populations were found to be important to guide behavior based on current goal value, only thalamic neurons critically supported choice based on current causal relationships. Importantly, inhibiting corticothalamic and thalamocortical pathways produced very specific patterns of behavioral alterations, apparent only when no rewards were available. Thus, deficits only appeared during tests conducted under extinction conditions, suggesting that functional interactions between cortical and thalamic areas are important to guide behavior based on the current content of mental representations. Even then, inhibition of the corticothalamic pathway left the ability to guide choice based on current action-outcome contingency unaltered.

Importantly, control experiments showed that CNO injections alone did not alter instrumental behavior at any stage of the task (Appendix 1). Together with the absence of effects of CNO on consumption (Figures 3C and 4C), this appears to be sufficient to rule out any non-specific impact of CNO, e.g. due to clozapine conversion (Gomez et al., 2017). But more importantly, it points to a specific role for projections-defined thalamic and cortical neurons in cognitive processes that are necessary when the task includes unobservable information (Bradfield et al., 2015).

Recent studies have emphasized a consistent role for thalamic nuclei to sustain cortical activity when holding online information is important for subsequent choice (Bolkan et al., 2017; Schmitt et al., 2017). The present data support this view as only dmPFC-projecting MD neurons were found to be important to support choice based on the current mental representation of action-outcome contingency. Thus, one role of the MD could be to provide online information to support choice when no observable element can help to retrieve action-outcome contingency. By itself, this result is also consistent with the effects of global chemogenetic inhibition of the MD (Parnaudeau et al., 2015). Similarly, recent findings obtained in primate suggested that MD-lesioned monkeys are unable to persist in successful strategies (Chakraborty et al., 2016), hinting at a similar problem of maintaining an accurate representation of the associative structure of the current task over time.

The role of the dmPFC in goal-directed actions is now well established (Corbit and Balleine, 2003; Killcross and Coutureau, 2003; Tran-Tu-Yen et al., 2009; Hart and Balleine, 2016), and is crucial for the acquisition of instrumental action-outcome associations (Tran-Tu-Yen et al., 2009). However, following acquisition, the role of the dmPFC in adapting instrumental responses to contingency changes appears complex, depending on the accessibility of the reward (Corbit and Balleine, 2003) and the nature of contingency changes (Coutureau et al., 2012). As a result, the available data therefore suggest that the dmPFC is differentially implicated in guiding choice based on current goal value or current action-outcome contingency (Naneix et al., 2009). Our data show that MD-projecting dmPFC neurons were necessary for representing/using current goal value, but not current action-outcome contingency. It is therefore possible that other pathways originating from the dmPFC and preserved in the present study may be relevant to track contingency changes.

Overall levels of responding appeared to be somewhat low during the initial devaluation test, especially when inhibiting dmPFC-projecting MD neurons (Experiment 1). Interestingly, this last feature is reminiscent from classic studies showing lower levels of instrumental performance in rats sustaining MD lesions (Corbit et al., 2003). However, in our study the same rats exhibited high level of responding during the degradation phase, suggesting that this disturbance was at best transitory. Low levels of performance are sometimes reported even in controls during devaluation tests (Corbit et al., 2003; Bradfield and Balleine, 2017). It seems unlikely that low levels of performance alone could account for the specific impairments during devaluation tests because CNO-treated rats exhibited normal extinction at this occasion. Since responding diminished over time during this test for all rats in a comparable fashion, performance was initially above floor level. We cannot exclude however that the impairments resulted from generalization on the two actions available during the choice test. We were actually concerned beforehand about this possibility, which prompted us to use two clearly distinct manipulanda (a lever and a tilt), unlike the two levers most commonly adopted in the literature. This should limits the possibility that rats generalize current goal value for both actions. In experiment 1 in particular, CNO-treated rats behave as if they were generalizing but this could result from an inability to select the correct option in the absence of the sensory feedback provided by the reward.

The identification of a specific role for corticothalamic projections not only strongly argues against the view of a thalamus acting only as a relay, but also suggests a specific role for these pathways in cognition (Crandall et al., 2015; Guo et al., 2017). Understanding the functional relevance of these corticothalamic pathways is an important issue as conceptual views posit that they may contribute to cortical functioning by enabling transthalamic communication between cortical areas, thus offering supplemental integrative opportunities (Sherman and Guillery, 2011; Sherman, 2016). The functional contribution of thalamocortical pathways appears to be consistent with that of a general role of the thalamus to direct attention toward task’s elements relevant for successful performance (Wolff et al., 2015a; Wolff et al., 2015b), not only in the presence of cues, but also when using the current content of mental representation is required for successful performance.

In conclusion, we provide causal evidence that thalamocortical and corticothalamic pathways connecting the dmPFC and the MD support at least partially dissociated goal attributes. These results highlight the directionality of the functional exchanges within neural circuits as one of their fundamental features (see also Bolkan et al., 2017; Lichtenberg et al., 2017), which calls for a more systematic functional assessment of reciprocally connected pathways. Past research has indicated a time-limited role for both the dmPFC (Ostlund and Balleine, 2005; Tran-Tu-Yen et al., 2009) and the MD (Ostlund and Balleine, 2008) in the acquisition of goal-directed behaviors. While studies that have directly examined functional interactions between cortical and thalamic areas have generally used permanent interventions (Bradfield et al., 2013; Browning et al., 2015), as was the case in the present study, proceeding to stage-limited interventions appears as a valuable prospect to further refine the functional contribution of projections-defined neurons.

Materials and methods

Animals and housing conditions

42 male Long Evans rats weighting 275 g to 300 g at surgery were obtained from Centre d’Elevage Janvier (France). Rats were initially housed in pairs and accustomed to the laboratory facility for two weeks before the beginning of the experiments. Environmental enrichment was provided by tinted polycarbonate tubing elements, in accordance with current French (Council directive 2013–118, February 1, 2013) and European (directive 2010–63, September 22, 2010, European Community) laws and policies regarding animal experiments. The facility was maintained at 21 ± 1°C with lights on from 7 a.m. to 7 p.m. The experimental protocols received approval #5012053-A from the local Ethics Committee on December 7, 2012. After histological verification (see below), the final group sizes were: thalamocortical: n = 7 for saline, n = 7 for CNO; corticothalamic: n = 8 for saline, n = 6 for CNO.

Surgery

Rats were anaesthetized with 4% Isoflurane and placed in a stereotaxic frame with atraumatic ear bars (Kopf, Tujunga, CA) in a flatskull position. Anaesthesia was maintained with 1.5–2% Isoflurane complemented by subcutaneous administration of buprenorphin (Buprecare, 0.05 mg/kg). CAV-2 and AAV were pressure injected (Picospritzer, General Valve Corporation, Fairfield, NJ) into the brain through a glass micropipette (outside diameter: around 100 µm) and polyethylene tubing. For MD-to-dmPFC pathway targeting, 1 µl of 1 × 109 genomic copies/µl of CAV2-Cre (Biocampus PVM, Montpellier, France) was injected bilaterally in the PL at the following coordinates: AP +3.2 mm from bregma, laterality ±0.6 mm, ventrality −3.4 mm from skull. In the same surgery session, 1 µl of 1 × 109 genomic copies/µl of AAV-hSyn-DIO-hM4Di-mCherry (UNC Vector Core, USA) was injected bilaterally in the MD at the following coordinates: AP −2.6 mm, laterality ±0.7 mm and ventrality −5.6 mm. For dmPFC-to-MD pathway targeting, virus injections were reversed, that is, CAV-2 in the MD and AAV in the dmPFC. All injection parameters were the same, except for the mediolateral coordinates of AAV injection in the dmPFC, set at ±0.8 mm, in order to preferentially target the cortical layers V and VI which project to the MD. In all groups, the pipette was left in place 5 min after injection before slow retraction. To allow for optimal viral expression, rats were given one month of recovery before behavioral testing began.

Behavioral experiments

Behavioral apparatus

Animals were trained in eight identical conditioning chambers (40 cm wide x 30 cm deep x 35 cm high, Imetronic, France), each located inside a sound and light-attenuating wooden chamber (74 × 46 × 50 cm). Each chamber had a ventilation fan producing a background noise of 55 dB and four LEDs on the ceiling for illumination. Each chamber had two opaque panels on the right and left sides, two clear Perspex walls on the back and front sides and a stainless-steel grid floor (rod diameter: 0.5 cm; inter-rod distance: 1.5 cm). In the middle of the left wall, a magazine (6 × 4.5 × 4.5 cm) received either grain or sucrose pellets (45 mg, F0165, Bio Serv, NJ, USA) from dispensers located outside the operant chamber. The magazine was equipped with infra-red cells to detect the animal’s visits. A retractable lever (4 × 1 × 2 cm) could be inserted next to the magazine as did a ‘tilt’, a vertical rod hinged on the ceiling and terminated by a small plastic ball. Pressing the lever or pushing the tilt in any direction were therefore the two distinct actions that rats could perform during instrumental tasks. Activation of either the lever or the tilt produced the delivery of the associated outcome, as a function of the current procedure (i.e. FR1, RR5 or RR10, see below). A personal computer connected to the operant chambers and equipped with POLY software and interface (Imetronic, France) controlled the equipment and recorded the data.

Instrumental training

Rats were first habituated to the magazine dispenser through two daily sessions of magazine training for 2 days. A session consisted in the delivery of 30 food rewards, grain or sucrose pellets, distributed randomly through a 30 min session. The first session took place in the morning, and the second in the afternoon, with the order of rewards counterbalanced between rats and days. Twelve daily sessions of instrumental training began the day after the last session of magazine training, during which rats had to make specific associations between two responses (lever press or tilt action) and the two different outcomes. Daily training consisted in instrumental learning with either the lever or the tilt, each specifically associated with one of the outcome (i.e. either grain or sucrose pellets, see Figure 1). For clarity, blocks of instrumental performance were considered for analyses on two consecutive sessions (one with the tilt, one with the lever, then averaged for the analysis). Daily training was completed when 30 rewards were earned or 30 min had elapsed. The action-outcome associations and the order of their presentations were counterbalanced between rats and days. For the four first sessions, each action was reinforced. Then, for sessions 5 to 8, a random ratio schedule of 5 was introduced (2 to 10 actions were necessary to obtain the reward, probability of receiving an outcome given a response = 0.2). Sessions 9 to 12 were performed with a RR10 schedule (4 to 20 actions were necessary to obtain the reward, probability of receiving an outcome given a response = 0.1). The last instrumental session with each action (RR10, highlighted in Figure 1) was used as a measure of baseline performance for the devaluation test while the last retraining session after this devaluation test (RR10, highlighted in Figure 1) was used as a measure of baseline performance for the degradation phase, including the choice test (see Figure 1).

Outcome devaluation test

The day after the last session of training, rats were placed in a plastic feeding cage containing free access of 15 g of one of the two outcomes for one hour of devaluation. Half of the rats in each response-outcome assignment received grain pellets, the remaining receiving sucrose pellets. Immediately after, rats were put in the operant cages for a 10 min extinction test. During the test, both actions were available but unrewarded. This ensured that rats were using representations of the response-outcome contingencies and outcome value to guide their behavior. Animals that received saline during training also received saline during the test and the same logic applied for animals that had received CNO. Performance was quantified relative to prior baseline levels.

Consumption test

After the extinction test, rats were put in the plastic feeding cage used for outcome devaluation. They had free access first to 5 g of one outcome for 15 min, and then to 5 g of the other outcome for 15 min. Food consumed was then measured for each outcome. Order of outcome presentation was counterbalanced between rats and groups.

Degradation procedure

One day after completion of the consumption test, rats received two supplemental sessions of RR10 to reinstate regular instrumental training. Immediately after, the degradation procedure began. For one of the action-outcome associations, the contingency between the action and its consequences was maintained identical to that used during instrumental training (RR10 training) but for the other, the contingency was degraded by delivering the same overall number of rewards randomly even if no action was performed. For both the degradation phase and the test, performance was quantified relative to prior baseline levels.

CNO preparation and injection

CNO (Enzo Life Science) was diluted in saline with 0.5% of DMSO at a final concentration of 1 mg/ml. CNO groups received a daily CNO i.p. injection one hour before each training and testing session (1 mg/kg) while saline groups received a daily saline i.p. injection (1 ml/kg of 0.9% Saline) one hour before each training and testing session (see Figure 1). We recently demonstrated the efficacy of CNO administration in reducing neuronal activity at a dose of 1 mg/kg using the same reagents and suppliers (Parkes et al., 2017). All animals were submitted to surgery then allocated to CNO or saline groups on a random basis prior to training.

Histology

Rats were perfused transcardially with 150 ml of saline followed by 400 ml of 4% paraformaldehyde (PFA). Brains were kept in the same PFA solution overnight, then sections of 40 µm of the prefrontal cortex and the thalamus were made using a vibratome. Immunochemistry was performed on the sections to enhance the mCherry staining. First, sections were rinsed in PBS 0.1M (5 × 5 min), and then incubated in a blocking solution for 1 hr (4% goat serum and 0.2% Triton X-100 in PBS 0.1 M). Immediately after, sections were put in a bath containing primary antibodies, rabbit anti-RFP (Clinisciences, PM005) primary antibodies diluted at 1/200 in the blocking solution for incubation at 4°C for 48 hr. Sections were then rinsed in PBS 0.1 M (4 × 5 min) and placed for 2 hr in a bath containing a goat anti-rabbit coupled to DyLight 549 (1/200 in PBS 0.1 M) (Jackson ImmunoResearch, 111-025-003) for two hours. Following four 5 min rinses in PBS 0.1 M, Hoechst solution (bisBenzimide H 33258, Sigma, B2883) for counterstaining was added for 15 min (1/5000 in PBS 0.1 M). Finally, sections were rinsed in PB 0.1M (4 × 5 min), mounted in PB 0.05 M onto gelatin-coated slides and coverslipped with the anti-fading reagent Fluoromount G (SouthernBiotech, 0100–01). Images were then captured using a Nanozoomer slide scanner (Hamamatsu Photonics) and analyzed with the NDP.view 2.0 freeware (Hamamatsu Photonics). Histology was performed by FA, VF and MW independently, while being blind to behavioral data.

Data analysis

The data were submitted to ANOVAs on StatView software (SAS Institute Inc.). For both experiments, Drug (saline/CNO) was the between subject factor, and Devaluation (Devalued/Non Devalued), Degradation (Degraded/Non degraded) and Session (averaged over both actions) were repeated measures when appropriate. The alpha value for rejection of the null hypothesis was 0.05 throughout.

Appendix 1

To ensure that CNO injections alone did not alter goal-directed behaviors, three groups of rats (300–330 g at the start of the experiment, obtained from Janvier Labs) were injected with CNO (1 mg/kg; n = 11), saline (0.9%, n = 6) or saline +0.5% DMSO (n = 6). We found that neither CNO nor DMSO affected behavior.

Instrumental training

Initial instrumental performance was similar to that observed when inhibiting thalamocortical and corticothalamic pathways (Figure 3A; Figure 4A). As shown in Appendix 1—figure 1A, instrumental performance significantly increased (F(5,100) = 146.0, p<0.0001) in a comparable fashion across the three groups. While performance in the saline group appeared to be lower initially, it reached an asymptotical level comparable to that of the other two groups thereafter. Therefore, these analyses revealed no significant effect of Drug (F(2,20) = 2.9, p=0.0773) or of Session X Drug interaction (F(10,100) = 1.2, p=0.3319).

Appendix 1—figure 1
Assessment of CNO without DREADD expression.

(A) Mean number of lever presses (±sem) during instrumental training. (B) Mean number of lever presses (±sem, relative to baseline) during the instrumental choice test conducted immediately after selective outcome devaluation under extinction conditions. (C) Consumption test.

https://doi.org/10.7554/eLife.32517.009

Devaluation test

During the devaluation test that immediately followed sensory-specific satiety, a clear bias towards the action associated for the still valued outcome was evident for all groups (Appendix 1—figure 1B). This observation was confirmed by a significant effect of Devaluation (F(1,20) 89.5, p<0.0001). We found no indication that any drug treatment was affecting rats during this test (Drug, F(2,20) = 1.0, p=0.3784; Drug X Devaluation, F(2,20) = 1.5, p=0.2493).

Consumption test

Immediately after the devaluation test conducted under extinction conditions, a consumption test was performed as described in the methods. All rats expressed a marked preference for the food reward that was not given during satiety (Appendix 1—figure 1C). As a result, the main effect of Devaluation was highly significant (F(2,20) = 68.0, p<0.0001). Again, drug treatment produced no detectable effect (Drug, Drug X Devaluation, Fs <1).

Action-outcome degradation

During the degradation procedure, differential responding was evident for all rats with lower responding when action-outcome contingency was degraded (Appendix 1—figure 2). As a result, the analyses produced a significant effect of Degradation (F(1,20) = 161.5, p<0001), Session (F(5,100) = 2.6, p=0.0280) and of the Degradation X Session interaction (F(5,100) = 24.0, p<0.0001). We found no evidence that any drug treatment would affect performance at this stage (Drug, (F(1,20) = 1.4, p=0.2771; Drug X Degradation, Drug X Session, Drug X Degradation X Session, Fs <1).

Appendix 1—figure 2
Assessment of CNO without DREADD expression.

Mean number of lever presses (±sem, relative to baseline) during the contingency degradation procedure.

https://doi.org/10.7554/eLife.32517.010

Action-outcome degradation, choice test in extinction:

During the final choice test (Appendix 1—figure 3), differential responding was evident for all rats, irrespective of drug treatment. As a result the main effect of Degradation was significant (F(1,20) = 32.5, p<0.0001) while the Drug X Degradation interaction was not (F < 1). Responding was slightly higher in the group that received DMSO alone on this occasion, as indicated by the main effect of Drug approaching significance (F(2,20) = 3.3, p=0.0589). However the critical CNO versus Saline comparison yielded no significant effect (post-hoc Scheffe’s test, p=0.6381). In addition, further analyses conducted on each group separately confirmed the existence of a Degradation effect for the DMSO-treated (F(1,5) = 8.5, p=0.0334) and the CNO-treated (F(1,10) = 23.1, p=0.0007) groups while it approached significance for the saline group (F(1,5) = 6.0, p=0.0579).

Appendix 1—figure 3
Assessment of CNO without DREADD expression.

Mean number of lever presses (±sem, relative to baseline) for the final choice test conducted under extinction conditions.

https://doi.org/10.7554/eLife.32517.011

Thus, altogether, these supplemental data confirmed that CNO treatment alone could not account for the effects observed in the main study.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
    Associative learning and animal cognition
    1. A Dickinson
    (2012)
    Philosophical Transactions of the Royal Society B: Biological Sciences 367:2733–2742.
    https://doi.org/10.1098/rstb.2012.0220
  17. 17
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
    Neurobiology of Sensation and Reward
    1. JP O'Doherty
    (2011)
    Reward predictions and computations, Neurobiology of Sensation and Reward, Boca Raton (FL).
  29. 29
  30. 30
  31. 31
  32. 32
  33. 33
  34. 34
    The Rat Brain in Stereotaxic Coordinates (7th Edn)
    1. G Paxinos
    2. C Watson
    (2014)
    San Diego: Academic Press.
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42

Decision letter

  1. Geoffrey Schoenbaum
    Reviewing Editor; National Institute on Drug Abuse, National Institutes of Health, United States

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Thalamocortical and corticothalamic pathways support distinct goal attributes" for consideration by eLife. Your article has been favorably evaluated by Timothy Behrens (Senior Editor) and three reviewers, one of whom, Geoffrey Schoenbaum (Reviewer #1), is a member of our Board of Reviewing Editors. The following individuals involved in review of your submission have agreed to reveal their identity: Sean Ostlund (Reviewer #2); Laura Corbit (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

The authors use genetic techniques to test the hypothesis that reciprocal connections between PFC and MD thalamus might play distinct roles in goal directed behavior. They report that projections to MD are critical for selective changes in instrumental behavior after devaluation, whereas projections from MD are necessary for selective changes in instrumental behavior after both devaluation and continency degradation.

Essential revisions:

There are two essential revisions. The first is the need for a control group to rule out CNO effects, given the recent report by Michaelides that CNO has its effect through peripheral conversion to clozapine, which then enters the CNS and could act at other sites. The addition of a separate group and comparison to the controls might be acceptable, rather than repeating the entire experiment, to rule this out.

The second essential revision is to address the alternative interpretations given the effect of CNO/DREADD on response rates in the critical tests. A fair discussion of this or additional data to rule out extinction, generalization and a floor effect on responding is necessary.

Reviewer #1:

In this study, the authors use genetic approaches to explore the effects of specifically inactivating connections between DMPFC and MD thalamus on instrumental devaluation and contingency degradation. They are exploring the interesting and novel question of whether projections in different directions between these areas may play dissociable roles in the behavior. They use DREADD receptors and retrogradely transported CRE recombinase to target directional projections and inject CNO systemically during instrumental learning, devaluation and contingency degradation testing to inactivate the projections. They show that projections to MD are critical for selective changes in instrumental behavior after devaluation, whereas projections from MD are necessary for selective changes in instrumental behavior after both devaluation and continency degradation. From this they conclude that the former projection is necessary for current value representations and the latter for maintaining or using action-outcome representations. At least that is my reading.

The study is well-conceived and tests an interesting and important question. In my opinion, it has two major problems however. One is not specific to this study, but is critical nonetheless, and it involves the recent report that the DREADD receptors are activated by clozapine and not by peripherally administered CNO. Specifically, my understanding of these data is that they show that CNO does not in fact enter the brain when injected peripherally. Instead it is converted to clozapine, and it is clozapine that acts on the receptor. Obviously, clozapine also has other endogenous sites of action. This means that the two-group approach used here is not sufficient for using this tool, since any effect in the experimental group could be due to the interaction between the clozapine and the DREADD receptor, as planned by the authors, or it could reflect the action of clozapine in some other area. To rule out the latter interpretation requires a CNO only control group. There is really no way around this it seems to me, given this recent report. Given that the authors were presumably blindsided by this new finding, I think the addition of this group in supplemental or in some other way would be acceptable (i.e. it is not necessary to repeat the entire experiment), but I think it must be added, otherwise we all look foolish.

The second problem I have is specific to this experiment, and it is the low levels of responding in the experimental groups. That is, the critical effects occur largely because the CNO/DREADD group stops responding. It seems to me that this could reflect the functions the authors claim. However it could also reflect a simple lack of responding, or it could also reflect more rapid extinction learning or generalization I think, independent of the claimed effects. This problem is not unique to this experiment; many studies like this suffer from this issue. It also affects both experiments, but perhaps is more of a problem in the first. I think it is important to note it clearly in the discussion of the effects, along with whatever mitigating observations the authors want to make. To directly address I think one might restrict the analysis of devaluation and contingency degradation to performance matched pairs. That is, the authors could analyze rats that responded similarly at the end of training, or that responded similarly to the valued or non-degraded response in the test. If the impact of treatment on the devaluation or degradation persisted in these specific rats, this would be good evidence that it was independent of any overall performance effect. Or one might look at responding across time in the critical test sessions to rule out a general increase in extinction in the face of non-reward. In any event, I think this must be fairly discussed. Maybe one of these other functions is really what is going on?

Reviewer #2:

This study investigates an important topic using techniques that are generally appropriate. The findings tend to support the authors' claim that direct connections between the dmPFC and MD thalamus are important for goal-directed behavior, with dmPFC-->MD and MD-->dmPFC projections underlying sensitivity to outcome value but only MD-->dmPFC underlying action-outcome contingency. Although these findings would be novel, there are a few issues that make data interpretation difficult, and that limit the significance of this project, as it stands.

1) Since no DREADD-free controls are used, it is unclear if CNO's effects on behavior are dependent on hM4Di expression. Given recent concerns about the back conversion of CNO to clozapine, a drug which has actions at various endogenous receptors, it is highly recommend that all DREADD studies include such controls. The authors discuss this issue and offer an argument that CNO did not disrupt certain behaviors (e.g., consumption) and had partially dissociable effects in groups in which different pathways were targeted. The fact that outcome devaluation was disrupted by CNO in both groups, however, leaves open the possibility that this aspect of behavior is disrupted by unconditional CNO effects. It therefore seems prudent to add additional data that speaks to this alternative account. Also, although the concentration of CNO is described, it's unclear what dose was used for treatment and whether it was consistently used across training/testing stages. Further discussion of how this dose was selected and how repeated exposure to CNO might be expected to impact the efficacy of this treatment would also be appropriate. Finally, it seems that saline was used as a control treatment rather than the actual DMSO-containing vehicle. Please clarify. This makes it that much more important to assess the unconditional effects of the CNO-DMSO solution.

2) In both experiments the CNO group showed generally low levels of responding during devaluation testing (rather than indiscriminate responding). This is not surprising given the results of lesions studies, but the authors should consider that a floor effect obscured detection of sensitivity to devaluation. Importantly, test data are presented as% baseline performance. Therefore this is a bigger concern for experiment 1 (MD-dmPFC group) since CNO had a substantial effect on baseline (training) performance, and also seems to have further suppressed responding at test to very low levels. (Indeed, despite this and the relatively small n's, there was a nonsignificant trend towards devaluation (p = 0.15) in this group). I would be interested to know whether similar deficits were observed early in test sessions, when response rates were likely higher. It is important to note that although the authors explain that the CNO did not differ from the vehicle group at the end of training, the rest of the training data suggests otherwise. There was most likely insufficient power to find significant effects on any given training day (despite the clear trend).

3) The methods do not clearly state how CNO was delivered during contingency training, e.g., that the same rats get CNO during both rounds of training/testing. More should be done clarify the CNO treatment during retraining sessions. Also, apparently there was an initial devaluation test in which all rats were tested on vehicle. This data was not presented but would speak to whether chemoinhibition of targeted pathways during encoding only (not at test) were sufficient to disrupt action-outcome based response retrieval. In general, discussion of literature indicating the stage-limited roles for dmPFC and MD in goal-directed learning but not performance is somewhat vague (e.g., Results, first paragraph and Discussion, fourth paragraph) and may be confusing to the reader, particularly given that the main approach and findings here do not attempt to investigate this further.

Reviewer #3:

This article by Alcaraz and colleagues uses a chemogenetic approach (DREADDs) to investigate the specific role of thalamocortical and corticothalamic pathways in instrumental learning and control. Both the medial prefrontal cortex (particularly the prelimbic region) and the mediodorsal thalamus have been shown to be involved in instrumental learning in previous work using lesion methods but these previous methods do not allow the role of the individual circuits to be isolated. The current results are an important contribution and advancement in that regard.

The manuscript uses appropriate methods for both behavioural and neural manipulations allowing strong conclusions to be drawn from the data. These findings make an important contribution to understanding of the circuits underlying instrumental learning and corticothalamic interactions in general. Further, they highlight the need to carefully examine the role of each pathway within reciprocal circuits to fully appreciate their contribution to behavior.

I have several specific comments for the authors to consider.

- I thought a schematic outlining the design, perhaps added as a panel in each figure, would greatly aid understanding of the behavioural design without going into great detail in the body of the manuscript (when was CNO given (training, testing, retraining, etc.?), point out that animals are trained on two R-O relationships (R1-O1, R2-O2 etc.)).

- The sample size ends up being rather small after exclusions. While I have a difficult time imagining how adding more animals would change the overall pattern of results, it would allow stronger conclusions and assuage any reservations in readers less familiar with these methods. For example, in Experiment 1, it's awkward that the groupxdevaluation interaction is not significant. I can live with this as the simple effects confirm the impairment in the CNO group and not in the saline group which is very much what's suggested by the data in Figure 1. But the authors are then forced to describe the impairment as mild, thereafter. To me, this has implications for interpretation; is the impairment weak or is the power weak? While subtle, different conclusions might be reached if power could be ruled out as a contributor to the effect. So I'm not requiring more experiments, and perhaps the authors are attempting to tread lightly considering their statistical support, however, it would be worth giving further consideration to the description of the results.

- I felt the histology figures were too small for the reader to make their own assessment of the expression of virus and it wasn't possible to evaluate the extent of overlap of the two vectors. The description of the histology methods and analyses was very brief. I am confident that placements were in the targeted area but there may be more subtle aspects of expression (degree, things like cortical layers, etc.) that readers may like to see themselves. Could the image be larger or a zoom be inset?

- Subsection “CNO preparation and injection” – here there is peculiar mention of all groups receiving saline before "the first session of tests" and then being split into saline and CNO groups for a section devaluation test. Is this what happened? I thought rats were in consistent treatment groups throughout and only data for a single devaluation test is reported? Please clarify. This isn't entirely trivial as there's some indication in the literature that the MD may contribute to acquisition but not expression of R-O learning.

https://doi.org/10.7554/eLife.32517.015

Author response

Reviewer #1:

[…] The study is well-conceived and tests an interesting and important question. In my opinion, it has two major problems however. One is not specific to this study, but is critical nonetheless, and it involves the recent report that the DREADD receptors are activated by clozapine and not by peripherally administered CNO. Specifically, my understanding of these data is that they show that CNO does not in fact enter the brain when injected peripherally. Instead it is converted to clozapine, and it is clozapine that acts on the receptor. Obviously, clozapine also has other endogenous sites of action. This means that the two-group approach used here is not sufficient for using this tool, since any effect in the experimental group could be due to the interaction between the clozapine and the DREADD receptor, as planned by the authors, or it could reflect the action of clozapine in some other area. To rule out the latter interpretation requires a CNO only control group. There is really no way around this it seems to me, given this recent report. Given that the authors were presumably blindsided by this new finding, I think the addition of this group in supplemental or in some other way would be acceptable (i.e. it is not necessary to repeat the entire experiment), but I think it must be added, otherwise we all look foolish.

We have added the required controls, which are included as supplemental material. We repeated the whole experimental procedure (depicted in the new Figure 1) with three new and independent groups that were treated with either 1 mg/kg CNO (n = 11), 0.9% saline (n = 6) or saline + DMSO (n = 6) to establish this important control (the latter was a suggestion from reviewer 2). These new data show that CNO injections alone, in the absence of DREADDs, did not affect behavior at any stage of testing.

The second problem I have is specific to this experiment, and it is the low levels of responding in the experimental groups. That is, the critical effects occur largely because the CNO/DREADD group stops responding. It seems to me that this could reflect the functions the authors claim. However it could also reflect a simple lack of responding, or it could also reflect more rapid extinction learning or generalization I think, independent of the claimed effects. This problem is not unique to this experiment; many studies like this suffer from this issue. It also affects both experiments, but perhaps is more of a problem in the first. I think it is important to note it clearly in the discussion of the effects, along with whatever mitigating observations the authors want to make. To directly address I think one might restrict the analysis of devaluation and contingency degradation to performance matched pairs. That is, the authors could analyze rats that responded similarly at the end of training, or that responded similarly to the valued or non-degraded response in the test. If the impact of treatment on the devaluation or degradation persisted in these specific rats, this would be good evidence that it was independent of any overall performance effect. Or one might look at responding across time in the critical test sessions to rule out a general increase in extinction in the face of non-reward. In any event, I think this must be fairly discussed. Maybe one of these other functions is really what is going on?

Responding across time:

We have conducted supplemental analyses and produced a new paragraph in the Discussion dealing with these issues (fifth paragraph). Analyzing the rate of responding over time during the devaluation test (treated as five 2 min blocks) confirmed the existence of extinction with a significant effect of Block, which did not interact with Drug (Experiment 1: Block (F(9,108) = 8.3, P < 0.0001); Block X Drug (F(9,108) = 1.2, P = 0.2908); Block X Devaluation X Drug (F(9,108) = 1.6, P = 0.1335); Expt2: Block (F(19,108) = 3.9, P = 0.0002); Block X Drug (F(9,108) = 1.3, P = 0.2721); Block X Devaluation X Drug (F(9,108) = 1.4, P = 0.2062). Moreover, analyses conducted separately on saline- and CNO-treated rats confirmed that responding gradually decreased over time during this test in each group (Saline: Block, F(4,24) = 6.6, P = 0.0010; CNO: Block, F(4,24) = 7.4, P = 0.0005), see the figures produced below). These data are now included in the corresponding Results sections. We found no evidence supporting the view that extinction processes during that test could be differently affected by saline versus CNO treatment.

Floor effect:

The same analysis allows us to also address the issue of a potential performance floor effect (a concern also raised by reviewer 2). It provides evidence showing that performance was not minimal at the start of the devaluation test for both groups of animals, especially when inhibiting thalamocortical pathway. Indeed, CNO-treated rats responded initially at higher rates for the action associated with the devalued outcome as shown by Author response image 1 and the related analyses produced below.

Author response image 1
Experiment 1: inhibiting thalamocortical pathway.
https://doi.org/10.7554/eLife.32517.013

When focusing only on the initial 2 min block during which responding was maximal, the analyses revealed an overall effect of Devaluation (F(1,12) = 4.8, P = 0.0499) but not of Drug (F<1). Interestingly, the critical Drug X Devaluation now reached significance (F(1,12) 4.7, P = 0.0501). Further analyses aimed at comparing responding rates for actions associated with the valued and the devalued outcome during this initial 2 min block produced the following: while the main effect of Drug did not reach significance for the still valued option (F(1,12) = 2.5, P = 0.1338), it approached significance for the devalued one (F(1,12) = 3.7, P = 0.0782), CNO-treated rats tended to respond more than saline-treated rats). Thus, the specific impairment exhibited by CNO-treated rats was also evident when responding was maximal and therefore, cannot result directly from performance floor effect.

Qualitatively similar findings were evident when considering initial responding (i.e. during the initial 2 min block) during the devaluation test performed for experiment 2 (inhibiting corticothalamic pathway). Both the effect of Devaluation and Drug reached significance (F(1,12) = 7.8, P = 0.0164; F(1,12) = 6.3, P = 0.0271, respectively), as did the Drug X Devaluation interaction (F(1,12) = 10.7, P = 0.0066). On this instance responding was lower in the CNO-treated group for the still valued option (F(1,12) = 10,1 P = 0.0079) but not for the devalued one (F<1).

On top of these additional data, we acknowledge that overall responding was low during devaluation tests. Overall, levels of performance seem to vary depending on labs and batches of rats. For example, in this recent study from Balleine and colleagues, as little as 3 (Devalued) or 5 (Non-devalued) presses per minute were observed during an initial 10 min devaluation test (Bradfield and Balleine, JN2017, Figure 1E, even lower values are reported thereafter) following instrumental training with a RR20 schedule, instead of a RR10 in the present study (which should result in even higher levels of responding). Similarly, 1 (Dev) vs. 8 (NDev) for Sham or 2 (Dev) vs. 2 (NDev) presses / minute for the MD group were reported in the classic MD lesion study from Corbit et al., EJN 2003. These comments have been added in the Discussion (fifth paragraph). Please also note that performance during the degradation procedure was not affected by CNO treatment during training, but specifically impaired during the test for the TC group only, with substantial levels of performance on this instance.

Generalization:

Finally, while we cannot exclude a possible generalization, we used two clearly distinct manipulanda (a lever and a tilt), which should limit any generalization as now more clearly pointed in the Discussion (fifth paragraph). It is possible that what appears to be generalization may actually result from an inability to select the correct option in the absence of the sensory feedback provided by reward to guide behavior.

Reviewer #2:

This study investigates an important topic using techniques that are generally appropriate. The findings tend to support the authors' claim that direct connections between the dmPFC and MD thalamus are important for goal-directed behavior, with dmPFC-->MD and MD-->dmPFC projections underlying sensitivity to outcome value but only MD-->dmPFC underlying action-outcome contingency. Although these findings would be novel, there are a few issues that make data interpretation difficult, and that limit the significance of this project, as it stands.

1) Since no DREADD-free controls are used, it is unclear if CNO's effects on behavior are dependent on hM4Di expression. Given recent concerns about the back conversion of CNO to clozapine, a drug which has actions at various endogenous receptors, it is highly recommend that all DREADD studies include such controls. The authors discuss this issue and offer an argument that CNO did not disrupt certain behaviors (e.g., consumption) and had partially dissociable effects in groups in which different pathways were targeted. The fact that outcome devaluation was disrupted by CNO in both groups, however, leaves open the possibility that this aspect of behavior is disrupted by unconditional CNO effects. It therefore seems prudent to add additional data that speaks to this alternative account. Also, although the concentration of CNO is described, it's unclear what dose was used for treatment and whether it was consistently used across training/testing stages. Further discussion of how this dose was selected and how repeated exposure to CNO might be expected to impact the efficacy of this treatment would also be appropriate. Finally, it seems that saline was used as a control treatment rather than the actual DMSO-containing vehicle. Please clarify. This makes it that much more important to assess the unconditional effects of the CNO-DMSO solution.

We have clarified that the CNO dose was 1 mg/kg as this dose is highly standard and was previously shown to be sufficient to elicit neuronal inhibition in vitro, in a recently published study from our group relying on similar reagents and procedures (Parkes et al., 2017). See in particular subsection “CNO preparation and injection” in the Materials and methods.

We have produced the requested controls as supplemental material. These show that neither treatment with 1 mg/kg of CNO, nor DMSO administration alone alter behavior at any stage of testing.

2) In both experiments the CNO group showed generally low levels of responding during devaluation testing (rather than indiscriminate responding). This is not surprising given the results of lesions studies, but the authors should consider that a floor effect obscured detection of sensitivity to devaluation. Importantly, test data are presented as% baseline performance. Therefore this is a bigger concern for experiment 1 (MD-dmPFC group) since CNO had a substantial effect on baseline (training) performance, and also seems to have further suppressed responding at test to very low levels. (Indeed, despite this and the relatively small n's, there was a nonsignificant trend towards devaluation (p = 0.15) in this group). I would be interested to know whether similar deficits were observed early in test sessions, when response rates were likely higher. It is important to note that although the authors explain that the CNO did not differ from the vehicle group at the end of training, the rest of the training data suggests otherwise. There was most likely insufficient power to find significant effects on any given training day (despite the clear trend).

The reviewer is right in pointing the similarity between MD chemogenetic inhibition with that of neurotoxic MD lesions on levels of responding (e.g. Corbit et al., 2003). This observation is now mentioned in the Discussion (fifth paragraph). Please note however that the same animals did respond at a rate comparable with that of controls during the degradation phase.

Furthermore, we have provided extended supplemental analyses of devaluation tests above to address this point (see reviewer 1). As a short reminder, when focusing only on the initial 2 min block during the devaluation test, during which responding was maximal (see Author response image 1 and 2), the analyses yielded qualitatively similar effects. Interestingly, for experiment 1 (inhibiting TC pathway), that analysis revealed an overall effect of Devaluation (F(1,12) = 4.8, P = 0.0499) but not Drug (F<1), but the critical Drug X Devaluation now reached significance (F(1,12) 4.7, P = 0.0501). For experiment 2 (inhibiting CT pathway), the results were largely consistent with those obtained when analyzing the whole test: both the effect of Devaluation and Drug reached significance (F(1,12) = 7.8, P = 0.0164; F(1,12) = 6.3, P = 0.0271, respectively), as did the Drug X Devaluation interaction (F(1,12) = 10.7, P = 0.0066).

Author response image 2
Experiment 2: inhibiting corticothalamic pathway.
https://doi.org/10.7554/eLife.32517.014

Thus, the major findings reported in the manuscript are confirmed by restricted analyses on the beginning of the test and supplemental analyses on the dynamics of responding during devaluation test have been added in the Results as they are sufficient to rule out any potential effect of chemogenetic treatment on extinction, or on floor effect alone as the main driver of the impairment.

3) The methods do not clearly state how CNO was delivered during contingency training, e.g., that the same rats get CNO during both rounds of training/testing. More should be done clarify the CNO treatment during retraining sessions. Also, apparently there was an initial devaluation test in which all rats were tested on vehicle. This data was not presented but would speak to whether chemoinhibition of targeted pathways during encoding only (not at test) were sufficient to disrupt action-outcome based response retrieval. In general, discussion of literature indicating the stage-limited roles for dmPFC and MD in goal-directed learning but not performance is somewhat vague (e.g., Results, first paragraph and Discussion, fourth paragraph) and may be confusing to the reader, particularly given that the main approach and findings here do not attempt to investigate this further.

Methods have been clarified with respect to the CNO administration, see in particular the new Figure 1, stating clearly that CNO was administered before each behavioral session (except magazine training), which included retraining.

We apologize that the corresponding paragraph of the Materials and methods regarding CNO administration was misleading. It has been re-written as no initial test under vehicle was conducted. In addition, we added in the conclusion a fair mention to the data available regarding stage-limited roles for both the dmPFC and the MD and we now suggest that the present work could be expanded by performing similar interventions at specific stages of the task.

Reviewer #3:

[…] I have several specific comments for the authors to consider.

- I thought a schematic outlining the design, perhaps added as a panel in each figure, would greatly aid understanding of the behavioural design without going into great detail in the body of the manuscript (when was CNO given (training, testing, retraining, etc.?), point out that animals are trained on two R-O relationships (R1-O1, R2-O2 etc.)).

This has been done (new Figure 1).

- The sample size ends up being rather small after exclusions. While I have a difficult time imagining how adding more animals would change the overall pattern of results, it would allow stronger conclusions and assuage any reservations in readers less familiar with these methods. For example, in Experiment 1, it's awkward that the groupxdevaluation interaction is not significant. I can live with this as the simple effects confirm the impairment in the CNO group and not in the saline group which is very much what's suggested by the data in Figure 1. But the authors are then forced to describe the impairment as mild, thereafter. To me, this has implications for interpretation; is the impairment weak or is the power weak? While subtle, different conclusions might be reached if power could be ruled out as a contributor to the effect. So I'm not requiring more experiments, and perhaps the authors are attempting to tread lightly considering their statistical support, however, it would be worth giving further consideration to the description of the results.

Inhibiting the MD-to-dmPFC pathway during devaluation test produced mixed findings, due to the lack of interaction. See however the additional analyses showing clear evidence when focusing only on the beginning of the test (responses to reviewer 1 and 2). In addition, the impairment produced by inhibiting the dmPFC-to-MD pathway on the same occasion is very clear (with comparable group size) and inhibiting the TC pathway during degradation also produced a clear impairment. With these observations in mind, together with the extended analyses produced above, we are confident that the impairment exhibited by CNO-treated rats during experiment 1 is mild and does not artificially derive from weak power per se.

- I felt the histology figures were too small for the reader to make their own assessment of the expression of virus and it wasn't possible to evaluate the extent of overlap of the two vectors. The description of the histology methods and analyses was very brief. I am confident that placements were in the targeted area but there may be more subtle aspects of expression (degree, things like cortical layers, etc.) that readers may like to see themselves. Could the image be larger or a zoom be inset?

We have added a new Figure 2 featuring additional magnifications of mCherry expression at the level of both the dmPFC and the MD. The description of the related histology has been substantially expanded.

- Subsection “CNO preparation and injection” – here there is peculiar mention of all groups receiving saline before "the first session of tests" and then being split into saline and CNO groups for a section devaluation test. Is this what happened? I thought rats were in consistent treatment groups throughout and only data for a single devaluation test is reported? Please clarify. This isn't entirely trivial as there's some indication in the literature that the MD may contribute to acquisition but not expression of R-O learning.

As noted by reviewer 2, this is the result of the initial formulation, which was not appropriate and unfortunately misleading. The paragraph has been corrected since there was no such test.

https://doi.org/10.7554/eLife.32517.016

Article and author information

Author details

  1. Fabien Alcaraz

    1. CNRS, INCIA, UMR 5287, Bordeaux, France
    2. Université de Bordeaux, INCIA, UMR 5287, Bordeaux, France
    Present address
    Friedrich Miescher Institute for Biomedical Research, Basel, Switzerland
    Contribution
    Validation, Investigation, Methodology
    Competing interests
    No competing interests declared
  2. Virginie Fresno

    1. CNRS, INCIA, UMR 5287, Bordeaux, France
    2. Université de Bordeaux, INCIA, UMR 5287, Bordeaux, France
    Contribution
    Formal analysis, Validation, Investigation, Visualization
    Competing interests
    No competing interests declared
  3. Alain R Marchand

    1. CNRS, INCIA, UMR 5287, Bordeaux, France
    2. Université de Bordeaux, INCIA, UMR 5287, Bordeaux, France
    Contribution
    Conceptualization, Software, Formal analysis, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
  4. Eric J Kremer

    Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France
    Contribution
    Conceptualization, Resources, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
  5. Etienne Coutureau

    1. CNRS, INCIA, UMR 5287, Bordeaux, France
    2. Université de Bordeaux, INCIA, UMR 5287, Bordeaux, France
    Contribution
    Conceptualization, Supervision, Funding acquisition, Writing—review and editing
    Competing interests
    No competing interests declared
  6. Mathieu Wolff

    1. CNRS, INCIA, UMR 5287, Bordeaux, France
    2. Université de Bordeaux, INCIA, UMR 5287, Bordeaux, France
    Contribution
    Conceptualization, Supervision, Funding acquisition, Investigation, Visualization, Methodology, Writing—original draft, Project administration, Writing—review and editing
    For correspondence
    mathieu.wolff@u-bordeaux.fr
    Competing interests
    No competing interests declared
    ORCID icon 0000-0003-3037-3038

Funding

Agence Nationale de la Recherche (ANR-14-CE13-0014)

  • Etienne Coutureau

Brain and Behavior Research Foundation (NARSAD Independent Investigator Grant #24702)

  • Mathieu Wolff

Labex (Brain LABEX PhD Extension Grant 2015)

  • Fabien Alcaraz

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

We thank Angélique Faugère and Yoan Salafranque for histological assistance and animal care. This work was supported by an Independent Investigator NARSAD grant #27402 to MW and a grant from the French agency for research ANR-14-CE13-0014 to EC. FA was supported by a BRAIN LabEx PhD extension grant. The microscopy was done in the Bordeaux Imaging Center, a service unit of the CNRS-INSERM and Bordeaux University, member of the national infrastructure France BioImaging, with help from Christel Poujol and Sébastien Marais.

Ethics

Animal experimentation: This study was performed in strict accordance with current French (Council directive 2013-118, February 1, 2013) and European (directive 2010-63, September 22, 2010, European Community) laws and policies regarding animal experiments. The experimental protocols received approval #5012053-A from the local Ethics Committee -(C2EA -50, Comité d'éthique pour l'Expérimentation Animale Bordeaux) on December 7, 2012.

Reviewing Editor

  1. Geoffrey Schoenbaum, National Institute on Drug Abuse, National Institutes of Health, United States

Publication history

  1. Received: October 5, 2017
  2. Accepted: January 12, 2018
  3. Version of Record published: February 6, 2018 (version 1)
  4. Version of Record updated: February 8, 2018 (version 2)

Copyright

© 2018, Alcaraz et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,983
    Page views
  • 275
    Downloads
  • 3
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)