The ability to flexibly use knowledge is one cardinal feature of goal-directed behaviors. We recently showed that thalamocortical and corticothalamic pathways connecting the medial prefrontal cortex and the mediodorsal thalamus (MD) contribute to adaptive decision-making (Alcaraz et al., 2018). In this study, we examined the impact of disconnecting the MD from its other main cortical target, the orbitofrontal cortex (OFC) in a task assessing outcome devaluation after initial instrumental training and after reversal of action-outcome contingencies. Crossed MD and OFC lesions did not impair instrumental performance. Using the same approach, we found however that disconnecting the OFC from its other main thalamic afferent, the submedius nucleus, produced a specific impairment in adaptive responding following action-outcome reversal. Altogether, this suggests that multiple thalamocortical circuits may act synergistically to achieve behaviorally relevant functions.https://doi.org/10.7554/eLife.46187.001
In dynamic environments, the ability to engage in adaptive behaviors is essential to meet basic needs and desires. This sometimes requires updating the current understanding of rules governing events or actions. A large literature has documented that such goal-directed behaviors rely on highly flexible cognitive processes that appear to be largely supported by prefrontal cortical areas (O'Doherty et al., 2017; Parkes and Coutureau, 2018). However, mounting experimental evidence indicates that functional interactions between prefrontal and subcortical areas are essential to support these functions (Balleine and O'Doherty, 2010; Verschure et al., 2014). In particular, the contribution of several thalamic nuclei is now better understood and acknowledged in current views on the functioning of thalamocortical circuits (Pergola et al., 2018; Rikhye et al., 2018b; Wolff and Vann, 2019).
Recently, the interactions between the medial prefrontal cortex (mPFC) and the mediodorsal thalamus (MD) have been highlighted in multiple behavioral setups and through various causal interventions (Bradfield et al., 2013; Browning et al., 2015; Bolkan et al., 2017; Schmitt et al., 2017; Alcaraz et al., 2018; Marton et al., 2018; Rikhye et al., 2018a). For instance, we recently reported that thalamocortical and corticothalamic pathways connecting the mPFC and the MD support complementary but dissociable aspects of goal-directed behavior (Alcaraz et al., 2018). However, the mPFC is not the only cortical recipient of MD projections as the orbitofrontal cortex (OFC) is also one of its main cortical targets (Groenewegen, 1988; Alcaraz et al., 2016). Interestingly, while the OFC has long been associated with stimulus-outcome predictions (Balleine et al., 2011; McDannald et al., 2014; Stalnaker et al., 2014), its contribution to flexible goal-directed responding has now also been highlighted (Gremel and Costa, 2013; Bradfield et al., 2015; Bradfield et al., 2018; Panayi and Killcross, 2018; Parkes et al., 2018). Beyond MD afferents, the OFC is also the recipient of a massive projection from the submedius thalamus (Sub) (Yoshida et al., 1992; Alcaraz et al., 2015; Kuramoto et al., 2017). While this region is currently poorly understood, it was recently demonstrated to support the updating of stimulus-outcome associations, in a manner that was highly reminiscent of OFC functions (Alcaraz et al., 2015). This suggests that a complex interplay occurs between the OFC and its thalamic afferents when flexible responding is essential.
Thus, in the present study, we aimed to build upon our previous study (Alcaraz et al., 2018) and to disentangle the functional interactions of the OFC with its two main thalamic afferents. To this end, we used an instrumental training paradigm allowing to test for initial action-outcome encoding but also its reversal (Parkes et al., 2018). The latter forces rats to engage in flexible responding. We then assessed the impact of disconnecting the OFC either from its MD afferent (Experiment 1) or from its Sub afferent (Experiment 2). We found that disconnecting the OFC from the MD left instrumental performance unaltered throughout all phases of testing. Interestingly, disconnecting the OFC from the Sub produced a marked and selective impairment in goal-directed responding following action-outcome reversal. The current set of results, together with prior evidence (Alcaraz et al., 2018), indicate that distinct thalamocortical circuits may act in parallel as a function of current behavioral demand.
To disconnect the OFC from its two main thalamic afferents, we performed unilateral lesions of either the OFC and the MD, or the OFC and the Sub. Critically, lesions were made in the same hemisphere (IPSI) or contralateral hemispheres (CONTRA). It is expected that in the CONTRA groups, communication between cortical and thalamic areas should be interrupted while communication remains possible in the IPSI groups. All lesion placements were counterbalanced across hemispheres. To include rats in the behavioral analyses, lesions performed at both the cortical and the thalamic levels needed to be accurate. We used the same criteria for IPSI and CONTRA groups, thus ensuring that the overall cortical and thalamic damage was equivalent in these groups.
The cortical lesions were highly similar to our previous work (Alcaraz et al., 2015). In general, the lesions targeted mostly the lateral (LO) and the ventral (VO) parts of the orbitofrontal cortex and in most cases the medial orbital region (MO) was intact. A representative example is shown in Figure 1A and the extent of the lesions given in Figure 1D. Some rats were excluded because cortical lesions were too small or too dorsal, leaving a substantial portion of the LO and VO intact.
NMDA injections within the MD produced variable damage (Figure 1B and E). Inclusions harbored significant damage encompassing the three segments of the MD and, in many instances, the adjacent intralaminar nuclei (mostly the paracentral and the centrolateral nuclei). Some of the rats however had only minimal thalamic damage because the lesions were too dorsal and were therefore excluded. As the lesions were unilateral, we generally succeeded in preserving midline thalamic nuclei and even the paraventricular nucleus was intact for included rats, which is difficult to achieve with bilateral lesions. The habenula however was damaged in many cases. Sub lesions (Figure 1C and F) were largely in line with previous work (Alcaraz et al., 2015) and highly specific. The reuniens/rhomboïd complex was unaffected in the majority of included rats while a few displayed moderate damage at this level. Importantly, thalamic damage did not overlap between the MD and the Sub for any of the included rats.
After histological examination by two experimenters blind to behavioral data, a total of 15 rats out of 48 were excluded from behavioral analyses (5 OFC-MD-IPSI, 5 OFC-MD-CONTRA, 3 OFC-Sub-IPSI and 2 OFC-Sub-CONTRA). The final group sizes for the OFC-MD groups were: 7 IPSI and 7 CONTRA rats and for the OFC-Sub groups: 9 IPSI and 10 CONTRA rats.
A schematic for the behavioral paradigm is illustrated in Figure 2, showing the successive phases consisting in instrumental training, outcome devaluation 1, outcome reversal training and outcome devaluation 2. The same paradigm was used in experiments 1 and 2.
Instrumental conditioning gradually developed over the 10 consecutive training sessions. As evident in Figure 3A, learning appeared to establish at similar rates in IPSI and CONTRA groups even though asymptotic performance appeared somewhat lower in the CONTRA group. The statistical analysis supported these observations as the ANOVA revealed a significant effect of Session (F(9,108) = 69.5, p<0.0001) but the effect of Lesion only approached significance (F(1,12) = 3.5, p=0.0853). Importantly, Lesion and Session did not interact (F(9,108) = 1.0, p=0.4215), suggesting a similar rate of instrumental acquisition in both groups.
Figure 3b shows the test conducted under extinction conditions, immediately after outcome devaluation. During sensory-specific satiety, rats from the IPSI group consumed more food than rats from the CONTRA group (IPSI: 9.9 ± 3.8 g, CONTRA: 5.5 ± 2.6 g, F(1,12) = 6.5, p=0.0254). Nevertheless, during the subsequent choice test, both groups expressed a marked preference for the lever associated with the still valued outcome (F(1,12) = 37.8, p<0.0001). This preference did not vary as a function of lesion status as neither the main effect of Lesion (F(1,12) = 1.9, p=0.1957) nor the Lesion X Devaluation interaction (F(1,12) = 0.0, p=0.9530) approached significance. The post-test consumption assays shown in Figure 3C confirmed that devaluation was effective for both groups and that rats in the IPSI and CONTRA groups consumed a similar amount for both the valued and the devalued outcomes (Devaluation, F(1,12) = 16.5, p=0.0016; Lesion, Lesion X Devaluation, F < 1).
After the devaluation test, rats were given five RR20 instrumental training sessions, during which the action-outcome contingencies were reversed. During that phase, all rats maintained a high level of instrumental responding, comparable to that attained at the end of the initial instrumental phase (Figure 3D). Responding slightly increased during these five sessions, in a comparable manner across all groups. The ANOVA revealed a significant effect of Session (F(14, 48)=8.1, p<0.0001) but not of Lesion (F < 1) or Lesion X Session interaction (F(4,68) = 1.1, p=0.3781). Thus, during reversal training, there was no evidence that instrumental responding differed between groups.
During outcome devaluation, all rats consumed similar amounts of food (IPSI: 11.0 ± 2.4 g, CONTRA: 9.2 ± 3.0 g, F(1,12) = 1.6, p=0.2361). During the choice test conducted under extinction conditions (Figure 3E), all rats expressed a marked preference for the action leading to the still valued outcome, showing that they succeeded in updating action-outcome contingencies. These observations were supported by statistical analyses. We found a main effect of Devaluation (F(1,12) = 11.9, p=0.0048) but no effect or Lesion or Lesion X Devaluation interaction (Fs <1). Similar conclusions arose from the analysis of post-test consumption assays (Figure 3F), confirming the efficacy of the devaluation procedure in both the IPSI and the CONTRA groups (Devaluation, F(1,12) = 86.7, p<0.0001; Devaluation X Lesion, F < 1).
Collectively, these data show that disconnecting the OFC from the MD did not alter instrumental conditioning or the ability to update action-outcome contingencies. As the OFC receives a prominent innervation from another thalamic source, the submedius thalamic nucleus, which was previously suggested to interact with the OFC (Alcaraz et al., 2015), we used a separate cohort of rats to examine the effect of disconnecting the OFC from the Sub in the same instrumental procedure.
As in Experiment 1, instrumental conditioning gradually developed over the 10 consecutive training sessions for both groups (Figure 4A). In this instance, instrumental performance appeared similar for both groups. The ANOVA conducted on these data support this observation as we found a significant effect of Session (F(9,153) = 74.1, p<0.0001) but not Lesion or Lesion X Session interaction (Fs <1).
Figure 4b shows the test conducted under extinction conditions, immediately after outcome devaluation. During sensory-specific outcome devaluation, all rats consumed similar amounts of food (IPSI: 5.8 ± 2.0 g, CONTRA: 7.9 ± 5.2 g, F(1,17) = 1.4, p=0.2528). As observed previously, both groups expressed a marked preference for the lever associated with the still valued outcome (F(1,17) = 29.2, p<0.0001.). Again, this preference did not vary as a function of lesion group as neither the main effect of Lesion nor that of the Lesion X Devaluation interaction were significant (Fs <1). The post-test consumption assays shown in Figure 4C confirmed that devaluation was effective for both groups and that rats in the IPSI and CONTRA groups consumed similar amounts of both the valued and the devalued outcomes (Devaluation, F(1,17) = 39.5, p<0.0001; Lesion, F(1,17) = 1.8, p=0.1884; Lesion X Devaluation, F(1,17) = 2.2, p=0.1527).
As in experiment 1, rats then received five RR20 instrumental training sessions, during which the action-outcome contingencies were reversed. During that phase, all rats maintained a high level of instrumental responding (Figure 4D) that did not differ between groups (Lesion, F < 1). Instrumental performance appeared fairly stable across sessions even if small daily variations likely prompted the main effect of Session (F(4,68) = 2.4, p=0.0609) and the Lesion X Session interaction (F(4,68) = 2.3, p=0.0643) to approach significance.
Figure 4E shows the results from the choice test following contingency reversal. During sensory-specific outcome devaluation, all rats consumed similar amounts of food (IPSI: 9.5 ± 3.1 g, CONTRA: 11.3 ± 3.6 g, F(1,17) = 1.3, p=0.2737). Rats from the CONTRA group appear to respond similarly for both outcomes while the IPSI group expressed a clear preference for the action earning the still valued outcome. Indeed, we found a main effect of Devaluation (F(1,17) = 10.2, p=0.0054) and a marginal effect of Lesion (F(1,17) = 4.0, p=0.0623) but also, critically, a significant Lesion X Devaluation interaction (F(1,17) = 5.6, p=0.0296). Additional analyses confirmed that Devaluation reached significance for the IPSI (F(1,8) = 11.3, p=0.0099) but not the CONTRA group (F < 1) indicating that rats from the CONTRA group were unable to display adaptive responding.
Results from the post-test consumption assay confirmed that the devaluation procedure was effective for both groups (Figure 4F). We found an overall effect of Devaluation (F(1,17) = 163.0, p<0.0001) and also an effect of Lesion (F(1,17) = 8.8, p=0.0085) and a significant Lesion X Devaluation interaction (F(1,17) = 4.6, p=0.0239). Nevertheless, simple effect contrasts conducted on the interaction confirmed that both groups consumed significantly more of the valued outcome than the devalued outcome (Devaluation, IPSI: F(1.8) = 53, p<0.0001), CONTRA: F(1,9) = 117.3, p<0.0001).
In summary, the behavior displayed by rats from the CONTRA group during the final test after reversal was in clear contrast with that exhibited during the prior choice test, suggesting an inability to update action-outcome contingency.
In a recent study, we showed that functional interactions between the mediodorsal thalamus (MD) and the medial prefrontal cortex are essential for goal-directed decision-making (Alcaraz et al., 2018). As the MD also connects with the orbitofrontal cortex (OFC), a region now known to play a role in goal-directed behavior (Parkes et al., 2018), the present study primarily aimed to extend our prior findings by focusing on MD-OFC interactions. We found that disconnecting the OFC from the MD produced no detectable impairment, even after reversal of action-outcome contingencies. This prompted us to examine whether thalamocortical interactions between the OFC and its other main thalamic afferent, the submedius thalamic nucleus (Sub), could support performance in this situation. The results clearly indicated a specific role for the OFC-Sub circuit in guiding choice after action-outcome contingency reversal. Taken as a whole, the data suggest an important role for this thalamocortical circuit in goal-directed behavior and further underscore the functional importance of a still poorly known thalamic region.
The MD is now acknowledged as a major partner of the prefrontal cortex for cognition, from rodent to primate, including humans (Bradfield et al., 2013; Browning et al., 2015; Wolff et al., 2015; Parnaudeau et al., 2018; Pergola et al., 2018; Rikhye et al., 2018b; Wolff and Vann, 2019). However, to date, most experimental studies have examined its functional interactions with the medial prefrontal cortex (Bolkan et al., 2017; Schmitt et al., 2017; Alcaraz et al., 2018; Marton et al., 2018; Rikhye et al., 2018a), thus neglecting the potential importance of its abundant connections with the OFC (Alcaraz et al., 2016; Murphy and Deutch, 2018). The present study does not support the idea that these interactions are important for goal-directed behavior, even in a setup known to engage OFC functions (Parkes et al., 2018). The functional relevance of the OFC-MD connections has been documented in other settings as both MD and OFC neurons were found to code for task-relevant elements in an odor-discrimination task (Courtiol and Wilson, 2016); however, this feature was not affected by inhibition of the corticothalamic pathway (Courtiol et al., 2019). In the primate, disconnecting the MD from the ventrolateral and orbital cortex impairs adaptive decision-making after selective outcome devaluation (Browning et al., 2015). By contrast, we observed the same effect after disconnecting the MD from the medial prefrontal cortex (Alcaraz et al., 2018) but not the OFC (present data). This certainly calls for further work to document the functional equivalence of prefrontal territories in rodent and primate (Laubach et al., 2018), especially when the functional parcellation of the OFC is now becoming clearer (Izquierdo, 2017; Panayi and Killcross, 2018). Thus, more work is needed to identify the functions that may be supported by connections between the OFC and the MD. An attractive prospect could be to rely on situations where choice is guided by stimulus-outcome associations, as previously suggested (Ostlund and Balleine, 2008; Wolff et al., 2015; Alcaraz et al., 2016).
Disconnecting the OFC from the Sub did not impair the ability to learn action-outcome associations and to use this information to guide choice. However, when action-outcome associations were reversed, rats with contralateral lesions of the OFC-Sub pathway were unable to integrate this new information to guide subsequent choice between the two actions. This does not necessarily imply that these animals were unable to notice these changes during the reversal phase. If that was the case they would show a devaluation effect consistent with the original contingencies, which was not observed during that test. Instead, reduced responding was observed for both actions, suggesting that these rats could not accurately segregate multiple action-outcome contingencies. It is also possible that reversal learning was incomplete. Indeed, the current study did not assess whether contralateral lesioned rats would learn the reversed contingencies with additional training but even if that was the case, it is clear that reversal learning is at least slowed down by OFC-Sub disconnection. Both hypotheses therefore point to a specific deficit in updating contingency information rather than an overall deficit in learning the relationship between actions and their outcomes. This is consistent with a prior study indicating a deficit in the ability to update stimulus-outcome contingencies in Sub lesioned rats (Alcaraz et al., 2015). Overall, this suggests that connections between the OFC and the Sub are critical to distinguish between learning experiences or task states (Wilson et al., 2014). It is also consistent with the emerging view that the cognitive thalamus critically assists the cortex to ensure that mental representations or cognitive maps are still relevant (Wolff and Vann, 2019).
A limitation of the present approach however lies in the lesion disconnection procedure. While targeting projection-defined cortical and thalamic neurons previously enabled us to differentiate the functional contribution of thalamocortical and corticothalamic pathways (Bolkan et al., 2017; Alcaraz et al., 2018), it is not possible with this lesion approach to assign specific functions to OFC-to-Sub and Sub-to-OFC pathways. Moreover, while thalamocortical projections are known to be almost exclusively ipsilateral, returning corticothalamic pathways cross to the other hemisphere (Négyessy et al., 1998). Thus, the latter would not be compromised with the present approach. Keeping this in mind, we cannot formally exclude that targeting more efficiently both thalamocortical and corticothalamic pathways would produce different results when addressing the OFC-MD circuit. By contrast, this further clarifies the involvement of the OFC-Sub circuit, as the remaining contralateral corticothalamic pathways were not sufficient to support the updating of instrumental contingencies.
Together with prior evidence concerning the MD (Alcaraz et al., 2018) and the Sub (Alcaraz et al., 2015), the present set of data clarify the nature of the functional interactions between cortical and thalamic areas. First, distinct thalamocortical circuits appear to support instrumental performance, depending on whether updating of the associative structure of the task is required: after initial instrumental training, functional interactions between the mPFC and the MD but not the OFC and the Sub are necessary to guide choice based on current goal value but the exact opposite pattern is found after action-outcome reversal. In other words, there is striking double dissociation regarding the functional involvement of the mPFC-MD and the OFC-Sub circuits in instrumental performance. This major result fits well with novel conceptions of the functioning of thalamocortical circuits, emphasizing parallel and synergetic information processing to achieve cognitively relevant functions (Rikhye et al., 2018b; Wolff and Vann, 2019). Second, the present data are in agreement with a broader conception of OFC functions but they also highlight the contribution of a poorly understood thalamic region. The Sub indeed appears to be important to update not only stimulus-outcome but also action-outcome associations, consistent with the idea that thalamic nuclei may help to shape mental representations (Wolff and Vann, 2019). Finally, it would be very useful to document the functions that may be supported by OFC-MD connections. As thalamic afferents from both the Sub and the MD terminate within the same OFC loci (Alcaraz et al., 2015; Alcaraz et al., 2016), this would provide an ideal venue to investigate the functional relevance of convergence in cognition (Man et al., 2013).
In conclusion, the present study complements prior findings to broaden our understanding of the functioning of thalamocortical circuits. Rather than acting as isolated loops assigned to specific functions, they would be better described as supporting each other in the face of changing circumstances. An influential view posits that cortical areas may communicate directly but also indirectly through the thalamus (Sherman, 2016; Sherman, 2018; Usrey and Sherman, 2019). One consequence of this functional organization is that convergent and divergent thalamocortical and corticothalamic pathways may help to recruit the currently relevant circuit and/or to gate the most meaningful inputs (Wolff and Vann, 2019). Thus, ensuring the accuracy of mental representations over time, and over changing circumstances, likely necessitates the cooperation of multiple circuits. As noted previously, this suggests cautious use of causality concepts to interpret data in contemporary neurosciences (Yoshihara and Yoshihara, 2018). Thalamic research has now entered a new era and we can expect in the coming years a more systematic assessment of thalamocortical circuits that should provide important insights to better apprehend mental conditions best described as connectivity disorders such as schizophrenia (Anticevic et al., 2014), obsessive-compulsive behaviors (Greenberg et al., 2010; Monteiro and Feng, 2016) or addiction (Balleine et al., 2015).
48 male Long Evans rats weighing 275 g to 300 g at surgery were obtained from Centre d’Elevage Janvier (France). Rats were initially housed in pairs and accustomed to the laboratory facility for two weeks before the beginning of the experiments. Environmental enrichment was provided by tinted polycarbonate tubing elements, in accordance with current French (Council directive 2013–118, February 1, 2013) and European (directive 2010–63, September 22, 2010, European Community) laws and policies regarding animal experiments. The facility was maintained at 21 ± 1°C with lights on from 7 a.m. to 7 p.m. The experimental protocols received approval #5012053-A from the local Ethics Committee on December 7, 2012. After histological verification (see below), the final group sizes were: IPSI n = 7, CONTRA n = 7 for the OFC-MD disconnection and IPSI n = 9, CONTRA n = 10 for the OFC-Sub disconnection.
Rats were anaesthetized with 4% Isoflurane and placed in a stereotaxic frame with atraumatic ear bars (Kopf, Tujunga, CA) in a flatskull position. Anaesthesia was maintained with 1.5–2% Isoflurane and complemented by subcutaneous administration of buprenorphin (Buprecare, 0.05 mg/kg). To disconnect the OFC from its two main thalamic afferents, unilateral neurotoxic lesions were performed in contralateral hemispheres using multiple NMDA micro-injections. 20 µg/µl NMDA (Sigma-Aldrich) in artificial cerebrospinal fluid (CMA Microdialysis AB, Solna, Sweden) was pressure injected (Picospritzer, General Valve Corporation, Fairfield, NJ) into the brain through a glass micropipette (outside diameter approximately 100 µm) and polyethylene tubing. For OFC lesions, three injections sites per side (0.1 µL each) were used: AP +4.2,+3.7, and +3.2 mm from bregma, laterality ±1.2, 2.2 and 3.0 mm, DV −4.4,–4.5 and −5.2 mm from Bregma. To perform neurotoxic MD lesions, one injection site (0.15 µL) was used at the following coordinates: AP: −2.7; Lat:±0.8; DV: −5.0 (from dura). Neurotoxic Sub lesions were made using the same procedure, with one injection site per side (0.06 µL) at the following coordinates: AP: −2.7; L:±0.7; DV: −7.1 (from dura). In all cases, the pipette was left in place 3 min after injection before slow retraction. For CONTRA groups, cortical and thalamic lesions were performed on different hemispheres while lesions were made in the same hemisphere for the IPSI groups. Importantly, the extent of cortical and thalamic damage is expected to be similar in both conditions. The lateralization of the lesions was counterbalanced across all groups and conditions (for example, in the OFC-MD-CONTRA group, half the animals had the cortical lesions in the left side and half on the right side). Rats were given at least 10 days of recovery before behavioral testing.
Animals were trained in eight identical conditioning chambers (40 cm wide x 30 cm deep x 35 cm high, Imetronic, France), each located inside a sound and light-attenuating wooden chamber (74 × 46×50 cm). Each chamber had a ventilation fan producing a background noise of 55 dB and four LEDs on the ceiling for illumination of chamber. Each chamber had two opaque panels on the right and left sides, two clear Perspex walls on the back and front sides and a stainless-steel grid floor (rod diameter: 0.5 cm; inter-rod distance: 1.5 cm). In the middle of the left wall, a magazine (6 × 4.5×4.5 cm) received either grain or sucrose pellets (45 mg, F0165 and F0023, Bio Serv, NJ, USA) from dispensers located outside the operant chamber. The magazine was equipped with infra-red cells to detect the animal’s visits. Two retractable levers (4 × 1×2 cm) could be inserted on the left and right of the magazine. Activation of either the left or the right lever produced the delivery of the associated outcome, as a function of the current procedure (i.e. FR1, RR5, RR10 or RR20, see below). A computer connected to the operant chambers and equipped with POLY software and interface (Imetronic, France) controlled the equipment and recorded the data.
Rats were first habituated to the magazine dispenser through two daily sessions of magazine training for 2 days. A session consisted in the delivery of 30 food rewards, grain or sucrose pellets, distributed randomly throughout a 30 min session. The first session took place in the morning, and the second in the afternoon, with the order of rewards counterbalanced between rats and days. After magazine training, instrumental conditioning began for a total of ten daily sessions, during which rats had to learn the specific, causal association between two responses (left or right lever presses) and the two different outcomes (grain or sucrose pellets). For each daily instrumental session, each lever was presented twice until 10 min elapsed or 20 rewards were earned in an alternating fashion (e.g., lever 1 → lever 2 → lever 1 → lever 2). Thus, the session could last up to 40 min and the rats could earn a maximum of 80 rewards. The action-outcome associations and the order of their presentations were counterbalanced between rats and days. For the two first sessions, each action was reinforced (FR1). Then, for the next two sessions, a random ratio schedule of 5 was introduced (probability of receiving an outcome given a response = 0.2). Sessions 5 to 7 were performed with a RR10 schedule (probability of receiving an outcome given a response = 0.1) and sessions 8 to 10 with a RR20 (probability of receiving an outcome given a response = 0.05). The last instrumental session with each action was used as a measure of baseline performance for the devaluation test. The entire behavioral procedure is depicted in Figure 2.
The day after the last session of instrumental training, rats were placed in a plastic feeding cage containing free access to 15 g of one of the two outcomes for one hour of specific satiety-induced devaluation. Half of the rats in each action-outcome assignment received grain pellets, the remaining received sucrose pellets. Immediately after, rats were put in the operant cages for a 10 min extinction test. During the test, both actions were available but were unrewarded. This ensured that rats were using representations of the action-outcome contingencies and outcome value to guide their behavior. Performance was quantified relative to prior baseline levels as our previous study (Alcaraz et al., 2018). After one test, a second test was conducted during which the identity of the devalued outcome was now reversed so that all rats were tested after devaluation of either outcome.
After the extinction test, rats were returned to the plastic feeding cages. They had free access to 10 g of each outcome for 10 min. Food consumed was then measured for each outcome to verify that the devaluation procedure was effective.
The procedure used was adapted from a previous study (Parkes et al., 2018). Following outcome devaluation testing, the same rats then underwent reversal training such that they were required to learn the reversed instrumental contingencies (e.g., the left lever now earned sugar pellets rather than grain pellets and the right lever now earned grain pellets rather than sugar pellets). Reversal training sessions were otherwise identical to initial instrumental training. Rats received five reversal sessions in total on an RR20 schedule of reinforcement. Outcome devaluation tests were conducted after reversal training in the same manner as that previously described. Consumption tests were also conducted after each instrumental test, as previously described.
Animals received a lethal dose of sodium pentobarbital and were perfused transcardially with 150 ml of saline followed by 200 ml of 10% formalin. The sections throughout cortical and thalamic regions of interest were collected onto gelatin-coated slides and dried before being stained with thionine. Histological analysis of the lesions was performed under the microscope by two experimenters (VF and MW) blind to lesion conditions.
The data were submitted to ANOVAs on StatView software (SAS Institute Inc). For both experiments, Lesion (IPSI/CONTRA) was the between subject factor, and Devaluation (Devalued/Valued), and Session (averaged over both actions) were repeated measures when appropriate. The alpha value for rejection of the null hypothesis was 0.05.
Flexible use of predictive cues beyond the orbitofrontal cortex: role of the submedius thalamic nucleusJournal of Neuroscience 35:13183–13193.https://doi.org/10.1523/JNEUROSCI.1237-15.2015
Parallel inputs from the mediodorsal thalamus to the prefrontal cortex in the ratEuropean Journal of Neuroscience 44:1972–1986.https://doi.org/10.1111/ejn.13316
The orbitofrontal cortex, predicted value, and choiceAnnals of the New York Academy of Sciences 1239:43–50.https://doi.org/10.1111/j.1749-6632.2011.06270.x
Thalamic projections sustain prefrontal activity during working memory maintenanceNature Neuroscience 20:987–996.https://doi.org/10.1038/nn.4568
The role of the anterior, Mediodorsal, and parafascicular thalamus in instrumental conditioningFrontiers in Systems Neuroscience 7:51.https://doi.org/10.3389/fnsys.2013.00051
Inferring action-dependent outcome representations depends on anterior but not posterior medial orbitofrontal cortexNeurobiology of Learning and Memory 155:463–473.https://doi.org/10.1016/j.nlm.2018.09.008
A specific olfactory cortico-thalamic pathway contributing to sampling performance during odor reversal learningBrain Structure and Function 224:961–971.https://doi.org/10.1007/s00429-018-1807-x
Neural representation of Odor-Guided behavior in the rat olfactory thalamusThe Journal of Neuroscience 36:5946–5960.https://doi.org/10.1523/JNEUROSCI.0533-16.2016
Invasive circuitry-based neurotherapeutics: stereotactic ablation and deep brain stimulation for OCDNeuropsychopharmacology 35:317–336.https://doi.org/10.1038/npp.2009.128
Functional heterogeneity within rat orbitofrontal cortex in reward learning and decision makingThe Journal of Neuroscience 37:10529–10540.https://doi.org/10.1523/JNEUROSCI.1678-17.2017
Neural convergence and divergence in the mammalian cerebral cortex: from experimental neuroanatomy to functional neuroimagingJournal of Comparative Neurology 521:4097–4111.https://doi.org/10.1002/cne.23408
Roles of prefrontal cortex and mediodorsal thalamus in task engagement and behavioral flexibilityThe Journal of Neuroscience 38:2569–2578.https://doi.org/10.1523/JNEUROSCI.1728-17.2018
Learning theory: a driving force in understanding orbitofrontal functionNeurobiology of Learning and Memory 108:22–27.https://doi.org/10.1016/j.nlm.2013.06.003
Learning from animal models of Obsessive-Compulsive disorderBiological Psychiatry 79:7–16.https://doi.org/10.1016/j.biopsych.2015.04.020
Organization of afferents to the orbitofrontal cortex in the ratJournal of Comparative Neurology 526:1498–1526.https://doi.org/10.1002/cne.24424
Learning, reward, and decision makingAnnual Review of Psychology 68:73–100.https://doi.org/10.1146/annurev-psych-010416-044216
Cortical determinants of Goal-Directed actionIn: R. G Morris, A Bornsteain, A Shenhav, editors. Goal-Directed Decision Making (First Edition). Elsevier. pp. 179–197.https://doi.org/10.1016/b978-0-12-812098-9.00008-5
The regulatory role of the human mediodorsal thalamusTrends in Cognitive Sciences 22:1011–1025.https://doi.org/10.1016/j.tics.2018.08.006
Toward an integrative theory of thalamic functionAnnual Review of Neuroscience 41:163–183.https://doi.org/10.1146/annurev-neuro-080317-062144
Thalamus plays a central role in ongoing cortical functioningNature Neuroscience 19:533–541.https://doi.org/10.1038/nn.4269
My prolonged collaboration with ray guilleryEuropean Journal of Neuroscience, 10.1111/ejn.13903, 29520891.
Orbitofrontal neurons infer the value and identity of predicted outcomesNature Communications 5:3926.https://doi.org/10.1038/ncomms4926
Corticofugal circuits: communication lines from the cortex to the rest of the brainJournal of Comparative Neurology 527:640–650.https://doi.org/10.1002/cne.24423
The why, what, where, when and how of goal-directed choice: neuronal and computational principlesPhilosophical Transactions of the Royal Society B: Biological Sciences 369:20130483.https://doi.org/10.1098/rstb.2013.0483
Functional heterogeneity of the limbic thalamus: from hippocampal to cortical functionsNeuroscience & Biobehavioral Reviews 54:120–130.https://doi.org/10.1016/j.neubiorev.2014.11.011
The cognitive thalamus as a gateway to mental representationsThe Journal of Neuroscience 39:3–14.https://doi.org/10.1523/JNEUROSCI.0479-18.2018
The afferent and efferent connections of the nucleus submedius in the ratThe Journal of Comparative Neurology 324:115–133.https://doi.org/10.1002/cne.903240109
Timothy E BehrensSenior Editor; University of Oxford, United Kingdom
Geoffrey SchoenbaumReviewing Editor; National Institute on Drug Abuse, National Institutes of Health, United States
Geoffrey SchoenbaumReviewer; National Institute on Drug Abuse, National Institutes of Health, United States
In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.
Thank you for submitting your article "A thalamocortical circuit for flexible goal-directed action" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Geoffrey Schoenbaum as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Timothy Behrens as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Laura Corbit (Reviewer #3).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
This is an exceptional study that explores the role of OFC-thalamic circuits on the acquisition and use of flexible associative representations, using an instrumental reinforcer devaluation task. The authors show convincingly that a particular circuit plays a critical role in how new information is integrated and segregated with old information. The results provide novel insight into the neural circuits controlling behavior.
The reviewers agreed unanimously that the study was excellent. Although each had suggestions and questions, they also agreed that none were essential to publication of the results. Thus while we hope the reviews are valuable and can be used to improve the paper, none of the requests must be addressed in any particular way or even at all for acceptance.
In the current study, the authors tested the effects of disconnecting the rat OFC from either the MD or submedius thalamic nucleus on changes in instrumental responding after sensory-specific outcome devaluation. They report that neither circuit was necessary for instrumental learning, sensory-specific devaluation, or for the normal, specific changes in instrumental responding caused by devaluation; however the OFC-sub circuit was necessary for maintaining this specificity after reversal of the action-outcome associations. That is, while ipsilateral lesioned controls reduced responding on the lever most recently associated with the devalued outcome, rats with OFC-sub disconnection reduced responding on both levers equally.
Overall I thought this was an exceptional study. The experiments were well designed, the results were analyzed and presented very clearly, and the findings were compelling, providing important new information about the neural networks mediating our ability to properly control behavior. I particularly like the use of the lesion approach, as I think it provides a ground truth that is less subject to uncertainties regarding expression, effects of rebound, and other issues. True there may be compensation and the effects may not reflect direct projections, but knowing the precise wiring is overemphasized in my opinion, and some of the feared "compensation" reflects the recovery of the downstream areas from the sudden loss of carrier signals from the lesioned areas. I think allowing this recovery to occur lessens the risk that the positive results may be "acute off-target effects" to coin a phrase from a paper on the subject. That makes the findings here reliable I think, which I like.
Indeed, I only have one concern, which is in how the authors characterize the effects as reflecting impaired updating of the action-outcome associations. I was not completely clear what was meant by this? But a simple interpretation is that without updating, the rats just maintained the original action-outcome mapping. Yet if this were the case, then they would have shown a devaluation effect, just in the wrong direction, and this is not really what they did. Instead the contra-lesioned rats seemed to reduce responding on both levers. This is as if they mixed up the two learning episodes or failed to update properly perhaps. Unless the authors disagree with me, I think they should clarify what they mean by updating in the discussion. And I think they should highlight this aspect of the results much more. Otherwise they are "dumbing down" the elegance of their behavior and the uniqueness of the finding. It will be lumped it in with the now-vast literature on goal-directed behavior and devaluation, which study only the initial learning. I think in some regards what is being studied here is quite different. Indeed, you might imagine the function that is lost here would be orthogonal to the type of information being represented? In any event, I think the authors are doing their results a disservice by not making this clearer.
This study by Fresno et al. applies a circuit disconnection to investigate whether orbitofrontal cortex (OFC) connections to the thalamus are required for goal-directed action selection, assayed using the reward devaluation paradigm. Disconnecting the OFC and mediodorsal thalamus (MD) with contralateral neurotoxic lesions had no effect on task performance, either before or after action-outcome contingency reversal. Disconnecting the OFC from the submedius nucleus (Sub) spared rats' ability to select actions based on expected reward value after initial action-outcome learning but disrupted their ability to do so following contingency reversal, which is consistent with the behavioral deficits produced by bilateral OFC inactivation, as previously shown by this group.
This is a well conceived and executed study. The scope of this study is relatively narrow but addresses an important question. The design is generally solid, though the specific disconnection approach used has some limitations. The results are clear and support the authors' conclusions. The manuscript is well-written and does a good job of acknowledging limitations of the study. My only reservations are with the overall significance and scope of the study.
The study provides compelling evidence that the OFC and Sub interact to support reward devaluation after contingency reversal, and I tend to agree this is an important aspect of behavioral control. But the authors could do more to justify the significance of the reversal component of the task and elaborate on why they believe it picks up something important that the standard reward devaluation task does not. This should also be addressed when discussing the findings, perhaps in terms of clinical significance. As it stands, the authors seem to take the significance of this task for granted and refer to it as "flexible" goal-directed action. This makes some sense but is also somewhat confusing since goal-directed actions are inherently flexible. It also fails to make clear whether this is a fundamental feature of goal-directed behavior or why it is specifically interesting.
As noted, there are some limitations to the circuit disconnection manipulation, which the authors do a good job of acknowledging. Because OFC projections to thalamus are bilateral, asymmetrical lesions allow for compensation via crossed corticothalamic circuitry. Null effects in the OFC-MD disconnection group and in pre-reversal devaluation performance for the OFC-Sub group are therefore not conclusive. Moreover, this approach does not allow the authors to tease apart the specific contributions of corticothalamic and thalamocortical projections, which the authors' own recent work sets up as an important question. The use of permanent pre-training lesions also limits the scope of the study, since the current findings do not address whether this circuitry specifically contributes to acquisition (contingency learning or remapping) or action selection processes. As the authors note, more advanced techniques like opto- and chemogenetics can overcome these limitations. Importantly, the authors main finding that crossed OFC and Sub lesions disrupt post-reversal reward devaluation performance still stands, and does indicate that this effect is particularly dependent on ipsilateral OFC-Sub circuit. Therefore, the comments above speak to the significance and scope of the current study but do not undermine the authors' main conclusions.
This manuscript by Fresno et al. examines the functional role of thalamic inputs from the mediodorsal thalamus and submedius nucleus to the orbitofrontal cortex (OFC) in the acquisition and reversal of response-outcome (R-O) learning. Using a crossed lesion approach, they find that disconnection of MD and OFC leaves response-outcome learning and its reversal intact. However, functional disconnection of the submedius nucleus and OFC leaves initial learning intact; animals show a selective devaluation effect, but the same animals no longer show a selective devaluation effect once the original R-O contingencies have been reversed.
While the lesion approach is not as elegant for targeting specific pathways as recent work by this group with DREADDs, the apparently normal performance of all groups in the initial devaluation tests and specific deficits only after reversal points to a selective deficit and many typical concerns about lesions can be dismissed. The studies are well run and the manuscript is clearly written and so my overall view is that this tells us something new and important about the role of the submedius nucleus in response-outcome learning and contributes to understanding of complex thalamocortical circuits.
I have several comments for the authors to consider.
While the consumption tests provide compelling evidence that the devaluation treatment itself was effective, the consumption in the induction of devaluation (1 hour feeding period) is not reported. It would be good to know that consumption was equivalent in the two groups and whether there was a criterion for exclusion applied. It's difficult to imagine a scenario where consumption was equivalent in test 1 but differed in test 2 but based on the main effect of lesion (panel 4F) it would nonetheless be reassuring to know consumption was equivalent in the two groups.
I thought the authors could have gone deeper into what the deficit in the submedius-OFC disconnection group means. If these animals couldn't update at all, you might expect to see preferential responding for the devalued outcome. The indiscriminate responding suggests perhaps they are beginning to update, although slower than controls. I wondered whether we might see something interesting in individual data; e.g., this might address whether all rats respond similarly for the two outcomes or whether some have reversed effectively while others have not (but the means blur this). There's the possibility that individual differences could correspond to the extent or other variation in the lesions. Future studies (not needed here) could explore whether they're really unable to update at all vs do so but more slowly (e.g. would they reverse with more training?). Generally, the authors interpret their findings as a failure to update action-outcome contingencies which I think their data support. They might rethink the statement (opening sentence of Discussion paragraph three) that the lesions prevented goal-directed actions, since responding could be set on a goal, but based on an outdated R-O.https://doi.org/10.7554/eLife.46187.009
The reviewers agreed unanimously that the study was excellent. Although each had suggestions and questions, they also agreed that none were essential to publication of the results. Thus while we hope the reviews are valuable and can be used to improve the paper, none of the requests must be addressed in any particular way or even at all for acceptance.
First, we’d like to thank all referees for their positive and constructive comments on the initial draft of this manuscript. We found all their suggestions very helpful and they all pointed towards the same general issue with the manuscript. All three reviewers indeed agreed that the major finding of the study required further discussion. As such, we have included the following paragraph in the Discussion to clarify our interpretation of the results, the title was also modified to reflect this clarification.
“Disconnecting the OFC from the Sub did not impair the ability to learn action-outcome associations and to use this information to guide choice. However, when action-outcome associations were reversed, rats with contralateral lesions of the OFC-Sub pathway were unable to integrate this new information to guide subsequent choice between the two actions. This does not necessarily imply that these animals were unable to notice these changes during the reversal phase. If that was the case they would show a devaluation effect consistent with the original contingencies, which was not observed during that test. Instead, reduced responding was observed for both actions, suggesting that these rats could not accurately segregate multiple action-outcome contingencies. It is also possible that reversal learning was incomplete. Indeed, the current study did not assess whether contralateral lesioned rats would learn the reversed contingencies with additional training but even if that was the case, it is clear that reversal learning is at least slowed down by OFC-Sub disconnection. Both hypotheses therefore point to a specific deficit in updating contingency information rather than an overall deficit in learning the relationship between actions and their outcomes. This is consistent with a prior study indicating a deficit in the ability to update stimulus-outcome contingencies in Sub lesioned rats (Alcaraz et al.et al., 2015). Overall, this suggests that connections between the OFC and the Sub are critical to distinguish between learning experiences or task states (Wilson et al.et al., 2014). It is also consistent with the emerging view that the cognitive thalamus critically assists the cortex to ensure that mental representations or cognitive maps are still relevant (Wolff and Vann, 2019).”
[…] While the consumption tests provide compelling evidence that the devaluation treatment itself was effective, the consumption in the induction of devaluation (1 hour feeding period) is not reported. It would be good to know that consumption was equivalent in the two groups and whether there was a criterion for exclusion applied. It's difficult to imagine a scenario where consumption was equivalent in test 1 but differed in test 2 but based on the main effect of lesion (panel 4F) it would nonetheless be reassuring to know consumption was equivalent in the two groups.
The consumption during devaluation and the corresponding statistics have been added in subsections of Experiments 1 and 2 “Outcome Devaluation” and “Reversal”. The only difference found was for Experiment 1 (OFC-MD disconnection) with initial contingencies, as rats from the IPSI group consumed more food than rats from the CONTRA group. But they behave the same during the subsequent choice test. On all other instances, we did not find any difference between IPSI and CONTRA (both experiments).
Future studies (not needed here) could explore whether they're really unable to update at all vs do so but more slowly (e.g. would they reverse with more training?).
The following has been added to the Discussion section: “It is also possible that reversal learning was incomplete. Indeed, the current study did not assess whether contralateral lesioned rats would learn the reversed contingencies with additional training but even if that was the case, it is clear that reversal learning is at least slowed down by OFC-Sub disconnection.”
Generally, the authors interpret their findings as a failure to update action-outcome contingencies which I think their data support. They might rethink the statement (opening sentence of Discussion paragraph three) that the lesions prevented goal-directed actions, since responding could be set on a goal, but based on an outdated R-O.
This statement has been changed to: “Disconnecting the OFC from the Sub did not impair the ability to learn action-outcome associations and to use this information to guide choice. However, when action-outcome associations were reversed, rats with contralateral lesions of the OFC-Sub pathway were unable to integrate this new information to guide subsequent choice between the two actions.”https://doi.org/10.7554/eLife.46187.010
- Mathieu Wolff
- Mathieu Wolff
- Shauna L Parkes
- Etienne Coutureau
- Mathieu Wolff
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Animal experimentation: Environmental enrichment was provided in accordance with current French (Council directive 2013-118, February 1, 2013) and European (directive 2010-63, September 22, 2010, European Community) laws and policies regarding animal experiments. The experimental protocols received approval #5012053-A from the local Ethics Committee on December 7, 2012.
- Timothy E Behrens, University of Oxford, United Kingdom
- Geoffrey Schoenbaum, National Institute on Drug Abuse, National Institutes of Health, United States
- Geoffrey Schoenbaum, National Institute on Drug Abuse, National Institutes of Health, United States
- Laura Corbit
- Received: February 26, 2019
- Accepted: April 16, 2019
- Version of Record published: April 23, 2019 (version 1)
© 2019, Fresno et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.