Mediodorsal thalamus is required for discrete phases of goal-directed behavior in macaques

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Reward contingencies are dynamic: outcomes that were valued at one point may subsequently lose value. Action selection in the face of dynamic reward associations requires several cognitive processes: registering a change in value of the primary reinforcer, adjusting the value of secondary reinforcers to reflect the new value of the primary reinforcer, and guiding action selection to optimal choices. Flexible responding has been evaluated extensively using reinforcer devaluation tasks. Performance on this task relies upon amygdala, Areas 11 and 13 of orbitofrontal cortex (OFC), and mediodorsal thalamus (MD). Differential contributions of amygdala and Areas 11 and 13 of OFC to specific sub-processes have been established, but the role of MD in these sub-processes is unknown. Pharmacological inactivation of the macaque MD during specific phases of this task revealed that MD is required for reward valuation and action selection. This profile is unique, differing from both amygdala and subregions of the OFC.

https://doi.org/10.7554/eLife.37325.001

eLife digest

Most of us have experienced feeling full after a main course, only to discover that we somehow still have room for dessert. Eating a particular foodstuff to the point of satiety makes that item temporarily less appealing. This is an example of reward devaluation. We typically respond to this phenomenon by adjusting our behavior. We give up on the main course, for example, and turn our attention instead to dessert. This ability to adjust our actions based on changes in the value of their outcomes is a form of behavioral flexibility.

Several brain regions contribute to behavioral flexibility. These include the amygdala, parts of the orbitofrontal cortex, and the mediodorsal thalamus. Wicker et al. have now explored the role of the mediodorsal thalamus by temporarily inactivating it in monkeys performing a task involving reward devaluation. The monkeys learned to associate one set of objects with peanuts and another with fruit. They were then given unlimited access to either peanuts or fruit. Finally, they were offered a choice between the two sets of objects. Like people who opt for dessert rather than another helping of a main course, the monkeys that had received peanuts chose the objects associated with fruit, and vice versa.

Temporarily inactivating the mediodorsal thalamus prevented this change in behavior. This occurred if the inactivation took place while the monkeys had unlimited access to the reward, or if it took place while they were choosing between the two objects. The mediodorsal thalamus is thus required both to update the value of a reward and to select the best course of action. This is in contrast to the amygdala and the orbitofrontal cortex, which each support only one of these processes.

Impaired behavioral flexibility is a hallmark of neuropsychiatric disorders, including addiction. Understanding the brain networks that support flexible responding may help improve the treatment of such disorders. As therapies that involve electrically stimulating the brain become more common, knowing which regions to avoid will be just as important as identifying new targets.

https://doi.org/10.7554/eLife.37325.002

Introduction

In daily life, reward contingencies are often unstable; an action that once produced a valued outcome may over time become less desirable. The ability to shift responses away from the previously valued action-outcome pair is a hallmark example of behavioral flexibility. Flexible responding (i.e., adapting behavior to reflect new reward contingencies or reward values) has been evaluated extensively through the use of reinforcer devaluation tasks (Málková et al., 1997; Hatfield et al., 1996).

In these tasks, the value of secondary reinforcers or operanda (e.g., objects), and actions that were once favored decrease following an experimental reduction in the value of the associated primary reinforcer. Processes required for optimal performance in the reinforcer devaluation task include: [1] registering a change in value of the primary reinforcer, [2] integrating the new value of the primary reinforcer with the cognitive representation of the secondary reinforcers/operanda (e.g., objects that signal the reward), and [3] guiding action selection to target the now optimal choice.

In the standard version of this task used in macaques, animals are trained to associate sets of objects with one of two rewards (primary reinforcers; e.g., peanuts or fruit snacks). After the association between objects and particular rewards is established, subjects are presented with a forced choice between objects associated with each of the two foods. The proportion of choices between the objects rewarded with one type of food (e.g., peanuts) versus the other (e.g., fruit snacks) represents the baseline preference. An experimental reduction of reward value by selective satiation (i.e., providing one food to satiety) produces a devaluation effect, i.e., a decrease in the proportion of objects associated with the sated food that are selected.

These processes critically rely on an interactive network including the orbitofrontal cortex (OFC), the amygdala, and the mediodorsal thalamus (MD). The amygdala projects directly to OFC, and indirectly to the OFC via the MD (Timbie and Barbas, 2015). MD is reciprocally interconnected with two critical subregions of the OFC, Areas 11 and 13 (Ray and Price, 1993). Lesions to each of these nodes impair the typical shift away from the objects that predict the sated food (Málková et al., 1997; Izquierdo et al., 2004; Izquierdo and Murray, 2010; Browning et al., 2015; Pickens, 2008; Mitchell et al., 2007). Moreover, crossed lesions of any two nodes of this circuit likewise impair performance. While lesions provide strong evidence that these brain regions are required for task performance, they do not have the temporal specificity needed to dissociate the contributions during discrete phases of task performance. By contrast, focal pharmacological manipulations, which can transiently suppress activity within a brain region, have revealed differential roles for the amygdala and the OFC in particular phases of this task (Wellman et al., 2005; West et al., 2011). While neither the amygdala, nor the OFC are needed to register a change in value of the primary reinforcer, both structures play critical and differential roles in the subsequent processes. The amygdala is necessary for adjusting the object representations to reflect the new value of the primary reinforcer, but not necessary for optimal action selection (Wellman et al., 2005); a similar profile has also been observed for the OFC Area 13. By contrast, Area 11 is critical only for action selection (Murray et al., 2015).

Because MD receives input from amygdala and is reciprocally connected with both Areas 11 and 13 of the OFC (Timbie and Barbas, 2015), it suggests that this structure may be central to the devaluation circuit. However, the role of the MD in specific phases of devaluation is unknown. Here, we considered two competing hypotheses regarding the role of the MD in this network: [1] it mirrors the function of the amygdala and Area 13 of the OFC, serving primarily as a ‘relay’ to process information regarding value updating between the amygdala, Area 11 and Area 13, and [2] it mirrors the function of Area 11 of the OFC, contributing primarily to action selection. To dissociate between these outcomes, we tested four adult male rhesus macaques on a reinforcer devaluation task while transiently inactivating the MD during various stages of the task (Málková et al., 1997).

Results

Animals were first trained on a set of forty concurrent object discriminations; through repeated trials the animals learned the association between each object and a particular reward (see Materials and methods). Next, they were tested for preference on a baseline object probe test (baseline probe). In this test, two objects, each baited with one of the two foods, were pitted against each other. On another day, animals underwent selective satiation (pre-feeding with one of the two foods) and were again tested on an object probe test (sated probe), administered in a similar way as the baseline probe. This weekly sequence (baseline probe followed by sated probe) was repeated such that each food was sated for each experimental condition (see Figure 1 for experimental timeline).

Figure 1

Download asset Open asset

Weekly schedule of testing sessions.

Days 1–7 represent a sequence of daily behavioral training. Testing order was pseudorandomized for each animal on the infusion probe sessions conducted on Day 4.

https://doi.org/10.7554/eLife.37325.003

In the above probe tests, the animal selected between two objects, each baited with one of the two food rewards. Thus, the objects were the cue used to guide action. We also tested animals in a ‘consummatory’ probe, in which they chose between two competing food rewards in the absence of objects; this served as a control to ensure successful devaluation of the primary reinforcer.

A shift in choices between the baseline and sated probe tests is reflected by a positive proportion shifted (see Materials and methods) and indicates successful devaluation. A value of 1 indicates a complete shift away from objects predicting the sated food, a value of 0 indicates no shift in preference. Thus, decreases in proportion shifted after experimental manipulations indicate impaired reinforcer devaluation.

On separate testing weeks, we microinjected the GABAa receptor agonist muscimol (MUS, 9 nmol), the glutamate receptor antagonist kynurenic acid (KYNA, 450 nmol), or saline into the MD, either prior to selective satiation or prior to the sated probe test (Figures 2A1–4). MRI-guided stereotaxic targeting of the MD was performed and confirmed (Figure 2B) as we have described for other regions (Wellman et al., 2005; West et al., 2011). We microinjected drug or vehicle at one of these two points during the task (see Figure 2A), to dissociate potential impairments in value updating (infusion before satiation; Figure 2A1 and A3) from impairments in action selection (infusion prior to probe; 2A2, 2A4).

Figure 2 with 2 supplements see all

Download asset Open asset

(A) Schematic, indicating the timing of drug infusions and tests.

Muscimol (MUS, blue), was infused either before satiation (A1), and was thus present during both satiation and the probe test, or infused 30 min before the probe test, and was thus present only during the probe test (A1). The 30 min interval between infusion and test in (A2) was selected to match the interval between infusion and probe in (A1). Kynurenic acid (KYNA, red) was infused either before satiation (A3) or before the probe test (A4). The 15 min interval between satiation and probe test in (A3) was selected to allow for clearance of KYNA prior to the probe test. (B) Intended infusion sites (top) with representative MRIs showing gadolinium contrast after infusion into the MD of two subjects (bottom). (**C–J**) Histograms indicate means + SEM with individual subject data points overlaid. *=significant difference from control, p<0.05; ^=significantly greater than chance, p<0.05. Full statistical results are presented in Supplementary file 1c.

https://doi.org/10.7554/eLife.37325.004

Under baseline conditions, animals displayed a slight, but significant, preference for one type of reward (Figure 2F). This was evident in their choosing a larger proportion of objects predicting that reward (baseline preference ratio). This pattern is similar to what has been previously reported (Mitchell et al., 2007; West et al., 2011). Under saline-infused condition, animals displayed robust devaluation following satiation; the devaluation effect was demonstrated by a shift away from choices of objects associated with the devalued food.

Microinjection of MUS either before satiation or before the sated probe test significantly disrupted the devaluation effect compared to sham/saline infusion; this was manifest as a decrease in the proportion shifted (Figure 2C). This disruption was evident in all four subjects, resulting in a significant main effect of treatment (F_1.1,3.4=13.3; p=0.029). Pairwise comparisons revealed a significant impairment in performance when MUS was infused either before satiation (p=0.032) or before the sated probe test (p=0.035). The magnitude of impairment did not differ between these conditions (p=0.13). In contrast to the object probe sessions, MUS injection failed to alter choices in the consummatory probe (Figure 2D; F_1.3,4.0=1.25; p=0.35). Because animals still displayed a typical shift away from sated primary reinforcers in the consummatory probe, the deficits seen in Figure 2C cannot be explained by impairment in satiety or valuation of primary reinforcers. Thus, the disruption in the devaluation effect was specific to adjusting the value of the objects to reflect the new value of the primary reinforcer (i.e., the sated food).

In prior studies (Izquierdo and Murray, 2010; Mitchell et al., 2007; Wellman et al., 2005; West et al., 2011), it has been reported that after displacing objects and revealing a devalued food reward, monkeys will avoid consuming the devalued food. While we did not systematically record the consumption of food during the object probe tests, anecdotally, there were trials in which the animals did not eat the devalued food after displacing the object.

To determine if deficits in devaluation were secondary to impaired object recognition, we tested animals (on a separate test day) on the concurrent visual discrimination task following drug infusion. MUS injection was associated with a small, but significant impairment in concurrent visual discrimination (Figure 2E; t = 4.09, df = 3, p=0.026). It is possible that this finding is due to a drug-related deficit in object recognition, reward associations, and/or appropriate action selection. Interestingly, this deficit also likely contributed to the significant decrease in baseline preference ratio (see Materials and methods) observed after MUS injection (Figure 2F; t = 4.28, df = 3, p=0.023). Indeed, unlike under baseline (non-sated) condition when animals typically choose significantly more objects associated with one type of reward over the other, MUS presence abolished this baseline preference ratio (Figure 2F).

MUS injection produces long lasting inhibition, with effects evident for hours after drug injection (Dybdal et al., 2013). Thus, injection before satiation likely results in a suppression of the activity within MD both during satiation and the probe test. Because injections both before and after satiation disrupted devaluation, these data alone were unable to clarify what, if any, role the MD played during the period of selective satiation. To determine if the MD is required specifically during selective satiation, we next turned to microinjection of KYNA, which has a short duration of action (~30 min) (Forcelli et al., 2014) and short half-life within the brain (8–30 min) (Vécsei and Beal, 1990; Turski and Schwarcz, 1988). Based on the timing of our experiments (the probe session was conducted 45 min after drug infusion,~3 half-lives), less than 10% of KYNA is expected to remain during the probe test. Thus, injection before satiation is expected to only disrupt activity during the period of satiation and not during the sated probe test. We confirmed, in two monkeys, that lower doses of KYNA infused immediately prior to the sated probe (Figure 2—figure supplement 1), did not impact performance. Thus the trace amounts of KYNA remaining when infused prior to satiation are unlikely to be sufficient to impact behavior during the probe test.

KYNA infusion either during satiation, or before the sated probe test, impaired the devaluation effect (Figure 2G). This pattern was evident in all three subjects, resulting in a significant main effect (F_1.0,2.1=52.3; p=0.017) with both test conditions differing significantly from saline-infused control sessions (Infusion before satiation: p=0.021, Infusion before sated probe: p=0.021). Moreover, the impairment was of greater magnitude when KYNA was infused before the sated probe test (p=0.043).

Similar to MUS infusion, KYNA infusion did not disrupt performance in the consummatory task (Figure 2H; F_1.1,2.1=1.11; p=0.40). In contrast to MUS infusion, KYNA infusion spared performance in concurrent visual discrimination (Figure 2I; identical values for all animals under all conditions precluded inferential statistical analysis across treatments). Similarly, KYNA infusion was without effect on baseline preference ratio (Figure 2J; t = 0.277, d = 2, p=0.81). In conclusion, impaired devaluation, in the absence of other deficits, indicates that the MD is required both for adjusting the value of object representations and for appropriate action selection.

Discussion

Our present findings delineate the role of the MD in each of the cognitive processes needed for reinforcer devaluation. We conclude that: [1] Because the expected shift in primary reinforcer preference seen after satiation occurred under all experimental conditions, the MD is not necessary for registering a change in primary reinforcer value. This is consistent with prior studies in which the MD was lesioned. [2] Because inactivation of the MD during satiation impaired reinforcer devaluation, activity in the MD is necessary for adjusting the value of the objects to reflect the new value of primary reinforcers. This deficit is similar to that seen after inactivation of the amygdala or Area 13 of the OFC. [3] Because inactivation of the MD during the probe session (i.e., after satiation was completed) also impaired reinforcer devaluation, activity in the MD is required for appropriate action selection, a role also attributed to Area 11 of the OFC. Thus, we have demonstrated that the MD is required for both reward valuation and action selection; this represents a unique profile within the circuit supporting this behavior, differing from both the amygdala and subregions of the orbitofrontal cortex.

We found a dissociation between the effects of MUS and KYNA with respect to performance on concurrent visual discrimination. While in the case of MUS, at least a portion of the observed deficit in reinforcer devaluation may be due to impaired object discrimination, this is not so for KYNA. After KYNA infusion, concurrent visual discrimination performance was left intact, but deficits in reinforcer devaluation were still evident. In prior studies, large lesions to the MD resulted in deficits in visual recognition/concurrent visual discrimination (Gaffan and Parker, 2000; Aggleton and Mishkin, 1983) whereas lesions that damaged only the magnocellular portion of the MD (the region we targeted with our drug infusions), did not (Mitchell et al., 2007). Because of the differences in duration of action and timing of experimental manipulations, MUS likely spread to a larger area of the MD than did KYNA. While gadolinium is a reasonable proxy of drug spread, and prior functional data from our labs (Dybdal et al., 2013; Malkova et al., 2015) and others help to estimate the volume of drug spread (Martin and Ghez, 1999; Martin, 1991; Allen et al., 2008), it is indeed a technical limitation of temporary pharmacological inactivation that we cannot directly document the spread of drug for each individual infusion. While speculative, broader inactivation of the MD with MUS may have caused the deficit in concurrent visual discrimination. The degree to which inactivation of the MD would impair performance in other tasks where lesions have produced deficits (Browning et al., 2015; Chakraborty et al., 2016) remains to be explored.

Inactivation of the MD during the probe test (KYNA infusion after satiation) resulted in a larger deficit than inactivation during satiation; we consider at least two explanations. [1] Amygdala neurons (necessary during satiation) project directly to the OFC (Timbie and Barbas, 2015; Timbie and Barbas, 2014), and indirectly to the OFC via the MD (Timbie and Barbas, 2015; Ray and Price, 1993). Thus, even in the absence of MD function, amygdalofrontal projections may partially support updating of the object values. [2] Because Areas 11 and 13 of the OFC support different components of this task, cross-talk between these regions may be critical. Both of these regions are reciprocally interconnected with the MD (Ray and Price, 1993), thus, the MD may modulate and facilitate transmission between these cortical regions (see Figure 2—figure supplement 2 for a comparison across studies). Consistent with this notion, crossed lesions to the OFC and the MD impaired devaluation (Browning et al., 2015). Moreover, recent studies in rodents have shown that optogenetic activation of the MD enhances cortico-cortical communication and improves performance on prefrontal cortex dependent tasks, whereas optogenetic silencing of the MD increases errors during prefrontal-dependent task performance (Schmitt et al., 2017; Bolkan et al., 2017).

Our present findings, which provide novel temporal information regarding the role of MD in reinforcer devaluation, are in general agreement with prior studies. Lesions to the MD produced similar impairments to those that we observed. While the effect magnitude has varied slightly from study to study, the general pattern has been consistent. Given the similarity in the findings between lesion and pharmacological inactivation methodologies, neither compensatory circuit alterations after lesions, nor off-target effects after drug infusion are likely to account for the deficits observed. Thus, together, these data and those previously published underscore a critical role for processing within the MD in behavioral flexibility.

The present data delineate the role of the MD in each of the processes integral to reinforcer devaluation: [1] The MD is not necessary for registering a change in primary reinforcer value. [2] Activity in the MD is necessary for adjusting the value of the objects to reflect the new value of primary reinforcers. [3] The MD is required for appropriate action selection. Together, this pattern of deficits seen with inactivation of the MD is unique; unlike the amygdala, or individual subregions of OFC, the MD is required for multiple components of task performance. Rather than serving as a parallel pathway for information transfer between the amygdala and the OFC, these data instead suggest that the MD functions as a privileged and critical interface between other nodes of the devaluation circuitry.

Materials and methods

Animals

Four adult, male, rhesus macaque (Macaca mulatta) were subjects in the present study. At the start of the first reinforcer devaluation study they weighed 8.2–9.8 kg. They were housed with visual access to conspecifics in standard home cages (61 × 74×76 cm each).

Water was available ad libitum in the home cage. Meals (LabDiet #5049) were provided twice daily and supplemented with fresh fruit. The first meal was always given after behavioral testing occurred. The study was conducted under a protocol approved by the Georgetown University Animal Care and Use Committee (#2016–1115) and in accordance with the Guide for Care and Use of Laboratory Animals (Committee for the Update of the Guide for the Care and Use of Laboratory Animals, Institute for Laboratory Animal Research, Division on Earth and Life Studies, National Research Council, 2010).

These animals were previously tested on a within-session concurrent discrimination learning task (unpublished) and a previous study using the current reinforcer devaluation task with systemic drug administration (Waguespack et al., 2018).

Share this article

Cite this article

Weekly schedule of testing sessions.

(A) Schematic, indicating the timing of drug infusions and tests.

Author details

Evan Wicker

Contribution

Competing interests

Janita Turchi

Contribution

Competing interests

Ludise Malkova

Contribution

Competing interests

Patrick A Forcelli

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism