Neural activity ramps in frontal cortex signal extended motivation during learning

eLife assessment

This important manuscript provides compelling experimental evidence of extended motivational signals encoded in the mouse anterior cingulate cortex (ACC) that are implemented by orbitofrontal cortex (OFC)-to-ACC signaling during learning. The experimental methods used were state-of-the-art. These results will be of interest to those interested in cortical function, learning, and/or motivation.

https://doi.org/10.7554/eLife.93983.3.sa0

Significance of the findings:

Important: Findings that have theoretical or practical implications beyond a single subfield

Landmark
Fundamental
Important
Valuable
Useful

Strength of evidence:

Compelling: Evidence that features methods, data and analyses more rigorous than the current state-of-the-art

Exceptional
Compelling
Convincing
Solid
Incomplete
Inadequate

During the peer-review process the editor and reviewers write an eLife Assessment that summarises the significance of the findings reported in the article (on a scale ranging from landmark to useful) and the strength of the evidence (on a scale ranging from exceptional to inadequate). Learn more about eLife Assessments

Abstract
eLife digest
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

Learning requires the ability to link actions to outcomes. How motivation facilitates learning is not well understood. We designed a behavioral task in which mice self-initiate trials to learn cue-reward contingencies and found that the anterior cingulate region of the prefrontal cortex (ACC) contains motivation-related signals to maximize rewards. In particular, we found that ACC neural activity was consistently tied to trial initiations where mice seek to leave unrewarded cues to reach reward-associated cues. Notably, this neural signal persisted over consecutive unrewarded cues until reward-associated cues were reached, and was required for learning. To determine how ACC inherits this motivational signal we performed projection-specific photometry recordings from several inputs to ACC during learning. In doing so, we identified a ramp in bulk neural activity in orbitofrontal cortex (OFC)-to-ACC projections as mice received unrewarded cues, which continued ramping across consecutive unrewarded cues, and finally peaked upon reaching a reward-associated cue, thus maintaining an extended motivational state. Cellular resolution imaging of OFC confirmed these neural correlates of motivation, and further delineated separate ensembles of neurons that sequentially tiled the ramp. Together, these results identify a mechanism by which OFC maps out task structure to convey an extended motivational state to ACC to facilitate goal-directed learning.

eLife digest

Achieving goals takes motivation. An individual may have to complete a task many times for a future reward. For example, an animal may have to forage repeatedly to find food, or a person may have to study to get a good grade on a test. How these complex behaviors are encoded in the brain’s wiring is not fully understood.

Patients with injuries to the frontal cortex of the brain display a lack of motivation to pursue goals. This discovery suggests the frontal cortex plays a vital role in motivation and goal-directed behavior. Animal studies show that part of their brain's frontal cortex, the anterior cingulate cortex (ACC), helps them stay motivated and put extra effort into achieving goals. Yet, scientists wonder how particular actions are associated with specific goals and suspect the orbital frontal cortex (OFC) contains the blueprint to support this association.

Regalado et al. show that the OFC and ACC work together during goal-seeking behavior in mice. In the experiments, mice learned to complete a task to achieve a sugar water reward. As the mice were learning, Regalado et al. recorded activity in the ACC and found that the ACC is active during goal-seeking behavior. They also discovered that the activity of neurons in the OFC increased the longer mice went without receiving a reward, up until the reward was achieved, signaling a motivational state. Animals not motivated enough to maximize their rewards did not have an increased OFC activity. The experiments also showed that the motivational signals in the OFC were conveyed to ACC to support goal-directed learning, especially linking actions to positive future outcomes.

The experiments help explain how an increase in neuronal activity in the OFC helps to increase motivation and goal-seeking behavior supported by the ACC. More studies will help scientists learn more about these processes and develop drugs or other therapies that can help people who have learning difficulties or struggle with motivation because of an injury or mental illness.

Introduction

Animals must sustain an extended motivational state to achieve goal-directed learning. Imagine being hungry in the middle of a busy metropolis with no cellphone battery and no way of searching for the nearest restaurant. The feeling of hunger provides motivation to search for restaurant signs, scan menus, and contemplate what type of food to eat. If it is dinnertime and many restaurants are full, this motivational state (hunger) may persist for hours until a restaurant is selected. Thus, an animal’s ability to carry out novel actions based on its desired goals is commonly referred to as goal-directed learning. This learning is of a more deliberate, informed nature than habitual learning, as it is sensitive to the current value of outcomes and can lead to a novel sequence of actions for a desired outcome (Balleine and Dickinson, 1998; Tolman, 1948; Pezzulo et al., 2014).

Goal-directed learning often requires the ability to maintain an extended motivational state even in the midst of distracting and competing external variables (Miller and Cohen, 2001; Shenhav et al., 2013). This function has been long proposed to be carried out by the prefrontal cortex (PFC), as patients with PFC lesions struggle to perform tasks that require maintaining a motivational and goal-directed state, in the midst of competing sensory information, such as the Stroop task or the Wisconsin Card Sorting Task (Stroop, 1935; Yuan and Raz, 2014; D’Esposito and Postle, 2015; Milner, 1963; Pardo et al., 1990; Shallice and Burgess, 1991). In particular, the anterior cingulate cortex (ACC) has been implicated in action selection over long timescales that are influenced by a variety of motivational factors, such as the value and effort required for each outcome (Shenhav et al., 2013; Hauber and Sommer, 2009; Hillman and Bilkey, 2010; Wallis and Kennerley, 2011; Cowen et al., 2012; Shenhav et al., 2016). For instance, when animals are given two choice options: one in which high effort leads to high rewards, and one in which low effort leads to low rewards, animals learn to exploit the high-effort, high-reward option (Walton et al., 2003; Schweimer et al., 2005). Impairments to the ACC results in animals failing to accurately allocate motivation toward strategies that maximize reward (Amiez et al., 2006; Kennerley et al., 2006). Single-unit recordings from ACC have shown that neurons encode for choices that require effort with a higher payoff, giving support for the hypothesis that this region is important for action-outcome associations and allocating resources for learning and for the maximization of reward over long timescales (Hillman and Bilkey, 2010; Monosov et al., 2020; Holroyd and Yeung, 2012; Hillman and Bilkey, 2012). While the precise functions of ACC are still debated, its role in goal-directed learning is widely accepted (Shenhav et al., 2013; Holroyd and Yeung, 2012; Botvinick et al., 2001; Heilbronner and Hayden, 2016; Rushworth et al., 2012).

To provide deeper mechanistic insight into how ACC encodes an extended motivational state to facilitate goal-directed learning, we sought to track how animals learn to adjust their behavior over days-long timescales to maximize reward when cue-reward contingencies change. We designed a task in which mice self-initiate trials and learn to associate cues with reward. Through neural activity recordings during behavior, we found that ACC neural activity was consistently tied to trial initiations where mice seek to leave unrewarded cues to reach a rewarded cue. Subsequently, by recording neural activity from inputs to ACC we identified a ramp in bulk activity in orbitofrontal cortex (OFC)-to-ACC projections as mice continuously existed unrewarded cues, peaking when they finally reached a rewarded cue, thus tracking an extended motivational state. Finally, cellular resolution imaging of OFC-to-ACC neurons identified populations of neurons that sequentially tile the observed bulk neural activity ramp across unrewarded cue presentations. In particular, neurons that preferentially encoded reward cues, before learning, began to code for unrewarded, cues after learning, including the motivation to exit these rooms to reach more reward-associated cues. Taken together, we identified a mechanism by which OFC neural activity ramps map out task structure and conveys an extended motivational state to ACC to enable goal-directed learning.

Results

ACC contains neural correlates of motivation during learning

We began by designing a learning task in which mice self-initiate trials and, upon brief cue exposure (an olfactory and auditory cue), learn to stop to collect a water reward (Figure 1A and B, Figure 1—figure supplement 1A, Materials and methods). We implemented this task in a head-fixed setting to enable hundreds of trials per session, and millisecond precision in tracking stimulus delivery and behavioral responses (Figure 1A). We used ‘time to initiate trials’ as the primary measure of motivation, and ‘total reward obtained’ as the primary measure of learning. Due to the self-paced nature of the task (Figure 1B and C), we found variation between our mice in how quickly they initiated trials and how many rewards they received per minute (Figure 1C). As expected, the faster mice can initiate trials, the more rewards they obtained per minute, providing a strong correlation between motivation and learning (Figure 1D).

Figure 1 with 1 supplement see all

Download asset Open asset

Neural activity in anterior cingulate cortex (ACC) signals a motivational state to obtain reward.

(A) Schematic of virtual reality experimental setup and trial structure. A mouse initiates a trial by running to trigger the onset of cues (olfactory and auditory). After cue onset, a mouse stops to collect a water reward, which ends the trial (see Materials and methods). (B) Representative traces of speed and licks from one mouse during a session, with shaded portions corresponding to when cues are on. Red arrows correspond to periods when mice are running to trigger cue onset or stopping to trigger water delivery. Black arrows correspond to sections of a session where we can quantify time to initiate trials, initiation speed, cue stops, and rewards. (C) Quantification per mouse of time to initiate a trial (far left; seconds), initiation speed (left; cm/s), % trials in which a stop occurred during cue presentation (right), and rewards received per minute. Individual data points shown (N=12 mice). (D) Scatter plots of the mean time (s) to initiate a trial plotted alongside rewards received per minute per mouse (N=12 mice). Individual data points shown, with a best fit line, represented by the solid line in the figure. r²=0.8675 and p<0.0001 are determined by linear regression. (E) Left: Bulk neural activity recording experimental design. GCaMP6f was injected into the ACC and neural activity was recorded on a fiber photometry setup (see Materials and methods). Right: Brain histology from a representative mouse showing DAPI in blue, GCaMP6f in green, and photometry cannula implantation in ACC (dotted white lines). Scale bar: 1 mm. (F) Top: Trial averaged plots of ACC activity (z-scored dF/F) and speed (cm/s) aligned to reward onset. Data are mean (solid line) ± s.e.m. (shaded area). Bottom: Relative frequency plots of the time (s) for ACC dF/F or speed to rise above 1 std or 1 cm/s during rewards, respectively (N=105 trials across 12 mice). *p<0.05, paired t-test between time to rise (s) between ACC and speed. Data is the frequency of values across time. (G) Same as F, but for trial initiations (N=510 trials across 12 mice). (H) Injection strategy for DREADDS-based chemogenetic inhibition of ACC during self-paced task. Coronal section from an animal virally injected with AAV1-CamKii-hM4D(Gi) in ACC. DAPI is shown in blue and hM4D(Gi) in red. Scale bar: 1 mm. (I) Representative traces of speed and licks from one mouse during the task on a day with saline (top) or clozapine N-oxide (CNO) (bottom) administration 45 min prior to a session, with shaded portions corresponding to when cues are presented. (J) Left: Quantification of time (s) to initiate trial (left) across saline and CNO sessions in mCherry-control mice (N=188 trials across 6 mice) and hM4D(Gi)-DREADDs mice (N=215 trials across 4 mice). Right: Same as left but for rewards received per minute in mCherry-control mice (N=60 min across 6 mice) and hM4D(Gi)-DREADDs mice (N=40 min across 4 mice). p=0.8707 for mCherry and *p<0.05 for hM4Di (time to initiate), p=0.2073 for mCherry and *p<0.05 for hM4Di (rewards per min), unpaired t-test between saline and CNO sessions per group. Data are mean ± s.e.m.

The ACC has been prominently implicated in motivation and voluntary actions for maximizing reward, so we posited that ACC would contain motivation-related neural activity patterns in our task (Shenhav et al., 2013; Monosov, 2017; Kolling et al., 2016; Khalighinejad et al., 2020; Khalighinejad et al., 2022). To test this hypothesis, we injected AAV1-CaMKII-GCaMP6f into the ACC and implanted fiber-optic cannulas to record bulk neural activity in ACC during behavior (Figure 1E). We observed strong neural responses in ACC that were tuned to reward delivery and trial initiations (Figure 1F and G). Notably, the ACC neural signal precedes speed onset in both cases, suggesting that ACC is not tracking speed but rather the motivation to initiate trials (Figure 1F and G, Figure 1—figure supplement 1B). We sought to determine how prolonged inhibition of ACC would impact motivation and whether this was required for learning and reward maximization. We injected AAV9-CaMKII-hM4D(Gi) into ACC and performed chemogenetic inhibition during a session of clozapine N-oxide (CNO) injection (ACC inhibition session) versus a session of saline injection (control session) (Figure 1H and I). We found that ACC inhibition caused mice to have a significant increase in time to initiate trials (Figure 1I), which also resulted in a decreased number of rewards received per minute (Figure 1J). Furthermore, we found a small, but significant, decrease in speed during trial initiation (but not overall session speed), suggesting that ACC inhibition might also impair vigor of movements during trial initiations (Figure 1—figure supplement 1C). Thus, we developed a self-paced behavioral task where mice learned cue-reward contingencies, and identified motivation-related signals in ACC that were required for learning to maximize rewards.

ACC contains neural correlates of extended motivation during learning

We next sought to increase the motivational demand during learning. We thus extended our task by training mice to learn two sets of cue-outcome relationships, where one cue-set (olfactory+auditory) is associated with a sucrose water reward (hereafter referred to as ‘R’ cues), whereas the other cue-set is associated with no-reward (‘N’ cues). Since mice have been shaped to stop during cue presentations (Figure 1), it is now effortful for them to learn to continue running through the N cues so that they can reach more R cues, and thus maximize their total rewards in a session. Thus, motivation is assessed not only by ‘time to initiate trials after R cues’, as before, but now also the more effortful measure of ‘time to initiate trials after N cues’ (Figure 2A; see Materials and methods). We measured overall learning through differences in their lick rates and speed during presentations, expecting progressive suppression of licking and increases in speed in the N cues compared to the R cues across days. Interestingly, we found that mice learned to suppress licking in the N cues (Figure 2A; red arrows on day 2) much earlier than learning to increase speed in N cues (Figure 2A; red arrows on day 4; Video 1). Across the cohort, on average, this increase in speed during N cues began as early as day 3, after they had learned to suppress their licking (day 2), as determined by speed and stop discrimination index (stop DI: % of stops in N – R/all trials) (Figure 2B and C, Figure 2—figure supplement 1A; see Materials and methods). Finally, there was also a significant correlation between stop DI and rewards obtained per minute, confirming that the development of this behavioral strategy is tied to reward maximization within a given training session (Figure 2D).

Figure 2 with 1 supplement see all

Download asset Open asset

Neural activity in anterior cingulate cortex (ACC) scales to match an increased motivational state during learning.

(A) Top: Schematic of training where mice learn to associate stopping to one set of cues with no water reward (‘N’) or with water reward (‘R’). Bottom: Representative traces of speed and licks from one mouse during a session on training day 2 and day 4, with shaded portions corresponding to when a reward cues (R, blue) or no-reward cues (N, orange) is presented. Red arrow denotes the suppression of licks on day 2, and rise in speed during no-reward cues on day 4. (B) Trial averaged speed (cm/s; top), lick rate (Hz; middle), and ACC activity (dF/F z-scored; bottom) aligned to cue presentation across days 2 and 4 of training, separated by reward and no-reward cues (blue vs orange). Black arrow signifies rise in speed after no-reward cue presentation. N=12 mice. Data are mean (dark line) with s.e.m. (shaded area). (C) Quantification of average cue speed (cm/s; top), lick rate (Hz; middle), and ACC activity (dF/F z-scored; bottom) across training, separated by reward and no-reward cues (blue vs orange). N=12 mice in each group, data are mean ± s.e.m. *p<0.05, paired t-test between reward and no-reward. (D) Scatter plots of rewards per minute vs stop discrimination (top), lick discrimination (middle), or dF/F difference (bottom) for each mouse throughout training (N=120 data points, 12 mice per each of 10 days). Data are individual points with best fit line. r² and p values are shown, as determined by linear regression. (E) Top: Trial averaged speed (cm/s) and ACC activity (dF/F z-scored) aligned to cue presentation across three trials consisting of a reward, no-reward, and reward cue (RNR). Bottom: Trial averaged ACC activity (dF/F z-scored) aligned to cue presentation across four trials consisting of a reward, no-reward, no-reward and reward cue (RNNR). N=12 mice. Data are mean (dark line) with s.e.m. (shaded area). (F) Quantification of average cue dF/F activity across RNR and RNNR trial sequences. N=12 mice. *p<0.05, one-way repeated measured ANOVA with post hoc Tukey’s multiple comparison test. Data are mean ± s.e.m. (right). (G) Top: Injection strategy for AAV1-CaMKII-stGtACR2 into ACC for optogenetic inhibition during training. Middle: Brain histology from a representative mouse showing DAPI in blue, stGtACR2 in red, and photometry cannula implantation in ACC. Scale bar: 1 mm. Bottom: Optogenetic inhibition was targeted to days 1–6 of training and mice were allowed to continue training for days 7–10. (H) Left: Trial averaged plots of speed (cm/s) aligned to cue entry on T6 for mCherry controls and GTACR inhibition mice, separated by reward or no-reward cues. Right: Quantification of mean speed during cue presentations. N=8 mice for mCherry, 4 for GTACR early inhibition. *p<0.05, paired t-test.

Video 1

Download asset

posterframe for video — Behavior during learning.

Playback speed: 2×. Shown here is a representative mouse learning to stop in cues that predict reward (blue walls) and run throughout consecutive cue presentations that predict no-reward (yellow walls). Displayed trial sequence order is reward, no-reward, no-reward, and reward (RNNR).

We next searched for neural correlates of motivation by recording bulk neural activity in ACC as mice performed this task, and aligning neural responses to behavioral frames, focusing on periods when mice learn to run during N cue presentations. As before (as in Figure 1), in this two cue-outcome relationship task, we again found that ACC continued to be active during reward delivery and during trial initiations (Figure 2—figure supplement 1B–C). Additionally, however, in this task we found that ACC began to significantly increase its activity, specifically during N cues, as early as T3, as mice exhibited a learned motivation to leave N cues to reach more R cues (Figure 2B–D). As a further confirmation of this result, we investigated ACC’s activity during extended motivation across two consecutive N cues and found that ACC activity continued to remain high from the initial N cue presentation until an R cue was reached (Figure 2E and F, Figure 2—figure supplement 1D). These neural responses, and in particular the dF/F difference in N vs R cues, positively correlated with the amount of reward obtained per minute, linking motivation-related ACC activity to overall learning (Figure 2D). Importantly, in all cases, on a trial-by-trial basis, the neural signal preceded the behavioral ramp in speed (Figure 2E), and was present even if we restricted our analyses to cue presentations in which mice stopped (Figure 2—figure supplement 1E), suggesting a motivational rather than motor response. To further confirm this dissociation, we passively presented both sets of cues to the mice at the end of each training session. As expected, mice did not develop the motivation to run out of N cues (Figure 2—figure supplement 1F), and accordingly, the ACC neural activity was no longer different between N and R cues. These results together suggest that ACC encodes a motivation signal to initiate trials, and in particular corresponds to the behavioral measure of running during N cues to reach more R cues, thus facilitating goal-directed learning.

We proceeded to test whether these motivation-related signals in ACC are required for learning. To restrict our inhibition to cue presentation portions of our task, and combat any potential off-target effects of CNO (Manvich et al., 2018) from repeated administration across several days of training, we used optogenetic inhibition. We injected AAV1-CaMKII-stGtACR2 bilaterally in ACC to express the inhibitory opsin and delivered light selectively when the mouse received R or N cues, for the first 6 days (‘early’) or last 4 days (‘late’) of training (Figure 2G and H, Figure 2—figure supplement 1H). We found that early ACC inhibition prevented mice from learning to run out of N cues, even though they still learned to suppress their lick rates (Figure 2H, Figure 2—figure supplement 1G). Late ACC inhibition had no effect on speed or lick rate behavior, as mice continued to run out during N cues while inhibition occurred, suggesting ACC activity does not broadly suppress speed (Figure 2—figure supplement 1H). All together, we identified an extended motivation signal in ACC that is required for learning and reward maximization.

Neural activity in orbitofrontal projections ramps until rewards are reached

The ACC receives projections from disparate regions across the brain that could facilitate the integration of value, internal state, and multisensory information, so we sought to identify how afferent projections may convey motivational signals to ACC during learning (Fillinger et al., 2017). We injected rgAAV-hSyn-Cre into ACC and injected AAV1-CAG-FLEX-GCaMP6f in the OFC, anteromedial thalamus (AM), basolateral amygdala (BLA), locus coeruleus (LC), and implanted optical fibers above each region to record neural activity during learning in this task (Kim et al., 2016; Figure 3A). We first characterized whether the previously observed ACC neural responses during reward delivery and trial initiations were present in any of the inputs to ACC (Figure 3—figure supplement 1A). We found that even before learning all projections responded significantly to rewards, and most (OFC_ACC, AM_ACC, and LC_ACC) increased their activity in anticipation of trial initiations (Figure 3—figure supplement 1A). Thus, motivation-related signals were broadly present in various projections to ACC.

Figure 3 with 1 supplement see all

Download asset Open asset

Mice with extended motivational states during learning display neural activity ramps in orbitofrontal cortex (OFC).

(A) Injection strategy and fiber-based photometry setup to record bulk GCaMP6f of projections to anterior cingulate cortex (ACC) from OFC_ACC (orbitofrontal cortex), AM_ACC (anteromedial thalamus), BLA_ACC (basolateral amygdala), or LC_ACC (locus coeruleus). Representative traces for a single mouse showing traces for each region dF/F, speed, and licks. Shaded portions are shown corresponding to when a reward cues (R, blue) or no-reward cues (N, orange) are presented. (B) Left: Trial averaged bulk GCaMP6f dF/F of ACC, OFC_ACC, AM_ACC, BLA_ACC, and LC_ACC during a sequence of trials on T6 including reward, no-reward, and reward cues (RNR). Black arrows denote the rise in pre-cue activity from N cue to the following R cue in the RNR sequence. Right: Quantification of pre-cue activity for the N cue and following R cue. Data are mean (solid line) ± s.e.m. (shaded area). N=19, 12, 5, 4 mice, data are mean (solid line) ± s.e.m. (shaded area), *p<0.05, paired t-test between N vs R cues. (C) Left: Trial averaged bulk GCaMP6f dF/F of OFC_ACC during a sequence of trials including reward, two no-reward, and reward cues (RNNR). Red arrows denote the rise in pre-cue activity from first N cue to the last R cue in the RNNR sequence. Right: Quantification of pre-cue activity for the first N cue, second N cue, and last R cue. Data are mean (solid line) ± s.e.m. (shaded area). N=19 mice, data are mean (solid line) ± s.e.m. (shaded area), *p<0.05, one-way repeated measures ANOVA with post hoc Tukey’s multiple comparison test. (D) Left: Speed (cm/s) for ‘Learner’ (black; reached a DI>0.5 for 3 consecutive days) or ‘Non-Learner’ (red) mice on training day 6 aligned to no-reward cue onset. Middle: Discrimination index for each group of mice throughout training. Right: Speed during reward and no-reward cues for ‘Learner’ mice. N=7 (‘Learner’) and 9 (‘Non-Learner’) mice. Data are mean (solid line) ± s.e.m. (shaded area), *p<0.05, unpaired t-test between Learner and Non-Learner DI (middle), paired t-test between reward and no-reward cues (right). (E) Left: Trial averaged bulk GCaMP6f dF/F of OFC_ACC during a sequence of trials including reward, two no-reward, and reward cues (RNNR). Black arrows denote the rise in pre-cue activity from first N cue to the last R cue in the RNNR sequence. Red arrows denote the absence of this ramp in Non-Learner mice. Right: Quantification of pre-cue activity for the first N cue, second N cue, and last R cue. Data are mean (solid line) ± s.e.m. (shaded area). N=7 (‘Learners’) and 9 (‘Non-Learner’) mice, data are mean (solid line) ± s.e.m. (shaded area), *p<0.05, one-way repeated measures ANOVA with post hoc Tukey’s multiple comparison test.

We then searched for motivation-related neural responses that were specifically tied to learning. To do so, we aligned neural responses to trial initiations after N cues, as mice learned to leave N cues to reach more R cues. We found that both OFC_ACC and AM_ACC had higher baseline activity during trial initiations after no-rewards (Figure 3—figure supplement 1B, C). To further understand this higher activity after no-rewards we analyzed sequences of ‘RNR’ trials which contained reward, no-reward, and reward cues (Figure 3B). We observed a rise in OFC_ACC activity prior to the N cue presentation that continued to rise until an R cue was reached (black dotted arrow; Figure 3B). We quantified this motivational signal as a difference in pre-cue activity between N and R cues in RNR trial sequences across days and found that this difference emerged at the time of learning (~T3) and closely tracked performance of the learned behavior (T3-T6) for OFC_ACC and AM_ACC, but not BLA_ACC or LC_ACC (Figure 3B). To further build confidence in these results, we asked whether this continuous rise in OFC_ACC activity in RNR sequences would be further extended in RNNR sequences. Indeed, OFC_ACC activity continued ramping across two consecutive N trials, exhibiting higher pre-cue activity upon entering an R cue after two versus one N (black dotted line, Figure 3C).

To more directly determine whether this motivational ramp signal in OFC_ACC is tied to learning, we separated our mice into two groups, one that learned the task (‘Learners’, stop DI>0.5 for at least 3 consecutive days) and one that did not learn (‘Non-Learners’) (Figure 3D, Figure 3—figure supplement 1D). The Learners reached a high DI by T6, which persisted throughout the rest of training, whereas the ‘Non-Learners’ only reached a significantly higher DI by T10 (Figure 3D). Both subsets of mice still learned to discriminate with licking at comparable rates (Figure 3—figure supplement 1E). When we compared OFC_ACC activity in an RNNR sequence of trials, we found that only Learners exhibited a significant ramp in neural activity from the first N cue to the final R cue presentation, which emerged coincidental with behavioral learning and persisted for the remaining days of training (Figure 3E). Together, we identify projection activity in OFC that ramps across N cues until an R cue is reached that is specifically tied to the development of a learned goal-directed behavior.

Orbitofrontal projection neurons tile unrewarded trials until rewards are reached

Given that we identified a ramp in OFC_ACC bulk neural activity during NNR sequences (Figure 3), we sought to determine whether a single persistently active population or a sequence of tiled neurons underlies this ramp. We thus performed real-time cellular resolution imaging of OFC projections to ACC by injecting rgAAV-hSyn-Cre into ACC and AAV1-CAG-FLEX-GCaMP6f in OFC (Figure 4A). We implanted a gradient-index (GRIN) lens above OFC and imaged the region under a two-photon microscope as mice performed the learning task (Figure 4A). We focused our analysis on days where behavioral learning emerged (Figure 4B), and on NNR trial sequences to find an underlying cellular mechanism to the previously observed photometry results (Figure 4C). We found individual neurons that were uniquely active across the first N, second N, or R cue, thereby tiling the sequence of NNR trials (Figure 4D and E). We further found that an increasing number of neurons were active along the sequence of NNR trials and most prominently before learning (Figure 4F, Figure 4—figure supplement 1A–C). Thus, collectively, as an ensemble, these neurons ramp consecutive N cues and peak upon reaching R cues.

Figure 4 with 1 supplement see all

Download asset Open asset

Orbitofrontal cortex (OFC) projection neurons tile sequences of trials with no-rewards.

(A) Injection strategy (top left), histology (top right; scale bar, 1 mm) and z-projection images of two-photon recording (bottom left; mean over time; scale bars, 200 μm) of GCaMP expressing OFC projection neurons with gradient-index (GRIN) implants. Bottom right: Sequence of trials with z-scored dF/F for individual neurons, with shaded portions corresponding to when a reward cues (R, blue) or no-reward cues (N, orange) are presented. Red arrow denotes a dF/F transient occurring after two consecutive N cues. (B) Stop (black) or lick (gray; see Materials and methods) discrimination index on the first day stop DI reaches >0.4 (‘after’) and the two previous days (‘before’ and ‘middle’). N=5 mice. (C) Schematic of OFC_ACC bulk activity based on Figure 3 results and potential single neuron findings that tile a sequence of trials with two no-rewards followed by a reward cue presentation (NNR). (D) Representative neurons with tunings (std>0.75 for 3 s prior to or after cue presentation) to separate cues in an NNR trial sequence. Trial averaged activity of an N (top), NN (middle), and NNR (bottom) neuron with heat map showing individual trial responses. (E) Quantification of neurons tuned to separate cues within an NNR trial sequence and their activity to all other cues. N=17 (N), 18 (NN), 32 (NNR) cells out of 115 cells in total. *p<0.05, one-way repeated measures ANOVA with post hoc Tukey’s multiple comparison test. (F) Percentage of neurons tuned to different cues in an NNR trial sequence before (top) or after (bottom) training. N=5 mice. *p<0.05, one-way repeated measures ANOVA with post hoc Tukey’s multiple comparison test. (G) Ensemble average plots of neurons tuned to R cues after two consecutive N cue presentations (NNR cells) before learning (top) and their activity after learning (bottom). Black arrows denote the rise in activity prior to R cues after learning. N=18 NNR cells out of 81 cells tracked across days. (H) Quantification of transient time (s) since R cue onset for neurons tracked across days. N=132, 170 transient events before and after learning across 18 NNR cells and 105, 59 transient events before and after learning across 12 NR cells. *p<0.05, unpaired t-test. (I) Left: Injection strategy for AAV1-hSyn-SIO-stGtACR2 into OFC_ACC for optogenetic inhibition during training. Optogenetic inhibition was targeted to training for 6 days. Right: Brain histology from a representative mouse showing DAPI in blue, stGtACR2 in red, and photometry cannula implantation in ACC. Scale bar: 1 mm. (J) Left: Mean animal speed (cm/s) aligned to cue zone entry after no-reward on T6 for mCherry control or GtACR mice. Black arrow signifies lack of speed increase during N cues. Right: Quantification of mean change speed in cue zone after no-reward, assessed separately for each cue presentation. N=10 mice for mCherry and 13 mice for GtACR, *p<0.05, paired t-test.

To determine how these NNR ensembles facilitate learning we tracked the same population of neurons ‘before’ and ‘after’ learning (Stop DI>0.4; Figure 4G, Figure 4—figure supplement 1D). We identified an ensemble of neurons that were uniquely responsive to R cues preceded by 2 N cues, before learning, and characterized their responses after learning. Interestingly, these neurons were no longer responsive to R cue onset but rather to pre-R cue activity, which then became progressively more responsive to the preceding N cue onset, aligning with the learned behavioral transition of mice leaving N cues to reach R cues (Figure 4G and H). To determine whether OFC_ACC activity ramps were required for learning, we optogenetically inhibited these projections bilaterally by injecting rgAAV-hSyn-Cre into ACC and AAV1-hSyn-SIO-stGtACR2 into OFC and delivering light only on R or N cues. We then specifically assessed whether previous trial history affected behavioral responses on the current cue condition (Figure 4I and J). Interestingly, while both mCherry control and OFC_ACC inhibition cohorts could increase their speed during N cues following an R cue, OFC_ACC mice were impaired in doing so if the N cue was followed by an N cue (Figure 4I and J, Figure 4—figure supplement 1E). Taken together, these data demonstrate that ensembles of neurons progressively tile the OFC motivational ramp, and that the initial reward responsive neurons become progressively linked to unrewarded cues after learning, thus effectively linking actions to outcomes to maximize rewards (Figure 4—figure supplement 1F).

Discussion

In this study, we developed a self-paced cue-outcome learning task to determine how mice extend their motivational state to maximize reward over long timescales. We identify the ACC as broadly critical to maximizing reward in our task, especially as mice learn to run out of unrewarded cues. We found that upstream inputs to ACC from OFC sustain a ramp-like increase in activity through consecutive unrewarded cues until mice reach rewarded cues. Cellular resolution imaging of OFC projection neurons revealed ensembles of neurons that tile the motivational ramp, and a progressive shift in ensemble tuning during learning such that neurons initially encoding for reward become progressively linked to motivated actions, i.e., trial initiations to reach more rewards. We therefore present a model where OFC contains neurons that increasingly link reward to motivated behaviors, conveying a motivational ramp to ACC, to facilitate learning and reward maximization (Figure 4C, Figure 4—figure supplement 1F).

The OFC has been implicated in guiding adaptive, flexible behavior by signaling information about future outcomes (Rudebeck et al., 2013; Montague and Berns, 2002; Mainen and Kepecs, 2009; Rich and Wallis, 2016; Padoa-Schioppa and Conen, 2017). One view sees OFC’s function as encoding for the value of the outcomes of events, with various neural correlates having been found for value-guided behavior. Another view sees OFC’s function more as building a model of the causal relationships between events, which may or may not entail value assessments, into a cognitive map (Behrens et al., 2018). Indeed, OFC neurons have been found to encode sensory-sensory associations even prior to any kind of learning (Sadacca et al., 2018). A way to link both perspectives into a single account has been to view value and a cognitive map as occurring along a spectrum, where inferring value onto outcomes hinges upon a map that is created. We have found that mice learn to run out during N cues to more quickly reach R cues, thereby acquiring more rewards over a training session. This behavior can be viewed as both value-guided, as the mouse suppresses their lick rate during N cues, and also requiring a mental model of the environment, as running occurs with the expectation of reaching R cues in the future. Indeed, the pseudorandom trial structure ensures that N cues will be presented no more than two times in a row, such that after two N cues an R cue is guaranteed (see Materials and methods). We thus parsimoniously position OFC as functioning in model-based behaviors, and in the accurate planning of actions based on the learned transition structure of a task (Drummond and Niv, 2020).

We linked the ramp-like increase in neural activity in OFC to motivation, but several questions still remain about how motivation is computed and why it would be represented as a ramp. Motivation could be computed as a combination of several variables such as time since last reward, value of reward, and effort to reach future rewards. Future theory-driven analyses could determine how motivation is computed, and whether individual variables of time, value, and effort are encoded as clusters of similar tuned neurons, or mixed and collectively represented at the population level. In either case, it is likely that a combined map of task space and value information carried by OFC are being used to inform downstream regions, such as ACC, for adjusting behavior.

The ACC has been shown to carry information necessary for switching or staying with current behaviors during decision-making and learning in order to maximize rewards and minimize threats or punishments (Shenhav et al., 2013; Monosov, 2017; Kolling et al., 2016). We posit that ACC reads information from OFC about task structure and value to perform computations relevant to allocating behavioral control. We have seen this through our findings that ACC is important for learned behaviors associated with maximizing rewards in our self-paced learning task. We compare the decision to run during N cues to a foraging decision to leave a patch to find alternative options, and ACC’s importance in the development of this behavior is reminiscent to signals previously described at the time a foraging decision is reached (Blanchard and Hayden, 2014; Hayden et al., 2011). We found inhibition of ACC activity affected the development of running during N cues, effectively diminishing an animals’ ability to strategy switch (Kennerley et al., 2006; Akam et al., 2021; Sarafyazd and Jazayeri, 2019; Tervo et al., 2014). While we did not perform single-cell imaging of ACC in our task, we hypothesize that individual ACC neurons could encode the distribution of actions/opportunities (Klein-Flügge et al., 2022) (i.e. stop, run, lick, suppress lick) taken during R or N cues. ACC neurons could compute the relative value of the action taken such that more ACC neurons become recruited once mice learn to run out of N cues. The sustained increase in bulk ACC activity across N cue trials (Figure 2) could come from a stable sequence of individual neurons that encode the timescale of the actions taken. In this way, OFC projections would encode current motivation across N cues before learning, which then triggers ACC to compute the value-based actions. Motivational signals in OFC would thus represent state since past rewards/goals, while in ACC these signals represent actions taken to pursue rewards/goals in the future.

Here, we studied learning as a systems-level process guided by top-down signals that maintain a motivational state. Our work showed the recruitment of multiple frontal cortical areas in this process, which is to be expected as animals are required to build, maintain, and use representations of task structure and value to drive learned, motivated behaviors (Klein-Flügge et al., 2022). Future work can build upon the task we developed here to determine how the frontal cortex maintains motivational states across many more cue-outcome associations, and how these associations may dynamically change across time (Izquierdo et al., 2017). Lastly, a more synaptic-level approach into how ACC integrates information from upstream regions during learning could reveal important micro-circuit computations, molecular or structural changes during motivational states and learning (Thornquist et al., 2020; Peters et al., 2017), and potential mechanisms underlying seconds-long behavioral timescale learning rules (Bittner et al., 2017).

Materials and methods

Key resources table

Reagent type (species) or resource	Designation	Source or reference	Identifiers
Strain, strain background (AAV)	AAV1-CaMKIIa-GCaMP6f	Upenn Vector Core	Addgene#100834
Strain, strain background (AAV)	AAV1-CAG-FLEX-GCaMP6f	Douglas Kim	Addgene#100835
Strain, strain background (AAV)	AAV1-CKIIa-stGtACR2-FusionRed	Ofer Yizhar	Addgene#105669
Strain, strain background (AAV)	AAV9-CaMKIIa-hM4D(Gi)-mCherry	Bryan Roth	Addgene#50477
Strain, strain background (AAV)	AAV9-CaMKIIa- mCherry	Bryan Roth	Addgene#114469
Strain, strain background (AAV)	rgAAV-hSyn-Cre	James Wilson	Addgene#105553
Strain, strain background (AAV)	AAV1-hSyn1-SIO-stGtACR2	Ofer Yizhar	Addgene#105677
Strain, strain background (AAV)	AAV9-hSyn-DIO-mCherry	Bryan Roth	Addgene#50459

Mice

All procedures were done in accordance with guidelines derived from and approved by the Institutional Animal Care and Use Committees (protocol #22,087-H) at The Rockefeller University. Animals used were 8- to 10-week-old naive male C57BL/6J mice (Jackson Laboratory, Strain #000664) at the time of surgery. Mice were group housed (3–5 per cage) with ad libitum food and water, unless mice were water restricted for behavioral assays, in which case they were given 1 mL water per day. Body weight was monitored daily to ensure it was maintained above 80% of the pre-restriction measurement. Surgical procedures and viral injections were carried out in mice under protocols approved by Rockefeller University IACUC and were performed in mice anesthetized with 2% isoflurane using a stereotactic apparatus (Kopf).

Surgical procedures

Puralube vet ointment was applied to the eyes and 0.2 mg/kg meloxicam was administered intraperitoneally using a 1 mL syringe. Hair from the scalp was trimmed, and the area was sterilized using povidone-iodine swabs and subsequently ethanol swabs. An incision covering the anteroposterior extent was made to allow access to the skull. Injection sites were accessed using a dental drill which made 0.5 mm holes through the skull. All virus was injected using a 35 G beveled needle in a 10 µL NanoFil Sub-Microliter Injection syringe (World Precision Instruments) controlled by an injection pump (Harvard Apparatus) at a rate of 100 nL/min. After all viral delivery, an additional 5–10 min delay was applied to avoid backflush before slowly removing the injection needle. Animals that required cannulas or GRIN lenses were implanted immediately following viral injection. Following surgery, mice were allowed to recover in a single housed cage for up to 12 hr, and were given meloxicam tablets. Mice were typically housed for 3 weeks to allow for adequate expression before behavioral testing or histology.

Share this article

Cite this article

Neural activity in anterior cingulate cortex (ACC) signals a motivational state to obtain reward.

Neural activity in anterior cingulate cortex (ACC) scales to match an increased motivational state during learning.

Behavior during learning.

Mice with extended motivational states during learning display neural activity ramps in orbitofrontal cortex (OFC).

Orbitofrontal cortex (OFC) projection neurons tile sequences of trials with no-rewards.

Author details

Josue M Regalado

Contribution

Competing interests

Ariadna Corredera Asensio

Contribution

Competing interests

Theresa Haunold

Contribution

Competing interests

Andrew C Toader

Contribution

Competing interests

Yan Ran Li

Contribution

Competing interests

Lauren A Neal

Contribution

Competing interests

Priyamvada Rajasethupathy

Contribution

For correspondence

Competing interests

Citations by DOI

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism