Neural activity in ACC signals a motivational state to obtain reward

(A) Schematic of virtual reality experimental setup and trial structure. A mouse initiates a trial by running to trigger the onset of cues (olfactory and auditory). After cue onset, a mouse stops to collect a water reward, which ends the trial (see Methods).

(B) Representative traces of speed and licks from one mouse during a session, with shaded portions corresponding to when cues are on. Red arrows correspond to periods when mice are running to trigger cue onset or stopping to trigger water delivery. Black arrows correspond to sections of a session where we can quantify time to initiate trials, initiation speed, cue stops, and rewards.

(C) Quantification per mouse of time to initiate a trial (far left; seconds), initiation speed (left; cm/s), % trials in which a stop occurred during cue presentation (right), and rewards received per minute. Individual data points shown (N=12 mice).

(D) Scatter plots of the mean time (s) to initiate a trial plotted alongside rewards received per minute per mouse (N=12 mice). Individual data points shown, with a best fit line, represented by the solid line in the figure. r2=0.8675 and p<0.0001 are determined by linear regression.

(E) Left: bulk neural activity recording experimental design. GCaMP6f was injected into the anterior cingulate cortex (ACC) and neural activity was recorded on a fiber photometry setup (see Methods). Right: Brain histology from a representative mouse showing DAPI in blue, GCaMP6f in green and photometry cannula implantation in ACC (dotted white lines). Scale bar: 1mm.

(F) Top: Trial average plots of ACC activity (z-scored dF/F) and speed (cm/s) aligned to reward onset. Data are mean (solid line) ± s.e.m (shaded area). Bottom: Relative frequency plots of the time (s) for ACC dF/F or speed to rise above 1 std or 1 cm/s during rewards, respectively (N=105 trials across 12 mice). *p<0.05, paired t-test between time to rise (s) between ACC and speed. Data is the frequency of values across time.

(G) Same as F, but for trial initiations. (N=510 trials across 12 mice).

(H) Injection strategy for DREADDS-based chemogenetic inhibition of ACC during self-paced task. Coronal section from an animal virally injected with AAV1-Cam-Kii-hM4D(Gi) in ACC. DAPI is shown in blue and hM4D(Gi) in red. Scale bar: 1mm.

(I) Representative traces of speed and licks from one mouse during the task on a day with saline (top) or CNO (bottom) administration 45 minutes prior to a session, with shaded portions corresponding to when cues are presented.

(J) Left: Quantification of time (s) to initiate trial (left) across saline and CNO sessions in mCherry-control mice (N=188 trials across 6 mice) and hM4D(Gi)-DREADDs mice (N=215 trials across 4 mice). Right: same as left but for rewards received per minute in mCherry-control mice (N=60 minutes across 6 mice) and hM4D(Gi)-DRE-ADDs mice (N=40 minutes across 4 mice). p=0.8707 for mCherry and *p<0.05 for hM4Di (time to initiate), p=0.2073 for mCherry and *p<0.05 for hM4Di (rewards per min), unpaired t-test between saline and CNO sessions per group. Data are mean ± s.e.m.

Neural activity in ACC scales to match an increased motivational state during learning

(A) Top: Schematic of training where mice learn to associate stopping to one set of cues with no water reward (”N”) or with water reward (”R”). Bottom: Representative traces of speed and licks from one mouse during a session on Training Day 2 and Day 4, with shaded portions corresponding to when a reward cues (R, blue) or no-reward cues (N, orange) is presented. Red arrow denotes the suppression of licks on Day 2, and rise in speed during no-reward cues on Day 4.

(B) Trial averaged speed (cm/s; top), lick rate (Hz; middle) and ACC activity (dF/F z-scored; bottom) aligned to cue presentation across day 2 and 4 of training, separated by reward and no-reward cues (blue vs orange). Black arrow signifies rise in speed after no-reward cue presentation. N=12 mice. Data are mean (dark line) with s.e.m. (shaded area).

(C) Quantification of average cue speed (cm/s; top), lick rate (Hz; middle) and ACC activity (dF/F z-scored; bottom) across training, separated by reward and no-reward cues (blue vs orange). N=12 mice in each group, data are mean ± s.e.m. *p<0.05, paired t-test between reward and no-reward.

(D) Scatter plots of rewards per minute vs stop discrimination (top), lick discrimination (middle), or dF/F difference (bottom) for each mouse through-out training (N=120 data points, 12 mice per each of 10 days). Data are individual points with best fit line. r2 and p values are shown, as determined by linear regression.

(E) Top: Trial averaged speed (cm/s) and ACC activity (dF/F z-scored) aligned to cue presentation across 3 trials consisting of a reward, no-reward, and reward cue (RNR). Bottom: Trial averaged ACC activity (dF/F z-scored) aligned to cue presentation across 4 trials consisting of a reward, no-reward, no-reward and reward cue (RNNR). Right: Quantification of average cue dF/F activity across RNR and RNNR trial sequences. N=12 mice. *p<0.05, one-way repeated measured ANOVA with post-hoc Tukey’s multiple comparison test. Data are mean (dark line) with s.e.m. (shaded area) or data are mean ± s.e.m (right).

(F) Top: Injection strategy for stGtACR2-based optogenetic inhibition of ACC during training. Middle: Brain histology from a representative mouse showing DAPI in blue, stGtACR2 in red and photometry cannula implantation in ACC. Scale bar: 1mm. Bottom: optogenetic inhibition was targeted to days 1-6 of training and mice were allowed to continue training for days 7-10.

(G) Left: Trial averaged plots of speed (cm/s) aligned to cue entry on T6 for mCherry controls and GTACR inhibition mice, separated by reward or no reward cues. Right: Quantification of mean speed during cue presentations. N=8 mice for mCherry, 4 for GTACR early inhibition. *p<0.05, paired t-test.

Mice with extended motivational states during learning display neural activity ramps in OFC

(A) Injection strategy and fiber-based photometry setup to record bulk GCaMP6f of projections to ACC from OFCACC (orbitofrontal cortex), AMACC (anteromedial thalamus), BLAACC (basolateral amygdala), or LCACC (locus coeruleus). Representative traces for a single mouse showing traces for each region dF/F, speed, and licks. Shaded portions are shown corresponding to when a reward cues (R, blue) or no-reward cues (N, orange) are presented.

(B) Left: trial averaged bulk GCaMP6f dF/F of ACC, OFCACC, AMACC, BLAACC, and LCACC during a sequence of trials on T6 including reward, no-reward, and reward cues (RNR). Black arrows denote the rise in pre-cue activity from N cue to the following R cue in the RNR sequence. Right: quantification of pre-cue activity for the N cue and following R cue. Data are mean (solid line) ± s.e.m (shaded area). N=19, 12, 5, 4 mice, data are mean (solid line) ± s.e.m (shaded area), *p<0.05, paired t-test between N vs R cues.

(C) Left: trial averaged bulk GCaMP6f dF/F of OFCACC during a sequence of trials including reward, two no-reward, and reward cues (RNNR). Red arrows denote the rise in pre-cue activity from first N cue to the last R cue in the RNNR sequence. Right: quantification of pre-cue activity for the first N cue, second N cue and last R cue. Data are mean (solid line) ± s.e.m (shaded area). N=19 mice, data are mean (solid line) ± s.e.m (shaded area), *p<0.05, one-way repeated measures ANOVA with post-hoc Tukey’s multiple comparison test.

(D) Left: speed (cm/s) for “Learner” (black; reached a DI > .5 for 3 consecutive days) or “Non-Learner” (red) mice on training day 6 aligned to no-reward cue onset. Middle: discrimination index for each group of mice throughout training. Right: speed during reward and no-reward cues for “Learner” mice. N=7 (“Learner”) and 9 (“Non-Learner”) mice. Data are mean (solid line) ± s.e.m (shaded area), *p<0.05, unpaired t-test between Learner and Non-Learner DI (middle), paired t-test between reward and no-reward cues (right).

(E) Left: trial averaged bulk GCaMP6f dF/F of OFCACC during a sequence of trials including reward, two no-reward, and reward cues (RNNR). Black arrows denote the rise in pre-cue activity from first N cue to the last R cue in the RNNR sequence. Red arrows denote the absence of this ramp in Non-Learner mice. Right: Quantification of pre-cue activity for the first N cue, second N cue and last R cue. Data are mean (solid line) ± s.e.m (shaded area). N=7 (“Learners”) and 9 (“Non-Learner”) mice, data are mean (solid line) ± s.e.m (shaded area), *p<0.05, one-way repeated measures ANOVA with post-hoc Tukey’s multiple comparison test.

Orbitofrontal cortex projection neurons tile sequences of trials with no-rewards

(A) Injection strategy (top left), histology (top right; scale bar, 1mm) and z-projection images of two-photon recording (bottom left; mean over time; scale bars, 200 μ m) of GCaMP expressing OFC projection neurons with GRIN implants. Bottom right: sequence of trials with z-scored dF/F for individual neurons, with shaded portions corresponding to when a reward cues (R, blue) or no-reward cues (N, orange) are presented. Red arrow denotes a dF/F transient occurring after 2 consecutive N cues.

(B) Stop (black) or lick (grey; see methods) discrimination index on the first day stop DI reaches > 0.4 (”after”) and the two previous days (”before” and “middle”). N=5 mice.

(C) Representative neurons with tunings (std > 0.75 for 3 seconds prior to or after cue presentation) to separate cues in an NNR trial sequence. Trial averaged activity of a N (top), NN (middle), and NNR (bottom) neuron with heat map showing individual trial responses.

(D) Quantification of neurons tuned to separate cues within an NNR trial sequence and their activity to all other cues. N=17 (N), 18 (NN), 32 (NNR) cells out of 115 cells in total. *p<0.05, one-way repeated measures ANOVA with post-hoc Tukey’s multiple comparison test.

(E) Schematic of OFCACC bulk activity based on Figure 3 results and potential single neuron findings that tile a sequence of trials with two no-rewards followed by a reward cue presentation (NNR).

(F) Percentage of neurons tuned to different cues in an NNR trial sequence before (top) or after (bottom) training. N=5 mice. *p<0.05, one-way repeated measures ANOVA with post-hoc Tukey’s multiple comparison test.

(G) Ensemble average plots of neurons tuned to R cues after 2 consecutive N cue presentations (NNR cells) before learning (top) and their activity after learning (bottom). Black arrows denote the rise in activity prior to R cues after learning. N=18 NNR cells out of 81 cells tracked across days.

(H) Quantification of transient time (s) since R cue onset for neurons tracked across days. N= 132, 170 transient events before and after learning across 18 NNR cells and 105, 59 transient events before and after learning across 12 NR cells. *p<0.05, unpaired t-test.

(I) Left: Injection strategy for stGtACR2-based optogenetic inhibition of OFCACC during training. Optogenetic inhibition was targeted to training for 6 days. Right: Brain histology from a representative mouse showing DAPI in blue, stGtACR2 in red and photometry cannula implantation in ACC. Scale bar: 1mm.

(J) Left: mean animal speed (cm/s) aligned to cue zone entry after no-reward on T6 for mCherry control or GtACR mice. Black arrow signifies lack of speed increase during N cues. Right: quantification of mean change speed in cue zone after no-reward, assessed separately for each cue presentation. N=10 mice for mCherry and 13 mice for GtACR, *p<0.05, paired t-test.

(K) Schematic of reward-responsive OFC projection neurons becoming increasingly active during no reward cues that precede reward cues over days.

Task shaping and speed related differences between mice and during ACC inhibition

(A) Left: schematic of behavioral shaping. Mice were shaped to run (>1 cm/s) for increasing durations over the course of 4 days to obtain rewards. Trial averaged plots of speed along the days of shaping (N=3 mice). Red arrow denotes the increasing duration needed to run to trigger rewards. Right: Days where mice decrease their speed (<1cm/s; stop) during cues for rewards. Red arrows denote the increasing duration to stop to trigger rewards. Data are mean (solid line) ± s.e.m (shaded area).

(B) Scatter plots of speed (cm/s) and ACC dF/F during reward (top) or trial initiations (bottom) for individual mice. Individual data points shown, with a best fit line, represented by the solid line in the figure. r2 and p values, as determined by linear regression, are shown for each mouse that had a p<0.5.

(C) Left: Trial average plots of speed (cm/s) aligned to trial initiation for saline-administered day (black) or CNO (red; N=4 mice). Data are mean (solid line) ± s.e.m (shaded area). Quantification of speed during trial initiation for mCherry-control mice (N=187, 214 trials across 6 mice) and hM4D(Gi)-DREADDs mice (N=166, 120 trials across 4 mice). p=0.5692 for mCherry and *p=0.0217 for hM4Di, unpaired t-test between saline and CNO sessions per group. Data are individual points, with mean ± s.e.m.

Lick rate discrimination, ACC learning signal controls, and ACC inhibition lick rate learning

(A) Left: Percentage of cue presentations with stops, separated by reward and no-reward cues (blue vs orange) and quantification of stop discrimination index (see Methods) across training. Right: Quantification of lick discrimination index (see Methods) across training. N=12 mice, data are mean ± s.e.m. *p<0.05, paired t-test between reward and no-reward each day, or one-way repeated measures ANOVA with post-hoc Tukey’s multiple comparison test between each training day and preexposure.

(B) Top: Trial average plots of ACC activity (z-scored dF/F) and speed (cm/s) aligned to outcome onset, separated by reward or no-rewards. Data are mean (solid line) ± s.e.m (shaded area). Bottom: Quantification of the mean dF/F and speed during outcome. N=12 mice in each group, data are mean ± s.e.m. *p<0.05, paired t-test between reward and no-reward each day.

(C) Same as B, but for trial initiations after each outcome.

(D) Quantification of ACC dF/F difference between reward and no-reward cues (see Methods) across training. N=12 mice, data are mean ± s.e.m. *p<0.05, one-way repeated measures ANOVA with post-hoc Tukey’s multiple comparison test between each training day and preexposure.

(E) Left: mean dF/F in ACC (top) and speed (bottom) aligned to cue onset for cue presentations in which stops occurred (blue vs orange). N=12 mice. Data are mean (dark line) with s.e.m. (shaded area). Right: Quantification of mean change in dF/F and speed in cue zone, assessed separately for each cue presentation. N=12 mice in each group, data are mean ± s.e.m. *p<0.05, paired t-test between reward and no-reward each day.

(F) Same as E, but for passive presentation of the reward and no-reward cues.

(G) Top: optogenetic inhibition was targeted to days 1-6 of training and mice were allowed to continue training for days 7-10. Bottom left: Trial averaged plots of lick rate (hz) aligned to cue onset on T6 for mCherry controls and GTACR inhibition mice, separated by reward or no reward cues. Bottom right: Quantification of mean lick rate during cues. N=8 mice for mCherry, 4 for GTACR early inhibition. *p<0.05, paired t-test.

(H) Top: optogenetic inhibition was targeted to days 7-10 of training. Bottom: Trial averaged plots of speed (cm/s) aligned to cue entry on T10 for GTACR inhibition mice, separated by reward or no reward cues. Quantification of mean speed during cues. N=8 mice for mCherry, 4 for GTACR late inhibition. *p<0.05, paired t-test. N=4 mice for GTACR late inhibition. *p<0.05, paired t-test.

Motivation signals in bulk projection activity, and behavior of learners

(A) Top: Trial average plots of each projection activity (z-scored dF/F) and speed (cm/s) aligned to reward onset (left) or trial initiations (right). Data are mean (solid line) ± s.e.m (shaded area). Bottom: Quantification of the mean dF/F change during reward (left) or trial initiation (right; N=19, 12, 5, 4 mice). Data are mean ± SEM.

(B) Left: Trial average plots of projection activity (z-scored dF/F) aligned to outcome onset, separated by reward or no-rewards. Data are mean (solid line) ± s.e.m (shaded area). Right: Quantification of the mean dF/F during outcome. N=19, 12, 5, 4 mice, data are mean ± s.e.m.

(C) Same as C, but for trial initiations after each outcome. *p<0.05, paired t-test between reward and no-reward each day.

(D) Plots of stop discrimination across preexposure or training for each mouse in the “Learner” group (top) or “Non-Learner” group (bottom). Dashed line is shown for 0 and 0.5 stop discrimination.

(E) Lick discrimination between the “Learner” or “Non-Learner” mice. *p<0.05, one-way ANOVA between training days and preexposure, with post-doc Tukey’s multiple comparison test.

Neuron tunings to NNR task structure and inhibition of OFCACC neurons.

(A) Mean z-scored dF/F for every recorded cell, aligned to cue onset separated by current cue and previous cue presentation (N=115 cells across 5 mice). Red arrow denotes the increased response after no-reward cues when the previous cue was no-reward.

(B) Left: mean population activity (top; z-scored dF/F) and individual neuron trial averaged activity (bottom) for NNR neurons during R cues preceded by 2 N, 1 N, or 1R cue presentations. Right, quantification of NNR tuned neurons’ response to R cues preceded by 2 N, 1 N, or 1 R cue presentations (N=32 cells). *p<0.05, one-way repeated measures ANOVA with post-hoc Tukey’s multiple comparison test.

(C) Percentage of neurons tuned (std > 0.75 3 seconds before or after cue onset) to R or N cues in total (left) or based on what cues preceded it one trial before (middle) or across two trials before (right). N=115, 117, 114 cells for “before”, “middle”, and “after” days. Black arrow denotes the higher percentage of neurons that are tuned to R cues after 2 N cues (NNR cells) before learning.

(D) Neural sources for a single field of view. Sources captured throughout learning are highlighted in green.

(E) Left: mean animal speed (cm/s) aligned to cue zone entry after reward on T6 for mCherry control or GtACR mice. Black arrow signifies speed increase during N cues. Right: quantification of mean change speed in cue zone after reward, assessed separately for each cue presentation. N=10 mice for mCherry and 13 mice for GtACR, *p<0.05, paired t-test.