Abstract
Sequenced reactivations of hippocampal neurons called replays, concomitant with sharp-wave ripples in the local field potential, are critical for the consolidation of episodic memory, but whether replays depend on the brain’s reward or novelty signals is unknown. Here we combined chemogenetic silencing of dopamine neurons in ventral tegmental area (VTA) and simultaneous electrophysiological recordings in dorsal hippocampal CA1, in freely behaving rats experiencing changes to reward magnitude and environmental novelty. Surprisingly, VTA silencing did not prevent ripple increases where reward was increased, but caused dramatic, aberrant ripple increases where reward was unchanged. These increases were associated with increased reverse-ordered replays. On familiar tracks this effect disappeared, and ripples tracked reward prediction error, indicating that non-VTA reward signals were sufficient to direct replay. Our results reveal a novel dependence of hippocampal replay on dopamine, and a role for a VTA-independent reward prediction error signal that is reliable only in familiar environments.
Introduction
Spatial information is encoded in the firing of hippocampal place cells, which are thought to provide a cognitive map to support memory and navigation (O’Keefe and Dostrovsky, 1971; O’Keefe and Nadel, 1978). During pauses in locomotion, place cells participate in structured population bursts of activity, representing temporally-compressed trajectories through experienced locations, a phenomenon termed replay (Diba and Buzsáki, 2007; Foster and Wilson, 2006). Replay sequences can occur in the same order as experience, called forward replay, or in the reverse order of experience, called reverse replay (Diba and Buzsáki, 2007; Foster and Wilson, 2006). Replay appears after just one experience in a novel environment (Berners-Lee et al., 2022), and is preferentially generated towards goals in goal-directed tasks (Pfeiffer and Foster, 2013; Widloski and Foster, 2022). Experimentally interrupting or lengthening replay-associated sharp wave ripples (SWR) in the local field potential (LFP) disrupts and enhances learning of a spatial memory task, respectively (Fernández-Ruiz et al., 2019; Jadhav et al., 2012). Replay is thus thought to support memory consolidation and online planning (Buzsáki, 2015; Foster, 2017).
Intriguingly, reward drives increased rates of SWR (Singer and Frank, 2009), and only reverse replay, not forward, is increased at highly rewarding locations (Ambrose et al., 2016). Theoretical work suggests replay functions to update spatial representations of value to influence behavior and optimize reward receipt (Mattar and Daw, 2018). These findings hint that replay may be strongly modulated by reward-processing areas in the brain, such as the midbrain dopamine system (Fields et al., 2007). Dopamine neuron activity in the ventral tegmental area (VTA) is consistent with coding of reward prediction error (RPE), with increased activity at unexpected rewards and decreased activity with omission of expected rewards (Schultz et al., 1997). Subsequent work investigated dopamine release in spatial tasks, finding it ramps towards large rewards (Guru et al., 2020; Howe et al., 2013) in a manner consistent with encoding of RPE for a value function over space (Kim et al., 2020).
Besides the well-established role of midbrain dopamine neurons in reward processing, dopamine release in hippocampus has been implicated in stabilizing place fields (Kentros et al., 2004), gating the increase in plasticity in dorsal CA1 synapses by novel experiences (Li et al., 2003), and improving memory retention via increasing replay (McNamara et al., 2014). Furthermore, VTA activity is increased in novel environments (Guru et al., 2020; McNamara et al., 2014; Takeuchi et al., 2016), suggesting the hippocampus and VTA may coordinate to signal spatial novelty and induce learning in new environments (Lisman and Grace, 2005). However, recent work implicates locus coeruleus (LC) as the dominant source of dopaminergic input to dorsal CA1 and show its necessity and sufficiency for novelty-mediated episodic memory consolidation (Guru et al., 2020; McNamara et al., 2014; Takeuchi et al., 2016), leaving the role of VTA unclear.
We therefore tested whether VTA dopamine neurons are required for reward-related modulation of SWR and replay. We expressed an inhibitory DREADD (Armbruster et al., 2007) in VTA dopamine neurons and implanted a tetrode microdrive above hippocampus. We could then inhibit VTA dopamine signaling and simultaneously record neural activity in the dorsal CA1 region of hippocampus while rats collected rewards in familiar and novel environments. If VTA dopamine signaling is required for coordinating replay to valuable locations, we expected to see deficits in the capacity for reward to recruit SWR and replay. Additionally, if VTA dopamine is critical for inducing plasticity in CA1 in novel environments, novelty may significantly increase the effect of VTA inactivation on hippocampal replay.
Results
VTA inactivation in a simple spatial task with reward changes
We combined tetrode recordings in dorsal CA1 (dCA1) and chemogenetic silencing of VTA dopamine neurons to determine whether reward-related changes in hippocampal ripples and replay required VTA dopamine signaling. Transgenic rats expressing cre-recombinase under the tyrosine hydroxylase (TH) promoter were stereotactically injected with cre-dependent virus containing the inhibitory DREADD hM4Di (Experimental, n=4) or mCherry-only control (Control, n=3) into bilateral VTA (Figure 1A). We observed widespread expression across the extent of VTA and co-localization with TH (Figure 1B), enabling specific and reversible inactivation of VTA dopamine signaling. Recording microdrives containing 6-32 independently adjustable tetrodes (bilateral 32 tetrode, n=4; unilateral 20 tetrode, n=2; unilateral 6 tetrode, n=1) were implanted above dCA1 (Fig. 1A). Each tetrode was lowered into the pyramidal cell layer of dCA1 to collect single unit and LFP data.
Before each experimental session, rats were given intraperitoneal injection of CNO, to activate hM4Di receptors and suppress VTA dopamine neuron activity, or saline, then performed a simple task on linear tracks (1.5 to 2.5 m in length), collecting liquid chocolate rewards from each end (Fig. 1C). Each session began with equal 0.1 ml reward volume at each end (Epoch 1) for 10-20 laps (1 lap was comprised of reward collection at both ends; mean, 16 laps). This was followed by unsignaled quadrupling of reward at one end to 0.4 ml (Incr. end), while reward at the other end remained unchanged (Unch. end), for 10-20 laps (Epoch2; mean, 16.7 laps). Finally, reward was equalized again to 0.1 ml at both ends (Epoch 3) for up to 20 laps (mean, 11.6 laps).
Each animal performed this task on familiar linear tracks (>2 sessions on track; total session count for each condition: Experimental rats: 36 saline, 34 CNO; Control rats: 23 saline, 25 CNO) and novel linear tracks (1st or 2nd session on track; Experimental rats: 9 saline, 10 CNO; Control rats: 12 saline, 12 CNO). During stopping periods at either end of the track (velocity ≤ 8 cm/s, position ≤ 10 cm from end), SWR were identified as peaks in the ripple band (150-250 hz) in LFP (Fig. 1D-E; see Methods).
Gross behavior was largely unaffected by VTA suppression (e.g., all reward consumed on each lap), but CNO in experimental animals systematically affected stopping period duration. Visits to the Unch. end in Epoch 2 were significantly shorter than in Epoch 1, despite unchanged reward volume there, and while this reduction was present in CNO sessions, overall visit durations were increased (Fig. 1F-G). CNO did not affect control animals (Fig. 1F-G). Visits to the Incr. end were significantly longer in Epoch 2 than in Epoch 1 in all conditions, owing to the increased reward consumption time (Fig. 1H-I; Epoch 1 vs. Epoch 2, two sample t-test, all p<10-10). Changes in stopping period duration in Epoch 3 were similar across all conditions: Unch. visit duration increased from Epoch 2 to Epoch 3 (Fig. S1E), while Incr. visit duration decreased (Fig. S1F). Separate analysis of novel and familiar sessions revealed the pattern of shorter duration Unch. visits in Epoch 2 compared to Epoch 1 did not depend on novelty (Fig. S1A-D). However, the main effect of CNO in experimental rats of prolonging stopping periods occurred in novel sessions (Fig. S1A-B), not in familiar sessions (Fig. S1C-D). Rats consistently ran slightly faster towards the Incr. end than the Unch. end in Epoch 2, across all conditions (Fig. S2).
We interpret the reduction in visit duration as a behavioral signature of the Unch. end becoming relatively less valuable during Epoch 2, when the reward volume is larger at the Incr. end. Though visit durations were slightly longer in CNO sessions in experimental rats, particularly in novel track sessions, this behavioral effect of a relative value decrease remained, indicating VTA inactivation did not prevent rats from recognizing a devalued location.
Reward-related modulation of SWR rate is mediated by novelty and VTA
We analyzed the rate of SWR occurrence during the first 10 s of each stopping period, when rats were consuming reward. In individual sessions, SWR rate increased robustly in all conditions shortly after stopping at the reward wells and beginning reward consumption (Fig. 2A). Relative to Epoch 1, SWR rate during Epoch 2 tended to increase dramatically at Incr. end visits and decrease at Unch. end visits. During Epoch 3, SWR rate at the Incr. end dropped precipitously relative to Epoch 2, while often increasing at the Unch. end (Fig. S4A).
Surprisingly, during novel sessions, VTA inactivation often led to increased SWR rate at both reward ends (Fig. 2A, right). SWR rate still increased in Epoch 2 at the Incr. end even without normal VTA signaling, indicating reward sensitivity per se was not abolished, but suggesting the localization of this increased SWR rate to where reward increased was disrupted.
Pooling across sessions revealed this dramatic increase in SWR rate at the Unch. end was typical with CNO in novel sessions, and further suggested there was a reduction in the difference in SWR rate between the Incr. and Unch. ends in Epoch 2 in both familiar and novel experiences (Fig. 2B). We therefore used a Poisson generalized linear model (GLM) to quantify the changes in SWR rate across reward end, epoch, drug condition, and novelty (see Methods). In experimental rats, CNO and reward both influenced SWR rate, with significant effects for the CNO main effect (z=3.19, p=0.0014), the interaction between Incr. end and Epoch 2 (z=9.02, p<10-10), the three-way interaction between Incr. end, Epoch 2, and CNO (z=-2.06, p=0.0396), and marginally by the interaction between Incr. end and CNO (z=-1.92, p=0.055). Control animals showed no apparent effect of CNO (Fig. 2C). The same Poisson GLM fit to control rat data confirmed this, with significant coefficients only for Incr. end (z=-2.42, p=0.0156) and the interaction between Incr. end and Epoch 2 (z=7.64, p<10-10).
To assess the interaction between VTA inactivation and novelty, we fit the Poisson GLM separately to novel and familiar sessions, then used bootstrapping to generate distributions of SWR rates for each reward end and condition under the null hypothesis that CNO had no effect (see Methods). We found the actual difference in SWR rates between saline and CNO sessions in experimental animals during epoch 2 was significantly greater than chance (Fig. S3A) in novel sessions for both the Unch. end (one-tailed test, CNO>saline, p=0.0006) and the Incr. end (one-tailed test, saline>CNO, p=0.004), as well as at the Unch. end in familiar sessions (one-tailed test, CNO>saline, p=0.002). There was no significant difference between saline and CNO in control animals at either reward end in either familiar or novel sessions (Fig. S3B; one-tailed tests, all p>0.17).
A potential functional role for the reward-related changes in SWR rate is to strengthen downstream representations of particularly rewarding locations at the expense of less rewarding locations. We tested whether VTA suppression blunted the SWR rate difference between reward ends in Epoch 2. Across both familiar and novel environments, CNO reduced the difference in SWR rate between the Incr. end and Unch. end in experimental rats but not in control rats (Fig. 2D-E, left panels; three-way ANOVA with animal group, drug, and novelty: drug, F[1,153]=5.19, p=0.024, group by drug, F[1,153]=5.16, p=0.025, all other terms n.s.). To control for the possibility that VTA inactivation caused changes in locomotor or other non-consummatory behavior at reward wells that might affect SWR emission, we omitted the first and last 1 s of each stopping period to isolate the reward consumption period. The effect of CNO remained in experimental rats, reducing SWR rate discrimination between Incr. and Unch. ends (Fig. 2D-E, right panels; three-way ANOVA with animal group, drug, and novelty: drug, F[1,153]=3.41, p=0.067; group by drug, F[1,153]=5.58, p=0.019).
We next looked for within-epoch changes in SWR rate to determine whether VTA inactivation altered the dynamics of the response to reward changes. We calculated the difference in SWR rate at the Incr. and Unch. reward ends (each with its Epoch 1 mean subtracted) in a 5-lap sliding window. In all time windows and conditions except novel CNO sessions in experimental rats, the SWR rate at the Incr. end was significantly greater than at the Unch. end (Fig. 2F-G; Incr. – Unch. significantly greater than 0, one-sample t-test, p<0.05, uncorrected for multiple comparisons).
In novel sessions, VTA inactivation did not prevent an initially larger increase in SWR rate at the Incr. end than the Unch. end, but caused that difference to diminish over laps (Fig. 2F). By the middle of the epoch, there was no statistically-significant difference in reward modulation between the reward ends (one-sample t-test, p>0.05, for 5-lap windows centered on laps 8-13 and 15), consistent with an initial appropriately-localized reaction to reward change that eventually spread across the track. We found a similar deficit in Epoch 3, with SWR rate decreasing significantly more compared to Epoch 2 at the Incr. end than the Unch. end (Incr. – Unch. significantly below 0, one-sample t-test, p<0.05) for almost every task condition and timepoint except in novel CNO sessions in experimental rats (Fig. S4B-C). This suggests VTA inactivation may disrupt the normal magnitude or time course of the SWR response to negative value changes as well.
Overall, VTA inactivation spared the capacity for increased reward to modulate SWR rate but led to decreased differentiation of low and high value locations, particularly in novel environments where SWR rate increased spatially-indiscriminately.
SWR rate is correlated with RPE even with VTA inactivation
Taken together, the above results demonstrate VTA inactivation caused changes in the normal dynamics of the response of SWR rate to positive and negative changes in reward value (Fig. 2). However, because each session had only two timepoints when reward value changed by fixed amounts, our experiment was not optimized to probe the precise relationship between SWR rate and reward changes. Additionally, the effect of VTA inactivation was particularly prominent with novelty, when both SWR rate and its modulation by reward changes were greater, raising the possibility that large SWR rates and fluctuations, rather than novelty per se, depend on VTA dopamine signaling.
To address these questions, we designed a volatile reward schedule (Experiment 2) with frequent, large reward changes at one end of the linear track, and tested whether VTA inactivation impacted the capacity for SWR rate to track value (Fig. 3A, top). The “stable end” delivered 0.2 ml every lap, while the “volatile end” reward volume was drawn pseudorandomly from 0, 0.1, 0.2, 0.4, and 0.8 ml (mean 0.37 ml; blocks of 20 laps were comprised of 3 laps x 0 ml, 4 laps x 0.1 ml, 3 laps x 0.2 ml, 4 laps x 0.4 ml, and 6 laps x 0.8 ml). The position of the stable and volatile ends randomly varied across sessions.
This reward schedule also allowed us to test whether SWR rate was correlated with value, RPE, or neither. We expected SWR rate at the volatile end would be predominantly determined by the current reward volume there, but potentially also modulated by previous reward volumes (Fig. 3A, bottom). If SWR rate is correlated with value, then for a given current volume, larger reward volumes at the last visit will lead to higher SWR rates compared to when the last visit was a smaller reward volume. Conversely, if SWR rate is correlated with RPE, the opposite modulation by last reward volume will be observed: the larger the previous reward volume, the lower the current SWR rate.
A subset of rats performed sessions of the modified reward schedule (mean 53.4 laps per session; Total sessions per condition: Experimental rats 2 and 4: 6 saline, 7 CNO; Control rats 1-3: 16 saline, 18 CNO). Each rat completed 1-2 saline sessions before any CNO sessions and all sessions were on the same linear track, meaning almost all CNO sessions took place on a familiar track. As expected, SWR rate at the volatile end was predominantly determined by the current volume (Fig. 3B-C). There was little obvious difference between saline and CNO in either experimental (Fig. 3B) or control rats (Fig. 3C). SWR rate at the stable end was largely stable across laps, although there was a trend towards higher SWR rate if the most recent volatile end visit was lower volume, consistent with lap-by-lap changes in the relative value of the stable end (Fig. S5).
We next investigated how SWR rate at the volatile end varied as a function of both current and immediately previous volatile end volume in experimental rats (Fig. 3D-E, top panels). For each current volume, we subtracted the mean SWR rate across all previous volumes, and examined how previous volume affected the mean-subtracted SWR rates across all current volumes (Fig. 3D-E, middle panels).
The mean-subtracted SWR rate was modestly negatively correlated with the previous volume (Pearson correlation: saline, r=-0.076, p=0.177; CNO, r=-0.109, p=0.049). A GLM found the mean-subtracted SWR rate was significantly affected by previous volume (z=-2.3, p=0.02), but not drug (z=-0.24, p=0.81) or current volume (z=0.196, p=0.844). There was no similar behavioral effect: for each current volume, we subtracted the mean reward end visit duration, and found no correlation between the previous reward volume and mean-subtracted visit duration (Pearson correlation; saline, r=0.011, p=0.844; CNO, r=-0.075, p=0.175).
We next separated visits to the volatile end using a median split based on the recent volumes (mean of previous 3 visits). SWR rates were higher for a given current reward volume when the recent reward history was low (Fig. 3D-E, bottom panels). A Poisson GLM predicting SWR rate as a function of current volume, drug condition, and whether reward history was low or high (relative to the median for the session) revealed significant effects of current volume (z=13.86, p<10-10), as expected, and reward history split (z=-2.23, p=0.026), but not drug condition (z=-1.05, p=0.29).
Finally, we separated combinations of current and previous volume into those with negative RPE (current < previous) and positive RPE (current > previous) and found mean-subtracted SWR rate was significantly affected by RPE sign (Fig. 3F; two-way ANOVA with drug and RPE sign, RPE sign: F[1,518]=6.42, p=0.012), but not drug (F[1,518]=0.3, p=0.582; drug X RPE sign, F[1,518]=0.07, p=0.785).
Given the lack of an effect of drug in experimental rats, we pooled all sessions (both animal groups, both drug conditions) in Experiment 2 to maximize experimental power and found similar results as in just experimental rats (Fig. S6). On top of a large increase in SWR rate with current volume at the volatile end (Fig. S6A-B), SWR rate was also significantly negatively correlated with the previous volatile volume, both at the volatile end (Fig. S6C) and at the stable end (Fig. S6E). Accordingly, SWR rates were significantly lower for negative RPE than for positive/non-negative RPE (Fig. S6D-F). Finally, recent reward history at the volatile end significantly affected SWR rate (Fig. S6G), with higher SWR rate when recent rewards were lower.
Taken together, SWR rate was modulated by reward volume changes consistent with RPE-like coding. This modulation did not require normal VTA dopamine signaling, at least in familiar environments. The lack of effect of VTA inactivation, even with frequent, large swings in value and SWR rate, corroborates the results from Experiment 1 that novel experiences are particularly susceptible to disruption, indicating VTA dopamine release is critical when learning new reward locations.
Rate of reverse replay is increased with reward in novel environments only with intact VTA signaling
Previous work discovered the incidence rate of reverse replay, but not forward replay, was increased at locations with increased reward (Ambrose et al., 2016). We therefore analyzed single unit data collected in Experiment 1 (excluding experimental rat 2, who had a 6-tetrode recording drive) to determine whether this modulation of replay required VTA dopamine signaling. As previously observed (e.g., Ambrose et al., 2016), place cells in dCA1 had directional fields, such that the location a neuron was active while the rat moved in one direction on the track (e.g., “upward”) was often distinct from its activity when the rat moved in the other direction (e.g., “downward”). This directionality was apparent in both familiar and novel sessions, including in experimental rats with either saline or CNO (Fig. 4A). We found no effect of CNO on within-session field reliability, but significantly less reliability in novel compared to familiar sessions (Fig. S7A). Field similarity across running directions was slightly but significantly increased by both CNO and novelty (Fig. S7B).
Two place fields were defined for each neuron, one for each running direction, permitting Bayesian decoding methods to estimate both position and direction from neural activity (Fig. 4B-C). Sessions with accurate position and direction decoding during run, primarily due to sufficiently high cell yield, were included for replay analysis (Fig. S8; total sessions included: Experimental rats: novel saline, n=8; novel CNO, n=8; familiar saline, n=18; familiar CNO, n=23; Control rats: novel saline, n=12; novel CNO, n=11; familiar saline, n=16; familiar CNO, n=17).
Candidate replay events were periods of high population spiking activity while the rat was not running (z-scored spike count>3, minimum duration of 50 ms, rat velocity≤8 cm/s). We used a memory-less Bayesian decoder, with 40 ms decoding windows advancing by 5 ms steps, to estimate position and direction from neural activity.
Replays were defined as candidate events with spatial trajectories meeting a threshold for motion and minimum total posterior in one running direction map 13 (see Methods), with the running direction with greater posterior probability used to classify replay directionality, described below.
Forward replays were spatial trajectories moving across the track in the same direction as the rat when fields were calculated, e.g., moving “downward” with posterior probability in the “downward” place field map (Fig. 4B, left panel). Reverse replays were trajectories that moved in the opposite direction as the rat, e.g., moving “upward” with posterior probability in the “downward” place field map or vice versa (Fig. 4B, middle and right panels). We found forward and reverse replays occurred in all conditions, including in novel sessions with saline (Fig. 4B) or CNO (Fig. 4C). We therefore asked how novelty and drug condition influenced the effect of reward change on rates of reverse and forward replay.
Consistent with previous work (Ambrose et al., 2016), the rate of reverse replay was strongly modulated by the reward volumes on the track. Excluding novel CNO sessions for the moment, in all other conditions in experimental rats, when reward was larger at the Incr. end than the Unch. end (unequal reward), reverse replay was significantly increased at the Incr. end relative to when rewards were equal (novel saline: equal reward, 0.0019±0.0009 replay/s; unequal reward, 0.0121±0.004 replay/s; two-sample t-test, t(420)=3.235, p=0.0013; familiar saline: equal reward, 0.0033±0.0012 replay/s; unequal reward, 0.0128±0.0025 replay/s; two-sample t-test, t(907)=3.822, p=0.0001; familiar CNO: equal reward, 0.0033±0.001 replay/s; unequal reward, 0.0139±0.0022 replay/s; two-sample t-test, t(1153)=5.234, p<10-6). The rate of reverse replay was not significantly increased at the Unch. end with unequal rewards in any of these conditions (all p>0.05). This led to a bias for reverse replay to preferentially occur at the Incr. end when rewards were unequal (Fig. 4D-E). In control rats, reward changes caused similar changes to the balance of reverse replay (Fig. 4H-I), with a significantly larger swing in reverse replay bias in novel sessions (three-way ANOVA with drug, novelty, and replay directionality: novelty X directionality, F[1,101]=9.04, p=0.0034; all other terms, p>0.05). Reward changes caused no consistent effects in the rates of forward replay in either animal group (Fig. 4F-G, Fig. 4J-K).
Conversely, in novel CNO sessions in experimental rats, reverse replay rate failed to be biased towards the larger reward location (Fig. 4D). With unequal reward, the rate of reverse replay did increase at the Incr. end (equal reward, 0.0053±0.0016 replay/s; unequal reward, 0.0116±0.0029 replay/s; two-sample t-test, t(332)=2.043, p=0.0419), but also increased somewhat at the Unch. end (equal reward, 0.0173±0.004 replay/s; unequal reward, 0.0249±0.0068; two-sample t-test, t(319)=1.019, p=0.309), leading to no consistent change in the difference at the two reward ends. This effect of VTA inactivation on the bias of replay between the reward ends when reward contingencies changed was specific to novel sessions and reverse replay (three-way ANOVA with drug, novelty, and replay directionality: drug X novelty X directionality, F[1,106]=4.64, p=0.0335; all other terms, p>0.05). VTA dopamine signaling was therefore required to direct reward-related changes in reverse replay, specifically in novel environments.
Discussion
Here we demonstrated a critical role for VTA dopamine signaling in driving hippocampal SWR and reverse replay selectively to locations with increased reward. Surprisingly, we found this was true only in novel environments, with only modest effects of VTA inactivation on SWR rates and no discernible effect on replay rates in environments that had been explored several times before. We additionally recorded activity in a modified task that allowed us to differentiate SWR rate modulation by value and RPE. While SWR rate was modulated by RPE, VTA inactivation had little effect on this RPE-like modulation, suggesting that at least in familiar environments normal VTA dopamine signaling is dispensable for this reward-related hippocampal activity.
Why is VTA inactivation particularly disruptive during novel experiences? Dopamine neuron firing rates are elevated in novel environments (McNamara et al., 2014; Takeuchi et al., 2016). More specifically, early in experience dopamine neuron activity ramps while mice run towards both larger and smaller rewards and this ramping activity declines over experience until modest ramps persist only towards the larger reward (Guru et al., 2020). Activation of VTA projections to dorsal CA1 improves retention of spatial learning of a novel maze configuration, while also promoting replay-related reactivation (McNamara et al., 2014), while inactivation of VTA causes an increase in spatial working memory-related errors in novel, but not familiar, environments (Martig et al., 2009). The results presented here support the hypothesis that VTA is critically involved in learning in new environments, as its inactivation prevents the selective recruitment of replay-associated planning or memory consolidation mechanisms to high value locations.
VTA is not the sole source of dopamine release in hippocampus, with recent work demonstrating locus coeruleus (LC) axons likely provide the bulk of dopamine to dorsal CA1 and can be necessary for novelty-related spatial learning (Kempadoo et al., 2016; Takeuchi et al., 2016). LC axons in CA1 are active in locations immediately preceding a new reward location in a familiar environment but not in a novel one, despite similar behavior in both cases indicating mice had learned the reward locations (Kaufman et al., 2020). This result, coupled with findings that LC neurons are modulated by reward-predicting stimuli similarly to substantia nigra dopamine neurons (Bouret et al., 2012; Bouret and Richmond, 2015), suggests LC activity can convey reward-related information and thereby compensate for VTA inactivation, but only in familiar environments where it is not signaling more general novelty. Altogether, our work adds to the body of evidence that VTA can directly or indirectly mediate hippocampal plasticity and spatial learning and memory (Gasbarri et al., 1996; McNamara et al., 2014; Rosen et al., 2015; Rossato et al., 2009), and suggests an intriguing distinction between the function of VTA and LC dopamine release in hippocampus (Duszkiewicz et al., 2019).
These results also support the hypothesis that reverse replay is intimately involved in reward learning (Ambrose et al., 2016; Foster and Wilson, 2006; Mattar and Daw, 2018). By activating representations associated with the location of the current reward and then progressing sequentially to earlier positions that preceded reward, reverse replay may provide a neural eligibility trace by which spatial positions can be associated with their proximity to reward (Foster and Wilson, 2006; Sutton and Barto, 1998). Dopamine release at reward detection and consumption would then couple a temporal gradient of dopamine concentration with the temporally-extended, reverse sequential activation of states that led to that reward. Indeed, VTA dopamine neurons are activated when SWR and replay occur during a spatial working memory task, but not during subsequent sleep (Gomperts et al., 2015), indicating close coordination specifically during reward learning. CA1, VTA, and medial prefrontal cortex neurons are jointly coupled via oscillatory mechanisms during spatial working memory (Fujisawa and Buzsáki, 2011), suggesting downstream targets of both replay (Berners-Lee et al., 2021) and VTA dopamine neurons (Lammel et al., 2008) may receive temporally-precise conjunctive input from them. We expect future work aimed at untangling under what conditions VTA and replay influence each other and coordinate to provide downstream areas with sequential activity in the presence of dopamine to be particularly fruitful in understanding how reward drives spatial learning.
Acknowledgements
We thank Stanford Gene Vector and Virus Core and Karl Deisseroth for viral constructs and the Biological Imaging Facility at University of California, Berkeley for assistance with tissue imaging. This work was supported by NIH grant NS113557. Animal use conformed to NIH guidelines and was approved by the UC Berkeley Animal Care and Use Committee.
Declaration of Interests
The authors declare no competing interests.
Methods
Lead contact
Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, David Foster (davidfoster@berkeley.edu).
Materials availability
This study did not generate any unique reagents.
Data and code availability
All data and code are available from the Lead Contact upon request. Custom analysis code is publicly available at the DOI listed in the key resource table. Any additional information required to reanalyze the data reported in this work is available from the Lead Contact upon request.
Experimental model and subject details
All experimental procedures were performed in accordance with the University of California Berkeley Animal Care and Use Committee and US National Institutes of Health guidelines. A total of twelve adult male Sprague Dawley TH-cre knock-in rats (inotiv, HsdSage:SD-THem1(IRES-Cre)Sage, age 3-10 months, 300-750 g) were used in this experiment, of which seven contributed data to the present report. Two were excluded from further analysis due to lack of virus expression evident in post-mortem immunohistochemistry, two were excluded due to faulty recording hardware, and one was excluded due to non-performance of behavioral tasks. Animals were housed on a standard, non-inverted 12-h light cycle. Rats were pair-housed with littermates prior to the start of experiments, after which they were single-housed.
Behavioral pre-training
Adult male Sprague Dawley TH-cre knock-in rats were fed ad lib and handled daily prior to experimental training. They were then food restricted to 85-90% of baseline weight and trained to collect liquid chocolate reward (0.1 ml) from each end of a single linear track (200 cm length) for at least 15 sessions. 3-6 other linear tracks were present in the room during this pre-training, with positions constant for the duration of experiments with each animal.
Surgical procedures
Each rat underwent virus injection and drive implantation in one surgery (control rats 1 and 2) or two surgeries spaced 4-20 days apart (experimental rats 1-4, control rat 3).
Virus injection
All virus was obtained from the Stanford Gene Vector and Virus Core under material transfer agreement with the laboratory of Karl Deisseroth. Experimental rats were injected with AAV-DJ-EF1a-DIO-hM4D(Gi)-mCherry (GVVC-AAV-129) and control rats were injected with AAV-DJ-EF1a-DIO-mCherry (GVVC-AAV-14), with 1 µl of virus delivered stereotactically to VTA in each hemisphere (-5.6 mm posterior, ± 0.7 mm lateral, and -8 mm ventral, all from bregma and skull surface). Data collection began four weeks after virus injection to allow for expression.
Recording drive implantation
Each rat was implanted with a recording microdrive, targeting bilateral (32 tetrodes, ∼40 g, n = 4 rats) or unilateral (20 tetrodes, ∼35 g, n = 2 rats; 6 tetrodes, ∼20 g, n = 1 rat) dorsal CA1. Each tetrode bundle of four platinum iridium wires (Neuralynx) was independently adjustable and electroplated with gold to an impedance of 150-300 kΩ. Tetrodes were advanced over the course of 1-3 weeks to the pyramidal cell layer. Rats were reintroduced to the pre-training linear track after several days of post-surgical recovery.
Tissue processing and immunohistochemistry
Eight weeks after virus injection, rats were deeply anesthetized with isoflurane and transcardially perfused with phosphate-buffered saline (PBS) and then 4% paraformaldehyde (PFA) in PBS. Brains were stored in 4% PFA for >24 hours, then 30% sucrose dissolved in PBS for >7 days for cryoprotection. 20-40 µm sections were made in a cryostat and mounted on slides. For tyrosine hydroxylase (TH) staining, all steps were performed at room temperature in a dark container on a slow orbital shaker. Sections were rinsed three times for 10 minutes each in PBS, then incubated for 2 hours in blocker (3% normal donkey serum and 0.3% Triton-X in PBS). Sections were then kept for 16-20 hours in blocking buffer with primary antibody (1:200, rabbit α-TH, EMD Millipore 657012, or sheep α-TH, Abcam ab113). After three 10-minute washes in PBS, sections were incubated with secondary antibody in blocking buffer for 2 hours (1:200, Alexa Fluor 488-conjugated α-rabbit, Invitrogen ThermoFisher R37118, or Alexa Fluor 488-conjugated α-sheep, Abcam ab150177). Imaging was performed at the Biological Imaging Facility at the University of California Berkeley using a Zeiss AxioImager M2.
Task design
At least 10 minutes prior to beginning a recording session (except for experimental rat 1, with an average of 4 minutes before recording session), rats were injected intraperitoneally (IP) with saline or 1-4 mg/kg clozapine N-oxide (CNO) solution (2-4 mg/ml in diH2O with 50-100 µl dimethyl sulfoxide). 1-4 sessions were completed each day, with at least 1.5 hours between injections. To prevent the possibility of carry over effects of CNO, saline sessions never followed CNO sessions in the same day (except for 3 recording days in experimental rat 3, when CNO preceded saline sessions by > 4 hours).
In Experiment 1, animals progressed through three epochs. In Epoch 1, animals collected 0.1 ml rewards from each end for 10-20 laps. Then, unsignaled to the rat, the session entered Epoch 2, where reward at one end (Incr. end) was increased to 0.4 ml while the other (Unch. end) remained at 0.1 ml. The assignment of track ends to be Incr. and Unch. randomly varied session to session. After 10-20 laps in Epoch 2, the reward changed again unsignaled to the rat, with both reward ends again delivering 0.1 ml in Epoch 3. Rats completed up to 20 laps in Epoch 3, before being removed and placed back into a rest box. This same task was repeated on distinct linear tracks that varied based on position in the room, material of construction, color, length, orientation, and reward well size and position. Sessions were classified as either “novel” (the 1st or 2nd experience on a particular linear track) or “familiar” (3rd or later experience on a specific track). The track used for pre-training was used first for both saline and CNO sessions. Then, each novel track was used for 2-6 sessions, with all sessions consisting of only saline or CNO (excluding one track each in experimental rats 2 and 3 that had both saline and CNO sessions). The assignment of saline or CNO to each novel track was varied across rats.
In Experiment 2, reward at the stable end (0.2 ml) remained fixed throughout the session, while at the volatile end it varied pseudorandomly every lap between 0 and 0.8 ml (mean 0.37 ml; blocks of 20 laps were comprised of 3 laps x 0 ml, 4 laps x 0.1 ml, 3 laps x 0.2 ml, 4 laps x 0.4 ml, and 6 laps x 0.8 ml). Rats were allowed to continue running until sated. Which track end was assigned to be stable and volatile varied randomly session by session. Each rat performed saline and CNO sessions of this task on the same linear track. The linear track was initially novel (except for in experimental rat 3). However, 1-2 saline sessions preceded the first CNO session, rendering it familiar for almost all CNO sessions and most saline sessions. In each rat, all experiment 2 sessions were completed after all experiment 1 sessions.
Data acquisition
Rat position was monitored at 30 frames/s using overhead camera and LEDs on the recording drive, then tracked using automated software (Spike Gadgets). Two-dimensional position and velocity were smoothed using a 7-bin median average, followed by a 5 bin Gaussian filter. Linearized position was then used for further analysis. Neural data was collected using a 128-channel wireless HH128 headstage (Spike Gadgets). LFP was sampled at 30 kHz and spikes extracted based on threshold crossing of 40-60 µv (Trodes software, Spike Gadgets). Individual units were differentiated based on manual clustering of spike waveform peak amplitudes using custom software (xclust2, M. A. Wilson, MIT).
Behavioral analysis
Reward end visits were defined as periods when the rat was within 10 cm of the end of the track (approximately ∼3-5 cm from the reward well, depending on the track). When analyzing visit durations, we excluded a small number of outliers (< 15 total across all sessions) that were longer than 60 s.
LFP analysis
For each sesson, 2-5 tetrodes with visible sharp wave ripples (SWR) were selected for SWR analysis. LFP from one channel from each tetrode was band-pass filtered between 150 and 250 Hz, then the smoothed (Gaussian kernel, 12 ms s.d.), absolute value of the Hilbert transform was averaged across tetrodes. For detecting SWR, we examined periods when the rat position was within 10 cm of the reward wells with velocity ≤ 8 cm/s. SWR were classified as local peaks when the average ripple power exceeded 4 s.d. above the mean, with start and end points defined as the time ripple power reached the mean before and after the peak, with a minimum start to end duration of 150 ms and maximum of 1 s. SWR rate for each reward end visit was then calculated as the number of SWR detected divided by the total duration of rat velocity ≤ 8 cm/s during that end visit. During Experiment 1, we considered SWR rate during the first 10 s of each end visit to isolate the reward consumption-related period and exclude occasional longer task-disengaged resting periods. In Experiment 2, we included the first 20 s of each end visit to allow for the longer consumption time required for 0.8 ml.
In Experiment 2, we defined mean-subtracted SWR rate: Mean-subtracted rate (x,y) = rate (x,y) − rate (x)
Where x and y are current and previous volatile volumes, respectively.
Single unit analysis
Place fields were calculated for each neuron based on spiking activity while the rat velocity exceeded 8 cm/s. Position was binned into 2 cm bins and directional place fields were calculated as the histogram of spike counts in each position bin (smoothed with Gaussian kernel, 4 cm s.d.) normalized by the animal’s occupancy in each bin, separately for periods when the rat was moving in each direction on the linear track (e.g., left and right). We calculated several properties of place fields to determine whether novelty or VTA inactivation affected them. The map direction correlation was defined as the Pearson correlation between the place field calculated in each running direction, such that a value of 1 indicates perfectly reliable firing dependent only on position, not running direction. The lap to session correlation was defined for each neuron only for the running direction with higher max firing rate. For that running direction, a place field was calculated independently for each lap, smoothed (Gaussian kernel, 4 cm s.d.), correlated to the directional field calculated from the entire session, and the average correlation coefficient taken across all laps.
Replay analysis
Candidate replay events were determined based on population activity while the rat was not running (velocity ≤ 8 cm/s) and near the reward wells (10 cm away at most). Population spike density was binned into 1 ms bins and smoothed (Gaussian kernel, 20 ms s.d.) and candidate events defined as local peaks when the population rate exceeded the mean by 3 s.d. and that lasted at least 50 ms, with start and end defined as the nearest times the rate crossed the mean before and after the peak. A memory-less Bayesian decoding algorithm was used to classify both position and running direction during candidate events, as in previous work 13. Replay position in each running direction was estimated in time windows of 40 ms, beginning at the start of the event, and advancing in 5 ms steps. The start/end of putative trajectories within a candidate event were determined by removing bins at either end of the event that contained zero spikes or had a position difference > 50% of the track length from the next/previous window. Candidate events with a remaining length of at least 5 time bins, an absolute weighted correlation (Wu and Foster, 2014) exceeding 0.5, and at least 55% of the posterior probability in one of the running directions (Ambrose et al., 2016) were classified as replay. Replays were classified as forward or reverse by comparing the direction of replay movement across the track with the running direction map containing the majority of the posterior probability: if they matched (e.g., the replay moved upward and used upward fields), the replay was classified as forward, and otherwise it was classified as reverse.
Only sessions with sufficiently accurate behavioral position decoding accuracy during run were included. Bayesian decoding using the directional place fields was applied to 250 ms non-overlapping windows covering the entirety of each session. For all time bins with mean animal velocity >20 cm/s, position >20 cm from reward wells, and >5 total spikes from any neurons, actual and decoded position and running direction were compared, yielding a position decoding error (distance in cm) and direction decoding match (same or different). Sessions with mean decoding error >35 cm or direction match <60% were excluded.
Statistics
A mixed effects Poisson generalized linear model was used to test which experimental variables affected SWR rate using the Matlab fitglme function (Mathworks), similarly to previous work 13. Animal ID was modeled as a random effect, allowing baseline SWR rate to vary across rats. The full model was as follows:
SWR rate = exp[b0 + b1 × (Incr. reward end) + b2 × (Epoch 2) + b3 × (CNO) + b4 × (Incr. reward end) × (Epoch 2) + b5 ×(Incr. reward end) × (CNO) + b6 × (Epoch 2) × (CNO) + b7 × (Incr. reward end) × (Epoch 2) × (CNO)]
Where “Incr. reward end” is a dummy variable indicating the rat is at the Incr. reward end, “Epoch 2” is a dummy variable indicating the visit is occurring in Epoch 2, and “CNO” is a dummy variable indicating it is a session with CNO injected. The coefficient for each term corresponds to the log multiplicative change in SWR rate from the reference condition (animal-specific rate at Unch. end, not in Epoch 2, of a saline session). The offset term was log(duration) of each stopping period, so the model fit SWR rate, rather than SWR count. Experimental and control rats were fit separately with this model, as were familiar and novel sessions.
Bootstrapping was used to assess the effect of drug on SWR rate in each experimental condition. For each combination of novelty, reward end, and epoch, drug identity was shuffled 5,000 times, generating a distribution of the chance difference between the SWR rate in saline vs. CNO sessions. P-values were determined using one-tailed tests, under the hypothesis that CNO would cause lower SWR rates at the Incr. end and higher SWR rates at the Unch. end when compared to saline.
In ANOVA used to assess the effect of various experimental variables on behavioral and neural measurements, the following variables were consistently defined: “animal group” was a dummy variable indicating experimental rats, “drug” was a dummy variable indicating CNO, “novelty” was a dummy variable indicating novel session, “epoch” was a categorical variable indicating epoch number, “reward end” was a dummy variable indicating Incr. reward end, “RPE sign” was a dummy variable indicating a positive RPE, and “previous volume” was a categorical variable indicating the volatile volume at the previous visit.
Supplemental Figures
References
- Reverse Replay of Hippocampal Place Cells Is Uniquely Modulated by Changing RewardNeuron 91:1124–1136https://doi.org/10.1016/j.neuron.2016.07.047
- Evolving the lock to fit the key to create a family of G protein-coupled receptors potently activated by an inert ligandProceedings of the National Academy of Sciences 104:5163–5168https://doi.org/10.1073/pnas.0700293104
- Hippocampal replays appear after a single experience and incorporate greater detail with more experienceNeuron 110:1829–1842https://doi.org/10.1016/j.neuron.2022.03.010
- Prefrontal Cortical Neurons Are Selective for Non-Local Hippocampal Representations during Replay and BehaviorThe Journal of Neuroscience 41:5894–5908https://doi.org/10.1523/JNEUROSCI.1158-20.2021
- Complementary neural correlates of motivation in dopaminergic and noradrenergic neurons of monkeysFront Behav Neurosci 6https://doi.org/10.3389/fnbeh.2012.00040
- Sensitivity of Locus Ceruleus Neurons to Reward Value for Goal-Directed ActionsThe Journal of Neuroscience 35:4005–4014https://doi.org/10.1523/JNEUROSCI.4553-14.2015
- Hippocampal sharp wave-ripple: A cognitive biomarker for episodic memory and planningHippocampus 25:1073–1188https://doi.org/10.1002/hipo.22488
- Forward and reverse hippocampal place-cell sequences during ripplesNat Neurosci 10:1241–2https://doi.org/10.1038/nn1961
- Novelty and Dopaminergic Modulation of Memory Persistence: A Tale of Two SystemsTrends Neurosci https://doi.org/10.1016/j.tins.2018.10.002
- Long-duration hippocampal sharp wave ripples improve memoryScience (1979) 364:1082–1086https://doi.org/10.1126/science.aax0758
- Ventral tegmental area neurons in learned appetitive behavior and positive reinforcementAnnu Rev Neurosci 30:289–316https://doi.org/10.1146/annurev.neuro.30.051606.094341
- Replay Comes of AgeAnnu Rev Neurosci 40:581–602https://doi.org/10.1146/annurev-neuro-072116-031538
- Reverse replay of behavioural sequences in hippocampal place cells during the awake stateNature 440:680–3https://doi.org/10.1038/nature04587
- A 4 Hz oscillation adaptively synchronizes prefrontal, VTA, and hippocampal activitiesNeuron 72:153–65https://doi.org/10.1016/j.neuron.2011.08.018
- Spatial memory impairment induced by lesion of the mesohippocampal dopaminergic system in the ratNeuroscience 74:1037–1044https://doi.org/10.1016/S0306-4522(96)00202-3
- VTA neurons coordinate with the hippocampal reactivation of spatial experienceElife 4:1–22https://doi.org/10.7554/eLife.05360
- Ramping activity in midbrain dopamine neurons signifies the use of a cognitive mapBioRxiv https://doi.org/10.1101/2020.05.21.108886
- Prolonged dopamine signalling in striatum signals proximity and value of distant rewardsNature 500:575–579https://doi.org/10.1038/nature12475
- Awake hippocampal sharp-wave ripples support spatial memoryScience (1979) 336:1454–8https://doi.org/10.1126/science.1217230
- A Role for the Locus Coeruleus in Hippocampal CA1 Place Cell Reorganization during Spatial Reward LearningNeuron 105:1018–1026https://doi.org/10.1016/j.neuron.2019.12.029
- Dopamine release from the locus coeruleus to the dorsal hippocampus promotes spatial learning and memoryProc Natl Acad Sci U S A 113https://doi.org/10.1073/pnas.1616515114
- Increased attention to spatial context increases both place field stability and spatial memoryNeuron 42:283–295https://doi.org/10.1016/S0896-6273(04)00192-8
- A Unified Framework for Dopamine Signals across TimescalesCell 183:1600–1616https://doi.org/10.1016/j.cell.2020.11.013
- Unique Properties of Mesoprefrontal Neurons within a Dual Mesocorticolimbic Dopamine SystemNeuron 57:760–773https://doi.org/10.1016/j.neuron.2008.01.022
- Dopamine-dependent facilitation of LTP induction in hippocampal CA1 by exposure to spatial noveltyNat Neurosci 6:526–531https://doi.org/10.1038/nn1049
- The hippocampal-VTA loop: controlling the entry of information into long-term memoryNeuron 46:703–13https://doi.org/10.1016/j.neuron.2005.05.002
- Context dependent effects of ventral tegmental area inactivation on spatial working memoryBehavioural Brain Research 203:316–320https://doi.org/10.1016/j.bbr.2009.05.008
- Prioritized memory access explains planning and hippocampal replayNat Neurosci 21:1609–1617https://doi.org/10.1038/s41593-018-0232-z
- Dopaminergic neurons promote hippocampal reactivation and spatial memory persistenceNat Neurosci 17:1658–1660https://doi.org/10.1038/nn.3843
- The hippocampus as a spatial map. Preliminary evidence from unit activity in the freely-moving ratBrain Res 34:171–175https://doi.org/10.1016/0006-8993(71)90358-1
- The hippocampus as a cognitive mapClarendon Press
- Hippocampal place-cell sequences depict future paths to remembered goalsNature 497:1–8https://doi.org/10.1038/nature12112
- Midbrain dopamine neurons bidirectionally regulate CA3-CA1 synaptic driveNat Neurosci 18:1763–1771https://doi.org/10.1038/nn.4152
- Dopamine Controls Persistence of Long-Term Memory StorageScience (1979) 325:1017–1020https://doi.org/10.1126/science.1172545
- A neural substrate of prediction and rewardScience (1979) 275:1593–1599https://doi.org/10.1126/science.275.5306.1593
- Rewarded Outcomes Enhance Reactivation of Experience in the HippocampusNeuron 64:910–921https://doi.org/10.1016/j.neuron.2009.11.016
- Reinforcement learning: an introductionMIT Press.
- Locus coeruleus and dopaminergic consolidation of everyday memoryNature :1–18https://doi.org/10.1038/nature19325
- Flexible rerouting of hippocampal replay sequences around changing barriers in the absence of global place field remappingNeuron 110:1547–1558https://doi.org/10.1016/j.neuron.2022.02.002
- Hippocampal replay captures the unique topological structure of a novel environmentJournal of Neuroscience 34:6459–69https://doi.org/10.1523/JNEUROSCI.3414-13.2014
Article and author information
Author information
Version history
- Sent for peer review:
- Preprint posted:
- Reviewed Preprint version 1:
Copyright
© 2024, Matthew R Kleinman & David J Foster
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 309
- downloads
- 3
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.