1. Neuroscience
Download icon

Distinct recruitment of dorsomedial and dorsolateral striatum erodes with extended training

  1. Youna Vandaele  Is a corresponding author
  2. Nagaraj R Mahajan
  3. David J Ottenheimer
  4. Jocelyn M Richard
  5. Shreesh P Mysore
  6. Patricia H Janak  Is a corresponding author
  1. Johns Hopkins University, United States
  2. University of Minnesota, United States
Research Article
  • Cited 1
  • Views 1,259
  • Annotations
Cite this article as: eLife 2019;8:e49536 doi: 10.7554/eLife.49536

Abstract

Hypotheses of striatal orchestration of behavior ascribe distinct functions to striatal subregions, with the dorsolateral striatum (DLS) especially implicated in habitual and skilled performance. Thus neural activity patterns recorded from the DLS, but not the dorsomedial striatum (DMS), should be correlated with habitual and automatized performance. Here, we recorded DMS and DLS neural activity in rats during training in a task promoting habitual lever pressing. Despite improving performance across sessions, clear changes in corresponding neural activity patterns were not evident in DMS or DLS during early training. Although DMS and DLS activity patterns were distinct during early training, their activity was similar following extended training. Finally, performance after extended training was not associated with DMS disengagement, as would be predicted from prior work. These results suggest that behavioral sequences may continue to engage both striatal regions long after initial acquisition, when skilled performance is consolidated.

https://doi.org/10.7554/eLife.49536.001

Introduction

The dorsal striatum plays a pivotal role in learning and performing actions, including sequential actions, made to obtain rewarding outcomes, with distinct functions proposed for different striatal subregions (Balleine et al., 2009; Graybiel and Grafton, 2015). The dorsomedial striatum (DMS) receives excitatory inputs from prefrontal cortices and is implicated in goal-directed action control, whereas the dorsolateral striatum (DLS) primarily receives inputs from sensorimotor and premotor cortices and is implicated in habit and skill learning (Balleine et al., 2009; Corbit and Janak, 2016; Corbit et al., 2007; Corbit and Janak, 2010; Graybiel and Grafton, 2015; Yin and Knowlton, 2006; Yin et al., 2004Yin et al., 2005; Yin and Knowlton, 2006; Yin et al., 2006).

Previous work has examined neural activity within the DMS and DLS to determine how these regions might contribute to instrumental behavior and skill learning. When subjects are trained to execute a series of lever presses for reward, neuronal excitations emerge over learning in the dorsal striatum at the initiation and termination of the response sequence (Jin and Costa, 2010; Jin et al., 2014). These neural excitations have been termed start/stop responses, and, along with sustained inhibitions and excitations, are proposed to reflect chunking of individual actions into behavioral units (Jin and Costa, 2015; Jin and Costa, 2010; Jin et al., 2014; Martiros et al., 2018). These sequence-related neural responses are reported to be more numerous in the DLS (Martiros et al., 2018). Consistent with this, contrasting patterns of neural ensemble activity emerge in the DMS and DLS as rats learn to choose the correct arm of a T-maze in response to sensory cue presentations (Barnes et al., 2005; Thorn et al., 2010), with task-bracketing activity emerging in the DLS (Barnes et al., 2005; Jog et al., 1999; Smith and Graybiel, 2016; Thorn et al., 2010), but not in the DMS. The DLS task-bracketing activity during navigational sequences is similar to that observed during acquisition of lever-pressing sequences and likewise may reflect chunking (Redish, 2016; Smith and Graybiel, 2014; Smith and Graybiel, 2016). However, the presence of task-bracketing neuronal activity around actions made to obtain reward is not always detected (Sales-Carbonell et al., 2018). More broadly, it is not clear how behavior-related neural activity in DMS and DLS relates to behavioral improvement over time.

Here we sought to characterize possible sequence-related neural correlates in the striatum within a reward-seeking task specifically chosen for its ability to promote rapid expression of automated and habitual lever pressing behavior (Vandaele et al., 2017). In this task, rats must wait for lever insertion after which they complete a series of five lever presses to obtain access to reward. Lever retraction occurs after the fifth lever press and signals reward delivery. Lever insertion and retraction thus constitute audio-visual stimuli signaling the opportunity for reward and its delivery, respectively. We previously showed that responding for sucrose reward rapidly becomes habitual when the lever insertion and retraction cues are provided in this discrete-trials fixed-ratio 5 (DT5) task; in contrast, responding remains sensitive to reward devaluation, a test of goal-directed control, in absence of these cues under a free-running fixed-ratio five schedule (Vandaele et al., 2017). Using the DT5 task, we sought here (1) to compare cue and action encoding in DMS and DLS during learning of the task and after extended training, (2) to characterize the dynamics of behavioral sequence-related neural activity in DMS and DLS within and across training stages, and (3) to examine the link between activity patterns in DLS and DMS with performance. As subjects showed increasing indicators of skilled performance, we expected to see the development of greater behavior-related activity in DLS versus DMS during training, and stronger correlation of DLS activity with indices of improved performance (Jog et al., 1999; Regier et al., 2015; Thorn et al., 2010). We also purposefully included a group of rats with very extensive training (>2 months) in which we expected to find that DLS sequence-related activity had strengthened over time, perhaps as consolidation of skills progressed (Barnes et al., 2005; Smith and Graybiel, 2013), leading to further dissociation of DLS and DMS activity.

Contrary to our expectations, we found sequence-related activity in both DLS and DMS, during both early and extended training. During early training, the nature of sequence-related activity was markedly different between DLS and DMS; DLS neurons were predominantly excited during the behavioral sequence whereas DMS neurons were predominantly inhibited, with excitation at the sequence boundaries. Further, many sequence-related firing patterns appeared to reflect stimulus attributes rather than motor initiation, in contrast to some prior reports (Jin and Costa, 2010; Jin et al., 2014) and in agreement with other work (Sales-Carbonell et al., 2018). Additionally, no substantial evolution in neural activity patterns was observed across early training sessions despite habitual learning and significant improvement in performance. Interestingly, however, the patterns of activity in the DMS and DLS were similar to each other after extended training, with a balanced distribution of task-related inhibition and excitation in both regions. Finally, the spike activity of a substantial proportion of neurons in both regions was correlated with specific aspects of performance on a trial-by-trial basis, indicating that optimized performance after extended training was not associated with DMS disengagement from behavioral control. Together, these findings suggest that both regions of the striatum are differentially, yet complementarily, involved during early training, but act in concert when series of actions are performed with great regularity after extended training.

Results

Graded performance improvement during early training and optimization of behavior after extended training in the discrete trials Fixed Ratio-5 (DT5) procedure

To characterize neural activity in DMS and DLS during early and extended training, rats were trained in a discrete trials fixed ratio-5 (DT5) procedure. We have previously shown that rats trained in this procedure rapidly develop automated lever pressing behavior that is relatively insensitive to reward devaluation, a finding taken to indicate habitual control (Vandaele et al., 2017). In this task, each trial began with lever insertion after which rats were required to complete a sequence of 5 lever presses to obtain access to reward, signaled by lever retraction after the fifth lever press (Figure 1A). Rats were given the opportunity to respond on 30 trials separated by 1 min intertrial intervals. To measure neural activity during learning, one group of rats was implanted with fixed recording electrodes in the DMS and DLS before training in the DT5 task, and neural activity in these regions was recorded during acquisition (early training group, N = 9). To measure neural activity in well-trained subjects, rats in the second group were implanted in the DMS and DLS with drivable recording electrodes after more than 8 weeks of training in the DT5 procedure (extended training group, N = 8; Figure 1B). Throughout training, lever pressing remained near the maximum possible (150 responses/session) (Figure 1C), yet performance become more automated across sessions in the early training group, indicative of sequence learning (Figure 1D–G). Specifically, the number of within-sequence reward port entries was rapidly suppressed across sessions (Friedman ANOVA χ2 = 44.82, p<0.0001; Figure 1D), including in the first DT5 session, as subjects learned the response requirement had increased from one to five. Learning was also accompanied by a progressive increase in response rate (Figure 1E; F(9,72)=16.59, p<0.0001) and a concomitant decrease in trial-by-trial variability in within-sequence response rate (Figure 1F; F(9,72)=3.12, p<0.01). The latency to first lever press also decreased across early training sessions (Figure 1G; F(12,96)=10.87, p<0.0001). Following extended training in the second group of rats, within-sequence port entries did not differ from zero and response rate, trial-by-trial variability in response rate and the latency to first lever press reached a plateau (Figure 1C–G). For both early and extended training groups, we assessed sensitivity to outcome devaluation at the end of recording, using sensory-specific satiety (Figure 1H–I). While mean responding decreased following devaluation, this decrease was not significant after either early or extended training when analyzing the number of lever presses (Figure 1H; early training: F(1,8)=4.87, p=0.06; extended training: F(1,7)=3.75, p=0.09) or the number of trial initiated (Figure 1I; early training: F(1,8)=4.25, p=0.07; extended training: F(1,7)=3.5, p>0.1), suggesting that behavior was under habitual control, although there was variability within the groups.

Rats rapidly develop habit and automaticity in the DT5 task.

(A) Schematic of the DT5 task and event-related time intervals considered for analyses. (B) Histological reconstruction of electrode placements in DLS and DMS. (C–G) Number of lever presses (C), within-sequence port entries (D), response rate (responses per second) (E), coefficient of variation of within-sequence response rate (F), and first lever press latency (G) across early training and recording sessions (3 DT1 sessions followed by 10 DT5 sessions) and during the last three recording sessions in the extended training group. Inset in D: Number of within-sequence port entries across trials during the first DT5 session. (H–I) Satiety-induced devaluation: number of lever presses (H) and number of initiated trials (I) in the valued (val) and devalued (deval) conditions under extinction after early training (left) and extended training (right). Error bars show SEM.

https://doi.org/10.7554/eLife.49536.002

DMS and DLS neurons are differentially modulated in the DT5 procedure during early training but not after extended training

During acquisition of the task in the early training group, we recorded an average of 37 (±1.3) and 81 (±1.8) units per session in the DLS and DMS, respectively, across the entire group of rats. The number of recorded units remained stable across sessions (Figure 2—figure supplement 1A). In the extended training group, by advancing the electrodes every other day, a total of 387 and 462 neurons were recorded in DLS and DMS, respectively (Figure 2—figure supplement 1B). Electrode placements were similar in the early and extended training groups (Figure 2—figure supplement 1C). To focus on the primary population of striatal neurons, putative medium spiny neuron (MSN) and interneuron populations were distinguished using firing rates and waveform properties (Figure 2—figure supplement 1D–FMartiros et al., 2018; Schmitzer-Torbert and Redish, 2008; Stalnaker et al., 2016). Putative MSNs represented 95% and 88% of the recorded units in the early and extended training groups, respectively. Fast-spiking interneurons (FSI) were extracted based on their firing rate and half-width valley (Figure 2—figure supplement 1D–F), and represented on average 1.1% of units in the early training group (0–3 neurons per session) and 4.1% of units in the extended training group (N = 31). Neurons not classified as putative-FSI but showing features intermediate to MSN and FSI were unclassified (early training, N = 1–9 per session; extended training, N = 74). As reported previously, we could not reliably isolate putative tonically active neurons (TANs) based on waveform (Sales-Carbonell et al., 2018). Given the low number of TANs in dorsal striatum (about 1%; Oorschot, 2013; Schmitzer-Torbert and Redish, 2008), we assumed any possible misclassification of TANs as MSNs, though unlikely, would have minimal impact on the population characteristics presented in this study. In this paper we focus only on the putative MSN population.

On average, 78.2% (±1.4) of putative MSNs from the early training dataset showed a significant change in firing rate to one or more events, and were termed task-responsive neurons (TRN) (Figure 2A). Among them, a majority of DLS neurons increased their spike activity during the behavioral sequence (excitation, range N = 15–24 per session; inhibition, range N = 4–9 per session) whereas DMS neurons predominantly decreased their spike activity (excitation, range N = 13–21 per session; inhibition, range N = 34–61 per session). The relative proportions of excitations and inhibitions significantly differed between DLS and DMS for the ten sessions analyzed during early training (Figure 2A; DLS vs DMS, χ2-values > 7.2, p-values<0.01).

Figure 2 with 2 supplements see all
DMS and DLS neurons are differentially modulated in the DT5 task during early training but not extended training.

(A) Proportion of non-task-responsive neurons (non-TRN), and task-responsive neurons (TRN) excited (EXC) or inhibited (INH) during the sequence, across early training sessions (left) or after extended training (right) in DLS (up) and DMS (bottom). (B–C) Mean z-score (± SEM) of DLS (blue) and DMS (red) TRNs during (B) early training sessions day 1 (top) and 10 (bottom) (N = 9 animals) and (C) during extended training (N = 8 animals). (D–E) Heatmaps of DLS (left) and DMS (right) TRNs, sorted by mean sequence-related activity, during (D) early training session 1 (top) and 10 (bottom) and (E) extended training.

https://doi.org/10.7554/eLife.49536.003

In the extended training dataset, 76.5% of the analyzed neurons were task-responsive. In clear contrast with the early training group, the number of neurons excited and inhibited during the behavioral sequence was similar in DLS versus DMS (Figure 2A; DLS excitation, N = 143; DLS inhibition, N = 109; DMS excitation, N = 158; DMS inhibition, N = 162; DMS vs DLS: χ2 = 3.07, p>0.05).

To investigate in more detail the presence or absence of regional differences in behavior-related spiking activity of putative-MSNs, we examined the normalized firing across the trial, focusing on seven consecutive events of the behavioral sequence (Figure 1A; lever insertion, each of 5 lever presses, and first port-entry following reward delivery) in the early and extended training groups, combining all TRNs (Figure 2B-E). During early training, we observed strong differences between the activity of TRN in DLS and DMS along the behavioral sequence. On average, the DLS population showed a sustained increase in activity throughout the behavioral sequence whereas the mean activity for the DMS population was increased at the sequence boundaries and decreased during lever presses (Figure 2A–B). Repeated measures ANOVA revealed differences between regions and across events (Figure 2B and D; main effect of region, F(1,839)=135.1, p<0.0001; region * event interaction, F(11,9229) = 30.85, p<0.0001). When mean neural activity was examined after extended training, the differences in DMS and DLS population activity were not significant, with no effect of region, nor a region by event interaction (Figure 2C and E; region F(1,570)=2.53, p>0.1, region*event F(11,6270) = 1.48, p>0.1). Thus the mean activity of TRNs differed between DMS and DLS during early training, but not after extended training.

Surprisingly, there was no significant change in mean normalized activity in DMS and DLS across early training sessions despite significant performance improvements during the task (Figure 1D–GFigure 2B and D, top versus bottom panels; main effect of session, F(9,839)=0.49, p>0.1; session*event interaction, F(99,9229) = 0.72, p>0.1). We therefore looked earlier in training to the period of transition between the DT1 sessions (one press required) and the DT5 sessions (five presses required). Here we observed an increase in DLS neural activity in the 500 ms before the first lever press across DT1 sessions (Figure 2—figure supplement 2; Wilcoxon test, 1st vs 3rd DT1: Z = −2.29, p<0.05; 1st DT1 vs 1st DT5: Z = −2.16, p<0.05). Within the first DT1 session, DLS activity on average increased at lever insertion and then returned to baseline before the first lever press; the activity was sustained between lever insertion and first lever press during subsequent sessions (Figure 2—figure supplement 2A–C). This early change may reflect shortening of the first lever press latency occurring across the first three DT1 sessions (Figure 2—figure supplement 2D–G, Figure 1G), as rats learned about the task. However, we still expected neuronal changes correlated with the improvement in performance, once the response requirement extended to five lever presses. Yet, the average activity in dorsal striatum did not change in step with behavior as subjects’ performance improved during the first 10 sessions of DT5 training. Furthermore, when comparing with recordings made after extended training, we observed more similarity between regions, rather than increasing difference in DLS and DMS activity over time with the optimization of behavior across repeated practice.

We next considered the possibility that averaging the activity of DMS and DLS neurons may have masked subtle regional differences in the activity patterns of individual neurons that may track behavioral improvement. Indeed, as previously reported (Jin et al., 2014), we observed substantial variability in individual patterns of neuronal activity along the behavioral sequence (Figure 2D–E, Figure 2—figure supplement 1G). Investigating the distribution of distinct neural signatures among individual neurons in DLS versus DMS during early and extended training could, therefore, provide a more precise characterization of neural activity from which to examine possible differences in DMS and DLS activity across training stages.

Classification of distinct neural signatures in the dorsal striatum during extended sequence training

To identify distinct neural signatures associated with performance of the DT5 lever pressing task we examined activity during a time period when behavior was very stable, after extended training. We applied an objective statistical approach of hierarchical clustering and a model selection metric to the extended training dataset containing putative MSNs from both DMS and DLS. First, we observed that many neurons showed transient changes in activity (<0.25 s; termed ‘Phasic’) while many others expressed excitations or inhibitions that were sustained in time (>0.5 s; ‘Non-phasic’). We considered that these phasic and sustained activity neurons might have distinct relations to performance of the behavioral sequence; we were especially interested to examine neurons that appeared to have sustained activity given prior reports on these profiles and their proposed relation to action sequences (Jin and Costa, 2015; Jin and Costa, 2010; Jin et al., 2014). Therefore we sought to separate Phasic and Non-phasic neurons, using a Fourier analysis on the extended training dataset. We reasoned that sustained (Non-phasic) modulation of activity during the behavioral sequence should be associated with higher power in the low frequency region (<1 Hz), whereas transient (Phasic) modulations of sequence-related activity should result in higher power in the intermediate frequency region (1–4 Hz) (Materials and methods; Figure 3—figure supplement 1). Therefore, power in low (<1 Hz) and intermediate (1–4 Hz) frequency domains for each neuron were used as features for hierarchical clustering. Following hierarchical clustering on the combined DMS and DLS dataset, the optimal number of classes was determined using a model selection metric (the Calinski Harabasz criterion; CH index). This method resulted in two significant classes of neurons (Figure 3—figure supplement 1; permutation test, p<0.0001). The majority of neurons were characterized by transient peaks of activity at some time in the behavioral sequence, and were classified as Phasic neurons (N = 635; Figure 3A, Figure 3—figure supplement 1). Neurons from the other class mostly expressed sustained activity throughout the lever press sequence and were classified as Non-phasic neurons (N = 113; Figure 3—figure supplement 1). The proportion of Phasic versus Non-phasic neurons did not significantly differ between brain regions (Figure 3—figure supplement 1C; χ2 = 0.39, p>0.1).

Figure 3 with 1 supplement see all
Hierarchical clustering on Phasic neurons during extended training.

(A) Heatmap of normalized activity of neurons identified as being Phasic (290/338 neurons in DLS and 345/410 in DMS) sorted by time of peak activity. (B) Hierarchical clustering on Phasic neurons (combined DMS and DLS): dendrogram (left) and proportion of DLS and DMS neurons in the Start, Stop and Middle classes (right). (C–E) Heatmaps (top) and average z-score (± SEM) (bottom) of Phasic Start neurons (C; 46/290 and 64/345), Phasic Stop neurons (D; 72/290 and 91/345) and Phasic Middle neurons (E; 172/290 and 190/345) separated by region; neurons are sorted by averaged normalized activity during the behavioral sequence.

https://doi.org/10.7554/eLife.49536.006

Among Phasic neurons, we observed transient excitation at the start, in the middle, or at the end of the sequence (Figure 3A), and sought to examine if there were systematic regional differences in this activity. Hierarchical clustering applied to the combined set of DMS and DLS Phasic neurons resulted in the separation of 3 significant classes of neurons (Figure 3B): neurons expressing transient excitations after the lever insertion (‘Start’ neurons, DLS N = 46, DMS N = 64; Figure 3C), before the port entry (‘Stop’ neurons, DLS N = 72, DMS N = 91; Figure 3D) or modulations at one or more points during the lever presses (‘Middle’ neurons, DLS N = 172, DMS N = 190; Figure 3E). The proportions of neurons in each Phasic class and their average z-scores did not significantly differ between DLS and DMS (χ2 = 1.3, p > 0.1; region: F-values < 3.5, p-values > 0.05; region*events: F-values < 2.1, p-values > 0.05).

The classification analysis reveals that many Phasic neurons show activity near the start or end of the behavioral sequence, in agreement with prior descriptions of ‘Start-Stop’ related activity. Yet, it is unclear whether the ‘Start’ activity identified here (Figure 3C) represents a response to the reward-predictive cue of lever insertion or a ‘Start’ signal for the initiation of the sequence. To address this question, we examined the activity of Phasic Start neurons aligned to both lever insertion and the time of first lever press, and found on average that this population showed a peak in activity within 250msec after lever insertion, without a concomitant peak within the 250msec prior to the first lever press (Figure 4A). Statistical analysis of spiking activity revealed that only 15% of Phasic Start neurons (16/110 units) increase their spiking activity prior to the first lever press relative to baseline (i.e., at the presumed time of sequence initiation) (Figure 4B). These results suggest that, in the main, the Phasic Start neurons responding to the lever insertion cue are not explicitly signaling motor sequence initiation.

Phasic Start neurons do not encode the initiation of the sequence.

(A) Mean z-score (± SEM) around lever insertion (left, LI) and first lever press (right, LP1) in DMS and DLS Phasic Start neurons. (B) Proportion of Phasic Start neurons increasing spiking activity before the first lever press (pre-LP1 exc) compared to baseline. (C) Mean z-score (± SEM) aligned to the last lever press (left, LLP) and port entry (right, PE) in DMS and DLS Phasic Stop neurons. (D) Mean z-score (± SEM) aligned to port entry in DLS (left) and DMS (right) Phasic Stop neurons at the termination of rewarded trials (TRIAL) or during inter-trial intervals (ITI).

https://doi.org/10.7554/eLife.49536.009

Similarly, it is not clear whether Phasic Stop neurons signal the termination of the motor sequence, a response to the lever retraction sensory stimulus, expectation of reward, or a combination of these. Phasic Stop neurons increased activity after the last lever press with an even greater peak before the port entry (Figure 4C), that is, before consumption of the reward. This port entry peak was not observed during port entries occurring during inter-trial intervals (Figure 4D), suggesting that this activity may encode termination of the sequence and/or reward expectation during the port approach.

Among Non-phasic neurons, we observed either sustained excitation or sustained inhibition during lever presses (Figure 5A). To quantify this, Non-phasic neurons were further subdivided using hierarchical clustering. This analysis resulted in the separation of 2 significant classes of neurons (Figure 5B–D): neurons expressing sustained inhibition (‘INH’ neurons, DLS N = 17/48, 35%; DMS N = 36/65, 55%; Figure 5C) or sustained excitation during the five lever presses (‘EXC’ neurons, DLS N = 31/48, 64%; DMS N = 29/65, 45%; Figure 5D). The proportion of neurons across the Non-phasic classes did significantly differ between DLS and DMS (χ2 = 4.4, p<0.05) with a higher proportion of EXC neurons in DLS compared to DMS (Figure 5B). In each class, the average normalized activity of DMS and DLS neurons did not differ across events in the behavioral sequence (region*event, INH F(11,561) = 0.41, p>0.1, EXC F(11,638) = 1.56, p>0.1).

Hierarchical clustering on Non-phasic neurons during extended training.

(A) Heatmap of normalized activity of neurons identified as being Non-phasic (48/338 neurons in DLS and 65/410 in DMS) sorted by averaged activity during the behavioral sequence. (B) Hierarchical clustering on Non-phasic neurons: dendrogram (left) and proportion of DLS and DMS neurons in the INH and EXC classes (right). (C–D) Heatmaps (top) and average z-score (± SEM) (bottom) of Non-phasic INH neurons (C; 17/48 and 36/65) and Non-phasic EXC neurons (D; 31/48 and 29/65) separated by regions; neurons are sorted by averaged normalized activity during the behavioral sequence.

https://doi.org/10.7554/eLife.49536.008

Overall, five similar firing patterns were identified after extended training that were each present in both DMS and DLS with a modest overrepresentation of sustained excitations (EXC) in DLS. Although these 5 classes of neurons were separated based on the most salient features of each pattern, examination of the heatmaps illustrates overlap in activity patterns with some neurons falling under multiples categories. For instance some neurons combined start and stop activity (boundary neurons, Figure 3C), whereas others combine start inhibition and stop excitation (Figure 3D) or sustained inhibition and stop excitation (Figure 5D). Some neurons with detectable start and stop activity, but relatively weaker sustained activity, tended to be classified as Phasic Start or Phasic Stop, suggesting our approach for isolating sustained responses was relatively conservative.

Large differences in relative proportions of Non-phasic firing patterns in DMS and DLS during early training

To determine the presence and relative proportions of the Phasic and Non-phasic classes in DMS and DLS during early training, we first separated Phasic and Non-phasic MSNs using the Fourier analysis, as described above (Figure 6—figure supplement 1). The proportion of Phasic and Non-phasic neurons in each session did not differ in DLS and DMS (χ2 <2.9, p-values>0.05; Figure 6—figure supplement 1). We then sought to classify the Phasic and Non-phasic neurons in the early training data set into different subtypes based on their activity, testing if the neuronal subtypes identified in the extended training dataset were represented in the early training dataset, and to what extent. To this end, we first trained a random forest classifier, a supervised learner, on the activity patterns of neurons from the extended dataset, along with their class labels obtained from the unsupervised hierarchical clustering. Then, we applied this classifier to the early training dataset to investigate the response type and their relative proportions in DMS and DLS across early training sessions.

The proportion of neurons in Phasic classes was similar between DLS and DMS for most sessions (χ2 <4.4, p-values>0.1), except the 4th session, where there was a greater fraction of Stop neurons and fewer Middle neurons in DLS (Figure 6A). On the other hand, neurons in Non-phasic classes were differentially distributed in DLS versus DMS in every session (χ2-values > 5.5, p-values<0.05; Figure 6E). More specifically, the majority (on average, 83%) of DLS Non-phasic neurons were classified as EXC whereas only an average of 21% of DMS non-phasic neurons were. DMS Non-phasic neurons were much more represented in the INH class (Figure 6E–G). Overall, these results show that DMS and DLS neurons express similar types of behavioral correlates during early training, but with large differences in their relative proportions. This over-representation of sustained excitation in the DLS, and of sustained inhibition in the DMS likely explains the regional differences observed in the analysis of the average activity during the first and last early training sessions (Figure 2).

Figure 6 with 1 supplement see all
Differential neuronal responses in DMS and DLS during early training.

(A) Proportion of Phasic Start, Stop and Middle neurons across training sessions. (B–D) Heatmaps of ensemble activity of Start (B), Stop (C), and Middle (D) neurons across training sessions in DLS (top) and DMS (middle) and average z-score (± SEM) across the last three sessions in DLS and DMS neurons (bottom). (E) Proportion of INH and EXC neurons across training sessions in DLS (top) and DMS (bottom). (F–G) Heatmaps of ensemble activity of INH (F) and EXC (G) neurons across training sessions in DLS (top) and DMS (middle) and average z-score (± SEM) across the last three sessions in DLS and DMS neurons (bottom). The absence of INH neurons in DLS on session 7 (F) is illustrated with a gray bar throughout the sequence.

https://doi.org/10.7554/eLife.49536.010

Of note, while behavioral performance was improving over time, we were surprised to observe no obvious shifts in neural activity patterns from session 1 to session 10 in any individual class of Phasic or Non-phasic neurons. We found no significant change in class proportions across sessions (DLS, χ2 = 39, p>0.1; DMS, χ2 = 41, p>0.1). Analayzing the normalized activity of all units in a given class, we also obtained no evidence for changes across early training sessions (session F-values <1.4, p-values>0.1; session*event F-values <1.2, p-values>0.1; Figure 6). These ANOVAs however did detect regional differences in the magnitude of the mean normalized activity within the Start, Stop and Middle populations of neurons when comparing DMS and DLS (Start neurons: region F(1,191)=91.3, p<0.0001; region*event: F(11,2101) = 20.08, p<0.0001; Stop neurons: region F(1,112)=13.65, p<0.001; region*event F(11,1232) = 5.62, p<0.01; Middle neurons: region F(1,433)=63.89, p<0.0001; region*event F(11,4763) = 9.71, p<0.0001; Figure 6B–D), reflecting higher average firing in DLS during lever presses, and higher activity after lever insertion and retraction in the DMS.

This classification analysis looking at individual neuronal responses confirms the findings based on average ensemble activity in showing that DMS and DLS activity differs during early training and is much more similar after extended training. This increase in similarity appears to mainly result from differences in sustained activity responses. After extended training, there is a higher proportion of INH response patterns in DLS and a higher proportion of EXC response patterns in DMS relative to early training leading to similar average responding.

Neuronal activity during performance of the DT5 sequence task is sufficient for decoding striatal subregion identity during early training but not after extended training

An alternative approach to test the relative difference or similarity in DMS and DLS neural activity in early and extended training is to use that activity to decode the brain region identity of all recorded putative MSNs (Figure 7 and Figure 7—figure supplement 1). Using ten-fold cross-validation, we determined how linear discriminant analysis (LDA) models trained on neural ensembles of normalized peri-event spike activity across the sequence from lever insertion to port entry (Materials and methods) could classify individual neurons as belonging to the DLS or DMS.

Figure 7 with 1 supplement see all
Accurate decoding of the dorsostriatal brain region identity in early training but not after extended training.

(A) Mean decoding accuracy (± standard deviation) of the dorsostriatal subregion identity, DLS or DMS, of individual neurons based on their sequence-related activity (True - red) is compared with the decoding accuracy expected from chance (Shuffled - dark blue) across early training sessions (left) or during extended training, as a function of ensemble size (right). (B) Decoding based on the first three principal components following principal component analysis. Average accuracy (± standard deviation) is presented as a function of early training sessions (left) or ensemble size during extended training (right).

https://doi.org/10.7554/eLife.49536.012

LDA models accurately revealed brain region identity during early training. More specifically, mean accuracy for assignment to DMS or DLS was significantly above chance for all early training sessions (p-values<0.0001, permutation test; Figure 7A). In contrast, for extended training, decoding accuracy did not differ from chance, even for the largest ensemble size (p-values>0.05, permutation test; Figure 7A). These results demonstrate that DMS and DLS neurons cannot be differentiated based on their sequence-related activity after extended training. These finding were confirmed using a lower-dimensional approach wherein a principal component analysis (PCA) was conducted on the concatenated z-scored neural activity around each event of the behavioral sequence (Materials and methods) and LDA models were trained using the first three principal components obtained from each early training session and from extended training. This analysis produced similar results, with decoding accuracy significantly above chance in all early training sessions (all p-values<0.0001, permutation test) but not for extended training, even for larger sets of units (all p-values>0.05, permutation test; Figure 7B).

These findings are consistent with the differences in proportions of neuron subtypes across the two regions in early training; this difference in proportions permits a reliable inference of which region a neuron comes from when using an unbiased decoding strategy. Altogether, these findings demonstrate through three approaches that DMS and DLS activity was relatively dissimilar during early training but similar after extended training.

Individual differences in sensitivity to outcome devaluation do not substantially correlate with dorsostriatal activity

Although, on average and as a group, lever pressing performance was insensitive to reward devaluation after early and extended training, we observed inter-individual variability in the sensitivity to outcome devaluation (Figure 1H-I). Thus, we sought to determine whether DMS and DLS activity differs across early training and extended training in subjects relatively insensitive or sensitive to devaluation, considering this as a possible read-out of the degree of habitual versus goal-directed control (Figure 8). Rats with a devaluation ratio above 0.45 were considered as insensitive to outcome devaluation (Figure 8A–B). Examining the average normalized activity of DMS and DLS neurons along the behavioral sequence across early training sessions (Figure 8A), we observed significant region by group and region by group by event interactions during early training (region*group: F(1,819)=22.36, p<0.0001; region*group*event: F(11,9009) = 4.3, p<0.01), but no main effect of group (F(1,819)=0.69,p>0.4) nor event by group interaction (F(11,9009) = 1.24,p>0.2). When separated into individual classes of neurons (Figure 8C; Figure 8D), the neuron numbers are too small for meaningful analysis of relative proportions or magnitudes of responses; however the percentages within each class look similar (Figure 8—figure supplement 1A), while observation of the heatmaps suggest relatively greater activity in the DLS of sensitive subjects. It is unclear from these results whether specific activity patterns emerge across early training sessions in rats whose responding transfers toward habitual control.

Figure 8 with 2 supplements see all
Similar dorsostriatal sequence-related activity in groups of rats differing in their sensitivity to satiety-induced devaluation.

(A) Number of lever presses in the valued (val) and devalued (deval) conditions in rats sensitive (devaluation ratio <0.45) (top) or insensitive (devaluation ratio >0.45) (bottom) to satiety induced devaluation after early training (left), and average z-score (± SEM) in DLS and DMS neurons during the first (middle) and last (right) early training sessions in these 2 subgroups of rats (top: sensitive group; bottom: insensitive group). (B) Number of lever presses in the valued (val) and devalued (deval) conditions in rats sensitive (devaluation ratio <0.45) (top) or insensitive (devaluation ratio >0.45) (bottom) to satiety induced devaluation after extended training (left), and average z-score (± SEM) in DLS and DMS neurons (right) in these 2 subgroups of rats (top: sensitive group; bottom: insensitive group). (C–D) Heatmaps of ensemble sequence-related activity in DLS (C) and DMS (D) across early training sessions, in each class of neuronal responses (from left to right) and as a function of sensitivity to outcome devaluation (top: sensitive group; bottom: insensitive group). (E–F) Heatmaps of sequence-related activity of DLS (E) and DMS neurons (F) during extended training, in each class of neuronal responses (from left to right) and as a function of sensitivity to outcome devaluation (top: sensitive group; bottom: insensitive group). The absence of neurons in some classes on a given early training session is illustrated with a gray bar throughout the sequence.

https://doi.org/10.7554/eLife.49536.014

When looking at the normalized activity in DMS and DLS in rats more or less sensitive to outcome devaluation during extended training, we found no significant differences (Figure 8B; main effect of group: F(1,568)=2.84, p>0.05; group*region interaction: F(1,568)=1.32, p>0.1; group*event interaction: F(11,6248) = 2.17, p>0.05; group*event*region interaction: F(11,6248) = 1.67, p>0.1). The activity and distribution of DLS and DMS neurons was overall similar across classes of neurons in the two groups of subjects (Figure 8E-F; Figure 8—figure supplement 1B; DLS: χ2 = 8.81, p=0.066; DMS: χ2 = 6.78, p>0.1). Overall, there are no obvious correlations between degree of sensitivity to outcome devaluation and the neural activity patterns.

We have previously shown in non-tethered rats that sensitivity to reward devaluation by satiety was promoted in the DT5 task when rats are trained with liquid sucrose but not with grain-based pellets (Vandaele et al., 2017). Here, we have included data from 5 of 17 rats who were trained with pellet rewards. However, dividing the groups based on reward type did not reveal substantial differences in behavior and neuronal activity (Figure 8—figure supplement 2). We also note that rats trained with both sucrose and pellets reliably show insensitivity to reward devalution by conditioned taste aversion (Vandaele et al., 2017), supporting the notion that on average the DT5 procedure promotes habitual responding.

Pharmacological inactivation of both DLS and DMS interferes with task performance

Although dorsostriatal activity did not correlate with habitual learning, we have found neural correlates of performance in the DT5 task in both DLS and DMS during early and extended training. To assess causal involvement of these regions in DT5 performance, additional groups of rats were bilaterally implanted with cannulae targeting DMS (N = 8) or DLS (N = 11) and trained to asymptotic performance (20 and 10 sessions, respectively). Rats received micro-infusions of saline or a cocktail of GABA agonists, baclofen and muscimol, before a DT5 session (Figure 9A–B). Inactivation of DMS resulted in a subtle but significant decrease in the number of lever presses (Figure 9C; F(1,7)=7.36, p<0.05). We also observed a trend toward an increase in latency to the first lever press following DMS inactivation (Figure 9D; F(1,7)=5.16, p=0.057), but no effect on response rate (Figure 9E; F(1,7)=1.43, p>0.1). DLS inactivation slightly decreased the number of lever presses, although the effect was not significant (Figure 9F; Sign test: Z11 = 1.8, p=0.07). We also observed a decrease in response rate following DLS inactivation (Figure 9H; F(1,10)=5.51, p<0.05), but no effect on the first lever press latency (Figure 9G; Sign test: Z11 = 0.6, p>0.5). The effect on response rate may result from a shift in the distribution of inter-press intervals to the right following DLS inactivation (Figure 9J; K-S test: Dn = 0.17, p<0.0001) but not DMS inactivation (Figure 9I; K-S test: Dn = 0.04, p>0.1). These results confirm the concurrent involvement of both DMS and DLS during sequence performance in the DT5-task.

Pharmacological inactivation of DMS and DLS disrupts task-performance.

(A–B) Schematic representation of cannula placements in DMS (A) or DLS (B). (C–E) Mean number of lever presses (± SEM) (C) mean first lever press latency in seconds (± SEM) (D) and mean response rate (± SEM) (E) after infusion of saline or muscimol/baclofen in the DMS (top) and difference scores (inactivation – saline) of individual rats (bottom). (F–H) Mean number of lever presses (± SEM) (F), mean first lever press latency in seconds (± SEM) (G), and mean response rate (± SEM) (H) after infusion of saline or muscimol/baclofen in the DLS (top) and difference scores (inactivation – saline) of individual rats (bottom). (I–J) Distribution of inter-press intervals after infusion of saline (DMS, red; DLS, blue) or muscimol/baclofen (DMS, yellow; DLS, green) in the DMS (I) or in the DLS (J).

https://doi.org/10.7554/eLife.49536.017

Both DLS and DMS remain correlated with over-trained task performance during extended training

If the DLS is specifically involved in over-trained habitual/skilled performance when no additional learning or deliberation is required, then one would expect a disengagement of DMS from behavioral control after extended training. However, our findings show multiple behavior-related neural activity patterns in DMS at this time point. To further test this hypothesis, we looked at correlations between DMS and DLS neural activity and specific phases of the behavior to ask whether these regions were differentially correlated with over-trained performance. We examined the trial-by-trial relation between (i) neural activity at lever insertion and latency to the first lever press, (ii) neural activity during the lever press sequence and response rate, and (iii) neural activity following lever retraction and latency to port entry (Figure 10A–C, Figure 10—figure supplement 1). We controlled for significant correlations occurring spuriously in a small percentage of units by running spearman correlations on 1000 iterations of shuffled data for each behavioral variable (Figure 10—figure supplements 24). The percentage of correlated units in each region significantly deviated from the proportion expected by chance for each behavioral variable except correlations between DLS neural activity and port entry latency (Figure 10—figure supplement 4, p=0.1). We found that spike activity of 42% of recorded MSNs in DLS and DMS was correlated with performance (Figure 10D), with significant regional differences (χ2=9.4, p<0.05; Figure 10E). There was no significant difference between DMS and DLS in the proportion of neurons correlated with latency to 1st lever press (DLS: 16.6%, N = 56; DMS: 18.8%, N = 77) or response rate (DLS: 14.8%, N = 50; DMS: 13.7%, N = 56; Figure 10E). However, the proportion of neurons correlated with port entry latency was surprisingly higher in DMS than DLS (DLS: 6.8%, N = 23; DMS: 11.9%, N = 49; χ2 = 5.6, p<0.05; Figure 10E). Thus, DMS activity remains correlated with optimized performance, and appears in fact more involved at the termination of the behavioral sequence than DLS. There was overall a small fraction of neurons with multiple behavioral correlations (Figure 10D). Correlated neurons were represented in every class of neuronal response determined above (Figure 10—figure supplement 1), suggesting that each of the sequence-related activity patterns isolated in this study may contribute to ongoing performance in the DT5 task.

Figure 10 with 4 supplements see all
DLS and DMS neural activity correlates with over-trained task-performance during extended training.

(A) Left: Distribution of Spearman rank correlation coefficients relating firing rate after the lever insertion (0–500 ms) to the first lever press latency on a trial by trial basis. Example of neurons with a negative (middle) or positive correlation (right) between firing rate and latency. (B) Left: Distribution of Spearman rank correlation coefficients relating firing rate during the sequence (from first to last lever press) to the response rate on a trial by trial basis. Example of neurons with a negative (middle) or positive correlation (right) between firing rate and response rate. (C) Left: Distribution of Spearman rank correlation coefficients relating firing rate after the lever retraction (0–500 ms) to the port entry latency on a trial by trial basis. Example of neurons with a negative (middle) or positive correlation (right) between firing rate and port entry latency. Red lines on correlation coefficient histograms indicate the coefficient median of units significantly correlated. (D) Venn diagram illustrating the number of units with significant correlation between firing rate and 1st lever press latency (yellow), response rate (green) and port entry latency (blue). (E) Proportion of neurons in DMS and DLS expressing correlation with the 1st lever press latency (left), the response rate (middle), and the port entry latency (right).

https://doi.org/10.7554/eLife.49536.018

Discussion

Here, we characterized and compared DLS and DMS sequence-related activity during improvement in skilled performance across early training sessions and during extended training, likely after considerable skill consolidation. Neural activity in DMS and DLS was strongly dissimilar during early training and did not show obvious changes across sessions despite significant improvement in performance. However, DMS and DLS activity was more similar after extended training, with a comparable distribution of sequence-related behavioral correlates in the neuronal activity across the two regions. Optimized performance after extended training was not associated with DMS disengagement from behavioral control, since both DMS and DLS neurons were involved in initiation, execution and termination of the lever pressing sequence, as indicated by trial-by-trial correlations between neural firing and behavior. Furthermore pharmacological inactivation of both DLS and DMS had moderate effects on performance. These results suggest that behavioral sequences triggered by salient stimuli may continue to involve both the DMS and DLS long after initial acquisition, even when performance is highly regular.

We previously showed that responding for sucrose reward on a free-running fixed-ratio five schedule remains goal directed even after 43 training sessions, while, in contrast, rats trained on the discrete-trials fixed-ratio 5 (DT5) task used here are relatively insensitive to devaluation as early as the 5th training session (Vandaele et al., 2017). In addition, rats trained under the DT5 procedure show lower trial-by-trial variability in responding than rats trained under free-running fixed-ratio 5, in agreement with greater regularity and automaticity in DT5 trained subjects (Vandaele et al., 2017). We hypothesized that the rapid development of habitual and automatic responding in rats trained under the DT5 schedule is due to the nature of lever presentation, in which insertion acts as a cue to begin lever pressing, and lever retraction signals the end of responding. Thus sequence initiation may be triggered by lever insertion, and lever retraction can ameliorate any requirement for monitoring the number of lever press responses. We therefore expected to observe strong sequence-related activity in the DLS, in accordance with prior findings (Barnes et al., 2005; Jin and Costa, 2010; Jin et al., 2014; Martiros et al., 2018; Thorn et al., 2010). In addition, we expected DMS behavior-related activity to decrease over time as behavioral performance presumably depended increasingly on the DLS. Our findings did not fully support our expectations: we observed neural activity correlated with the behavioral sequence but did not find evidence to support a disengagement of the DMS over time.

Using standard single-neuron analyses, we found units that either increased or decreased their activity in association with one or more of the behavioral events we measured. This approach revealed differences in the relative proportions of excitations and inhibitions in DMS and DLS in early training, but did not help us distinguish the characteristic firing patterns we observed. We therefore turned to a classification approach to identify behavioral correlates in the spiking patterns of DMS and DLS neurons. This data-driven approach revealed many of the neural response types previously described during lever pressing sequences (Jin et al., 2014). We found subpopulations of Phasic neurons showing transient excitation at the start or at the end of the sequence, whereas Non-phasic neurons were characterized by sustained excitation or inhibition during the sequence. Both sustained excitation throughout sequentially-performed actions and excitation marking the beginning and termination of sequentially-performed actions have been proposed to represent the chunking of individual actions into a single unit (Barnes et al., 2005; Graybiel and Grafton, 2015; Jin and Costa, 2015; Jin and Costa, 2010; Jin et al., 2014; Smith and Graybiel, 2016), while excitations at the boundaries of sequences have been hypothesized to signal the initiation and termination of the sequence (Jin et al., 2014; Jin and Costa, 2010; Jin and Costa, 2015; but see Sales-Carbonell et al., 2018). As would be predicted by prior findings (e.g., Thorn et al., 2010), the distribution of these neural response patterns was indeed different in DMS and DLS during early training. Specifically, in the DLS, activity was characterized by a large proportion of neurons showing sustained excitation during the entirety of the series of lever presses, with a relatively low proportion of neurons expressing sustained inhibition throughout. A somewhat opposite pattern was observed in DMS with excitation mostly restricted to the boundaries or delimiters of the sequence in Phasic neurons (i.e, at lever insertion and lever retraction), and more neurons expressing sustained inhibition during lever pressing. However, the activity in DMS did not diminish over time, and DLS and DMS neural activity in fact became more similar after months of overtraining.

The presence of task-bracketing activity in DMS was unexpected. Previous work has demonstrated the development of task-bracketing activity in DLS, but not in DMS, as rats learned to navigate in a T-maze (Barnes et al., 2005; Regier et al., 2015; Smith and Graybiel, 2013; Thorn et al., 2010). Indeed, in the study of Thorn and colleagues, DLS activity was characterized by excitation at the boundaries of the navigation sequence whereas DMS activity peaked in the middle, when animals chose between two directions based on instruction cues. A more recent study by Martiros et al. (2018) also reported neuronal excitation at the beginning and end of a 3-response lever press sequence in the DLS, but not in the DMS. In contrast, in the present study, DMS activity peaked at the lever insertion and lever retraction cues and was inhibited during lever pressing, whereas DLS neurons expressed sustained excitation throughout the sequence. One possible explanation for these differences is that both the T-maze task and the 3-response sequence task require the subject to move about the apparatus, and thus to link actions across space. In contrast, in the current DT5 task, rats responded repetitively upon a single operandum in the same location. In agreement with this notion, both sustained excitation and inhibition were observed in the dorsal striatum of mice similarly performing a series of lever presses (Jin and Costa, 2010; Jin et al., 2014). Sustained excitation in DLS neurons is also consistent with studies recording DLS activity in tasks involving locomotor sequences (Rueda-Orozco and Robbe, 2015; Sales-Carbonell et al., 2018).

A second possibility for why task-bracketing activity was strongly observed in the DMS is that these responses may largely represent cue-elicited activations rather than motor sequence initiation-cessation signals. Although we isolated neurons showing activity consistent with a start-stop characterization, a major distinction between the present findings and much of the prior work is that the beginning and end of the behavioral sequence in this study is cued by lever insertion and retraction. These two events proved to be strong stimuli that elicited excitatory neural activity resembling previously-described start/stop or boundary neurons (Jin et al., 2014; Smith and Graybiel, 2013). Yet close examination revealed that most so-called ‘Start neurons’ responding to the lever cue at trial initiation did not respond just before the first lever press. These data suggest the alternative explanation that phasic activity associated with behavioral sequence initiation, including that observed here, may reflect sensory cues rather than strictly the initiation of a series of actions. This result is consistent with prior findings showing an absence of start signals at the initiation of a locomotor sequence (Sales-Carbonell et al., 2018). Further, Phasic Stop neurons expressed excitation at the termination of the sequence after the lever retraction cue but also before the port entry as rats approached the magazine to retrieve the reward. This excitation was not found before isolated port entries during inter-trial intervals and may therefore encode an expectation of the reward, triggered by the lever retraction cue.

Based on our previous study, we expected a transition from goal-directed to habitual control over early training sessions (Vandaele et al., 2017). Yet, neural activity was relatively similar across the first ten sessions of DT5 training, even as markers of behavioral regularity and automaticity improved. Thus, we cannot conclude that the neural correlates we report here mediate the development of habit or the improvement in performance over time. Of note, as a group, subjects in the early training group already showed habitual responding after 10 sessions. However, our analysis suggests that the specific DMS and DLS neural patterns observed during those 10 sessions do not substantially correlate with the presence or absence of habitual control, because when we separated subjects based on habitual or goal-directed control for early (or extended) training, the neural patterns were largely similar. This conclusion, however, rests on our arbitrary separation of rats by sensitivity to outcome devalution by satiety, which may not be the best metric for assessing habitual control, since evidence with other manipulations suggests strong habit formation in the DT5 task (Vandaele et al., 2017). Overall, it is not clear whether or how sequence-related activity relates to either skill learning/performance or habit learning/performance. Future studies could address this question by comparing dorsostriatal activity in the DT5 task with free-running instrumental behavior that remains goal directed after extended training (Vandaele et al., 2017).

We focused on neural activity at the beginning of DT5 training, but all subjects experienced prior instrumental training under CRF and DT1 during which relevant aspects of the behavior and neural activity may have become organized. Although within-sequence response rate progressively increased across the ten sessions, the number of within-sequence port entries, a marker of behavioral chunking, decreased quickly with most of the change occurring in session one, suggesting rapid concatenation of individual lever presses into unitary sequences (Figure 1D, inset). In fact, in the DT5 task, rats do not have to track the number of presses emitted to obtain the reward and may just continue pressing on the lever until its retraction, which may favor behavioral chunking and the specific activity patterns reported in this study. Thus, rapid acquisition of the new response requirement, facilitated by the presence of cues predicting reward availability and delivery, could explain why sequence-related activity is observed in both regions very rapidly, compared to previous studies using uncued, self-paced procedures or requiring discriminatory learning (Barnes et al., 2005; Jin and Costa, 2010; Jin et al., 2014; Smith and Graybiel, 2013; Thorn et al., 2010). Including probe trials in which lever retraction is delayed would be helpful for to determining to what extent behavior and neural activity depend on the lever retraction cue.

Skill learning is generally characterized by an initial phase of rapid performance improvement followed by an extended period of slow performance optimization (skill consolidation). We observed that during extended training, execution of lever press sequences was optimized and stereotyped, with within-sequence response rate, trial-by-trial variability in response rate, and 1st lever press latency reaching asymptotic levels. We hypothesized that DLS sequence-related activity would strengthen with repeated practice, whereas DMS, involved in initial goal-directed learning, would disengage, thereby increasing the disparity between these two regions. Contrary to our predictions, correlations with behavior suggest that DMS did not disengage from behavioral control after extended training (see below), and neuronal activity in DMS and DLS became more similar. Ensemble activity analysis revealed that, on average, both DMS and DLS expressed sustained excitation throughout the lever press sequence with peaks following lever insertion and retraction. While there was a moderate difference in the relative proportions of sustained inhibitory activity in the two regions, DMS and DLS neurons could not be differentiated by LDA models based on their sequence-related activity, a result that stands in sharp contrast with the dissociation between these brain regions during early training. The increase in dorsostriatal similarity after extended training resulted from an increase in sustained inhibition in DLS and a higher proportion of sustained excitation neurons in DMS relative to early training. Furthermore, we observed less activity during lever presses and stronger activity in response to lever insertion and retraction in Phasic DLS neurons after extended training compared to early training (compare Figure 3C–D). One interpretation is that these changes reflect greater cue encoding in DLS and greater action encoding in DMS. This suggests that skill consolidation following many weeks of training in this task leads to improved integration of both cues and actions across regions within the dorsal striatum. Few studies have examined neuronal activity in dorsal striatum across two training stages so separated in time, so it is not yet clear whether findings will be similar after extended training (months) in other tasks.

If the neural activity as characterized here does not clearly correlate with habit or automatization, what might these patterns represent? As mentioned above, the relative absence of change across early training may reflect rapid behavioral chunking promoted by the lever insertion and retraction cues. While the proportions of distinct classes of neuronal activity patterns that might represent chunking were different in DLS and DMS, both regions do show these types of patterns, and these patterns were similar after extended training. Yet, the functional contribution of these DMS and DLS signals to behavior is expected to be distinct based on the specific connectivity of these regions as nodes within parallel corticostriatal loops. After extended training, a greater proportion of DMS than DLS neurons showed significant trial-by-trial correlations between neuronal activity after the final lever press and latency to enter the reward port. This suggests that DMS neural activity may mediate the impact of associations between the lever retraction stimulus or the port entry action with the rewarding outcome (Burton et al., 2015; Ito and Doya, 2015; Kimchi and Laubach, 2009b; Kimchi and Laubach, 2009a) even after substantial training. Interestingly, similar proportions of neurons in DMS and DLS expressed correlations with the 1st lever press latency and the response rate, suggesting that both regions are involved at the initiation and execution of the sequence. In fact, pharmacological inactivation of both DLS and DMS modestly reduced responding during the task, although through different mechanisms. These results are in agreement with our correlation analysis and prior findings in showing concurrent engagement of both DLS and DMS in performance vigor (Kim et al., 2014; Panigrahi et al., 2015; Rueda-Orozco and Robbe, 2015). Taken together, the correlation analyses and the inactivations support a role for sensorimotor striatal activity in moment-by-moment expression of behavior (Robbe, 2018; Rueda-Orozco and Robbe, 2015; Sales-Carbonell et al., 2018), suggesting that the neural representation of higher-order functions, including transitions in cognitive control, may be more clearly reflected in activity in other regions, or in broader neuronal loops that include the striatum.

To conclude, we observed dissociated DMS and DLS sequence-related activity in early training that did not evolve across early training sessions, despite significant changes in performance. The biggest change in dorsostriatal sequence-related activity occurred between early and extended training, two training stages involving very moderate changes in behavior in our experimental conditions. Thus, although the activity of 41% of neurons was correlated with performance on a trial-by-trial basis, changes in sequence-related activity within and across training stages appeared to be uncoupled with changes in behavior. Furthermore, although behavioral control was on average habitual in both early training and extended training groups, individual differences in sensitivity to outcome devaluation did not correlate well with dorsostriatal activity. Thus, it is not clear how activity in DMS and DLS relates to skill learning and formation of habit. One interpretation is that the absence of change across early training reflects rapid behavioral chunking promoted by the lever insertion and retraction cues. Furthermore, the alterations in sequence-related activity over extended training may reflect a refinement of the temporal properties of striatal circuit engagement that may not map one-to-one with the behavioral variables we and others typically measure. It is possible that temporally-correlated circuit engagement across parallel corticostriatal loops is associated with better performance and/or skill consolidation, but this remains to be tested. Together, these findings suggest that sequence-related activity in the dorsal striatum during early and extended training may reflect additional processes not captured by our behavioral indicators of sequence learning, optimization of performance, or habits.

Materials and methods

Subjects

Subjects

Male Long Evans rats (N = 36, Harlan, IN) were individually housed and maintained in a light- (12 hr light-dark cycle, lights on at 7am) and temperature-controlled vivarium (21° C). Except just before and after surgery, rats were maintained at 90% of their free-feeding weight; food rations were given 1–2 hr after daily behavioral sessions. Water was available ad libitum. This study was carried out in accordance with the recommendations of the Guide for the Care and Use of Laboratory Animals (National Research Council, 1996), and was approved by the institutional animal care and use committee of Johns Hopkins University.

Experimental groups

Request a detailed protocol

Following instrumental training, rats in the early training group (N = 9) underwent surgery, and, after recovery, neural activity in dorsal striatum was recorded throughout acquisition of the DT5 task, which included 3 DT1 sessions and 10 DT5 sessions. Rats in the extended training group (N = 8) received extensive training in the DT5 task with more than 45 DT5 sessions prior to surgery; neural activity in these subjects was recorded daily during an additional 9 to 22 DT5 sessions (total training: 58–67 DT5 sessions). Sensitivity to satiety-induced devaluation was assessed at the end of recording for all rats.

Pharmacological inactivation of DMS (N = 8) or DLS (N = 11) were tested on two additional groups of rats, trained in the DT5 task for 20 and 10 sessions, respectively. Effect sizes were not estimated before the experiment but we aimed at including a minimum of 8 rats per group with correct cannulae placements based on established standards.

Behavioral training

Behavioral apparatus and DT5 task training

Request a detailed protocol

Training occurred in conditioning chambers designed for in vivo neural recording housed within sound-attenuating boxes (Med Associates, St Albans, VT). The house light, located on the ceiling of the chamber, remained illuminated during the full length of every session. DT5 training was conducted as described in Vandaele et al. (2017). Rats first underwent a single 30 min magazine training session, in which reward was delivered under a variable time-60s schedule. Rats were next trained to press the left lever to earn reward, delivered in the adjacent magazine. Rats were trained to respond for either 20% sucrose (0.1 mL delivered over 3 s) (early training: N = 6, extended training: N = 6) or a single grain-based pellet (Bioserv Biotechnology) (early training: N = 3, extended training: N = 2). Sessions were limited to 1 hr or 30 reward deliveries. Next, subjects advanced to discrete trial (DT) training, in which each session consisted of 30 trials separated by 1 min inter-trial intervals. Every trial was initiated by insertion of the left lever. For the first three sessions, one lever press simultaneously resulted in the retraction of the lever, reward delivery, and initiation of a new inter-trial interval (discrete-trial fixed ratio (FR) 1; DT1). On session 4, the response ratio was increased to 5 (discrete-trial FR5; DT5). Failure to complete the ratio within 1 min was considered as an omission and resulted in lever retraction and initiation of a new inter-trial interval.

Outcome devaluation by sensory-specific satiety

Request a detailed protocol

To avoid neophobia, rats were pre-exposed to the control reward for 30 min in feeding cages one or two days before the first devaluation test. Each rat received 2 days of testing, separated by one reinforced training session. On the first test day, half of the rats were given free access to their training reward (devalued condition), while the other half received a control reward, which never served as a reinforcer (valued condition). Grain-based pellets served as the control reward for rats trained with liquid sucrose, and liquid sucrose served as the control reward for rats trained with pellets. Pre-feeding occurred for 1 hr in feeding cages in the experimental room. Immediately after pre-feeding, rats were tethered and placed in the recording chambers for a 10-trial extinction session. The procedure was identical to that of training except that no reward was delivered. On the second test session, animals were pre-fed with the alternative reward prior to the 10-trial extinction test.

Surgery

Request a detailed protocol

Rats underwent surgery under isoflurane anesthesia (0.5–5%), after receiving pre-operative injections of cefazolin (75 mg/kg, antibiotic) and carprofen (5 mg/kg, analgesic). Topical lidocaine was applied for local analgesia. Rats in the early training group were implanted with two unilateral arrays of 8 wires aimed at DMS and DLS (0.004’ steel wires arranged in a 2 × 4 configuration with wires, each array spaced 2 mm apart, Microprobes). Rats in the extended training group were implanted with two unilateral arrays of 8 electrodes each (0.004’ tungsten wires arranged in 2 bundles spaced 2 mm apart and aimed at DMS and DLS) attached to a microdrive allowing the entire array to be lowered by 160 µm increments. For both groups, target coordinates were +0.25 mm AP, +2.3 mm ML, −4.6 mm DV for DMS and +0.25 mm AP, +4.3 mm ML, −4.6 mm DV for DLS. At least 5 days of post-operative recovery commenced before neural recording.

For the inactivation experiments, 26 gauge guide cannulae (Plastic One, Roanoke, Virginia) were implanted bilaterally in 19 rats and targeted to either the DLS (+0.48 mm AP, +4 mm ML, −2 mm DV; final DV: −5 mm) or DMS (+0.36 mm AP, +2.2 mm ML, −3 mm DV; final DV: −4 mm). Inactivation was achieved with a mixture of baclofen (γ-aminobutyric acid-B receptor agonist) and muscimol (γ-aminobutyric acid-A receptor agonist) (B/M: 1.0/0.1 mM; Sigma, St Louis, MO) in a volume of 0.3 µL delivered over 1 min via 33 gauge infusers, 10 min before test. Saline vehicle was administered as the control condition. All subject were tested in both conditions, with the order of test counterbalanced across subjects.

Electrophysiological recordings

Request a detailed protocol

Recording

Request a detailed protocol

Single-unit activity was acquired during behavior as described previously (Ottenheimer et al., 2018; Richard et al., 2016; Richard et al., 2018), by connecting recording cables to rats’ headsets and, at the other end, to a commutator that allowed free movement throughout the recording session. Amplified signals were further processed and stored, along with time stamps of behavioral events, by a multichannel neural recording system (MAP system; Plexon Inc, TX). In the early training group, fixed electrodes allowed recording from the same location throughout training. In the extended training group, the microdrive carrying the electrode arrays was lowered by 160 µm at the end of every other session in order to record from a new set of neurons. In the extended training group, at any electrode location, units were only included from one session for the analysis.

Analysis of electrophysiological recordings

Request a detailed protocol

Spike sorting

Request a detailed protocol

Individual units were isolated offline using principal component analysis in Offline Sorter (Plexon) as described previously (Ottenheimer et al., 2018; Richard et al., 2016; Richard et al., 2018). Auto-correlograms, cross correlograms and the distribution of inter-spike intervals were used to ensure good isolation of single units. Only units with well-defined waveforms and constant characteristics throughout the entire recording session were included in the analyses. Sorted units were exported to Neuro-Explorer (Plexon) for the extraction of neuron and event timestamps, and then onto MATLAB (MathWorks) for further analysis.

Waveform analysis

Request a detailed protocol

To dissociate putative-FSI from putative-TAN and MSN, neurons with a firing rate >20 Hz and narrow half-valley width (<0.15 ms) were defined as putative-FSI. Neurons with intermediate characteristics (firing rate between 12.5 and 20 Hz) were unclassified and excluded from analysis. The remaining neurons were defined as putative-TAN/MSN. Given the low number of TAN in dorsal striatum (about 1%) (Oorschot, 2013; Schmitzer-Torbert and Redish, 2008), we reasoned that including these neurons in our analysis would not significantly impact the pattern of results presented in this study.

Characterization of task-responsive neurons and z-scores

Request a detailed protocol

We assessed neuron responses to each event by conducting t-tests on firing rates during 250 ms periods pre- and post-event, from lever insertion to port entry, in comparison to a 5 s baseline period occurring 20 s prior to each event during the inter-trial interval. Neurons with low baseline firing rate (<0.5 Hz) were not considered. Neurons expressing a significant response (p<0.01) to at least one of the 12 events of the behavioral sequence were considered as task responsive neurons (TRN). Neurons not classified as task-responsive were deemed ‘non task-responsive’ (nonTRN). Importantly, both TRN and nonTRN were included in classification, decoding and correlation analyses.

The firing rate of each neuron was normalized as follows: (Fi-Fmean)/Fsd, where Fmean and Fsd are the mean and standard deviation of the firing rate during the 5 s baseline period, and Fi is the firing rate at the ith bin of the PSTH. Heatmaps and average PSTHs presented in this study represent the z-scores from −250 ms to 250 ms around each event of the behavioral sequence.

Among TRNs, neurons were considered as excited during the sequence if the average of z-scores from the lever insertion to the port entry was above zero, and were considered as inhibited during the sequence if this average was below zero.

Classification of distinct neural activity patterns

Request a detailed protocol

Phasic and Non-phasic neurons were separated using a Fourier analysis followed by a hierarchical clustering method and the application of a model selection metric (Figure 3—figure supplement 1). Specifically, we first reasoned that Non-phasic neurons, expressing by definition sustained modulation of activity along the behavioral sequence, would be characterized by higher power in low frequency domains (<1 Hz) in comparison with Phasic neurons expressing transient peaks in activity. Note the application of a Fourier analysis here is not to characterize inherent oscillatory activity, but, instead, is a tool to search for distinctions among the time courses of event-evoked spiking patterns. To capture this quantitatively, we computed the power in low (<1 Hz) and intermediate (1–4 Hz) frequency domains for each neuron (fft function in MATLAB; Figure 3—figure supplement 1) (for each neuron vector, normalized activity was aligned to each event of the sequence, and the vector represented successive time windows that covered all 7 events of the sequence). We then used these as the features for clustering and applied hierarchical clustering to the dataset (using the linkage function on MATLAB, with the ‘ward’ parameter; Figure 3—figure supplement 1C). Following this, we determined the optimal number of clusters in the dataset by applying the Calinski Harabasz model selection criterion, which assessed the statistical separability of the individual classes obtained for each of 1, 2, 3, ... 10 best clusters identified in the data (CH index; evalclusters function in MATLAB with parameters: ‘linkage’,’CalinskiHarabasz’, ‘Klist’,[1:10]). The optimal number of clusters identified by the CH model selection criterion was 2. The separability of these two clusters was verified using a permutation test (Figure 3—figure supplement 1F). We note that our selection of 1 Hz and 4 Hz as low and intermediate frequency limits for extracting features for hierarchical clustering was informed by our original reasoning, and by visual inspection of the data. However, those specific frequency cut-off values were ultimately arbitrary. To explore the sensitivity of our results to the specific values of these cut-offs, we systematically varied both these values (Figure 3—figure supplement 1E) and found no substantial impact on the results of clustering: similar outcomes in terms of optimal number of clusters and their separability were obtained in 60% of the cases (Figure 3—figure supplement 1E–F).

Following separation of Phasic and Non-phasic neurons, we further classified the Phasic and Non-phasic populations using hierarchical clustering. To account for the strong disparity in z-scores among the population of neurons, activity was normalized as follow: (Fi-Fmean)/F|max| where and Fi is the firing rate at the ith bin of the PSTH, Fmean is the mean of the firing rate during the 5 s baseline period, and F|max| is the absolute maximum neural response. Three features were used in the hierarchical clustering algorithm: the mean normalized activity at the initiation (0–250 ms from lever insertion), execution (250 ms periods around each lever press) and termination (250 ms prior port entry) of the sequence. The number of clusters retained was determined using the Calinski Harabasz criterion.

To investigate the representation of each phasic and non-phasic class during early training, we first separated Phasic and Non-phasic neurons using a Fourier analysis, as described above, and trained a random forest classifier (Breiman, 2001; Liaw and Wiener, 2002) on the extended training dataset using Phasic (Start, Stop, Middle) or Non-phasic classes (EXC, INH) as labels and the mean normalized activity at the start, middle and end of the sequence, as features. The random forest classifier was then tested on the early training dataset. The ‘out of bag’ error estimates from the Random Forest classifier were 0.036 and 0 for Phasic and Non-phasic classes, respectively, which constitute low values for generalization error providing an additional validation of the classification approach used here.

Decoding

Request a detailed protocol

A linear discriminant analysis (LDA) model (the ‘fitcdiscr’ function in MATLAB) was trained on mean peri-event z-scores (from lever insertion to port entry; vector of 12 elements) for 90% of the full dataset (i.e., putative MSNs). This model was then used to classify the remaining 10% of individual neurons as belonging to DLS or DMS. This was repeated for each 10% of the dataset in a cross-validation approach. To account for the unbalanced number of recorded neurons in DLS versus DMS, we performed the analysis on the same number of neurons in each region by randomly sampling neurons in DMS based on the number of recorded DLS neurons. We performed this analysis on 50 random selections of neurons pooled from the different rats on a given session, and averaged performance across all 50 repetitions to determine model accuracy. We also conducted the same analysis with the region identities shuffled to determine accuracy expected by chance. We compared the mean decoding accuracy across early training sessions. During extended training, we performed the analysis on 50 random selections of different ensemble sizes of neurons (60, 120, 200, 400, 600 neurons), pooled from different sessions and rats. Decoding accuracy was compared across ensemble sizes.

To account for subtle modulations of activity along the behavioral sequence not captured by mean peri-event z-scores, we conducted a principal component analysis (PCA) on the concatenated z-scores around each event of the behavioral sequence (−250 to 250 ms, 10 ms bins, seven events *51 bins vector) and trained LDA models on these principal components for each early training session and during extended training. We first compared the decoding accuracy as a function of the number of principal components included in the analysis. This allowed us to determine the minimal number of principal components necessary to reliably decode brain regions in both dataset, as a function of sessions or ensemble sizes (Figure 7—figure supplement 1). We then repeated this analysis using neurons’ first three principal components, for each early training session and during extended training.

To compare accuracy in region decoding across sessions and ensemble size, we performed permutation tests for shuffled and true data, in each early training session and for each ensemble size during extended training.

Correlation analysis and shuffled data controls

Request a detailed protocol

To assess relationships between neural activity of individual neurons and behavior, we conducted Spearman’s rank correlations. We examined the trial-by-trial correlation between firing rate at the initiation (0–500ms post lever insertion), execution (from 1st to last lever press) and termination (0–500ms post lever retraction) of the sequence with the first lever press latency, response rate, and port-entry latency, respectively. Neural activity during execution of the lever press sequence was computed by normalizing the number of spikes between the first and last lever press to the sequence duration. As in Richard et al. (2018), to control for significant correlations occurring spuriously in a small percentage of units, we ran Spearman correlations on 1000 iterations of shuffled data for each behavioral variable (Richard et al., 2016; Richard et al., 2018). The trial-by-trial latencies or response rate were randomly shuffled for each neuron for each iteration, and Spearman correlations that related behavior and firing rate during the relevant event window were assessed for each neuron. We then evaluated the distribution of both the number of units with significant correlations (at p<0.05) and the mean correlation coefficient to determine to what extent these variables stand apart from the shuffled data distribution (Figure 10—figure supplements 24).

Statistical analysis

Request a detailed protocol

Mean response rate (in resp/s) was measured as the number of lever presses per trial, divided by the time of lever availability and averaged across trials. Data following a normal distribution were subjected to repeated measures analysis of variance (ANOVA). Appropriate non-parametric tests (sign-test, Kolmogorov-smirnov) were used when normality assumption was violated. For neural population measures, mean z-scores were computed by averaging time bins from 0 to 250 ms pre and post-events from lever insertion to the port entry following reward delivery. Mean normalized population activity in DMS and DLS was compared across events using repeated-measures ANOVA (with Geisser-Greenhouse correction for violation of sphericity), with events as the within factor and region as the between factor. During early training, sessions were considered as a between factor. Proportions of neurons were compared using chi-squared tests. Statistical analyses were performed on MATLAB (MathWorks) and Statistica (StatSoft 7.0).

Histology

Request a detailed protocol

To verify the electrodes and cannulas placement, animals were deeply anesthetized with pentobarbital and, when appropriate, electrode sites were labeled by passing a DC current through each electrode. All rats were perfused intracardially with 0.9% saline followed by 4% paraformaldehyde (extended training group with tungsten wires), or 4% paraformaldehyde with 3% potassium ferricyanide (early training group with steel wires). Brains were removed, post-fixed in 4% paraformaldehyde for 4–24 hr, cryo-protected in 20% sucrose for >48 hr, sectioned at 50 µm on a cryostat, and stained with cresyl violet.

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
    Classification and regression by RandomForest
    1. A Liaw
    2. M Wiener
    (2002)
    R News 2:18–22.
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
  27. 27
  28. 28
  29. 29
  30. 30
  31. 31
  32. 32
    Habit formation
    1. KS Smith
    2. AM Graybiel
    (2016)
    Dialogues in Clinical Neuroscience 18:33–43.
  33. 33
  34. 34
  35. 35
    Lever insertion as a salient stimulus promoting insensitivity to outcome devaluation
    1. Y Vandaele
    2. HJ Pribut
    3. PH Janak
    (2017)
    Frontiers in Integrative Neuroscience, 11, 10.3389/fnint.2017.00023.
  36. 36
  37. 37
  38. 38
  39. 39

Decision letter

  1. Kate M Wassum
    Senior Editor; University of California, Los Angeles, United States
  2. Naoshige Uchida
    Reviewing Editor; Harvard University, United States
  3. Naoshige Uchida
    Reviewer; Harvard University, United States
  4. David Robbe
    Reviewer; INSERM U1249, Aix-Marseille University, France

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Distinct recruitment of dorsomedial and dorsolateral striatum erodes with extended training" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Naoshige Uchida as the Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Kate Wassum as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: David Robbe (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

Summary:

Vandaele and colleagues monitored the activity of neurons in the dorsomedial and dorsolateral striatum (DMS and DLS, respectively) while rats were trained in a discrete trials fixed ratio-5 (DT5) procedure. A prevailing view in the field is that DMS and DLS are involved in goal-directed and habitual behaviors, respectively, and training shifts the involvement of DMS to DLS as the behavior becomes more habitual. Furthermore, patterns of activity related to sequence learning such as the appearance of start and end of motor sequence and sustained activity between them are thought to underlie habitual motor behaviors. The present study found that although neuronal activity in DMS and DLS differ early during training, their activity patterns become similar after overtraining. The data also showed that the activity of some neurons in both DMS and DLS was correlated with movement parameters, and inactivation of either DMS or DLS affected the animal's performance, indicating that these regions are involved in the control of behaviors.

These results challenge the prevailing idea regarding the activity patterns related to habitual behavioral control as well as the involvement of these areas in habit learning. The reviewers pointed out potential weaknesses such as that (1) the behavior of the animal becomes insensitive to the devaluation test (i.e. habitual) relatively early and it is unclear whether the authors were able to record in the period during which the behavior is clearly goal-directed, and therefore, that (2) the authors were unable to identify particular features of neuronal activity that correlated with transition from goal-directed to habitual behaviors. Conversely, the authors were unable to identify what features of behaviors are correlated with the changes in neuronal activity during overtraining.

The reviewers' overall evaluations varied somewhat, with some reviewers being more critical of the caveats mentioned above. However, all the reviewers thought that the results, in particular, the convergence of activity patterns between DMS and DLS after overtraining, are surprising and agreed that this study contains important results that will challenge the prevailing views regarding how DMS and DLS regulate goal-directed and habitual behaviors and subtleties of prevailing paradigms and interpretations of the previous results.

Essential revisions:

1) The authors discuss "engagement" of DMS and DLS in habit formation. For instance, in the Abstract, the authors state that "These results suggest that behavioral sequences may continue to engage both striatal regions long after initial acquisition, when performance is highly regular and habitual". However, later in the manuscript, the authors also point out the difficulty of relating the patterns of activity observed in the data and specific phases of goal-directed to habitual transition. In the last paragraph of Discussion, the authors discuss this issue clearly: "Thus, it is not clear whether or how sequence-related activity relates to either skill learning/performance or habit learning/performance". The reviewers found this apparent discrepancy very confusing. First, are there any behavioral changes that occur during overtraining? The reviewers thought that other behavioral measures might be better for evaluating habit or the transition from the early to over-trained phases. Second, the reviewers found that the last sentence of Discussion is insightful and provides a good summary of the implication of the study. We suggest that the authors emphasize the content discussed in this paragraph, and rewrite the Results and other parts of Discussion sections so that they are more consistent with one another.

2) The reviewers thought that additional experiments can clarify some of the remaining issues. In particular, showing recording data from probe tests and trials when the sequence was not completed, showing recording data from even earlier in training (e.g. DT1), looking at side-by-side-recordings from rats doing free-running FR5, etc. (Please see the individual comments for detail). Although these experiments are not required in light of eLife's policy for requested revisions to be feasible within an approximate 2 month time frame, we urge the authors to consider these experiments, or at least discuss in the manuscript.

3) The reviewers pointed out various issues in the data analysis (see below the individual comments). We request that the authors address these issues by additional analyses and revising the manuscript.

4) It is unclear at what point in training the DMS and DLS inactivations were done (i.e. after how many weeks/sessions of DT5 training). Please clarify. If it was after very extended DT5 training then this is really an important point: different than previous lesion studies, and important for their argument that DMS and DLS are still engaged in the task after the long extended training.

5) The analysis comparing the activity patterns between goal-directed and habitual animals is very informative. We request that this analysis (Supplementary Figure 12) be moved to a main figure.

Overall, we think that the authors can address the essential points and support the authors' main conclusions without additional experiments. Additional experiments are not required but would be very useful in clarifying specific issues.

Reviewer #1:

Previous studies have indicated that the dorsomedial striatum (DMS) is involved in goal-directed behaviors while dorsolateral striatum (DMS) is involved in habitual behaviors. A common view is that behavioral overtraining shifts the involvement of DMS to DLS. In accord with this, several studies have indicated that patterns of neural activity that are thought to bracket or chunk a sequence of movements appear in the DLS after overtraining. Some recent studies, however, have provided data indicating that a simple dichotomy may not describe the functions of DMS and DLS accurately. It is therefore important to clarify whether and in what conditions these ideas hold.

In this study, Vandaele and colleagues addressed this question by recording the neuronal activity in the DMS and DLS while rats performed a discrete trials fixed ratio (DT5) task. In this task, rats have to press a lever five times before obtaining a food reward. A lever insertion signaled rats to initiate lever pressing and the lever was retracted immediately after a fifth-lever press. This task design builds on their previous work that showed that the insertion/retraction promotes habit formation compared to a similar task with a free-standing lever. In the present data set, the lever pressing behavior became more rapid and stereotyped over the course of ~10 days of training. After extended training (8 weeks), the lever pressing behavior became insensitive to devaluation, indicative of habit formation.

During initial training (<10 days of training), the authors observed a big difference in neuronal activity between the DMS and DLS. Many DMS neurons exhibited sustained inhibition while DLS neurons showed sustained excitation. Transient excitations at the initiation and termination of lever presses and sequential activations during task performance were observed in both areas. These differences were evident at the gross population level, at the individual neuron level, and based on a decoding analysis. Surprisingly, these differences diminished after extended training. Finally, inactivation of DMS and DLS impaired the stereotyped performance in a similar manner. These results suggested that habit formation and the appearance of task-bracketing activities can be dissociated depending on task conditions. The authors discuss potential reasons why the present results differ from previous studies, such as the role of cue (lever insertion/retraction) in triggering phasic responses in DMS.

Overall, the results are surprising and provides important insights into how DMS and DLS regulate operant behaviors. The inclusion of the data from extended training is illuminating. The data analyses were done at different levels in a compelling way and they together make a compelling case to support the authors' conclusion. The exact reason why the results are different from previous studies remains unclear but the authors make interesting discussions. This study raise various important issues that will open up future investigations.

I have some relatively minor technical concerns which are unlikely to change the authors' main conclusions but are important for understanding the data.

1) The exact reason why the present result differed from the previous work remains unclear. Another possibility is that the animal in this task does not really have to "know" how many lever presses are required to get reward because the lever is retracted immediately after the 5th lever press. It would be illuminating to include probe trials in which lever retraction is delayed. Does the animal leave the lever without retraction or does the leaving depend on retraction? The authors provide evidence supporting habitual nature of behavior (stereotypy, speed, and devaluation) but it would be interesting to discuss this point.

2) The authors use a Fourier analysis to first classify neurons into phasic versus non-phasic types. It remains somewhat unclear what aspects of neuronal activity this analysis actually captures. It would be useful to analyze and report this issue more carefully. (1) the non-phasic type or "sustained" activity may actually consist of multiple phasic activities. Are there neurons that exhibit multiple phasic activities (e.g. every lever press) in the data set? If so, how are these neurons classified? The analysis was done by averaging across trials, and there is a possibility that it may smear out phasic responses. Clarifying these issues would facilitate better understanding of the analysis. (2) In the panel showing the firing patterns of "phasic neurons" (Figure 3—figure supplement 1 and Figure 6—figure supplement 1), the neurons at the bottom (and the top) appear to have sustained activity although these neurons may be exhibiting multiple phasic responses. Please clarify. (3) The authors included non-responsive neurons in this analysis. Some of the "inhibited" or "excited" neurons may have actually been not responsive. (4) Some neurons appear to show both phasic and sustained responses. I understand that the number of clusters are verified by the Calinski Harabasz criterion. But this does not necessarily exclude the presence of these neurons.

3) Throughout the manuscript, the authors indicate that their analyses are "unbiased". It is unclear in what sense they are unbiased. The authors are making a number of choices toward particular analysis or criterion. Perhaps, use "objective"? or clarify in what sense unbiased.

4) In the correlation between neuronal activity and three phases of behavior (1st lever press latency, response rate, and port entry latency), the authors define the "execution" window as the window between the first and the last lever press. This is a variable window and therefore, will be highly anti-correlated with response rate even where there is no firing modulation. This analysis has to be done and reported carefully.

5) It is unclear from the results "monitoring" is a good way to characterize the activity correlated with behaviors. It is possible that these activities are controlling behaviors. Why does the author use "monitoring"? That being said, a similar idea was proposed by Rueda-Orozco and Robbe, 2015. It would be good to cite this paper.

Reviewer #2:

In this manuscript, Vandaele et al., recorded spiking activity simultaneously in the dorsomedial and dorsolateral regions of the striatum (DMS, DLS) during learning and performance of lever press sequence task. The authors wanted to examine the validity of a classical theory in the field: the DMS contributes to early phases of task learning while DLS contribute to performance once the task is habitual. The firing pattern of DMS and DLS population was distinct but did not evolve much during learning although they were more similar when recordings were performed after extensive training. The authors concluded that contrary to the classical view both regions are engaged throughout learning and long after learning.

The general conclusion is interesting and could be novel enough to deserve publication in eLife. But in my opinion there many contradictory results and odds/unclear analyses. In particular, the authors developed a functional classification of the neurons (phasic, no-phasic; start, stop, middle) which is statistically contestable and is used extensively in the manuscript. More important (and again in my opinion) this analysis does not bring much more information or additional evidence in regard to the conclusion of the paper, as mentioned in the Abstract. I also find that some important observation which contradicts previous classical studies are not sufficiently emphasized. Finally, I have a strong technical concern I would like the author to clarify.

1) The authors claim that they use an unbiased classification of their neurons but in fact, many aspects of this classification are arbitrary and have unclear statistical ground. The authors decided to use the power in low (<1Hz) and "intermediate" (1-4Hz) frequency domains of each neuron spike train as features for hierarchical clustering. To the best of my knowledge, it is the first time such a method is used (at least the authors do not provide references). First, this choice of frequency is arbitrary. Second power in spike train is quite a tricky measure and is strongly sensitive to the total numbers of spikes recorded. |I doubt this is a good method for low firing rates observed in the striatum. In support of my doubt, the average power spectrum shown in Figure 3—figure supplement 1 peak at the lowest frequency value (first bin after 0 Hz). I don't think it is statistically valid to use power spectrum analysis while a majority of neurons do not show clear oscillatory patterns. In other words, the clustering algorithm is applied to noisy data with arbitrary threshold. Third, it is not because a clustering algorithm returns an optimal number of clusters that it means that data can be statistically separated. This requires combining the clustering algorithm with a permutation of the data points (shuffling of the paired value) and bootstrapping. Without showing such statistical disambiguation the authors cannot reject the hypothesis that there is no discrete categories but rather a continuum of firing rate profiles. By looking at the data in Figure 3—figure supplement 1 it appears that there is significant overlap between the 2 main categories: the phasic neurons in the upper part of Figure 3—figure supplement 1D are more similar to non-phasic neurons in the upper panel of Figure 3—figure supplement 1E, than to phasic neurons in the lower part of Figure 3—figure supplement 1D.

Without demonstrating that there are discrete functional categories of neurons (which is very unlikely by looking at the data in Figure 2), the authors should rewrite a large section of their manuscript and consider the possibility that striatum encodes continuously sensorimotor dynamics. This is an important point as this issue has been raised recently (a paper cited by the authors Sales-Carbonell et al., 2018 but see also Robbe et al., 2018). Also, instead of using oscillatory analysis as features characterizing neuronal activity, why not using the averaged and normalized perivent histograms (data in Figure 2D, E) and compute PCA on these to characterize two or three values that capture most of the variability of the peri-event histograms (see Sales-Carbonnel, 2018).

2) In the Abstract, the authors stated that DMS and DLS are "engaged" during learning. This statement is a bit contradictory with the fact that firing patterns in both regions do not evolve while animals improve in the task. Again, this is especially contradictory as the authors emphasized the existence of star and stop neurons. Usually, a causal role in behavior is inferred when there is a change in neuronal activity that correlates with a change in task proficiency (e.g., Barnes et al., 2005, Jin and Costa, 2015).

3) The results of the inactivation experiments do not fit very well with the general spin of the paper as there is no strong effect on the number of lever press. The strongest effect seems to be an increase in the latency to 1st press and a decrease in response rate. Again, this is difficult to reconcile with striatal neurons starting or maintaining or stopping sequences. On the other hand, this result with work emphasizing a role of the striatum in vigor/movement speed (Kim et al., 2014, Rueda-Orozco et al., 2015, Panigrahi et al., 2015).

4) One of the most surprising aspects of this data is that the population of DLS neurons fire continuously during the sequence. This is in plain contradiction with the start and stop hypothesis or the task bracketing hypothesis (especially it contradicts Martiros et al., 2018 paper that used a very similar lever press task). On the other hand, this result is very similar to Sales-Carbonnel et al., 2018 and Rueda-Orozco and Robbe 2015). I am really surprised that the authors do not emphasize such an important result.

5) The title of the paper is "Distinct recruitment of dorsomedial and dorsolateral striatum erodes with extended training". And in the Abstract it is stated that DMS and DLS activity converged after extensive. First, the authors have no direct evidence for such convergence as they did not record from naive to extensive. During the first 10 sessions (at the end of which performance plateaued) there is no sign of convergence. Second, it seems from Figure 2C that the increased similarity in coding arise from diminished modulation of both DLS and DMS population activity (Z score closer to zero). I am not sure that the authors can conclude in some kind of functional convergence while both regions are becoming less modulated.

6) The authors stated that they recorded in average 81 units per session in the DMS with 8 single electrodes implanted. That means they recorded routinely 10 well-isolated units per electrode. I doubt this is possible. Could the authors provide a view of these well-isolated clusters showing how they are separated from noise/MUA?

Reviewer #3:

The authors show that, although previous literature has shown the dorsomedial striatum (DMS) mediates goal-directed actions and the dorsolateral striatum (DLS) mediates habitual behaviors, the DMS remains engaged even after habit development. The recordings were done using a relatively novel behavioral task, which the researchers published on in 2017. The data are intriguing and represent an important challenge to current theories of habit formation, yet they are purely correlational. However, the lack of causal data might be balanced by a more thorough analysis of the recording data in correlation with specific behavioral events, and in this way the work could still be quite impactful as it challenges current theories of habit formation. We provide some suggestions for improvement below:

• The authors present data from an early training and extended training group of rats, yet there is not a clear justification of the time points used in these groups and no discussion of how the reader should interpret these different timepoints in relation to our broader understanding of habit. If behavior stabilizes by the end of early training, yet neural activity changes by the time the extended training group is recorded, how are we to interpret these results? Is it possible the data recorded in the early training group are correlated with habit acquisition while the data recorded in the extended training group are due to later consolidation? A discussion on what the change in signal represents (presumably not changes in behavior) would benefit the overall interpretation of the observed changes in DMS and DLS signals.

• The lack of data from a time point where the animals are not habitual limits the interpretations that can be made about signal changes that occur as habit forms.

• Figure 1C serves as the single piece of data to show animals are developing habit – however the authors note that the difference between valued and devalued responding is p=.06 which could be considered marginally significant. Looking at the individual data points reveals several animals that display noticeable decreases in lever pressing under the devalued condition, suggesting these animals are not yet habitual. Likewise, in the extended training group there are also multiple animals with noticeable decreases. The authors do make some attempt at analysis of the individual data in Supplementary Figure 12. We would suggest that this be moved into the main figures and analyzed further, as the results in the goal-directed vs. habitual rats are not identical. If the satiation probe is not reliably distinguishing differences, but is only good for distinguishing group averages (e.g. perhaps because the individual differences are simply due to an order effect of testing that needs to be counterbalanced in the group?), this limits the analyses that can be done, and should be noted.

• Some rats are trained with sucrose and some with pellets. The authors previously showed that reward type mattered for habit formation (Vandaele et al., 2017). It is not clear in this manuscript which rats were trained using which reward and whether that affected any of the results.

• Were recordings done during the probe sessions? These would be interesting to analyze and present, especially with respect to the individual differences in performance on the probe trials.

• Although clearly somewhat rare, it would be interesting to see what happens in these recordings on trials in which the rats failed to complete the 5 presses (failure to complete ratio with 1 minute).

• We wonder whether other outcome measures might be better for evaluating habit. It would be informative to see response rates and CV within-sequence for the probe trials. It would also be good to show 1st lever press latency in Figure 1 for comparison with Figure 8.

• In Figure 8 the authors use 1st lever press latency as a measure of habit, however this measure is not presented as a reliable measure of habit previously in the manuscript. Also in Figure 8C-F – why are the same outcome measures not shown for both DMS and DLS?

• The characterization of neurons used throughout the manuscript lacks nuance in terms of what these neurons may be doing. For instance, in Figure 3 the authors categorize neurons as either phasic start or stop neurons depending on their peak activity during the task, however the stop neurons look like they may also be signaling the port entry – a more careful timing analysis of this data like an analysis of recordings during a port entry that doesn't follow sequenced nose poking would help explain the nature of this late increase. Additionally, several neurons within the heatmaps seem to fall under multiple categories (start and middle; middle and stop; start, middle, and stop) – however there is no mention of neurons that are active across multiple behavioral time points. In Figure 5, it is noticeable that similarly categorized DMS and DLS neurons do look very different from each other in terms of overall activity.

• Interpretation of the results is limited by the fact that direct and indirect pathway neurons cannot be distinguished. Also, unless we are missing something, it does not appear that putative FSIs and unidentified neurons are included in the analyzed data.

Overall, this work has the potential to be very informative for understanding the role of striatal subregions in habitual responding. However, although the authors mention these findings are at odds with previous literature, they stop short of explaining why this may be or how we should view this circuit going forward in light of their findings, which ultimately limits the impact and significance of their work.

https://doi.org/10.7554/eLife.49536.027

Author response

Essential revisions:

1) The authors discuss "engagement" of DMS and DLS in habit formation. For instance, in the Abstract, the authors state that "These results suggest that behavioral sequences may continue to engage both striatal regions long after initial acquisition, when performance is highly regular and habitual". However, later in the manuscript, the authors also point out the difficulty of relating the patterns of activity observed in the data and specific phases of goal-directed to habitual transition. In the last paragraph of Discussion, the authors discuss this issue clearly: "Thus, it is not clear whether or how sequence-related activity relates to either skill learning/performance or habit learning/performance". The reviewers found this apparent discrepancy very confusing. First, are there any behavioral changes that occur during overtraining? The reviewers thought that other behavioral measures might be better for evaluating habit or the transition from the early to over-trained phases.

We did not find any obvious behavioral changes occurring during overtraining. We observed that some measures such as the response rate, the coefficient of variation of response rate, and the first lever press latency, reached an asymptotic level after extended training (Figure 1E-G). The first lever press latency across early training sessions and after extended training is now illustrated in Figure 1G on the recommendation of reviewer 3. But in answer to the broader question, it is not clear from this data set how the observed neural activity patterns relate to improved performance/more automatic performance over time. Because the activity patterns are by definition correlated in time with the occurrence of the salient events/actions including lever extension, lever presses, lever retraction and port entry, such neural activity changes are typically interpreted as critical for behavior production, behavior monitoring, or behavior feedback, and while the trial-by-trial correlations between neural firing and behavioral measures support this connection with real-time behavior, we can’t demonstrate a clear connection with the nature of the control of that real-time behavior, i.e., habitual or not, through observation of the activity patterns. Although there is not a qualitative change in the behavioral measures from the end of initial training to extended training, the changes in neural activity patterns over this period of time could, as a reviewer suggested, reflect changes in representation as consolidation processes develop over time. We are now better describing and discussing these important findings throughout the manuscript.

Second, the reviewers found that the last sentence of Discussion is insightful and provides a good summary of the implication of the study. We suggest that the authors emphasize the content discussed in this paragraph, and rewrite the Results and other parts of Discussion sections so that they are more consistent with one another.

We are now emphasizing the content of the last paragraph of the Discussion throughout the manuscript. In line with the first comment of reviewer 3, we emphasized the improvement in performance across early training sessions and suggest that skill consolidation is occurring after extended training. We are putting less emphasis on relations between neural activity and habitual/skill learning and rather suggest correlations with performance. We underline in the Results and Discussion the difficulty in relating activity in dorsal striatum with habitual learning in these subjects.

2) The reviewers thought that additional experiments can clarify some of the remaining issues. In particular, showing recording data from probe tests and trials when the sequence was not completed, showing recording data from even earlier in training (e.g. DT1), looking at side-by-side-recordings from rats doing free-running FR5, etc. (Please see the individual comments for detail). Although these experiments are not required in light of eLife's policy for requested revisions to be feasible within an approximate 2 month time frame, we urge the authors to consider these experiments, or at least discuss in the manuscript.

We are now introducing data recorded during DT1 early training sessions in Figure 2—figure supplement 2, which reveals an early change in DLS activity before the first lever press. We also discuss the need for additional experiments aimed at comparing DT5 recording results with neural activity during training in the free-running FR5 task.

3) The reviewers pointed out various issues in the data analysis (see below the individual comments). We request that the authors address these issues by additional analyses and revising the manuscript.

The different issues in data analysis have been addressed in various parts of the manuscripts. See below for specific details.

4) It is unclear at what point in training the DMS and DLS inactivations were done (i.e. after how many weeks/sessions of DT5 training). Please clarify. If it was after very extended DT5 training then this is really an important point: different than previous lesion studies, and important for their argument that DMS and DLS are still engaged in the task after the long extended training.

This information is now directly specified in the Results subsection “Pharmacological inactivation of both DLS and DMS interferes with task performance”. The DMS group received 20 days of DT5 training and the DLS group received 10 DT5 sessions. These groups cannot be considered as having received an extended training. This experiment is now presented in the aforementioned subsection to prevent any confusion.

5) The analysis comparing the activity patterns between goal-directed and habitual animals is very informative. We request that this analysis (Supplementary Figure 12) be moved to a main figure.

As requested, Supplementary Figure 12 in our original manuscript has been moved to the main text as Figure 8 and the results of the statistical analysis are now detailed in the subsection “Individual differences in sensitivity to outcome devaluation do not substantially correlate with dorsostriatal activity”.

Overall, we think that the authors can address the essential points and support the authors' main conclusions without additional experiments. Additional experiments are not required but would be very useful in clarifying specific issues.

We thank the reviewers and editors for the evaluation. We have addressed each comment to the best of our ability, and discuss our preference regarding additional experiments.

Reviewer #1:

Previous studies have indicated that the dorsomedial striatum (DMS) is involved in goal-directed behaviors while dorsolateral striatum (DMS) is involved in habitual behaviors. A common view is that behavioral overtraining shifts the involvement of DMS to DLS. In accord with this, several studies have indicated that patterns of neural activity that are thought to bracket or chunk a sequence of movements appear in the DLS after overtraining. Some recent studies, however, have provided data indicating that a simple dichotomy may not describe the functions of DMS and DLS accurately. It is therefore important to clarify whether and in what conditions these ideas hold.

In this study, Vandaele and colleagues addressed this question by recording the neuronal activity in the DMS and DLS while rats performed a discrete trials fixed ratio (DT5) task. In this task, rats have to press a lever five times before obtaining a food reward. A lever insertion signaled rats to initiate lever pressing and the lever was retracted immediately after a fifth-lever press. This task design builds on their previous work that showed that the insertion/retraction promotes habit formation compared to a similar task with a free-standing lever. In the present data set, the lever pressing behavior became more rapid and stereotyped over the course of ~10 days of training. After extended training (8 weeks), the lever pressing behavior became insensitive to devaluation, indicative of habit formation.

During initial training (<10 days of training), the authors observed a big difference in neuronal activity between the DMS and DLS. Many DMS neurons exhibited sustained inhibition while DLS neurons showed sustained excitation. Transient excitations at the initiation and termination of lever presses and sequential activations during task performance were observed in both areas. These differences were evident at the gross population level, at the individual neuron level, and based on a decoding analysis. Surprisingly, these differences diminished after extended training. Finally, inactivation of DMS and DLS impaired the stereotyped performance in a similar manner. These results suggested that habit formation and the appearance of task-bracketing activities can be dissociated depending on task conditions. The authors discuss potential reasons why the present results differ from previous studies, such as the role of cue (lever insertion/retraction) in triggering phasic responses in DMS.

Overall, the results are surprising and provides important insights into how DMS and DLS regulate operant behaviors. The inclusion of the data from extended training is illuminating. The data analyses were done at different levels in a compelling way and they together make a compelling case to support the authors' conclusion. The exact reason why the results are different from previous studies remains unclear but the authors make interesting discussions. This study raise various important issues that will open up future investigations.

I have some relatively minor technical concerns which are unlikely to change the authors' main conclusions but are important for understanding the data.

1) The exact reason why the present result differed from the previous work remains unclear. Another possibility is that the animal in this task do not really have to "know" how many lever presses are required to get reward because the lever is retracted immediately after the 5th lever press. It would be illuminating to include probe trials in which lever retraction is delayed. Does the animal leave the lever without retraction or does the leaving depend on retraction? The authors provide evidence supporting habitual nature of behavior (stereotypy, speed, and devaluation) but it would be interesting to discuss this point.

Absolutely. Our interpretation of behavioral findings in the DT5 task is that rats rapidly develop habit and behavioral chunking specifically because they are not required to track how many times to press the lever to get the reward. Rats can merely continue pressing until the retraction of the lever. We predict that during probe trials in which lever retraction is delayed, rats would keep pressing on the lever until its retraction. Indeed, using a task similar to the DT5 procedure, in which reward delivery was signaled by the lever retraction, we found that rats made on average 3.3 ± 0.5 additional lever presses during such probe trials, where lever retraction was delayed by 2 second from the fifth lever press.

We are now discussing this point in the Discussion: “In fact, in the DT5 task, rats do not have to track the number of presses emitted to obtain the reward and may continue pressing on the lever until its retraction, which may favor behavioral chunking and the specific activity pattern reported in this study”.

2) The authors use a Fourier analysis to first classify neurons into phasic versus non-phasic types. It remains somewhat unclear what aspects of neuronal activity this analysis actually captures. It would be useful to analyze and report this issue more carefully.

a) The non-phasic type or "sustained" activity may actually consist of multiple phasic activities. Are there neurons that exhibit multiple phasic activities (e.g. every lever press) in the data set? If so, how are these neurons classified?

Neurons showing significant responses to all 5 lever presses (both before and after lever presses) were found in both phasic and non-phasic classes: these neurons represented 41% of non-phasic neurons (N=46) and 1.4% of phasic neurons (N=9), when the activity pre- and post-lever press was compared to the baseline activity during inter-trial intervals. However, it is not possible from this analysis to determine whether neurons’ response to lever press events results from phasic or sustained activity. Analysis of the activity of the 9 phasic neurons responding to each lever press reveals that these few neurons appear to express sustained excitation or sustained inhibition during lever pressing, and their classification as phasic resulted from a large peak after the lever insertion and/or after the last lever press. The PSTHs in Author response image 1 illustrate the difference between neurons responding to every lever presses and classified as phasic or non-phasic. As the reviewer can see, Non phasic neurons expressed sustained excitation or sustained inhibition without peaks post lever insertion or post Lever retraction, which may explain why there were classified as Non phasic. This is now specified in the last paragraph of the Results subsection “Classification of distinct neural signatures in the dorsal striatum during extended sequence training”.

Author response image 1

The analysis was done by averaging across trials, and there is a possibility that it may smear out phasic responses. Clarifying these issues would facilitate better understanding of the analysis.

We do average across trials, but keep the time relative to the events. To create the vector for each neuron, normalized activity was first aligned to each event of the behavioral sequence, with time windows of -0.25s to 0.25s around events and time bins of 0.01s. The Fourier analysis was then conducted on one vector per neuron that represented the successive time windows that covered all 7 events of the behavioral sequence. Therefore, phasic activity represents neuronal responses to specific events. This is now clarified in the Materials and methods subsection “Classification of distinct neural activity patterns”.

As suggested by the reviewer, high response rates in the DT5 task could in theory confound our classification of phasic and non-phasic neurons; phasic responses to every lever presses could appear as a sustained response. However, if this were true, one would expect the number of neurons classified as non-phasic to be correlated with the response rate. Although we observed a rise in response rate across early training sessions, we did not observe any change in the proportion of non-phasic neurons (Figure 6—figure supplement 1B, χ2=9.0, p=0.437). Furthermore, we did not observe clear changes in the activity of neurons showing sustained excitation or sustained inhibition across early training sessions.

To further examine whether sustained activity may result from multiple phasic activations around the lever presses, we reproduced Figure 6 using peri-event time windows of 1.5s (from -0.75 to 0.75s around each events) instead of 0.5s (-0.25 to 0.25s). As can be seen in Author response image 2, phasic excitation at the lever insertion and at the termination of the sequence in phasic start and stop neurons was repeated from one event to the next one, but the activity returned to baseline before the next event. In contrast, the activity in non-phasic neurons, while certainly dynamic, does not return to baseline during the extended time window, even when lever presses were on average 1-2 seconds apart during the first three early training sessions (Day 1-3, response rate between 0.5-1 response/sec). These results suggest that sustained activity in non-phasic neurons does not result from multiple phasic responses around each lever press.

Author response image 2

b) In the panel showing the firing patterns of "phasic neurons" (Figure 3—figure supplement 1 and Figure 6—figure supplement 1), the neurons at the bottom (and the top) appear to have sustained activity although these neurons may be exhibiting multiple phasic responses. Please clarify.

Multiple phasic responses to every lever press events occurred rarely (see response to previous comment). However, although some phasic neurons appear to have sustained activity and fire above baseline during inter-response intervals over the 5 lever presses, this population only represented 1.4% of the phasic neurons, and typically included units that also displayed activity at lever insertion or retraction. Therefore, we think the occurrence of strong transient responses prominent at the start of the behavioral sequence (after the lever insertion) and at the end of the sequence (after the last lever press or before the port entry) are responsible for the classification of these neurons as phasic. This results in the presence of neurons classified as phasic but showing both sustained and phasic responses, as highlighted by the reviewer in the following comment. This observation is now included in the last paragraph of the subsection Classification of distinct neural signatures in the dorsal striatum during extended sequence training”.

c) The authors included non-responsive neurons in this analysis. Some of the "inhibited" or "excited" neurons may have actually been not responsive.

The reviewer is correct, we included non-task responsive neurons (as determined by traditional statistical analysis of activity around events vs. baseline activity) in the analysis to remain as objective as possible in our classification approach. We avoided pre-selecting neurons based on other arbitrary criterion. This represents our attempt to analyze the data using multiple approaches that are not dependent on one another. It is however important to point out that non-responsive neurons represented a minority of putative-MSNs in our study.

d) Some neurons appear to show both phasic and sustained responses. I understand that the number of clusters are verified by the Calinski Harabasz criterion. But this does not necessarily exclude the presence of these neurons.

We agree – some neurons, for example, showed sustained inhibition during lever pressing with a phasic excitation at the end of the behavioral sequence. In fact by varying the classification parameters (lower and intermediate frequency limits for the Fourier analysis, Figure 3—figure supplement 1) we sometimes obtained 3 significant classes, as indicated by the Calinski Harabasz criterion. When we obtain 3 classes, neurons in the intermediate class showed this combination of phasic and sustained features. Since we obtained two significant classes in most conditions (60% of tested parameter conditions), we conserved these two classes for subsequent analysis. However, we are now acknowledging the variety of activity patterns observed in this study in the last paragraph of the subsection “Classification of distinct neural signatures in the dorsal striatum during extended sequence training”.

We are now including further validation of our clustering approach by using a permutation test (presented in Figure 3—figure supplement 1) to demonstrate that the two classes provided by hierarchical clustering are indeed significantly different from each other.

3) Throughout the manuscript, the authors indicate that their analyses are "unbiased". It is unclear in what sense they are unbiased. The authors are making a number of choices toward particular analysis or criterion. Perhaps, use "objective"? or clarify in what sense unbiased.

We thank the reviewer for this suggestion; we are now using the word “objective” instead of “unbiased”.

4) In the correlation between neuronal activity and three phases of behavior (1st lever press latency, response rate, and port entry latency), the authors define the "execution" window as the window between the first and the last lever press. This is a variable window and therefore, will be highly anti-correlated with response rate even where there is no firing modulation. This analysis has to be done and reported carefully.

We understand the reviewer’s concern regarding this potential issue. Since the window of sequence execution (form 1st to last lever press) is variable in duration, the number of spikes within this period was normalized by the sequence duration. This is now more clearly explained in the Materials and methods section: “Neural activity during execution of the lever press sequence was computed by normalizing the number of spikes between the first and last lever press to the sequence duration”.

5) It is unclear from the results "monitoring" is a good way to characterize the activity correlated with behaviors. It is possible that these activities are controlling behaviors. Why does the author use "monitoring"? That being said, a similar idea was proposed by Rueda-Orozco and Robbe, 2015. It would be good to cite this paper.

We initially used monitoring since our data do not clearly show how the neural activity we measured might control behavior. As mentioned above, it is not clear to what extent the activity we see represents the control or monitoring of behavior. Thus to take into account the comment from the reviewer, we have tried to mostly stay descriptive, stating that the activity in DMS and DLS remains correlated with performance after extended training. What these activity patterns could encode is discussed in the Discussion. And, we are now citing the Rueda-Orozco and Robbe paper which is indeed particularly relevant for the present study.

Reviewer #2:

In this manuscript, Vandaele et al., recorded spiking activity simultaneously in the dorsomedial and dorsolateral regions of the striatum (DMS, DLS) during learning and performance of lever press sequence task. The authors wanted to examine the validity of a classical theory in the field: the DMS contributes to early phases of task learning while DLS contribute to performance once the task is habitual. The firing pattern of DMS and DLS population was distinct but did not evolve much during learning although they were more similar when recordings were performed after extensive training. The authors concluded that contrary to the classical view both regions are engaged throughout learning and long after learning.

The general conclusion is interesting and could be novel enough to deserve publication in eLife. But in my opinion there many contradictory results and odds/unclear analyses. In particular, the authors developed a functional classification of the neurons (phasic, no-phasic; start, stop, middle) which is statistically contestable and is used extensively in the manuscript. More important (and again in my opinion) this analysis does not bring much more information or additional evidence in regard to the conclusion of the paper, as mentioned in the Abstract. I also find that some important observation which contradicts previous classical studies are not sufficiently emphasized. Finally, I have a strong technical concern I would like the author to clarify.

We appreciate the reviewer’s point of view. In point of fact, the reviewer is correct that the classification approach we used in the paper provides results that do not appreciably differ from the analysis of the mean activity of all task responsive neurons and from the decoding analysis. Each of the three approaches indicates that the activity in DMS and DLS is more different during 10 sessions of DT5 training than after 8 weeks of DT5 training. We were indeed surprised not to find relations between any of our approaches and the behavioral improvements over early training. We are viewing this agreement among three different approaches as a strength of our paper. Our preference would be to keep the classification approach since we suppose a common criticism should we remove this section would be that different typical response patterns (stop/start/sustained) proposed to reflect chunking/skill learning would have shown development across session 1 through 10, in possible contrast to the conclusions from average activity and from decoding. Our aim is to describe this classification approach as efficiently as possible so that it does not detract from the overall message of congruence found with these three approaches.

1) The authors claim that they use an unbiased classification of their neurons but in fact, many aspects of this classification are arbitrary and have unclear statistical ground. The authors decided to use the power in low (<1Hz) and "intermediate" (1-4Hz) frequency domains of each neuron spike train as features for hierarchical clustering. To the best of my knowledge, it is the first time such a method is used (at least the authors do not provide references). First, this choice of frequency is arbitrary. Second power in spike train is quite a tricky measure and is strongly sensitive to the total numbers of spikes recorded. I doubt this is a good method for low firing rates observed in the striatum. In support of my doubt, the average power spectrum shown in Figure 3—figure supplement 1 peak at the lowest frequency value (first bin after 0 Hz). I don't think it is statistically valid to use power spectrum analysis while a majority of neurons do not show clear oscillatory patterns. In other words, the clustering algorithm is applied to noisy data with arbitrary threshold. Third, it is not because a clustering algorithm returns an optimal number of clusters that it means that data can be statistically separated. This requires combining the clustering algorithm with a permutation of the data points (shuffling of the paired value) and bootstrapping. Without showing such statistical disambiguation the authors cannot reject the hypothesis that there is no discrete categories but rather a continuum of firing rate profiles. By looking at the data in Figure 3—figure supplement 1 it appears that there is significant overlap between the 2 main categories: the phasic neurons in the upper part of Figure 3—figure supplement 1D are more similar to non-phasic neurons in the upper panel of Figure 3—figure supplement 1E, than to phasic neurons in the lower part of Figure 3—figure supplement 1D.

Without demonstrating that there are discrete functional categories of neurons (which is very unlikely by looking at the data in Figure 2), the authors should rewrite a large section of their manuscript and consider the possibility that striatum encodes continuously sensorimotor dynamics. This is an important point as this issue has been raised recently (a paper cited by the authors Sales-Carbonell et al., 2018 but see also Robbe, 2018).

We thank the reviewer for the suggestion of permutation test. We are now using this test to demonstrate that the two classes, phasic and non-phasic, provided by hierarchical clustering are indeed significantly different from each other (Figure 3—figure supplement 1F). We also agree with the reviewer that the choice of low (<1Hz) and intermediate (1-4Hz) ranges in frequency domain for the Fourier analysis is arbitrary, which we are now acknowledging in the Materials and methods subsection “Classification of distinct neural activity patterns”. Therefore, we manipulated these parameters to investigate the consistency in clustering results (Figure 3—figure supplement 1E). We varied the lower and higher frequency limits and assessed (1) the optimal number of cluster detected and (2) the distance between cluster means when 2 clusters were detected. We found that our results generalize well across ranges of frequency limit parameters (lower frequency limit: 0.3 to 1.0Hz; upper frequency limit: 2 to 8 Hz), the optimal number of clusters being 2 (60%) or 3 (33%) in 93% of parameter combinations. Furthermore, when 2 clusters were detected, cluster separability significantly departed from chance (permutation test) (Figure 3—figure supplement 1F). Finally, as can be seen in Author response image 3, 4 examples of hierarchical clustering providing 2 significant classes, the number and identity of Non Phasic and Phasic neurons was consistent across ranges of frequency limit parameters. In addition, the peaks in the frequency histogram depicted shows that around the same number of non phasic neurons were separated from the phasic neurons across all the parameter space, when two clusters were derived. Given that clustering results are not drastically impacted by the arbitrary choice of frequency range, we consider the range selected as reasonable. However, since this approach still requires arbitrary choices, we cannot characterize our analysis as “unbiased” and therefore replaced this word with “objective”.

Author response image 3

To the best of our knowledge, this is the first time such method is used to separate phasic and non-phasic neurons. Thus, we are now better justifying our approach in the Materials and methods subsection “Classification of distinct neural activity patterns”. We also highlight that the Fourier analysis was not used to characterize inherent oscillatory activity, but, instead, as a tool to search for distinctions among the time courses of event-evoked spiking patterns. We agree that analysis of frequency spectrum is sensitive to difference in firing rate, which may impact hierarchical clustering. However, consistent clustering results across ranges of frequency domains suggest that the analysis provides robust results across a range of firing rates.

Also, instead of using oscillatory analysis as features characterizing neuronal activity, why not using the averaged and normalized peri-event histograms (data in Figure 2D, E) and compute PCA on these to characterize two or three values that capture most of the variability of the peri-event histograms (see Sales-Carbonnel, 2018).

We thank the reviewer for this suggestion. Hierarchical clustering using neurons’ coefficients in the first three principal components revealed 4 clusters of neurons as indicated by the Calinski Harabasz criterion. The heatmaps of individual neuron z-scores in these 4 clusters are shown in Author response image 4. Although some clusters were characterized by a specific pattern of activity (i.e. excitation at the end of the sequence in the first cluster (far left), excitations at the boundaries of the sequence in the last cluster (far right)) we observe significant overlap in activity patterns across clusters. This overlap is also visible by looking at the 3D scatter plot of neuron coefficients in PC1, PC2 and PC3, shown in Author response image 5. For this reason, we are not confident enough in the clustering results to use this approach in the present study. In contrast, the overlap we observe from using a Fourier analysis as a tool to separate “phasic” and “non-phasic” behavior-related activity patterns is low (please see Figure 3—figure supplement 1 and Figure 6—figure supplement 1).

The Fourier analysis was motivated by our initial observations that some neurons expressed sustained activity whereas others showed more transient modulation of activity along the behavioral sequence. We sought a way to distinguish these two groups that might be preferable to doing so “by eye”. The results of hierarchical clustering met this objective. We therefore feel more confident in the approach currently employed in this study.

Author response image 4
Author response image 5

2) In the Abstract, the authors stated that DMS and DLS are "engaged" during learning. This statement is a bit contradictory with the fact that firing patterns in both regions do not evolve while animals improve in the task. Again, this is especially contradictory as the authors emphasized the existence of star and stop neurons. Usually, a causal role in behavior is inferred when there is a change in neuronal activity that correlates with a change in task proficiency (e.g., Barnes et al., 2005, Jin and Costa, 2015).

We agree with the reviewer that changes in DMS and DLS activity patterns do not match with improvement in performance during early training. In fact, we acknowledge this in the Discussion: “[…]. Thus, we cannot conclude that the neural correlates we report here mediate the development of habit or the improvement in performance over time”. Yet, we observed in both DMS and DLS behavioral correlates with performance and our inactivation experiments reveal the involvement of these regions in DT5 performance. For instance, we agree with the reviewer comment below about a possible role of striatum in vigor and movement speed. To be more clear in our wording, we have tried to improve our emphasis on correlations of DMS and DLS activity with performance rather than skill or habitual learning.

3) The results of the inactivation experiments do not fit very well with the general spin of the paper as there is no strong effect on the number of lever press. The strongest effect seems to be an increase in the latency to 1st press and a decrease in response rate. Again, this is difficult to reconcile with striatal neurons starting or maintaining or stopping sequences. On the other hand, this result with work emphasizing a role of the striatum in vigor/movement speed (Kim et al., 2014, Rueda-Orozco et al., 2015, Panigrahi et al., 2015).

We agree with the reviewer that the results of the inactivation experiments are in line with prior work demonstrating the role of dorsal striatum in performance attributes, such as movement vigor. We are now citing the references suggested by the reviewer to emphasize this result in the ninth paragraph of the Discussion. In agreement with the reviewer, we do not think the striatal activity patterns we observe in our task are encoding the initiation and termination of the sequence. In fact, one paragraph in the Discussion is devoted to point out that “start” and “stop” activity in the dorsal striatum may largely represent cue-elicited activations, rather than motor sequence initiation-cessation signals. The supplementary figure supporting this conclusion is now presented as a main figure (Figure 4) to emphasize this important element of discussion, and we have tried to make this clearer in the revised Results (subsection “Classification of distinct neural signatures in the dorsal striatum during extended sequence training”, third paragraph).

4) One of the most surprising aspects of this data is that the population of DLS neurons fire continuously during the sequence. This is in plain contradiction with the start and stop hypothesis or the task bracketing hypothesis (especially it contradicts Martiros et al., 2018 paper that used a very similar lever press task). On the other hand, this result is very similar to Sales-Carbonnel et al., 2018 and Rueda-Orozco and Robbe). I am really surprised that the authors do not emphasize such an important result.

We thank the reviewer for suggesting this emphasis, as we have a similar view. We are now emphasizing this result, as follows: “Sustained excitation in DLS neurons is also consistent with studies recording DLS activity during locomotion tasks involving motor control (Rueda-Orozco and Robbe, 2015; Sales-carbonell et al., 2018).”

5) The title of the paper is "Distinct recruitment of dorsomedial and dorsolateral striatum erodes with extended training". And in the Abstract it is stated that DMS and DLS activity converged after extensive. First, the authors have no direct evidence for such convergence as they did not record from naive to extensive. During the first 10 sessions (at the end of which performance plateaued) there is no sign of convergence.

This statement in the Abstract is now corrected: “Although DMS and DLS were differentially involved during early training, their activity was similar following extended training”.

Second, it seems from Figure 2C that the increased similarity in coding arise from diminished modulation of both DLS and DMS population activity (Z score closer to zero). I am not sure that the authors can conclude in some kind of functional convergence while both regions are becoming less modulated.

The degree of modulation of DLS and DMS population activity is actually not decreased after extended training: the proportions of task-responsive neurons are similar. Z-scores are closer to zero in Figure 2C because the relative proportions of excitations and inhibitions are similar after extended training, but unbalanced across region during early training (Figure 2A). This can be observed by an analysis of normalized activity after separation of the population into the excited and inhibited TRN. This analysis reveals that DLS and DMS neurons are similarly modulated across early and extended training (Author response image 6).

Author response image 6

6) The authors stated that they recorded in average 81 units per session in the DMS with 8 single electrodes implanted. That means they recorded routinely 10 well-isolated units per electrode. I doubt this is possible. Could the authors provide a view of these well-isolated clusters showing how they are separated from noise/MUA?

Ah, there has been a misunderstanding. The number of recorded units is reported across the entire group of rats (N=9 for the early training group). This point is now clearly specified in the first paragraph of the subsection “DMS and DLS neurons are differentially modulated in the DT5 procedure during early training but not after extended training”.

Author response image 7 illustrates the isolation of 2 different units using offline sorter.

Author response image 7

Reviewer #3:

The authors show that, although previous literature has shown the dorsomedial striatum (DMS) mediates goal-directed actions and the dorsolateral striatum (DLS) mediates habitual behaviors, the DMS remains engaged even after habit development. The recordings were done using a relatively novel behavioral task, which the researchers published on in 2017. The data are intriguing and represent an important challenge to current theories of habit formation, yet they are purely correlational. However, the lack of causal data might be balanced by a more thorough analysis of the recording data in correlation with specific behavioral events, and in this way the work could still be quite impactful as it challenges current theories of habit formation. We provide some suggestions for improvement below:

• The authors present data from an early training and extended training group of rats, yet there is not a clear justification of the time points used in these groups and no discussion of how the reader should interpret these different timepoints in relation to our broader understanding of habit. If behavior stabilizes by the end of early training, yet neural activity changes by the time the extended training group is recorded, how are we to interpret these results? Is it possible the data recorded in the early training group are correlated with habit acquisition while the data recorded in the extended training group are due to later consolidation? A discussion on what the change in signal represents (presumably not changes in behavior) would benefit the overall interpretation of the observed changes in DMS and DLS signals.

In the Introduction we now seek to provide an improved justification of the time points used in this study with respect to habitual learning and skill consolidation. In the Discussion, we now bring up the notion of skill consolidation to explain alterations in sequence-related activity without concomitant change in behavior, and thank the reviewer for that suggestion. In the last paragraphs of the Discussion we provide some interpretations (although speculative) for the absence of changes during early training and the alteration in sequence-related activity after extended training (Discussion, last paragraph). We agree with the reviewer that interpretation is challenging given that no obvious neural correlates that could “explain” the improvements in performance across the first 10 sessions were identified, and we now seek to discuss more completely what the changes in signal might represent.

• The lack of data from a time point where the animals are not habitual limits the interpretations that can be made about signal changes that occur as habit forms.

We are now acknowledging this limitation in the Discussion, and we propose future experiences comparing dorsostriatal activity in the DT5 task with the free-running ratio tasks in which behavior remains under goal-directed control, to address this issue (Discussion, sixth paragraph).

• Figure 1C serves as the single piece of data to show animals are developing habit – however the authors note that the difference between valued and devalued responding is p=.06 which could be considered marginally significant. Looking at the individual data points reveals several animals that display noticeable decreases in lever pressing under the devalued condition, suggesting these animals are not yet habitual. Likewise, in the extended training group there are also multiple animals with noticeable decreases. The authors do make some attempt at analysis of the individual data in Supplementary Figure 12. We would suggest that this be moved into the main figures and analyzed further, as the results in the goal-directed vs. habitual rats are not identical. If the satiation probe is not reliably distinguishing differences, but is only good for distinguishing group averages (e.g. perhaps because the individual differences are simply due to an order effect of testing that needs to be counterbalanced in the group?), this limits the analyses that can be done, and should be noted.

As suggested by the reviewer, we moved Supplementary Figure 12 to a main figure (Figure 8) and discuss more deeply differences in activity in rats sensitive or insensitive to devaluation (subsection “Individual differences in sensitivity to outcome devaluation do not substantially correlate with dorsostriatal activity”). A major limitation in this analysis is that the numbers of neurons become too small for comparisons of proportions of distinct class types. However, we can compare the activity of task-responsive neurons irrespective of individual classes. Additional analysis are provided in the Figure 8—figure supplement 1. Although a testing order effect cannot be definitely excluded; however, a baseline reinforced training session was systematically conducted between two tests under extinction to minimize this caveat. In addition, we only observed less responding in the second test session compared to the first in half of the rats in both early training and extended training groups.

• Some rats are trained with sucrose and some with pellets. The authors previously showed that reward type mattered for habit formation (Vandaele et al., 2017). It is not clear in this manuscript which rats were trained using which reward and whether that affected any of the results.

An analysis comparing behavior and DLS/DMS activity in rats trained with liquid sucrose or grain-based pellet is now included and presented in the Figure 8—figure supplement 2 (see also subsection “Individual differences in sensitivity to outcome devaluation do not substantially correlate with dorsostriatal activity”, last paragraph). We observed that similar proportion of rats trained with liquid sucrose or grain based pellet developed habits in contradiction with prior findings. It is possible that neural recording and tethering prevented expression of habit in a subgroup of rats trained with sucrose, thereby explaining the trend reported in sensitivity to outcome devaluation, in line with our observation that tethered behavior, not surprisingly, does not exactly replicate untethered behavior for many rats. However, the low number of animals in each condition, combined with significant inter-individual variability in sensitivity to satiety induced devaluation, impedes robust conclusion about reward type effect in the present findings.

• Were recordings done during the probe sessions? These would be interesting to analyze and present, especially with respect to the individual differences in performance on the probe trials.

We wish there were more subjects in the two groups that were and were not sensitive to reward devaluation to be able to better address this question. In addition, although recording was done during probe sessions, the limited number of trials precludes reliable statistical analysis of the activity of individual neurons, and many rats responded for fewer than the maximum 10 trials Analyzing this limited number of trials did not provide compelling and meaningful results as shown in Author response image 8. While I would like to read into some of the activity representations, the neuron number is rather small and the SEM in many cases large.

Author response image 8

• Although clearly somewhat rare, it would be interesting to see what happens in these recordings on trials in which the rats failed to complete the 5 presses (failure to complete ratio with 1 minute).

We agree with the reviewer. Unfortunately, omissions are too rare to allow any relevant analysis of the activity (at most 2 or 3 trials per sessions in a very limited number of sessions).

• We wonder whether other outcome measures might be better for evaluating habit. It would be informative to see response rates and CV within-sequence for the probe trials. It would also be good to show 1st lever press latency in Figure 1 for comparison with Figure 8.

Change in first lever press latency across early training and extended training sessions is now illustrated in Figure 1H. We agree that showing more measures could be useful for expressing the devaluation results. The additional measures suggested by the reviewer are depicted in Author response image 9. Normalized response rate and coefficient of variation of response rate were not significantly affected by devaluation in the early training group. We observed a significant effect of devaluation on normalized response rate on the extended training group, mainly due to the suppression of responding during the second trials in the devalued condition. Because of the discrete trial schedule, we feel that the usual measure used to assess habitual responding (normalized response rate) in free-running schedule does not generalize very well in this study since the discrete trials confine the ability to respond. Perhaps a useful measure is the number of trials upon which at least one lever press was recorded; this is now depicted in the manuscript in Figure 1I, and devalued responding was not significantly lower than valued. We also analyzed the mean first lever press latency during devaluation testing and found no significant differences between the valued and devalued conditions, despite a trend for the extended training group (Early training F1,8=1.65, p>0.2; Extended training F1,7=5.22, p=0.056).

Author response image 9

• In Figure 8 the authors use 1st lever press latency as a measure of habit, however this measure is not presented as a reliable measure of habit previously in the manuscript. Also in Figure 8C-F – why are the same outcome measures not shown for both DMS and DLS?

The Figure 8 in the original manuscript (Figure 9 in the revised manuscript) now includes the same variables for both DMS and DLS inactivation. The decrease in first lever press latency across sessions is now presented in the first figure of the manuscript. Importantly, we try to present multiple behavioral measures, but seek not to claim they are a firm measure of habit per se, with the exception of the devaluation group means. We instead note the progressive changes in latencies and response rates to document performance improvements over the first 10 sessions. To address the reviewer’s point, we try to be consistent in this approach since we can’t say whether these measures are indicative of habitual responding.

• The characterization of neurons used throughout the manuscript lacks nuance in terms of what these neurons may be doing. For instance, in Figure 3 the authors categorize neurons as either phasic start or stop neurons depending on their peak activity during the task, however the stop neurons look like they may also be signaling the port entry – a more careful timing analysis of this data like an analysis of recordings during a port entry that doesn't follow sequenced nose poking would help explain the nature of this late increase.

We thank the reviewer for this suggestion of analysis; we are now more thoroughly describing the activity of start and Stop neurons (Figure 4) (subsection “Classification of distinct neural signatures in the dorsal striatum during extended sequence training”).In addition, we have expanded our discussion of what these start and stop neurons might signal (Discussion, fifth paragraph), since the timing of them suggests that they may not be motor sequence initiation and cessation signals as we initially hypothesized.

Additionally, several neurons within the heatmaps seem to fall under multiple categories (start and middle; middle and stop; start, middle, and stop) – however there is no mention of neurons that are active across multiple behavioral time points. In Figure 5, it is noticeable that similarly categorized DMS and DLS neurons do look very different from each other in terms of overall activity.

The reviewer is correct in that there are neurons that can be observed to show more than one profile. The classification approach places a given unit into only one group, and we think on average this approach is grouping the neurons into the class that best matches their maximal absolute activity changes. We agree that analysis of these hybrid neurons could be insightful; however in the case of this already complex data set, we feel that taking into account each combination of activity would give us pretty low n’s in some of these categories, as well as impede the interpretation of the data and the readability of the manuscript. To address the reviewer’s point we now discuss the overlap in activity patterns with some neurons combining several profile of activity (subsection “Classification of distinct neural signatures in the dorsal striatum during extended sequence training”, last paragraph).

• Interpretation of the results is limited by the fact that direct and indirect pathway neurons cannot be distinguished. Also, unless we are missing something, it does not appear that putative FSIs and unidentified neurons are included in the analyzed data.

The reviewer is right; we cannot distinguish MSNs from the direct and indirect pathways in the present study. This would require using a viral approach or D1-Cre and D2-Cre line rodents to combine neuronal recording with optotagging or a switch to calcium imaging. Our findings are comparable to a body of past work using electrophysiological recording in striatum in mouse, rat, and monkey that does not distinguish these pathways.

Putative FSIs and unidentified neurons were not included in the analysis. Since we used arbitrary thresholds to separate FSI from MSNs, we ensured exclusion of FSI by also removing neurons showing intermediate features (unidentified neurons). Although we recognize the important role of this subtype of neurons in habitual and automated behavior, we were not confident enough in the identification of putative FSI to include them in the analysis, and their small number in early training sessions prevent any reliable analysis (at most one neuron in DLS or DMS on a given session). The PSTH represents the average z-score of all putative-FSI across early training and extended training sessions in DLS and DMS. The reviewer can note on the heatmaps shown in Author response image 10, the sustained activity (excitation or inhibition) of putative-FSI during early training sessions and after extended training. This sustained activity clearly could be important in driving the sustained responses we report in the manuscript. However, we feel we cannot strongly make any claims about this small number of units without some type of genetic approach allowing us to reliably distinguish these units from MSNs.

Author response image 10

Overall, this work has the potential to be very informative for understanding the role of striatal subregions in habitual responding. However, although the authors mention these findings are at odds with previous literature, they stop short of explaining why this may be or how we should view this circuit going forward in light of their findings, which ultimately limits the impact and significance of their work.

We thank the reviewer for this evaluation. To address this point, we are now putting more emphasis on differences and similarities with previous literature throughout the Discussion and provide in the last paragraph 2 possible explanations for our unexpected findings.

https://doi.org/10.7554/eLife.49536.028

Article and author information

Author details

  1. Youna Vandaele

    Department of Psychological and Brain Sciences, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    youna.vandaele@jhu.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-8389-8850
  2. Nagaraj R Mahajan

    Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, United States
    Contribution
    Formal analysis, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4437-2645
  3. David J Ottenheimer

    The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Johns Hopkins University, Baltimore, United States
    Contribution
    Formal analysis, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-4882-1898
  4. Jocelyn M Richard

    Department of Neuroscience, University of Minnesota, Minneapolis, United States
    Contribution
    Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0001-5750-0418
  5. Shreesh P Mysore

    1. Department of Psychological and Brain Sciences, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, United States
    2. The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Johns Hopkins University, Baltimore, United States
    Contribution
    Conceptualization, Resources, Supervision, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7781-8252
  6. Patricia H Janak

    1. Department of Psychological and Brain Sciences, Krieger School of Arts and Sciences, Johns Hopkins University, Baltimore, United States
    2. The Solomon H. Snyder Department of Neuroscience, Johns Hopkins School of Medicine, Johns Hopkins University, Baltimore, United States
    3. Kavli Neuroscience Discovery Institute, Johns Hopkins University, Baltimore, United States
    Contribution
    Conceptualization, Resources, Supervision, Funding acquisition, Investigation, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    patricia.janak@jhu.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-3333-9049

Funding

National Institute for Health Research (R01DA035943)

  • Patricia H Janak

National Institute on Alcohol Abuse and Alcoholism (R01AA026306)

  • Patricia H Janak

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Ethics

Animal experimentation: This study was carried out in accordance with the recommendations of the Guide for the Care and Use of Laboratory Animals (National Research Council, 1996), and was approved by the institutional animal care and use committee of Johns Hopkins University.

Senior Editor

  1. Kate M Wassum, University of California, Los Angeles, United States

Reviewing Editor

  1. Naoshige Uchida, Harvard University, United States

Reviewers

  1. Naoshige Uchida, Harvard University, United States
  2. David Robbe, INSERM U1249, Aix-Marseille University, France

Publication history

  1. Received: June 20, 2019
  2. Accepted: October 16, 2019
  3. Accepted Manuscript published: October 17, 2019 (version 1)
  4. Version of Record published: October 31, 2019 (version 2)

Copyright

© 2019, Vandaele et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,259
    Page views
  • 246
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: PubMed Central, Crossref, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)