Introduction

The neural network connecting the cerebral cortex, basal ganglia, and thalamus forms functionally heterogeneous loops distinguished by the topographic patterns of connectivity1,2, and is implicated in various learning processes by the integration of cognitive, sensorimotor, and reward information3,4. The striatum is a key node in the cortico-basal ganglia loops, and its segregated subregions are considered to mediate procedural learning, such as motor skills and instrumental behavior, through parallel processing via different loops57. Previous studies in humans and non-human primates have reported that the neural activity within the striatum shifts from the associative subregions (caudate and anterior putamen) to the sensorimotor subregions (posterior putamen) as motor learning progresses811. In rodents, an instrumental behavior initiates as goal-directed actions and then transits to stimulus-response habits12,13, requiring the dorsomedial striatum (DMS; a homologue to the caudate in primates) and the dorsolateral striatum (DLS; a homologue to the putamen in primates), respectively14,15. Based on these results, a model of procedural learning has been proposed, in which the functional dominance changes from the associative to sensorimotor subregions in the striatum during the learning phases16.

In contrast, deviations from the prior model are evident in other behavioral tasks on decision-making based on external sensory cues. For example, the neural activity in the caudate in humans is associated with learning of a visuomotor association task and the execution of learned behavior17,18. The tail of the caudate in non-human primates contributes to the automatic, habitual behavior in decision-making task using complex visual stimuli19. In rodents, the DLS is involved in the acquisiton and execution of the discrimination task with visual and auditory cues, wheras the DMS madiates the performance of learned behavior20,21. Moreover, other striatal subregions, such as the posterior striatum and ventrolateral striatum in rodents are also engaged in the auditory discrimination2224. Based on these results, a hypothesis has arisen that the acquisition of sensory cue-based decision-making requires wide-range spatiotemporal processing within the striatum. However, the neural mechanism by which the functional circuits linked to striatal subregions change throughout learning processes is not yet fully understood.

To investigate a large-scale striatal reorganization in accordance with the acquisition of external cue-dependent decisions, we conducted a two-alternative auditory discrimination task25. Using a small-animal neuroimaging technique, we found that the anterior dorsolateral striatum (aDLS) and posterior ventrolateral striatum (pVLS) are activated to the highest level at the middle and late stages during the acquisition phase of auditory discrimination, respectively. Then, chronic pharmacologic manipulations confirmed that the aDLS and pVLS are necessary for discrimination learning, and the transient manipulations also ascertained the function of these subregions in the learning at the corresponding stages. The resultant effects of the manipulations on the behavioral strategy showed that the aDLS promotes the strategy based on the stimulus-response association, while suppressing that based on response-outcome associations. Additionally, the pVLS engages in the formation and maintenance of the stimulus-response association strategy. Electrophysiological recording of striatal neurons indicated that subpopulations of aDLS neurons mainly represent the outcome of specific behaviors at the initial period of discrimination learning, whereas pVLS subpopulations encode the beginning and ending of each behavior in association with the progress of the learning. In addition, other subpopulations of aDLS and pVLS neurons show sustained activation after obtaining reward with distinct patterns of the combination between the stimulus and response. These findings reveal that the aDLS and pVLS regulate the acquisition of auditory discrimination through distinct spatiotemporal and functional manners, challenging the prior model which proposed a transition from the associative to sensorimotor striatal subregions during learning.

Results

Leaning processes of auditory discrimination

We employed a two-alternative auditory discrimination task, in which animals are required to make new associations between a tone instruction cue (2 or 10 kHz) and a response (pressing the left or right lever)25, setting the interval between the stimulus presentation and lever insertion to 3 s (Figure 1A). The success rate gradually increased, reaching a plateau around Day 13, which persisted through Day 24 (Figure 1B). When the success rates across Days 2, 6, 10, 13 and 24 were compared, significant differences were observed between Days 2 and 6, Days 6 and 10, as well as Days 10 and 13, showing no significant difference between Days 13 and 24 (Figure 1C). In contrast, the response time and omission rate did not vary among the training days (Figures 1D and 1E), and the variation of response bias decreased along with the progress of auditory discrimination (Figure 1F). Based on changes observed in the success rate, we categorized the learning processes into two phases: the acquisition phase, which lasted until Day 13; and the learned phase, which began after Day 13.

Behavioral performance of two-alternative auditory discrimination learning

(A) Schematic diagram of the auditory discrimination task. Each trial started by presentation of a tone instruction cue with the high (10 kHz) or low (2 kHz) frequency. Three seconds later, the room light was illuminated, and two retractable levers were inserted at the same time. The rats were required to press the right and left levers in response to the high and low tones, respectively. (B) Learning curve of the auditory discrimination in intact rats (n = 14). Inset indicates the cumulative curve of number achieved the success rate of more than 90%. (C) Success rate at Days 2, 6, 10, 13, and 24 (one-way repeated ANOVA, F[1.897,24.659] = 106.706, p = 1.1 × 10-12, post hoc Bonferroni test, Day 2 vs. Day 6, p = 0.001, Day 6 vs. Day 10, p = 3.6 × 10-4, Day 10 vs. Day 13, p = 0.040, and Day 13 vs. Day 24, p = 0.080). (D) Response time (one-way repeated ANOVA, F[2.203,28.642] = 2.386, p = 0.105). (E) Omission rate (F[4,52] = 0.699, p = 0.596). (F) Response bias (one-way repeated ANOVA, F[1.129,14.674] = 0.250, p = 0.653). Data are indicated as the mean ± s.e.m., and individual data are overlaid. *p < 0.05, **p < 0.01, and ***p < 0.001.

Distinct brain activity patterns in striatal subregions during learning processes

To investigate dynamic changes in regional brain activity in the entire striatum during auditory discrimination in the same animals, we conducted the positron emission tomography (PET) for small animals with 2-deoxy-2-[18F]fluoro-D-glucose (18F-FDG), which measures cerebral glucose metabolism correlated to brain activity2628 (Figure 2A). We performed 18F-FDG-PET scanning on Day 4 of the single lever press task without sound-based decision as a control condition and during a series of the trainings including the acquisition (Days 2, 6, and 10) and learned (Day 24) phases (Figure 2B). The average number of lever presses in the single lever press task exceeded 80 times (81.6 ± 1.3, mean ± s.e.m.) on Day 2 and maintained that level until Day 4 (Figure 2C). In the following auditory discrimination task, the success rate gradually increased along with the training days (Figure 2D), showing a significant difference among sessions in which we performed the 18F-FDG-PET scanning (Figure 2E). The response time and omission rate were consistent among the days, and the variation of response bias became smaller as the task progresses (Figure 2E). These performances were similar to those in the intact rats (compared to Figures 1C1F), suggesting that the procedure for 18F-FDG-PET scans does not affect the acquisition of discrimination.

Dissociable brain activity patterns during auditory discrimination learning among three striatal subregions

(A) Schematic illustration of an awake rat received intravenously 18F-FDG injection through an indwelling catheter attached to the tail. The rat was conducted for the behavioral experiment and then used for microPET imaging. (B) Schedule for the behavioral training and 18F-FDG-PET scan. After 18F-FDG injection, rats (n = 14 rats) were subjected to the behavioral experiment (30 min) and returned to the home cage (15 min). Ten min before the PET scan started, the rats were anesthetized and then the scan was started (30 min). (C and D) Learning curves of the single lever press task (C) and auditory discrimination task (D). Arrowheads indicate the PET scan days. (E) Behavioral performance on the scan days during the auditory discrimination. Success rate (one-way repeated ANOVA, F[3,39] = 125.012, p = 4.8 × 10-20, post hoc Bonferroni test; Day 2 vs. Day 6, p = 5.0 × 10-7; Day 6 vs. Day 10, p = 0.001; Day 10 vs. Day 24, p = 0.003), response time (one-way repeated ANOVA, F[3,39] = 0.156, p = 0.926), omission rate (one-way repeated ANOVA, F[3,39] = 1.559, p = 0.215), and response bias (one-way repeated ANOVA, F[1.779,23.130] = 17.734, p = 3.6 × 10-5) are shown. (F) Representative images of coronal sections that compared the brain activity in the single lever press task with the activity on Day 2, 6, 10, or 24 in the discrimination task. Left panels show schematic illustrations of striatal subregions. (G) Horizontal (top) and coronal (bottom) images of striatal activation areas shown on Day 10 vs. single lever in (F). (H) Representative images of coronal section that compared the brain activity on Day 2 with that on Day 6, 10, or 24 in the discrimination task. Color bars indicate the T-values, and a value of 3.8 was used as the threshold corresponding to the uncorrected threshold (p < 0.001). (I) Schematic pictures showing voxel of interests for the aDLS, pVLS, and DMS (green, left hemisphere; and purple, right hemisphere). (J-L) Regional 18F-FDG uptakes in the aDLS (J, one-way repeated ANOVA, left aDLS, F[4,52] = 10.322, p = 3.0 × 10-6, post hoc Bonferroni test; single lever vs. Day 6, p = 0.016; Day 2 vs. Day 6, p = 6.5 × 10-5; Day 2 vs. Day 10, p = 0.019; Day 2 vs. Day 24, p = 0.017; Day 6 vs. Day 24, p = 0.041; right aDLS, F[4,52] = 7.462, p = 7.9 × 10-5, post hoc Bonferroni test, Day 2 vs. Day 6, p = 2.9 × 10-4; Day 2 vs. Day 10, p = 0.036; Day 6 vs. Day 24, p = 0.005), pVLS (K, one-way repeated ANOVA, left pVLS, F[2.368,30.784] = 4.152, p = 0.020, post hoc Bonferroni test; single lever vs. Day 10, p = 1.8 × 10-5; right pVLS, F[4,52] = 5.995, p = 4.8 × 10-4, post hoc Bonferroni test, single lever vs. Day 10, p = 0.001; Day 2 vs. Day 10, p = 0.011; Day 6 vs. Day 10, p = 0.032), and DMS (L, one-way repeated ANOVA, left DMS, F[4,52] = 12.836, p = 2.4 × 10-7, post hoc Bonferroni test; Day 2 vs. Day 24, p = 1.8 × 10-5; Day 6 vs. Day 24, p = 4.8 × 10-4; Day 10 vs. Day 24, p = 5.5 × 10-5; right DMS, F[4,52] = 10.717, p = 2.0 × 10-6, post hoc Bonferroni test; single lever vs. Day 24, p = 0.036; Day 2 vs. Day 24, p = 0.011; Day 6 vs. Day 24, p = 2.6 × 10-4; Day 10 vs. Day 24, p = 0.011). Arrowheads indicate the day with the most activations throughout the learning process. Data are indicated as the mean ± s.e.m., and individual data are overlaid. The anteroposterior coordinates from bregma (mm) are shown (F–I). Scale bar; 2 mm (I). *p < 0.05, **p < 0.01, and ***p < 0.001.

The regional brain activity related to the behavioral task was analyzed with a voxel-based statistical parametric analysis26, in which the 18F-FDG uptake on either day of the discrimination task was compared with that on the single lever task. The task-related brain activity in the whole brain is summarized in Table S1. In the present study, we focused on the task-related activity in striatal subregions. We found significant increases of the activity in the unilateral aDLS and the bilateral pVLS on Days 6 and 10 during the acquisition phase (Figures 2F and 2G). However, no significant change was observed in the DMS during the acquisition phase except a unilateral decrease on Day 24 (Figure 2F). Next, to evaluate the activity related to the progress of learning, 18F-FDG uptake on Days 6, 10, and 24 was compared with that on Day 2. The learning-dependent brain activity was significantly increased in the bilateral aDLS and the unilateral pVLS on Days 6 and 10, respectively (Figure 2H), whereas the activity decreased in the bilateral DMS on Day 24 (Figure 2H). This decrease was also detected when 18F-FDG uptakes were compared between Days 10 and 24 (Figure 2H), suggesting the down-regulation of the DMS activity in the learned phase.

To quantitatively validate changes in the brain activity in striatal subregions, we analyzed the amount of 18F-FDG uptake in the voxel of interest of these subregions (Figure 2I). The 18F-FDG uptake in the bilateral aDLS reached a peak on Day 6, and it decreased during the subsequent days (Figure 2J), whereas the uptake in the bilateral pVLS gradually increased along with the progress of learning, showing the maximal value on Day 10 (Figure 2K). The uptake in the DMS did not alter during the acquisition phase, and it decreased on Day 24 (Figure 2L). These results suggest that the aDLS and pVLS contribute to learning processes in a different temporal pattern during the acquisition phase, whereas the DMS may be engaged to the execution of the behavior in the learned phase.

Excitotoxic lesion of the aDLS and pVLS, but not the DMS, disrupts the acquisition of auditory discrimination

To study whether the changed activity of striatal subregions is involved in learning processes, rats received a bilateral injection of solution containing ibotenic acid (IBO; 8 mg/mL, 0.3 μL/site) or phosphate buffered saline (PBS) into the subregions, and we conducted for the behavioral study. The range of IBO lesion in the subregions was analyzed by immunostaining with a neuronal marker NeuN after the behavioral tests (Figures 3A–3C). The number of single lever presses increased similarly along with the training in both the PBS- and IBO-injected groups into the aDLS, pVLS, or DMS along with the training (Figures 3D–3F). In the discrimination learning, increases of the success rate were impaired in both IBO-injected aDLS and pVLS groups, compared to the corresponding PBS-injected groups (Figures 3G and 3H). The response time in the aDLS-lesioned group did not alter during the discrimination training, whereas that in the pVLS-lesioned group was continuously prolonged (Figure. S1A and S1B), and the omission rate and response bias were similar between the lesioned and the corresponding control groups (Figures S1D/S1G for aDLS lesion and Figures S1E/S1H for pVLS lesion). In contrast, the success rate in the discrimination task was comparable between the PBS- and IBO-injected groups into the DMS (Figure 3I). The DMS lesion lengthened the response time through the learning (Figure S1C), whereas it did not alter the omission rate and response bias (Figures S1F and S1I). These results indicate that the aDLS and pVLS are required for the acquisition of auditory discrimination, whereas the DMS contributes to the execution but not learning of the discriminative behavior, being consistent with previous reports29,30.

Impacts of excitotoxic lesion of striatal subregions on the acquisition of auditory discrimination

Rats were given intracranial injection of PBS or IBO solution into the aDLS (n = 8 for each injection), pVLS (n = 8 for each injection), or DMS (n = 7 for PBS injection, and n = 6 for IBO injection) before start of the single lever press task. (A-C) Representative images of NeuN immunostaining and individual schematic illustrations showing lesioned area in the aDLS (A), pVLS (B), or DMS (C). Dotted lines in the images indicate the range of the lesioned area. ac, anterior commissure; cc, corpus callosum; and LV, lateral ventricle. (D-F) Learning curves during the single lever press task in the groups injected into the aDLS (D, two-way repeated ANOVA, group, F[1,14] = 1.275, p = 0.278, day, F[1.512,21.171] = 26.281, p = 7.0 × 10-6, group × day, F[1.512,21.171] = 0.834, p = 0.418), pVLS (E, two-way repeated ANOVA, group, F[1,14] = 2.542, p = 0.133, day, F[1.547,21.658] = 40.183, p = 2.2 × 10-7, group × day, F[1.547,21.658] = 0.953, p = 0.380), or DMS (F, two-way repeated ANOVA, group, F[1,11] = 0.025, p = 0.876, day, F[1.205,13.252] = 14.818, p = 0.001, group × day, F[1.205,13.252] = 0.610, p = 0.478). (G-I) Learning curves during the auditory discrimination task in the groups injected into the aDLS (G, two-way repeated ANOVA, group, F[1,14] = 11.578, p = 0.004, day, F[2.805,39.274] = 60.733, p = 1.8 × 10-14, group × day, F[2.805,39.274] = 3.863, p = 0.018), pVLS (H, two-way repeated ANOVA, group, F[1,14] = 7.221, p = 0.018, day, F[3.646,51.041] = 51.794, p = 9.2 × 10-17, group × day, F[3.646,51.041] = 1.612, p = 0.190), or DMS (I, two-way repeated ANOVA, group, F[1,11] = 0.469, p = 0.507, day, F[2.677,29.452] = 48.860, p = 3.9 × 10-11, group × day, F[2.677,29.452] = 1.094, p = 0.362). (J-L) Changes in the proportion of the WSW or LSL strategy through the discrimination learning in the groups treated into the aDLS (J, two-way repeated ANOVA, group, F[1,14] = 13.451, p = 0.003, day, F[3.048,42.673] = 54.777, p = 9.1 × 10-15, group × day, F[3.048,42.673] = 3.288, p = 0.029 for the WSW; group, F[1,14] = 5.039, p = 0.041, day, F[3.866,54.119] = 14.578, p = 5.2 × 10-8, group × day, F[3.866,54.119] = 2.471, p = 0.057 for the LSL), pVLS (K, two-way repeated ANOVA, group, F[1,14] = 9.251, p = 0.009, day, F[3.206,44.881] = 47.168, p = 2.7 × 10-14, group × day, F[3.206,44.881] = 1.862, p = 0.146 for the WSW; group, F[1,14] = 0.762, p = 0.397, day, F[4.773,66.815] = 24.018, p = 1.9 × 10-13, group × day, F[4.773,66.815] = 1.338, p = 0.260 for the LSL), or DMS (L, two-way repeated ANOVA, group, F[1,11] = 0.703, p = 0.420, day, F[3.052,33.568] = 48.424, p = 2.3 × 10-12, group × day, F[3.052,33.568] = 0.846, p = 0.480 for the WSW; group, F[1,11] = 0.002, p = 0.965, day, F[3.442,37.866] = 13.787, p = 1.0 × 10-6, group × day, F[3.442, 37.866] = 0.581, p = 0.654 for the LSL). Data are indicated as the mean ± s.e.m., and individual data are overlaid (except for panels J–L). The anteroposterior coordinates from bregma (mm) are shown (A–C). Scale bars; 2 mm (A–C). *p < 0.05 and **p < 0.01.

Previous studies have reported that the dorsal striatum is involved in both behavioral strategies based on the response-outcome association and the stimulus-response association during stimulus-response learning12,13. To assess the impact of striatal lesions on the behavioral strategies, we analyzed the proportion of response attributed to two representative strategies in all responses of each session. One is the “win-shift-win (WSW)” strategy, in which, after a correct response in the previous trial, rats press the opposite lever in the current trial in response to a shift of the instruction cue, resulting in the correct response. The WSW is considered to reflect the behavioral strategy based on the stimulus-response association31,32. Another strategy is the “lose-shift-lose (LSL)” strategy, in which, after an error response in the previous trial rats press the opposite lever in the current trial despite a shift of the instruction cue, leading to another error response. The LSL is considered to appear as a consequence of the behavioral strategy based on the response-outcome association31. In the PBS-injected group into either of three striatal subregions, the proportion of WSW and LSL strategies increased and decreased, respectively, during the acquisition phase (Figures 3J–3L). In the IBO-lesioned group into the aDLS, the increase in the WSW proportion was markedly impaired as compared to the corresponding control group, and the decrease in the LSL proportion was delayed compared to the same group (Figure 3J). In the pVLS-lesioned group, the increasing WSW proportion was also impaired as compared to the corresponding controls, although the change in the LSL proportion was comparable to the control response (Figure 3K). In contrast, in the DMS lesion group both WSW and LSL proportions did not show significant differences from the corresponding values in their control group (Figure 3L). These results indicate that the aDLS promotes the WSW strategy through stimulus-response association and suppresses the LSL strategy rooted in response-outcome association, and that the pVLS selectively facilitates the WSW strategy acquisition. Thus, these two subregions not only play roles in the acquisition of auditory discrimination but also control distinct behavioral strategies during the learning processes.

After the completion of discrimination training, we tested locomotion and feeding behavior of rats with aDLS and pVLS lesions. There was no difference in locomotor activity in the open field and the amount of food intake in both lesioned groups (Figures S1J-S1M), suggesting that learning deficits following excitotoxic lesions of the striatal subregions cannot be attributable to abnormalities in these general motor behaviors.

Inhibiting aDLS and pVLS activity impairs discriminative behavior at different stages of the acquisition phase

Our 18F-FDG-PET imaging study indicated that the brain activity in the aDLS and pVLS was increased in a different temporal pattern during the acquisition phase of auditory discrimination. To examine whether neuronal activity in striatal subregions is linked with the processes at different stages, we temporarily inhibited the activity in each subregion by the treatment with a GABAA receptor agonist muscimol (MUS). In a pilot study, we tested the effect of the bilateral injection of several doses of MUS (0.1, 0.2, and 0.5 μg/μL) into the aDLS or pVLS on the single lever press behavior and confirmed that the administration of the lowest dose of MUS did not affect the frequency of single lever presses and the quantity of food intake (Figures S2A–S2D). In addition, we divided the acquisition phase into three stages based on the success rate that included the early (< 60%), middle (60–80%), and late (80–100%) stages. To achieve the stage-specific inactivation of the aDLS or pVLS neurons, rats were given bilateral injections of MUS or saline (SAL) at three different timings during a series of the training. The first injection was conducted on Day 2 to target the early stage. The second and third injections were conducted on the day after the success rate had reached 60% and 80% for the first time through the training, respectively, to target the middle or late stage. After the injections, the rats were used for the behavioral test. Placement locations of the injection cannula into the striatal subregions were assessed by cresyl violet staining after the behavioral tests (Figures 4A and 4B). The success rates on the day before the injection (day N-1) at the early, middle, and late stages were similar between the two groups that received the bilateral injections of MUS or SAL (Figures S2E and S2F). Rats were then administered injections into the aDLS on the next day (Day N), and the success rate in the MUS-injected groups significantly decreased at the middle stage compared to the corresponding SAL-injected groups, but not at the early and late stages (Figure 4C, left). In contrast, the success rate in the MUS-injected groups into the pVLS significantly decreased at the late stage, but not at the early and middle stages, compared with the respective SAL-injected groups (Figure 4D, left). To normalize the variation of learning speed in the individual animals, we calculated the value by subtracting the success rate on Day N-1 from that on Day N in the individual animals. There were significant reductions in the subtracted rate at the middle stage in the MUS-injected group into the aDLS (Figure 4C, right) and at the late stage in the MUS-injected group into the pVLS (Figure 4D, right). These results indicate that the aDLS and pVLS mainly function at the middle and late stages, respectively, in the acquisition phase of auditory discrimination, supporting the results obtained from our imaging study.

Influence of transient inhibition of striatal subregions at different timings on the performance of auditory discrimination

Rats received intracranial injection of SAL or MUS solution into the aDLS (n = 8 for each injection) or pVLS (n = 6 for each injection). (A and B) Representative images of cresyl violet staining and schematic illustrations showing placement sites of the tip of guide cannula in the aDLS (A) or pVLS (B). Dotted lines in the images indicate the position of the cannula placement. Ac, anterior commissure; cc, corpus callosum; and LV, lateral ventricle. (C and B) Effects of transient striatal inhibition on the performance. For aDLS inhibition (C), success rate (left, two-way repeated ANOVA, stage, F[2,28] = 28.708, p = 1.7 × 10-7, group, F[1,14] = 8.840, p = 0.010, stage × group, F[2,28] = 0.757, p = 0.478; unpaired Student’s t-test, early, t[14] = 0.558, p = 0.586; middle, t[14] = 2.764, p = 0.015; Welch’s t-test, late, t[12.010] = 1.617, p = 0.132, p = 0.016 after a Bonferroni correction method) and subtracted success rate on day N-1 from that on day N (right, two-way repeated ANOVA, stage, F[2,28] = 1.231, p = 0.307, group, F[1,14] = 10.797, p = 0.005, stage × group, F[2,28] = 2.360, p = 0.113; unpaired Student’s t-test, early, t[14] = 0.285, p = 0.780, late, t[14] = 1.271, p = 0.225, Welch’s t-test, middle, t[9.603] = 3.331, p = 0.008, p = 0.016 after a Bonferroni correction method). For pVLS inhibition (D), success rate (left, two-way repeated ANOVA, stage, F[2,20] = 17.642, p = 3.8 × 10-5, group, F[1,10] = 43.942, p = 5.9 × 10-5, stage × group, F[2,20] = 4.729, p = 0.012; simple main effect, early, p = 0.055, middle, p = 0.109, late, p = 4.2 × 10-5) and subtracted success rate on day N-1 from that on day N (right, two-way repeated ANOVA, stage, F[2,20] = 12.105, p = 3.6 × 10-4, group, F[1,10] = 16.310, p = 0.002, stage × group, F[2,20] = 8.500, p = 0.002; simple main effect, early, p = 0.595, middle, p = 0.085, late, p = 3.0 × 10-6). (E and F) Subtracted proportion of the WSW or LSL strategy in the groups injected into the aDLS at the middle stage (E, unpaired Student’s t-test, t[14] = 2.038, p = 0.061 for the WSW, and t[14] = 2.714, p = 0.017 for the LSL) and the late stage (F, unpaired Student’s t-test, t[14] = 0.898, p = 0.384 for the WSW, and t[14] = 0.226, p = 0.824 for the LSL). (G and H) Subtracted proportion of the WSW or LSL strategy in the groups injected into the pVLS at the middle stage (G, unpaired Student’s t-test, t(10) = 1.924, p = 0.083 for the WSW, and Welch’s t-test, t[6.095] = 1.364, p = 0.221 for the LSL) and the late stage (H, unpaired Student’s t-test, t[10] = 3.629, p = 0.005 for the WSW, and t[10] = 1.577, p = 0.146 for the LSL). Data are indicated as the mean ± s.e.m., and individual data are overlaid. The anteroposterior coordinates from bregma (mm) are shown (A and B). Scale bars; 2 mm (A and B). *p < 0.05, **p < 0.01, and ***p < 0.001.

Next, we analyzed the influence of temporal striatal inactivation on the behavioral strategy at the middle and late stages of the acquisition phase. We calculated the proportion of WSW and LSL strategies, and subtracted the values of WSW or LSL proportion on Day N-1 from those on Day N. The proportion of both strategies on Day N-1 were comparable between the MUS- and SAL-injected groups (Figures S2G and S2H). In the aDLS, the subtracted proportion of LSL strategy at the middle stage significantly increased in the MUS-injected group as compared to the SAL-injected group, although the subtracted proportion of WSW strategy tended to decrease in the MUS-injected group as compared to the control group (Figure 4E), and the subtracted proportion of WSW or LSL strategy at the late stage was similar between both groups (Figure 4F). In the pVLS, the subtracted proportions of the WSW and LSL strategies at the middle stage were similar between the MUS- and SAL-injected groups (Figure 4G), and the subtracted WSW proportion at the late stage significantly decreased in the MUS-injected group as compared to the SAL-injected group, although the subtracted LSL proportion was comparable between the two groups (Figure 4H). Our results show that transient inhibition of aDLS activity at the middle stage affects the LSL strategy and has a moderate effect on the WSW strategy, supporting that the aDLS mediates the acquisition process of auditory discrimination at the middle stage at least partly through the suppression of the behavior based on the response-outcome association. Although the impact of inhibition on the WSW strategy was modest, excitotoxic lesion of the aDLS influenced the WSW strategy in addition to the LSL strategy (see Figure 3J), suggesting that the aDLS acts to enhance the auditory discrimination by promoting the stimulus-response association, together with the suppression of the response-outcome association. The discrepancy in the extent of changes in the WSW strategy between the two treatments may be generated from the difference in the timing or sustainability of drug inhibition of the striatal subregion. In addition, our results show that pVLS inactivation at the late stage selectively impairs the WSW strategy, supporting contribution of the pVLS to the process via progression of the stimulus-response association.

Neuronal activity in the aDLS and pVLS during the acquisition phase

To examine the firing activity in striatal subregions during the acquisition phase of auditory discrimination, we simultaneously recorded neuronal activity in the aDLS and pVLS at the unilateral (left) side by using a multi-unit recording procedure (Figure 5A). Using immunostaining for tyrosine hydroxylase (TH) after the recordings, we verified the location of electrodes within the striatal subregions (Figure 5B). After the single lever press task, we conducted the auditory discrimination task for the recordings. In correct trials, the delay period (0.5 ± 0.2 s) was inserted between a lever press and a sound indicating the reward to distinguish neuronal activity related to the two events (Figure 5C). The behavioral data in individual rats were divided into three grouped sessions corresponding to the early, middle, and late stages, based on a sigmoid function analysis33 of the success rate in the task. The success rates in the grouped sessions were 52.7 ± 0.8 %, 61.6 ± 0.7 %, and 85.3 ± 1.7 % (mean ± s.e.m.) for the early, middle, and late stages, respectively, and the rate was significantly increased throughout the three stages (Figure S3A). However, the response time, omission rate, and response bias were similar among the stages (Figures S3B-S3D). Thus, these stages corresponded to the timing window defined in the acquisition phase of the transient MUS inhibition experiment of neuronal activity.

Multi-unit recording of neurons in striatal subregions and firing activity related to the behavioral outcome of RS-HR neurons in the aDLS

(A) Schematic illustration of a freely moving rat used for simultaneous multi-unit recordings in the aDLS and pVLS. Six rats were used for the following analysis. (B) Representative images showing positions of electrode tips (arrows) in the aDLS and pVLS (left rows). The recording sites estimated by electrode tracks and electrical marks in individual rats are shown (center and right rows). ac, anterior commissure; LV, lateral ventricle. (C) Sequence of some events in correct and error trials. The delay periods were pseudorandomly added between a correct lever press and the reward sound (0.5 ± 0.2 s). The room light turned off the extended time (4 s) after the correct response or immediately after the error response. (D) Mean firing rate (top rows) and auROC values (bottom rows) of RS-HR neurons in the aDLS during the period when the RS is presented (green shadows, 500 ± 200 ms after the lever press) at the early (n = 43 neurons), middle (n = 37 neurons), and late (n = 44 neurons) stages. Time bins with significant differences between the mean firing rate in the HR or LR trial and any of the rates in other three trials (Wilcoxon signed rank test) or the distribution of auROC values and 0.5 (Wilcoxon rank-sum test) are represented by the circles at the top. (E) Averaged firing rate during the RS period in the HR and LR trials at the early, middle, and late stages (Kruskal-Wallis test: HR trial, χ2 = 3.028, p = 0.220; and LR trial, χ2 = 1.249, p = 0.536). (F) Cumulative probability of the proportion of RS-HR neuron number at the three stages (Kruskal-Wallis test, χ2 = 45.378, p = 1.4 × 10-10; post hoc Tukey-Kramer test, p = 7.1 × 10-5 for early vs. middle, p = 1.0 × 10-9 for early vs. late, and p = 0.040 for middle vs. late). Data are indicated as the mean ± s.e.m. (D) or the median and quartiles with the maximal and minimal values (E). The anteroposterior coordinates from bregma (mm) are shown (B). Scale bars; 2 mm (B). *p < 0.05 and ***p < 0.001.

We identified 1,062 and 423 well-isolated neurons from the aDLS and pVLS, respectively, across the three stages (Figure S3E) and focused on the major population of striatal neurons, putative medium spiny neurons (MSNs; Figures S3F and S3G). The total numbers of MSNs at the early, middle, and late stages were 295, 364, and 360 neurons for the aDLS, and 87, 110, and 190 neurons for pVLS, respectively. Based on the combinatorial pattern of the tone instruction cue and lever press in our discrimination task, we categorized the electrophysiological data into four trial types: (1) high-frequency tone/right lever press (HR) and (2) low-frequency tone/left lever press (LL) as correct responses; and (3) low-frequency tone/right lever press (LR) and (4) high-frequency tone/left lever press (HL) as error responses (see Figure 1A). We identified HR or LL type neurons showing significant changes in firing rate related to the instruction cue onset (CO), choice response (CR), reward sound (RS), or first licking (FL) as compared to the baseline firing rate in each of two trials (Mann-Whitney U test, p < 0.01). These neurons were further divided into two groups based on increased or decreased activity (Figures S4A and S4B). In the following analyses, we focused on the HR and LL type neurons with increased event-related activity to explore the firing patterns of neurons underlying behavioral processes of auditory discrimination.

Subpopulations of aDLS neurons, but not of pVLS neurons, show firing activity related to the behavioral outcome

Previous studies have reported that neurons in the dorsal striatum show the outcome-related activity in the instrumental conditioning tasks24,34. We thus investigated whether aDLS and pVLS neurons exhibit a response related to the behavioral outcome during the acquisition of auditory discrimination. In the aDLS, the firing activity of RS-related HR type (RS-HR) neurons showed a higher extent in the increase during a period, when the RS is presented in the correct responses, in the HR trial as compared with the LR, LL, or HL trial throughout the three stages (Figure 5D, top rows). The rate of these neurons also moderately increased during the same period in the LR trial compared to the LL or HL trial across the three stages, although the extent of the increase was lower than that in the HR trial (Figure 5D, top rows). The area under the receiver operating characteristic (auROC) values revealed a significant difference of the HR trial from the LR trial during the RS period, whereas the values of HL and LL trials indicated a significant difference in the opposite direction from the LR trial during the same period (Figure 5D, bottom rows). The averaged firing rate during the RS period in the HR or LR trial appeared to modestly increase at the middle stage, but no significant difference existed among the stages (Figure 5E). To directly compare the number of neurons showing event-related activity among different stages, we calculated the number of RS-HR neurons by matching the trial number at each stage (Figure 5F). The proportion decreased progressively from the early to middle stages, and to the late stage. In addition, the firing rate of RS-related LL type (RS-LL) neurons exhibited a higher increase in the LL trial and a modest increase in the HL trial while the averaged firing rate during the RS period was similar among the three stages. Furthermore, the proportion of RS-LL neurons reduced progressively through the stages (Figures S5A–S5C). These results indicate that the aDLS contained neurons related to the outcome of choice, showing a greater increase in the firing activity for the correct responses (HR and LL trials) than the error responses (LR and HL trials), and that the proportion of these subpopulations was gradually reduced through the progress of learning. In contrast, during the RS period at all stages, the firing rate and auROC values of RS-HR and RS-LL neurons in the pVLS revealed no significant differences across the four kinds of trial (Figures S5D and S5E), indicating that the firing activity of pVLS neurons did not appear to be associated with the behavioral outcome.

Subpopulations of aDLS and pVLS neurons show sustained firing activity after the reward in different manners

The dorsal striatum is known to demonstrate a response of neuronal activity after the positive outcome in the instrumental behaviors35,36. We thus examined whether aDLS and pVLS neurons show the response after obtaining the reward during the acquisition phase of auditory discrimination. In the aDLS, the firing rate of FL-related HR type (FL-HR) neurons indicated a sustained elevation after the FL in the HR trial as compared with any of the other three trials across the early, middle, and late stages (Figure 6A, top rows). The auROC values revealed a significant difference of the LR, LL, or HL trial from the HR trial across the three stages (Figure 6A, middle rows). In addition, we calculated the averaged firing rate and increased firing period, defined as the total number of time bins above 3 s.d. of the baseline firing rate for 5 s after the FL in the HR trial. The averaged firing rate was comparable between the early and middle stages, but the rate was significantly lower at the late stage compared to either of the other two stages (Figure 6B). The increased firing period was also comparable between the early and middle stages, showing a significant reduction at the late stage (Figure 6C). The proportions of FL-HR neurons were similar between the early and late stages, although they showed a slight decrease at the middle stage (Figure 6D). There were few neurons with significant correlations between the firing rate and licking performance (Figure 6E). In addition, the firing rate of the FL-related LL type (FL-LL) neurons exhibited the largest increase in the LL trial at the three stages, showing a tendency to reduce the average firing rate and increased firing period after the FL across the stages (Figures S6A-S6C). The proportion of FL-LL neurons during the learning processes displayed similar changes as observed in the FL-HR neurons (Figure S6D). These results indicate that the aDLS contained neurons showing long-lasting, increased activity after the reward in the specific correct combination of the stimulus and response (HR or LL trial), and that the extent of firing activity and length of firing period in the response appeared to be gradually reduced with no correlation between the proportion of neurons and the progress of the learning stages.

Sustained activity after the reward of FL-HR neurons in the aDLS and pVLS

(A) Firing rate (top rows), auROC (middle rows), and licking rate (bottom rows) after the reward of FL-HR neurons in the aDLS at the early (n = 27 neurons), middle (n = 31 neurons), and late (n = 49 neurons) stages. Time bins with significant differences between the mean firing rate in the HR trial and any of the rate in other three trials (Wilcoxon signed rank test) or the distribution of auROC values and 0.5 (Wilcoxon rank-sum test) are represented by the circles at the top. Timings of the RS and FL are shown. (B) Averaged firing rate for 5 s after the FL in the HR trial (Kruskal-Wallis test, χ = 16.396, p = 2.8 × 10, post hoc Tukey-Kramer test, early vs. middle, p = 0.726, early vs. late, p = 7.2 × 10-4, middle vs. late, p = 0.009). (C) Averaged total number of time bins above 3 S.D. of the baseline firing of aDLS neurons for 5 s after the FL in the HR trial (Kruskal-Wallis test, χ2 = 12.803, p = 0.002, post hoc Tukey-Kramer test, early vs. middle, p = 0.173, early vs. late, p = 0.001, middle vs. late, p = 0.219). (D) Cumulative probability of the proportion of FL-HR neuron number (Kruskal-Wallis test, χ2 = 39.280, p = 3.0[×[10-9, post hoc Tukey-Kramer test, early vs. middle, p = 1.3 × 10-7, early vs. late, p = 0.995, middle vs. late, p = 2.3 × 10-7). (E) Distribution of the correlation coefficient between the numbers of spikes of FL-HR neurons and licking. Closed column indicates the number of neurons showing significant correlations between the two parameters (5 out of 107 neurons). (F) Firing rate (top rows), auROC (middle rows), and licking rate (bottom rows) after the reward of FL-HR neurons in the pVLS at the early (n = 17 neurons), middle (n = 11 neurons), and late (n = 64 neurons) stages. Time bins with significant differences between the rate in the HR trial and any of the rate in other trials (Wilcoxon signed rank test) or the distribution of auROC values and 0.5 (Wilcoxon rank-sum test) are represented by the circles at the top. The timings of the RS and FL are shown. (G) Averaged firing rate of pVLS neurons after the FL in the HR trial (Kruskal-Wallis test, χ2 = 7.652, p = 0.022; post hoc Tukey-Kramer test, early vs. middle, p = 0.975, early vs. late, p = 0.076, and middle vs. late, p = 0.096). (H) Averaged total number of time bins above 3 S.D. of the baseline firing after the FL in the HR trial (Kruskal-Wallis test, χ2 = 0.023, p = 0.989). (I) Cumulative probability of the proportion of FL-HR neuron number (Kruskal-Wallis test, χ2 = 47.922, p = 3.9 × 10-11; post hoc Tukey-Kramer test, early vs. middle, p = 0.011, early vs. late, p = 1.7 × 10-4, and middle vs. late, p = 9.7 × 10-10). (J) Distribution of the correlation coefficient between the number of spikes in the pVLS and the licking. Closed column indicates the number of neurons showing significant correlations between the two parameters (5 out of 92 neurons). Data are indicated as the mean ± s.e.m. (A and F) or the median and quartiles with the maximal and minimal values (B, C, G, and H). **p < 0.01 and ***p < 0.001.

In the pVLS, FL-HR neurons showed a sustained elevation in the firing rate after receiving a reward in the HR trial compared to that in the LR or HL trial throughout the three stages (Figure 6F, top rows). A similar elevation in the firing rate was also observed in the LL trial throughout the stages (Figure 6F, top rows). The auROC values revealed a significant difference of the LR or HL trial, but not the LL trial, from the HR trial throughout the stages (Figure 6F, middle rows). The average firing rate did not show significant changes among the three stages (Figure 6G), and the increased firing period was also similar across the stages (Figure 6H). The proportion of FL-HR neurons increased at the late stage compared to the early or middle stage (Figure 6I). The firing rate showed almost no significant correlation to the licking performance (Figure 6J). In addition, FL-LL neurons exhibited similar firing patterns and changes in the proportion of neurons across the stages (Figures S6F–S6J). Therefore, the pVLS included neurons representing the long-lasting increased activity after the reward in the two kinds of correct responses (HR and LL trials) with the increased number of these neurons at the late stage of learning.

Neurons in the pVLS, but not in the aDLS, exhibit firing patterns related to the beginning and ending of the behavior

The dorsal striatum plays a key role in the acquisition and execution of behaviors based on the stimulus-response association, and the association is accompanied by a transient activation of neuronal firing immediately after the stimulus presentation or action in trials, which is considered to represent the beginning and ending of a learned behavior3739. We first examined whether the firing activity of CO-related HR type (CO-HR) or CO-related LL type (CO-LL) neurons corresponds to the beginning of the behavior in the auditory discrimination task. In the aDLS, the firing rate of CO-HR or CO-LL neurons showed a significant difference at some time points in each trial compared to those in any of the other three trials, with a significant difference of auROC values between the trials (Figures S7A and S7C). However, the transiently increased activity did not emerge at the time points corresponding to the beginning of the behavior. In contrast, CO-HR neurons in the pVLS showed a transient increase in the firing rate immediately after the CO, and the activity appeared at the middle and late stages (Figure 7A, top). The transient activation was observed not only in the HR trial but also in the other three trials, being consistent with the results of a previous report40. The increased firing rate showed a slight but notable difference between the HR and LL trials with a significant difference in the auROC values between the two trials (Figure 7A, low), suggesting that different combinations between the stimulus and response may affect the level of firing activity. Differences in the averaged firing rate between the two kinds of trial was observed at the middle and late stages (Figure 7B). The proportion of CO-HR neurons increased at the middle and late stages compared to the early stage (Figure 7C). In addition, the firing rate of CO-LL neurons exhibited patterns similar to those observed in the CO-HR neurons (Figure S7E). The data indicate that subpopulations of pVLS neurons show firing activity related to the beginning of the behavior, with a rising number of neurons throughout discrimination learning.

Transient activity related to the beginning and ending of a behavior of CO-HR and CR-HR neurons in the pVLS

(A) Firing rate (top rows) and auROC values (bottom rows) of CO-HR neurons in the pVLS at the early (n = 4 neurons), middle (n = 21 neurons), and late (n = 24 neurons) stages. Time bins with significant differences between the mean firing rate in the HR trial and either of the rate in other three trials (Wilcoxon signed rank test) or the distribution of auROC values and 0.5 (Wilcoxon rank-sum test) are represented by the circles at the top. (B) Averaged firing rate of CO-HR neurons in HR and LL trials (0-600 ms after the CO) (Wilcoxon signed rank test; early, p = 0.875, middle, p = 0.006, and late, p = 0.004). (C) Cumulative probability of the proportion of CO-HR neuron number (Kruskal-Wallis test, χ2 = 51.114, p = 8.0 × 10-12; post hoc Tukey-Kramer test, early vs. middle, p = 9.6 × 10-10, early vs. late, p = 0.002, and middle vs. late, p = 6.4 × 10-4). (D) Firing rate (top rows) and auROC values (bottom rows) of CR-HR neurons in the pVLS at the early (n = 11 neurons), middle (n = 22 neurons), and late (n = 41 neurons) stages. Time bins with significant differences between the mean rate in the HR trial and any of the rate in other trials (Wilcoxon signed rank test) or the distribution of auROC values and 0.5 (Wilcoxon rank-sum test) are represented by the circles at the top. The timing of the RS is presented. (E) Averaged firing rate of CR-HR neurons in HR and LL trials (0-600 ms after the CR) (Wilcoxon signed rank test; early, p = 0.278, middle, p = 0.758, and late, p = 0.279). (F) Cumulative probability of the proportion of CR-HR neuron number (Kruskal-Wallis test, χ2 = 13.786, p = 0.001; post hoc Tukey-Kramer test, early vs. middle, p = 0.003, early vs. late, p = 0.005, and middle vs. late, p = 0.976). Data are indicated as the mean ± s.e.m. (A and B) or the median and quartiles with the maximal and minimal values (B and E). **p < 0.01 and ***p < 0.001.

Next, we examined the firing activity of CR-related HR type (CR-HR) or CR-related LL type (CR-LL) neurons, corresponding to the ending of a behavior in auditory discrimination. In the aDLS, the firing rate of these neurons did not exhibit the transient increase related to the ending of the behavior, instead, there was a modest increase before the lever press (Figures S7B and S7D). In contrast, CR-HR neurons in the pVLS were transiently activated immediately after the lever press in the four kinds of trial at the middle and late stages, and the rate appeared to gradually decline toward the RS presentation (Figure 7D). The averaged firing rate was similar between the HR and LL trials though the stages (Figure 7E). The proportion of the CR-HR neurons increased at the middle and late stages compared to the early stage (Figure 7F). In addition, the rate of CR-LL neurons displayed similar firing patterns to those observed in the CR-HR neurons (Figure S7F). Thus, subpopulations of pVLS neurons also represent the activity related to the ending of the behavior in auditory discrimination, increasing the neuron number throughout the learning stages in a similar pattern to neurons relevant to the beginning of the behavior in the task.

Discussion

Our findings demonstrate that the aDLS and pVLS are required for the acquisition of auditory discrimination at different stages of learning, and that the DMS is not necessary for the acquisition of discrimination, although it appears to contribute to the execution of actions, in particular to the regulation of response time. These results not only support some recent studies reporting no requirement of the DMS in the procedural learning20,41,42, but also reveal important roles of two different subregions in the lateral striatum in the external cue-dependent decision-making. We propose that aDLS and pVLS neurons act to integrate the new learning of auditory discrimination in spatiotemporally and functionally distinct manners, showing a remarkable difference from the prior learning model that proposes the functional dominance changes from the DMS to DLS subregions during procedural learning1216.

Our 18F-FDG-PET imaging study demonstrated that the brain activity in the aDLS reaches a peak at the middle stage of the acquisition phase of auditory discrimination (Figure 2I). The results of excitotoxic lesion experiment of striatal subregions suggest that the aDLS functions to enhance the acquisition of discrimination through the promotion of behavior based on the stimulus-response association and the suppression of behavior based on the response-outcome association (Figure 3J). Transient inhibition experiment of neuronal activity confirmed this aDLS function at the middle stage of the learning, although the effect of drug inhibition was moderate for the promotion of the stimulus-response behavior (Figures 4E and 4F). In addition, we found subpopulations of aDLS neurons (RS-related neurons) which represent the behavioral outcome and differentiate the correct (HR or LL) and incorrect (LR or HL) choices (Figures 5D and S5A). The distinct responses of these neurons may explain the mechanism by which the aDLS regulates the stimulus-response and response-outcome associations through distinct neural circuits leading to a differential control of the two associations. Alternatively, the responses mainly promote the behavior based on the stimulus-response association, thereby secondarily inhibiting the response-outcome association. Indeed, some previous studies have suggested that an action based on either the stimulus-response association or response-outcome association is generated as a result of competition between different striatal subregions43. We also found other subpopulations of aDLS neurons (FL-related neurons) showing long-lasting increased activity after obtaining reward in response to a specific combination between the stimulus and response (HR or LL trial) (Figures 6A and S6A). This activity may function as the feedback signal to promote the stimulus and response association, resulting in the enhancement of discrimination learning. The proportion of neurons related to the behavioral outcome showed a gradual reduction through the progress of auditory discrimination (Figures 5E and S5C), and the extent of the firing activity and the length of the firing period of neurons showing sustained activation after reward were also decreased at the late stage of learning (Figures 6B and 6C). The learning-dependent changes in the properties of these subpopulations in the aDLS may explain the mechanism underlying the earlier acquisition process of the auditory discrimination.

The 18F-FDG-PET imaging indicated that the pVLS activity was gradually elevated through the acquisition phase of auditory discrimination, showing the maximal value at the late stage (Figure 2J). The lesion experiment suggested that the pVLS drives the formation of discrimination learning via the selective progression of the stimulus-response association (Figure 3K), and the transient inhibition experiment ascertained the pVLS function at the late stage of learning (Figures 4G and 4H). Previous studies have shown that the stimulus-response association is encoded in the DLS or VLS as the transient neuronal activity representing the beginning and ending of a learned behavior37-39,44. A recent study reported that the VLS is causally related to learned performance based on the auditory stimulus, while the DLS shows a weaker relationship to the behavior22. In the present study, we found subpopulations of pVLS neurons that represent the beginning and ending of a behavior (CO- and CR-related neurons, respectively) (Figures 7A, 7D, S7E, and S7F), showing an increased number of neurons along with the progress of discrimination learning (Figures 7C and 7F). These data suggest that pVLS neurons contribute to the formation of the stimulus-response association through the learning processes. In addition, we found another subpopulation of pVLS neurons (FL-related neurons) that are continuously activated after the reward outcome in response to two kinds of correct choices (HR and LL trials) (Figures 6F and S6F), and the proportion of neurons was increased at the late stage (Figures 6I and S6I). This activity may act as a feedback signal to consolidate the stimulus and response association, leading to the maintenance of associative memory at the late stage during the acquisition phase of learning.

In the present study, we identified neurons with four kinds of event-related activity in the aDLS and pVLS, and classified them into the HR and LL type neurons in which their firing activity showed a significant increase in the HR and LL trials, respectively, as compared to the baseline (Figures 57 and S5-S7). Parts of these two types of neurons with the respective event-related activity overlapped, and the numbers of overlapping neurons varied between striatal subregions (Figures S8A and S8B). In the pVLS, the overlapping neurons for the same event-related activities were 35.1%, 45.2%, and 67.0% in the CO, CR, and FL, respectively. In contrast, the aDLS had lower overlaps of 12.6% and 22.7% in the RS and FL, respectively. Thus, distinct neurons in the aDLS tend to have properties of a single trial type while the same neurons in the pVLS seem to possess properties of both types. One possible explanation for the diversity in the proportion of overlapping neurons between the two subregions may be the difference of input patterns into striatal neurons or plastic changes in functional connectivity during learning processes. The detailed mechanism by which aDLS and pVLS neurons react to the events in a different manner should be investigated in the future.

Anatomically, striatal subregions are known to form heterogeneous cortico-basal ganglia loops in mice2,45,46. We examined the neural circuits linked to the aDLS and pVLS in rats by using bidirectional tracers. The results showed that two striatal subregions receive innervations from different subregions in the cerebral cortex, intralaminar thalamus, and substantia nigra pars compacta, and send projections to different subregions in the basal ganglia nuclei, including the globus pallidus, entopeduncular nucleus, and substantia nigra pars reticulata (Figure S9), confirming that the aDLS and pVLS form the heterogenous cortico-basal ganglia loops in the rat brain. These data suggest that the aDLS and pVLS mediate the acquisition of auditory discrimination via the parallel loops, and cooperate to achieve neural processing related to the dual regulation of the stimulus-response and response-outcome associations. Further studies are needed to elucidate the mechanism by which the two kinds of loops are processed separately and integrate the new learning of discrimination.

Materials and Methods

Animals

Animal care and handling procedures were carried out in accordance with the guidelines established by RIKEN Center for Biosystems Dynamics Research, Fukushima Medical University, and Osaka City University. All procedures were approved by their Institutional Animal Care and Use Committees. All efforts were made to minimize the number of animals used and their suffering throughout the course of the experiments. Male Long-Evans rats (8–13 weeks old, Institute for Animal Reproduction (Ibaraki, Japan)) were used for the present study. The rats were maintained at 22 ± 2 °C and 60% humidity in a 12-h light/12-h dark cycle, and food and water were continuously available unless stated otherwise.

Surgery

The rats underwent a stereotaxic surgery under isoflurane anesthesia (1–5%). For excitotoxic lesion of striatal neurons, IBO was dissolved in PBS (0.8 mg/mL, Wako Pure Chemical), and IBO solution (0.3 μL per site) was injected bilaterally with the coordinates (mm) from bregma and dura for the aDLS (AP +1.6, ML ±3.3, DV −3.9), pVLS (AP −1.6, ML ±4.6, and DV −5.6), or DMS (AP −1.6, ML ±1.4, and DV −4.8) according to the atlas of rat brain47. Injection was performed through a glass pipette (diameter: 60 μm) at a constant velocity of 0.1 mL/min with the microinfusion pump (EICOM). For transient inactivation of striatal neurons, two guide cannulae (length: 10 mm, outer diameter: 26 gauge) were bilaterally placed on the skull and fixed using four screws and dental cement. The chip of the cannula was placed at 1 mm above the target coordinates. A solution containing MUS (Sigma-Aldrich) in saline (0.1 μg/μL, 0.25 μL per site) was injected into the target coordinates through the internal cannula (outer diameter: 30 gauge) at a constant velocity of 0.25 μL/min. For anterograde and retrograde tracing, a solution containing cholera toxin subunit B (CTb; 1.0 mg/mL) conjugated to Alexa 488 or Alexa 555 (Invitrogen) as a bidirectional tracer was injected into the aDLS and pVLS (0.3 μL/site), respectively. The coordinates for the injection into striatal subregions were the same as described above. For the electrophysiological experiments, surgery was performed as described previously48. A 64-channel silicon probe (Buzsaki64spl; NeuroNexus) attaching to a 3D-printed microdrive with a movable screw was chronically implanted with the coordinates (mm) from the bregma and dura into the left aDLS (AP +1.6, ML −3.3, and DV −2.5) and pVLS (AP −1.6, ML −4.6, and DV −3.6). A faraday cage composed of copper mesh was secured on the skull with dental cement to reduce electrical noise and avoid damage to the implants. The rats were allowed to recover for 1 week after the surgeries.

Behavioral analysis

Behavioral tasks were conducted in operant conditioning chambers (30.5 × 24.1 × 29.2 cm; ENV-007CT; MED Associates) equipped with two retractable levers on both sides of a reward port in the front panel (ENV-203M-45, ENV-652AM, and ENV-201A; MED Associates). A multi-tone generator (ENV-223; MED Associates) and room light were mounted on the top of the center and corner panels on the rear side, respectively. For the electrophysical experiment, a 3D printed reward port was used to detect the licking and to prevent contact with the head attachment and front panel. The behavioral experiment in intact rats was performed using grain pellets (F0021, Bioserv) as a food reward. To prevent spillover of radioactivity into the temporal cortex as a consequence of excessive 18F-FDG accumulation into the muscles by their activity to eat the pellets49,50, the PET imaging experiments were carried out using 0.02% saccharine solution (B0131, Tokyo Chemical Industry) as a liquid reward. The electrophysiological experiment was also carried out using the saccharine solution. The pharmacological experiment was performed using the pellets.

The rats were subjected to dietary restrictions of 12 g/day (CE-2, CLEA Japan) and water access was limited to 6 to 9 h in accordance with the reward used in the experiments for 3 days before the beginning of the training schedule, and the body weights of the rats were maintained at ∼85% of their normal weights. The training schedule consisted of four continuous steps. In the first step (magazine training), the rats were habituated to the operant conditioning chambers, and the reward was presented with the click every 20 s. Each daily session was started by illumination of the room light, and continued for 30 min. The sessions were conducted for 3 consecutive days. In the second step (shaping), only one lever on the left or right side was inserted. When the rats pressed the lever, they earned a reward accompanied by a click. Each daily session was started by illumination of the light, and lasted 30 min or until the criteria, in which the rats received 90 rewards. When the number of lever presses on either side reached the criteria, the lever was inserted in the opposite side on the next day. When the rats achieved the criteria on both sides for 2 or 3 days, they moved to the following step. In the third step (single lever press task), each trial was started by illumination of the light and insertion of one lever on either the left or right side in a pseudorandom manner with equal frequencies on both sides for every set of four trials. When the rats pressed the lever, the reward was presented with the sound, and the lever was immediately retracted. The room light turned off 4 s after the lever press. When they did not press the lever within 10 s after the start, the trial was terminated when the light turned off. The trial was repeated every 20 s. Each daily session continued for 30 min, and the sessions were conducted for 4 consecutive days. The number of lever presses in each session was counted. In the final step (two-alternative auditory discrimination task), each trial was started by the presentation of a high (10 kHz) or low (2 kHz) frequency tone as an instruction cue in a pseudorandom manner. After three seconds, the room light was illuminated, and two retractable levers were inserted at the same time. The rats were required to press the left or right levers in response to the low or high tone cues, respectively. When the rats pressed the correct lever, the reward was presented with the sound, and the levers were immediately retracted. Correct response allowed the rats to access the reward: the retractable spout delivering water was inserted for 1 s for the PET imaging experiment; a pellet was delivered by a food dispenser for the pharmacological experiment; and three drops of water were delivered through a solenoid valve for the electrophysiological experiment. The room light turned off 4 s after the correct response. On the other hand, error response resulted in the retraction of levers and the turning off of the room light. No response within 10 s after the lever insertion was recorded as an omission response. The success rate in each session was calculated by dividing the number of correct response trials with the number of lever press trials minus the number of omission trials. The trial was repeated every 20 s. Each daily session continued for 60 min, and the sessions were conducted for 14 or 24 days according to the experiments.

To assess the behavioral strategies in the auditory discrimination task, a trial-by-trial analysis was conducted; the WSW strategy was defined as a correct response, in which, after a correct response during the previous trial, the rats press the opposite lever during the current trial in response to the shift of the tone instruction cue; the LSL strategy was defined as an error response, in which, after an error response during the previous trial, the rats press the opposite lever during the current trial despite the shift of the instruction cue. The proportion of both strategies was calculated by dividing the number of each strategy with the number of lever press trials minus the number of omission trials.

For the transient inactivation experiment, MUS injections were bilaterally performed 20 min before start of the session. For habituation, all rats were injected with saline bilaterally into the corresponding striatal subregions on the last day (Day 4) of the single lever press task. During the auditory discrimination task, the first injection was conducted on Day 2 of the task. The second and third injections were conducted the next day, after the success rate had exceeded 60% and 80%, respectively. Each session continued for 15 min.

For the electrophysiological experiments, a delay period (0.5 ± 0.2 s) was added in the auditory discrimination task between the lever press and reward sound to distinguish the motor- and reward-related responses of neurons. When the rats pressed the lever, the lever was immediately retracted. After the delay period, the reward was presented with a sound. The room light turned off 4 s after the lever press. The trial was repeated every 20 s, and each daily session continued for 60 min.

PET imaging analysis

An 18F-FDG-PET scan was performed with a microPET Focus220 (Siemens Medical Solutions, Knoxville) designed for high-resolution imaging of small animals (spatial resolution of 1.4 mm in full width at half maximum at the center of the field of view) as described previously26. Serial 18F-FDG-PET scans were performed on the last day of the single lever press task and on Days 2, 6, 10, and 24 of the auditory discrimination task. Under the awake condition, a solution of 18F-FDG (ca. 74 MBq/0.4 mL) was intravenously injected just before the beginning of each behavioral session through an indwelling catheter attached to the tail. After the 30-min session, the rats were returned to their home cage, and then a 30-min static PET scan was performed under anesthesia with 2.0– 2.5% isoflurane 55-min after the 18F-FDG injection. During the PET scan, the body temperature of the anesthetized rats was maintained at 37°C using a small animal warmer (BWT-100A, Bio Research Center). The acquired emission data were sorted into a single sinogram and reconstructed by a standard 2D filtered back projection (FBP) with a ramp filter and cutoff frequency at 0.5 cycles per pixel or by a statistical maximum a posteriori probability algorithm (MAP), 12 iterations with point spread function effect.

Voxel-based analysis of PET images was conducted according to the method described previously26. Briefly, individual MAP-reconstructed FDG (MAP-FDG) images were aligned to a FDG template image of Long Evans rats using PMOD software package (version 3.2, PMOD Technologies), and then the aligned images were transformed into the space of a T2 weighted MRI templates for the rats (https://www.nitrc.org/projects/tpm_rat/). The transformation parameters obtained from individual MAP-FDG images were used to match the individual FBP-reconstructed FDG (FBP-FDG) images with the MRI templates. The voxel size of the template was magnified by 10 to a size similar to that of a human brain. All FBP-FDG images were resampled with a voxel size of 1.2 × 1.2 × 1.2 mm and spatially smoothed through an isotropic Gaussian kernel (6-mm FWHM). Voxel-based statistical analysis was assessed by using the Statistical Parametric Mapping software (version 8; https://www.fil.ion.ucl.ac.uk/spm/). The voxel of interest (VOI) was determined as voxels with more than 85% of the maximum T-value within activated areas obtained by voxel-based statistical comparison of FBP-FDG images. The average regional 18F-FDG uptake was calculated on the basis of the values of the VOI in FBP-FDG images.

Electrophysiological recording

Extracellular multi-unit recording was conducted at a sampling rate of 20[kHz from the striatum of freely moving rats by using a 512-ch acquisition board (Open Ephys) via 64-ch recording headstages (C3325; Intan Technologies). The tips of the probe were lowered by turning the screw (> 140[µm/day) until it reached the target region (the aDLS and pVLS) during the single lever press task. The tips were sometimes lowered by the same procedure (140 to 280 µm/day) after the daily recording session to improve cell yield.

Spike sorting was performed automatically by the Kilosort software (https://github.com/cortex-lab/KiloSort). Clear noise clusters were removed in the first manual sorting using the phy GUI (https://github.com/cortex-lab/phy). The remaining clusters were then re-clustered by using the Klusta Kwik (https://klusta.readthedocs.io/en/latest/), and the clusters were adjusted in the second manual sorting by using the phy GUI again. The sorted clusters satisfying all the following criteria were used for further analysis: L-ratio < 0.05, isolation distance > 15, and ISI index < 0.251,52. Well-isolated single neurons (n = 1,485; aDLS, n = 1,062; pVLS, n = 423) were classified as putative medium spiny neurons (MSNs), fast spiking interneurons (FSIs), tonically active neurons (TANs), and unclassified interneurons (UIs). Putative MSNs and TANs were separated from FSIs and UIs by trough-to-peak duration of mean spike waveform (> 0.6 ms) according to previous studies, with slight modifications53,54. Putative FSIs were then separated from UIs by utilizing the proportion of time associated with long inter-spike intervals. This was achieved by summing the inter-spike intervals longer than 2 s and subsequently dividing the resulting sum by the total recording time (PropISIs > 2 s)55. Neurons with values of PropISIs > 2 s less than 0.4 were classified as putative FSIs, while neurons with PropISIs > 2 s greater than 0.4 were classified as putative UIs. The putative MSNs and TANs were separated by measuring the post-spike suppression55. We measured the length of suppressed time during which the firing rate of a neuron was suppressed following a spike by counting the number of 1-ms bins in its autocorrelation function until the rate of the neuron was equal to or greater than its averaged rate over the 600-ms to 900-ms autocorrelation bins. Putative TANs were separated by post-spike suppression of less than 40 ms, and the remaining neurons were classified as putative MSNs. Because the numbers of FSIs, TANs, and UIs were low (79 out of 1,485, 5.3%), they were excluded from further analysis.

To divide the three stages during the acquisition phase, the sigmoidal function was fitted using the following four-parameter logistic equation:

Where α is the inflection point, β is a variable for slope, and λ and γ are parameters related to the lower asymptote (γ) and upper asymptote (λ) of the learning curve. The function was used to evaluate the day when the success rate reached 20% (t20%) or 90% (t90%) of the maximum rate change defined by the difference between the γ and λ. Three stages were defined according to the following criteria: early stage (up to 3 days before the middle stage); middle stage (3 days including before and after the day of t20%); and late stage (3 days including the day of t90% and before). One rat learned the discrimination faster, and the success rate reached t90% 2 days after the day of t20%. One day before the day of t90% was assigned to the late stage. For the following analysis, spike data in each striatal subregion were separated into the early, middle, and late stages (n = 295, 364 and 360, respectively, for the aDLS; and n = 87, 110 and 190, respectively, for the pVLS).

To determine the neurons showing significant changes in firing rate related to the task events, electrophysiological data were grouped into four trial types based on the instruction cue (high or low tone) and the choice (right or left lever); HR, high tone and right lever press; HL, high tone and left lever press; LR, low tone and right lever press; and LL, low tone and left lever press. Neurons with three or more trial data were analyzed. We identified the HR or LL type neurons, detecting significant increases or decreases of the firing rate by applying the Wilcoxon rank-sum test (p < 0.01) to trial-by-trial spike number between the baseline period (BL, −1,500 to −500 ms or −1,100 to −500 ms before presentation of the instruction cue) and any of four peri-event periods (CO, 0 to +1,000 ms after the onset of the instruction cue; CR, −700 to +300 ms around the time of lever press; RS, −300 to 300 ms around the time of sound onset instructing the reward delivery; and FL, 0 to +1,000 ms after the time of first licking after the CR). Any spikes around the events were averaged in 50-ms bins. The event-related activity was shown as averaged firing rates aligned based on the timing of an event for each trial. The auROC values were calculated by using the spike counts between the HR or LR trials and any of the other three trials. Bins showing significant differences in the auROC data were determined by applying the Wilcoxon rank-sum test (p < 0.05) to the distribution of 200 ms bin-by-bin auROC values and 0.5.

To compare the number of neurons showing event-related activity among different stages, the number of HR or LL type neurons were calculated by matching the trial number at each stage. Twenty trials were randomly selected for a neuron, and neurons with less than 20 trials were removed from the analysis. The neurons with event-related activity were identified as described above. The proportion (percentage) of neurons relative to the total number of neurons was calculated at each individual stage. The same analysis was repeated 30 times and the distribution of the proportion of neurons with event-related activity at the stage was obtained. The cumulative probability against the proportion was calculated using cumulative distribution function (cdfplot, MATLAB code from Mathworks).

Licking responses were detected with an infrared photobeam sensor placed in the 3D printed reward port, and generated timestamps for the onset of each spout contact according to the analog voltage signal. The licking responses corresponding to four kinds of trial in the FL-related neurons were shown as averaged rates (50-ms bins) aligned based on the timing of the CR. To calculate the correlation coefficient between neuronal activity and licking responses, the spikes and licking for 5 s after the FL were counted every trial for each FL-related neuron. Trials in which neurons never responded were removed from both the spike and licking data. The correlation coefficients between the two remaining data were calculated for each neuron (corr, MATLAB code from Mathworks), and neurons showing significant differences were determined by applying the Pearson correlation coefficient (p < 0.01).

Histology

After the behavioral experiments, the rats were anesthetized with pentobarbital (50 mg/kg body weight) and isoflurane (3%), then transcardially perfused 4% paraformaldehyde in phosphate-buffer (PB). The brains were post-fixed overnight at 4 [and cryoprotect with 10, 20, and 30% sucrose in PB. Brain tissues were cut into 30-µm sections by using a cryostat (Leica). For the lesion experiment, cresyl violet staining or immunostaining were performed to define the lesioned areas in the striatum. The sections were stained with a mouse monoclonal anti-NeuN IgG (1:1000 dilution; Millipore, Cat# MAB377, RRID:AB_2298772), and then with a goat anti-mouse IgG conjugated with Alexa Fluor 488 (1:400 dilution; Thermo Fisher Scientific, Cat# A-11029, RRID:AB_2534088). The sections were counterstained with 0.3 μM 4’,6-diamidino-2-phenylindole (DAPI; Sigma-Aldrich). For transient inactivation experiment, sections were stained by cresyl violet to visualize the positions of guide cannulae. For the electrophysiological experiments, small lesions were made by applying an electrolytic current (3 μA, 10 s) through top and bottom channels of each shank by using the stimulus isolator (A365, World Precision Instrument). Sections through the striatum were stained with a mouse monoclonal anti-TH IgG (1:1,000 dilution; Millipore, Cat# MAB318, RRID:AB_2201528), and then with a goat anti-mouse IgG conjugated with Alexa Fluor 488 (1:400 dilution; Thermo Fisher Scientific, Cat# A-11029, RRID:AB_2534088). The recording sites were verified by the staining with DAPI and red fluorescent Nissl stain solution (1:200 dilution; Thermo Fisher Scientific, Cat# N21482, RRID:AB_2620170). Fluorescence and immunostained signals were visualized under fluorescence microscopes (BZ-X800; Keyence) and confocal microscope (A1R; Nikon).

Statistical analysis

Statistical analysis was performed using MATLAB (MathWorks) and SPSS Statistics (Ver. 27, IBM). Comparisons of data between two groups were performed using the Student’s t-test (paired), the Wilcoxon signed rank test (paired), and the Wilcoxon rank-sum test (unpaired). Comparisons of data among three groups were constructed using a one-way repeated measures ANOVA, two-way repeated measures ANOVA, or the Kruskal-Wallis test (without assuming normal distribution and homogeneity of variances). All statistical tests were two-sided. The Bonferroni post-hoc test and Mann–Whitney U-test with Bonferroni adjustment were carried out as needed to compare means. For boxplots, the data points below the lower (25th percentile) −1.5 interquartile range (IQR) and above the upper (75th percentile) quartiles +1.5 IQR were regarded as outliers, and they are not included in the boxplots.

Acknowledgements

We thank Drs. Keiji Ota, Hikaru Yokoyama, and Sho Aoki for instructing the behavioral analysis and Dr. Hideyuki Matsumoto for technical advice and discussion. We thank the Research Support Platform, Osaka Metropolitan University Graduate School of Medicine for microscopic imaging. This work was supported by grants-in-aid for JSPS Fellows (19J01997) and Scientific Research (C) (21K11556) from the Japan Society for the Promotion of Science (S.S.); a grant-in-aid for Scientific Research on Transformative Research Areas (A) Adaptive Circuit Census (21H05244) from the Ministry of Education, Science, Sports, and Culture of Japan (K.K.); the Grant for Casio Science Promotion Foundation, Takeda Science Foundation; and the Nakatomi Foundation (S.S.).

Author contributions

S.S., Y.C., and K.K. designed the study. S.S. performed the small-animal neuroimaging, multi-unit recordings, tracing studies, and pharmacological experiments with technical support from all authors; T.O., D.H., Y.W., H.O., K.H., and Y.C. assisted with the neuroimaging studies. K.N. performed the tracing study, and N.S. assisted with the immunostaining. H.M., T.K., and K.M. assisted with the multi-unit extracellular recordings. S.S. and K.K. wrote the manuscript with assistance from all coauthors.

Declaration of interest

The authors declare no competing interest.

Data availability

All source data underlying the graphs and charts are available in Supplementary Data. All other data are available upon request. The datasets generated and analyzed in this study are deposited on Mendeley data (DOI: 10.17632/ghzr838bdk.1).