1. Neuroscience
Download icon

Coupling between motor cortex and striatum increases during sleep over long-term skill learning

  1. Stefan M Lemke
  2. Dhakshin S Ramanathan
  3. David Darevksy
  4. Daniel Egert
  5. Joshua D Berke
  6. Karunesh Ganguly  Is a corresponding author
  1. Neuroscience Graduate Program, University of California, San Francisco, United States
  2. Neurology Service, San Francisco Veterans Affairs Medical Center, United States
  3. Department of Neurology, University of California, San Francisco, United States
  4. Department of Psychiatry, University of California, San Diego, United States
  5. Weill Institute for Neurosciences and Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, United States
Research Article
  • Cited 0
  • Views 1,105
  • Annotations
Cite this article as: eLife 2021;10:e64303 doi: 10.7554/eLife.64303

Abstract

The strength of cortical connectivity to the striatum influences the balance between behavioral variability and stability. Learning to consistently produce a skilled action requires plasticity in corticostriatal connectivity associated with repeated training of the action. However, it remains unknown whether such corticostriatal plasticity occurs during training itself or ‘offline’ during time away from training, such as sleep. Here, we monitor the corticostriatal network throughout long-term skill learning in rats and find that non-rapid-eye-movement (NREM) sleep is a relevant period for corticostriatal plasticity. We first show that the offline activation of striatal NMDA receptors is required for skill learning. We then show that corticostriatal functional connectivity increases offline, coupled to emerging consistent skilled movements, and coupled cross-area neural dynamics. We then identify NREM sleep spindles as uniquely poised to mediate corticostriatal plasticity, through interactions with slow oscillations. Our results provide evidence that sleep shapes cross-area coupling required for skill learning.

Introduction

Cortical and basal ganglia circuits regulate behavioral variability, as evidenced in habit development (Gremel et al., 2016; O'Hare et al., 2016; Rueda-Orozco and Robbe, 2015; Malvaez and Wassum, 2018; Lipton et al., 2019; Yin and Knowlton, 2006), skill learning (Santos et al., 2015; Kupferschmidt et al., 2017; Koralek et al., 2012; Yin et al., 2009), as well as the pathophysiology of neuropsychiatric disorders such as obsessive-compulsive disorder and autism spectrum disorder (Vicente et al., 2020; Shepherd, 2013). In the case of skill learning, the ability to consistently produce a skilled action is accompanied by emerging coordinated neural activity across the motor cortex and striatum during action execution (Santos et al., 2015; Lemke et al., 2019; Koralek et al., 2013). Skill learning has also been associated with striatal NMDA receptor activation (Santos et al., 2015; Koralek et al., 2012Jin and Costa, 2010Dang et al., 2006), suggesting that the activity-dependent potentiation of cortical inputs to the striatum may be required (Calabresi et al., 1992; Charpier and Deniau, 1997). However, little is known about the specific activity patterns that may drive corticostriatal plasticity, or when they occur, during skill learning.

One intriguing possibility is that neural activity patterns during ‘offline’ periods, or time away from training such as sleep, play a central role in driving corticostriatal plasticity during skill learning. This possibility is motivated by evidence that ‘reactivations’ of training-related neural activity patterns during sleep promote motor skill learning (Yang et al., 2014; Ramanathan et al., 2015; Gulati et al., 2014; Kim et al., 2019). Moreover, it has been proposed that both cortical and subcortical brain areas are engaged during the sleep-dependent consolidation of motor skills (Vahdat et al., 2017; Boutin et al., 2018; Doyon et al., 2018; Doyon and Benali, 2005). However, the specific activity patterns that may impact cross-area connectivity during sleep, or how such sleep-dependent plasticity may impact network activity during subsequent awake behavior, remains unknown.

Here, we monitor the corticostriatal network throughout reach-to-grasp skill learning in rats. We establish that the coupling between motor cortex and striatum increases offline during skill learning, and identify sleep spindles during non-rapid-eye-movement sleep (NREM) as uniquely poised to mediate such plasticity, through interactions with slow oscillations (SOs). We first show that blocking striatal NMDA receptor activation during offline periods following training disrupts the emergence of a consistent skilled action. We then show that corticostriatal functional connectivity increases offline, rather than during training itself, and that such offline plasticity tracks increased movement consistency and emerging coupled cross-area neural dynamics during action execution. We then demonstrate that sleep spindles in NREM uniquely facilitate corticostriatal network transmission and link the modulation of M1 and DLS neurons during sleep spindles following training to the preservation of corticostriatal functional connectivity. Finally, we provide evidence that the temporal proximity between sleep spindles and SOs influences the impact that sleep spindles have on the corticostriatal network. These results provide evidence that NREM rhythms play a role in strengthening cross-area connectivity in the corticostriatal network during skill learning.

Results

We implanted six adult rats with either microwire electrode arrays (n=5) or custom-built high-density silicon probes (Egert et al., 2020) (n=1) in the primary motor cortex (M1) and the dorsolateral striatum (DLS), which receives the majority of M1 projections to the striatum (Figure 1a; Figure 1—figure supplement 1; Aoki et al., 2019). Neural activity in both regions was simultaneously monitored as rats underwent long-term reach-to-grasp skill training (range: 5–14 days). On each day, rats were placed in a custom-built behavioral box (Wong et al., 2015) and neural activity was recorded during a 2–3 hr pre-training period (consisting of both sleep and wake periods), a 100–150 trial training period, and a second 2–3 hr post-training period (Figure 1b; pre-training period length: 154.1±6.1 min, post-training period length: 159.4±5.4 min, mean± SEM, n=56 days). Rats learned a reach-to-grasp task which involved reaching through a small window in the behavioral box to grasp and retrieve a food pellet. During pre- and post-training periods, behavioral states—wake and sleep (NREM and REM) —were classified using standard methods based on cortical local field potential (LFP) power and movement measured from video or electromyography (EMG) activity (Watson et al., 2016).

Figure 1 with 5 supplements see all
Blocking offline striatal NMDA receptor activation disrupts skill learning.

(a) Schematic of recording locations in primary motor cortex (M1) and dorsolateral striatum (DLS) and anterograde tracing of M1 projections showing direct input to the DLS. (b) Schematic of each day’s recording periods throughout long-term training. (c) Spatial reaching trajectories (individual trials in gray overlaid with mean trajectory in color) and mean reaching velocity profiles on each day of training in example animal. (d) Correlation between each day’s mean velocity profile and the final day’s mean velocity profile (average of x and y dimensions) for each day of training in example animal (top) and across animals (bottom; individual animals as black lines; last day of training, which served as template, is excluded in each animal). (e) Correlation between each day’s mean velocity profile and the final day’s mean velocity profile for example animal with post-training DLS infusions of AP5 or saline (top) and across animals (bottom; individual animals as black lines; last day of training, which served as template, is excluded in each animal). (f) Comparison of total change in velocity profile correlation across days with either post-training saline infusions, post-training AP5 infusions, or no infusions as in learning cohort animals (individual animals as gray dots and mean± SEM across animals in color; last day of training, which served as template, is excluded from calculations).

Blocking offline striatal NMDA receptor activation disrupts skill learning

With repeated days of training on the reach-to-grasp task, animals developed both a consistent spatial reaching trajectory and temporal velocity profile (Figure 1c). To measure learning, we quantified day-to-day changes in the velocity profile, as this captured the combination of individual movements (i.e., reach toward pellet, time spent interacting with pellet, and retraction with pellet) into a consistent skilled action and was less constrained by the task than the spatial reaching trajectory. Comparing the trial-averaged velocity profile on each day of training to the trial-averaged velocity profile on the last day of training, which served as a learned ‘template,’ revealed that a consistent day-to-day velocity profile emerged within the first 8 days of training (Figure 1d; Figure 1—figure supplement 2). Single-trial peak reaching velocity also generally increased across training days, correlated to the consistency of the velocity profile (Figure 1—figure supplement 3; r=0.55, P=5×10−5, Pearson’s r), consistent with previous work suggesting that movement speed is a relevant aspect of skill learning (Lemke et al., 2019; Hikosaka et al., 2013).

We next tested whether disrupting offline striatal activity and plasticity impacted learning. We trained a new cohort of animals (n=six rats) for 10 days, infusing either 1 µl of NMDA receptor antagonist AP5 (5 µg/µl) or saline into DLS immediately after training on each day (Figure 1e). This revealed that offline striatal NMDA activation was important for skill learning, as day-to-day changes in velocity profile correlation were significantly decreased with AP5 infusions, compared to saline infusions or changes observed in the learning cohort (Figure 1f; Figure 1—figure supplement 4; n=six rats with AP5 infusions, −0.11±0.06 total correlation value change, mean± SEM, n=six rats with saline infusions, 0.16±0.06 total correlation value change, n=six rats in learning cohort, 0.25±0.08 total correlation value change; AP5 infusions vs. saline infusions: P=8×10−3, Wilcoxon rank-sum test, AP5 infusions vs. learning cohort: P=0.48, Wilcoxon rank-sum test, saline infusions vs. learning cohort: P=2×10−3, Wilcoxon rank-sum test; for all animals, the last day of training which served as template was excluded from total correlation value change calculation). Day-to-day changes in single-trial peak reaching velocity for animals receiving either AP5 or saline infusions followed a similar trend (Figure 1—figure supplement 5).

Corticostriatal functional connectivity increases offline during skill learning

Given the importance of offline striatal NMDA receptor activation for skill learning, we next examined whether changes in corticostriatal functional connectivity occurred during training itself or offline, between daily training sessions. To measure functional connectivity we calculated 4–8 Hz LFP coherence across each M1 and DLS electrode during each pre- and post-training period throughout learning, as LFP signals in the theta frequency band have been previously shown to reflect corticostriatal spiking activity (Lemke et al., 2019; Koralek et al., 2013; Thorn and Graybiel, 2014). We calculated coherence specifically during NREM to establish a consistent measure of functional connectivity across days (Figure 2a). While LFP signals are generally more stable across days than single-unit spiking activity (Flint et al., 2016), there are significant challenges in interpreting LFP signals from non-laminar structures such as the striatum (Tanaka and Nakamura, 2019; Buzsáki et al., 2012), including the influence of non-local signals volume conducted from cortex (Lalla et al., 2017). To address these issues, we first locally referenced signals, in M1 and DLS separately, to decrease common noise and minimize volume conduction within each region (Lemke et al., 2019). This resulted in a phase difference between M1 and DLS 4–8 Hz LFP signals during NREM that was inconsistent with volume conduction (Figure 2—figure supplement 1). We next confirmed that 4–8 Hz LFP coherence between M1 and DLS electrodes was correlated to a separate measure of functional connectivity, calculated independently of DLS LFP: the phase locking of DLS units to M1 LFP. We calculated the entrainment of DLS units to 4–8 Hz M1 LFP signals for each M1 electrode (Figure 2b) and compared it to the mean 4–8 Hz LFP coherence between that M1 electrode and all simultaneously recorded DLS LFP signals (Figure 2c and d). We found that M1 electrodes with high LFP coherence with DLS electrodes also entrained DLS units to a greater degree than M1 electrodes with low LFP coherence with DLS electrodes (Figure 2e and f). Finally, we sought to test the relevance of specifically 4–8 Hz LFP coherence, versus other frequency bands. We found a significant relationship between the emergence of a consistent skilled action and mean LFP coherence measured during the pre-training period for frequencies between ~5 and 11 Hz (Figure 2—figure supplement 2), indicating that offline LFP coherence in the theta frequency range uniquely reflects network changes relevant to learning.

Figure 2 with 2 supplements see all
NREM M1-DLS 4–8 Hz LFP coherence reflects M1 LFP-DLS spike phase locking.

(a) Example snippet of M1 LFP during NREM. (b) Example computation of M1 LFP-DLS spike phase locking. Lower circular standard deviation (cSD) is equivalent to greater phase locking. (c) Phase difference between M1 and DLS 4–8 Hz LFP signals for example electrode pair with high coherence. (d) Relationship between mean M1 LFP-DLS spike phase locking and 4–8 Hz M1-DLS LFP coherence for all M1 electrodes in example animal on example day. (e) Scatterplot between mean M1 LFP-DLS spike phase locking and 4–8 Hz M1-DLS LFP coherence for M1 electrodes across all days in example animal. (f) Same as (e) for M1 electrodes across all animals. M1, primary motor cortex, DLS, dorsolateral striatum; LFP, local field potential; NREM, non-rapid-eye-movement sleep.

Having established LFP coherence as a measure of corticostriatal functional connectivity, we next examined whether coherence increased during training periods (online) or offline, between training periods (Figure 3a). Across animals, 35% of M1 and DLS electrode pairs increased in 4–8 Hz LFP coherence from the first to last day of training (36% did not change, 29% decreased; increase or decrease defined as a change in coherence of at least, .025 from first to last day of training). Across the population of electrode pairs with learning-related increases, 4–8 Hz LFP coherence increased predominantly offline, that is, between each day’s post-training period and the next day’s pre-training period, rather than online during training, that is, between pre- and post-training periods on the same day (Figure 3a–c; Figure 3—figure supplement 1). Across animals, mean changes in 4–8 Hz LFP coherence occurring online were not significantly different than zero, while changes occurring offline were skewed positive (Figure 3d; online LFP coherence changes: t(1413)=−0.34, P=0.73, offline LFP coherence changes: t(1413)=23.4, P=7×10−103 one-sample t-test). Notably, there was a close relationship between 4–8 Hz LFP coherence measured during the pre-training period on each day and consistency in skilled action execution during the subsequent training period (Figure 3e; r=0.73, P=5×10−10, Pearson’s r), indicating that offline increases in corticostriatal functional connectivity closely tracked skill learning. This relationship remained significant when taking into account single-trial peak reaching velocity (r=0.62; P=1×10−4, Pearson partial correlation coefficient).

Figure 3 with 1 supplement see all
Corticostriatal functional connectivity increases offline during skill learning.

(a) Depiction of M1 and DLS electrode pairs with high 4–8 Hz LFP coherence (>0.6 coherence value measured in NREM) during pre- and post-training periods on one day of training, and the pre-training period on the subsequent day of training, in example animal. (b) LFP coherence spectrums across an example M1 and DLS electrode pair measured during NREM in the pre- and post-training periods on one day of training, and the pre-training period on the subsequent day of training. (c) 4–8 Hz LFP coherence measured during NREM on each pre- and post-training period throughout learning for example M1 and DLS electrode pair, overlaid with reach velocity profile correlation values on each day of training. (d) Comparison of the mean online (left) and offline (right) change in LFP coherence (4–8 Hz, measured in NREM) across training days for all M1 and DLS electrode pairs that increased in coherence from the first to last day of training, mean in each animal (top), and histogram of all electrode pairs across animals (bottom). (e) Scatterplot between each day’s 4–8 Hz LFP coherence measured during the pre-training period and reach velocity profile correlation value for the subsequent training period. Both values are normalized within each animal by z-scoring the values across days. M1, primary motor cortex; DLS, dorsolateral striatum; LFP, local field potential; NREM, non-rapid-eye-movement sleep.

Offline increases in corticostriatal functional connectivity predict emergence of cross-area neural dynamics during subsequent skill execution

We next examined whether offline increases in corticostriatal functional connectivity impacted corticostriatal network activity during subsequent training periods. To measure cross-area neural dynamics, we extracted low-dimensional neural trajectory representations of DLS spiking activity during the reaching action using principal component analysis (PCA) and examined the evolution of how well M1 spiking activity could predict DLS neural trajectories over the course of learning (Figure 4a). To determine this predictive ability, on each day of training we fit a linear regression model to predict DLS neural trajectories (top three PCs, with a separate model fit for each component) from spiking activity in M1 and then measured the correlation between the predicted and real DLS neural trajectories. We found that the ability to predict DLS neural trajectories during execution of the reaching action increased with training, while the ability to predict the trajectory representations of DLS activity during a baseline, non-reaching, period did not significantly change (Figure 4b; action execution: 0.18±0.05 Pearson’s r on first 3 training days (average of three correlation values corresponding to top three PCs) and 0.40±0.04 Pearson’s r on last 3 training days, P=7×10−3, Wilcoxon rank-sum test, n=11 early and late days, only days with greater than 6 M1 and DLS units recorded were considered; baseline period: 0.03±0.02 Pearson’s r on first 3 training days and 0.02±0.02 Pearson’s r on last 3 training days, P=0.28, Wilcoxon rank-sum test, n=11 early and late days, only days with greater than 6 M1 and DLS units recorded were considered).

Figure 4 with 1 supplement see all
Offline increases in corticostriatal functional connectivity predict emergence of cross-area neural dynamics during subsequent skill execution.

(a) Trial-averaged neural trajectory (PC1 and PC2) of DLS activity during reaching (1 s before to 1 s after pellet touch) on day 1 (left) and day 8 (right) of training in example animal, overlaid with prediction of DLS neural trajectory from M1 spiking activity. (b) Ability to predict DLS neural trajectory (PC1–3) during reaching and during a baseline, non-reaching, period from M1 spiking activity on each day of training (gray dots represent days for individual animals, mean± SEM across animals in color). (c) Correlation between each day’s mean 4–8 Hz NREM LFP coherence during the pre-training period and ability to predict DLS neural trajectory (PC1–3) during reaching from M1 spiking activity during the subsequent training period. Both values are normalized within each animal by z-scoring the values across days. M1, primary motor cortex; DLS, dorsolateral striatum; LFP, local field potential; NREM, non-rapid-eye-movement sleep; PC, principal component.

It is possible that the increase in ability to predict DLS neural trajectories from M1 activity is influenced by local learning-related changes in DLS spiking activity during action execution or a change in variance explained by the top PCs of DLS activity. However, we found no significant difference in the trial-averaged spiking modulation of DLS units during action execution between early (first 3 days of training) and late (last 3 days of training) training days (Figure 4—figure supplement 1; 1.6±0.05 modulation value on early days and 1.7±0.09 modulation value on late days, mean± SEM, t(409)=−1.7, P=0.09, two-sample t-test, n=233 DLS units on early days and 178 DLS units on late days), as well as no significant difference in the variance explained by the first three PCs computed from DLS activity during action execution between early and late training days (58.0±2.7 variance explained on early days and 52.2±3.0 on late days, mean± SEM, t(27)=1.4, P=0.17, two-sample t-test, n=17 early days and 12 late days). A similar number of DLS units were also recorded on early and late training days (13.6±1.5 DLS units per day per animal on early days and 12.4±1.1 DLS units on late days, mean± SEM, t(27)=0.6, P=0.56, two-sample t-test, n=17 early days and 12 late days). Altogether, these results provide evidence that the increased ability to predict DLS neural trajectories during skill execution from M1 spiking activity reflects cross-area dynamics emerging with learning, rather than local learning-related changes in DLS. Consistent with the idea that offline plasticity in the corticostriatal network is relevant to skill learning, we found that 4–8 Hz LFP coherence measured during the pre-training period on each day was significantly correlated to the ability to predict DLS neural trajectories during the subsequent training period from M1 spiking activity (Figure 4c; reaching period: r=0.58, P=9×10−4, Pearson’s r, baseline period: r=0.03, P=0.87).

Sleep spindles in NREM facilitate corticostriatal transmission

We next sought to identify neural activity patterns relevant for corticostriatal plasticity during offline periods. To do this, we first examined how corticostriatal transmission strength, that is, the degree to which M1 neural activity drives DLS activity, differed across behavioral states, as increased transmission rate may enable activity-dependent plasticity (Charpier and Deniau, 1997; Figure 5a). To measure transmission strength, we identified coupled pairs of M1 and DLS neurons based on consistent short-latency spike-timing relationships. We utilized a ‘basic spike jitter’ method to identify coupled pairs of M1 and DLS units with significant spike-timing relationships at timescales consistent with the conduction and synaptic delays between M1 and DLS (~6 ms time lag from M1 to DLS activity; Koralek et al., 2013). The spike jitter method allows for the differentiation of such short-latency spike-timing relationships from spike-timing relationships at longer time scales (>50 ms), more likely to reflect common input or slow population spiking fluctuations (Fujisawa et al., 2008; Amarasingham et al., 2012; Hatsopoulos et al., 2003). Across the population of recorded M1 (n=1100 units) and DLS neurons (n=579 units, 71% classified as medium spiny neurons [MSNs] based on spike width; Figure 5—figure supplement 1), we identified ~2.6% of pairs with a significant short-latency spiking relationship (311/12,169 pairs; Figure 5—figure supplement 2). It is important to note that in addition to this small percentage of neuron pairs with significant short-latency spiking relationships, we observed relationships between M1 and DLS neural activity at timescales greater than 50 ms, as can be seen in the jittered cross-correlations of M1 and DLS spiking (Figure 5—figure supplement 2). This is consistent with recent work that carefully dissected the relationship between cortical and striatal activity and observed broad cross-correlation histograms with peaks at a short-latency delay (~3 ms) between cortical and striatal spiking activity (Peters et al., 2021). The relatively broad cross-correlation histogram peaks observed between M1 and DLS neurons, compared to those typically seen between cortical neurons (Fujisawa et al., 2008), may be because striatal MSNs receive weak input from many cortical neurons, rather than strong input from individual cortical neurons (Dudman and Gerfen, 2015). Therefore, convergent activation from several cortical neurons is likely required to drive MSN spiking activity, resulting in temporal jitter that decreases the consistency of any specific M1 and DLS neuron spiking relationship.

Figure 5 with 5 supplements see all
Sleep spindles in NREM facilitate corticostriatal transmission.

(a) M1 local field potential (LFP) spectrogram and behavioral state detection from example session. (b) Schematic depicting basic spike jitter method for detecting coupled M1 and DLS neurons (top) and normalization by subtracting mean jittered cross-correlation from real cross-correlation (bottom). (c) Comparison of normalized cross-correlations of spiking activity during NREM and wake from example coupled pair of M1 and DLS units (left) and histogram of differences in short-latency cross-correlation magnitude (1–15 ms) between NREM and wake for all pairs of coupled M1 and DLS neurons (right). (d) Snippet of LFP and single-unit spiking activity from M1 and DLS during NREM overlaid with detected NREM rhythms in M1. (e) Mean LFP and spiking activity during slow oscillations, delta waves, and sleep spindles in both M1 and DLS in example animal (top) and percentage of M1 and DLS units across animals significantly phase locked to M1 LFP during each NREM rhythm (bottom; significance threshold of P=0.05, Rayleigh test of uniformity). (f) Comparison of normalized cross-correlations of spiking activity during NREM rhythms from example coupled pair of M1 and DLS units (left) differences in short-latency cross-correlation magnitude (1–15 ms) between NREM rhythms for all pairs of coupled M1 and DLS neurons (right). M1, primary motor cortex; DLS, dorsolateral striatum; NREM, non-rapid-eye-movement sleep.

Having characterized a population of coupled M1 and DLS neurons with consistent short-latency spike-timing relationships, we next compared corticostriatal transmission strength across sleep and wake states by measuring the magnitude of the short-latency cross-correlation (1–15 ms time lag) within the coupled population. To account for differences in firing rate across wake and sleep states (Figure 5—figure supplement 3), we normalized each cross-correlation by subtracting the mean spike jittered cross-correlations before comparison (Figure 5b). We found that corticostriatal transmission was higher during NREM, compared to wake, in 77% of coupled M1 and DLS neuron pairs (Figure 5c), suggesting activity patterns in NREM may be particularly relevant for offline corticostriatal plasticity given the increased transmission of activity from M1 to DLS.

Given the heterogeneous nature of NREM activity, we next sought to examine whether corticostriatal transmission strength was boosted during specific patterns of activity in NREM (Figure 5d). We detected NREM rhythms in M1 that have been previously related to activity-dependent plasticity in cortex, including sleep spindles, SOs, and delta waves (Ramanathan et al., 2015; Kim et al., 2019; Huber et al., 2004; Durkin et al., 2017; Figure 5—figure supplement 4) and examined whether activity in DLS was also modulated during these rhythms. We found that both LFP signals and spiking in DLS were significantly modulated during SOs, delta waves, and sleep spindles detected in M1 (Figure 5e; Figure 5—figure supplement 5). We next compared corticostriatal transmission strength between NREM rhythms by measuring the magnitude of the short-latency cross-correlation (as above, 1–15 ms time lag, within the coupled population of M1 and DLS neurons, and normalized by subtracting the mean spike jittered cross-correlation). This revealed that corticostriatal transmission strength was greatest during sleep spindles, compared to SOs or delta waves (Figure 5f), suggesting that sleep spindles during NREM may be particularly relevant periods for activity-dependent plasticity within the corticostriatal network.

Short-latency spike-timing relationships are uniquely preserved within the post-training period for sleep spindle modulated M1 and DLS neuron pairs

We further investigated the role of sleep spindles in offline corticostriatal plasticity by examining whether sleep spindle modulation impacted changes in corticostriatal transmission within pre- and post-training periods. To do this, we divided each pre- and post-training period into halves and measured the difference in short-latency cross-correlation magnitude (as above, 1–15 ms time lag) from the first to the second half of each period (Figure 6a). Short-latency cross-correlation values were calculated specifically within the previously identified population of coupled M1 and DLS neurons, using spiking activity during NREM to control for any differences in time spent in each behavioral state (Figure 6—figure supplement 1). A consistent increase or decrease in short-latency cross-correlation magnitude between halves would indicate that corticostriatal functional connectivity is modified during the pre- or post-training offline periods. To examine the role that sleep spindles may play in such offline plasticity, we also classified each M1 and DLS neuron pair based on whether both neurons were modulated by sleep spindles. This revealed that corticostriatal functional connectivity, measured by short-latency cross-correlation magnitude, was specifically preserved in spindle-modulated neuron pairs during the post-training period, in contrast to non-spindle modulated pairs during the post-training period or all pairs during the pre-training period (Figure 6b and c). This change could not be attributed to differences in sleep depth, as sleep depth measured by low-frequency cortical LFP power did not show a similar trend (Figure 6—figure supplement 2). Altogether, this suggested that sleep spindles following training may be involved in preserving learning-related cross-area connectivity in the corticostriatal network. Strikingly, post-training period changes in short-latency cross-correlation magnitude averaged across coupled M1 and DLS neuron pairs were significantly correlated to subsequent overnight changes in 4–8 Hz LFP coherence averaged across M1 and DLS electrodes (Figure 6d), indicating that sleep-spindle related preservation of corticostriatal functional connectivity within the first few hours after training may be related to overnight plasticity in the corticostriatal network.

Figure 6 with 2 supplements see all
Short-latency spike-timing relationships are uniquely preserved within the post-training period for sleep spindle modulated M1 and DLS neuron pairs.

(a) Schematic of changes in short-latency spike-timing relationships measured by cross-correlations of spiking activity during NREM from the first and second half of pre- (left) and post-training (right) periods for example M1 and DLS neuron pair. (b) Cross-correlations of spiking activity during NREM for coupled pairs of M1 and DLS neurons that are spindle-modulated (top) or non-spindle modulated (bottom) during the first and second half of pre- (left) and post-training (right) periods (width of line represents mean± SEM). (c) Comparison of change in short-latency cross-correlation peak (1–15 ms time lag) between spindle-modulated and non-spindle modulated M1 and DLS pairs during the pre- and post-training periods (mean± SEM). (d) Scatterplot of mean change in short-latency cross-correlation magnitude (post-training change normalized by pre-training change) across coupled M1 and DLS neurons pairs and mean overnight change in 4–8 Hz LFP coherence across M1 and DLS electrodes. M1, primary motor cortex; DLS, dorsolateral striatum; NREM, non-rapid-eye-movement sleep.

The impact of sleep spindles on corticostriatal connectivity is influenced by temporal proximity to preceding slow oscillations

Finally, we sought to understand why corticostriatal functional connectivity was preserved across spindle-modulated pairs during the post-training period, but not spindle-modulated pairs during the pre-training period. To do this, we examined the interaction between sleep spindles and SOs, a relationship known to be relevant for sleep-dependent processing (Kim et al., 2019; Niethard et al., 2018; Silversmith et al., 2020; Rasch and Born, 2013). We found a large shift in the temporal proximity to preceding SOs from the pre- to post-training period, with a larger proportion of sleep spindles in the post-training period ‘nested’ near SOs (Figure 7a; P=6×10−39, two-sample Kolmogorov–Smirnov test). We then examined whether SO nesting of sleep spindles influenced the role of sleep spindles in preserving corticostriatal functional connectivity. Within coupled M1 and DLS neuron pairs, we calculated the short-latency cross-correlation magnitude (as above, 1–15 ms time lag) within 30 s bins before and after every sleep spindle. We found that corticostriatal transmission strength, measured by short-latency cross-correlation magnitude, was significantly elevated after slow oscillation-nested sleep spindles (SO-nested spindles; sleep spindles within 1 s of a SO) compared to non-SO-nested sleep spindles (Figure 7b; non-SO-nested spindles; sleep spindles occurring at least 5 s after a SO). There were no clear differences in sleep spindle frequency or corticostriatal firing rates between SO-nested and non-SO-nested spindles that would account for this difference (Figure 7—figure supplement 1). This suggested that increased nesting of SOs and sleep spindles may account for the unique preservation of corticostriatal functional connectivity within spindle-modulated M1 and DLS neuron pairs during the post-training period.

Figure 7 with 1 supplement see all
The impact of sleep spindles on corticostriatal connectivity is influenced by temporal proximity to preceding slow oscillations (SOs).

(a) Distributions of the temporal proximity to preceding SOs for all sleep spindles during NREM in pre- and post-training periods, across days and animals. (b) Short-latency cross-correlation magnitude (1-15ms time lag) across coupled M1 and DLS neuron pairs calculated from spiking occurring in 30 s bins around SO-nested spindles (sleep spindles occurring within 1 s after a SO zero-crossing) and non-SO nested spindles (sleep spindles occurring 5 s or more after a SO zero-crossing; top), and the difference in short-latency cross-correlation magnitude between SO-nested and non-SO-nested spindles. M1, primary motor cortex; DLS, dorsolateral striatum; NREM, non-rapid-eye-movement sleep.

Discussion

Plasticity in cortical connectivity to the striatum can influence the balance between behavioral variability and stability (Malvaez and Wassum, 2018; Lipton et al., 2019; Yin and Knowlton, 2006; Vicente et al., 2020; Gremel and Costa, 2013). Here, in the context of skill learning, we provide evidence that sleep is a relevant period for such corticostriatal plasticity. We show that functional connectivity between motor cortex and striatum, measured by both LFP coherence and spike-timing relationships, evolves during offline periods away from training, rather than during training itself, and that blocking the activation of striatal NMDA receptors during these offline periods disrupts skill learning. We then identify NREM sleep spindles as uniquely poised to mediate such plasticity, through their interaction with SOs.

NREM sleep rhythms and plasticity

Our results add to a growing body of work linking NREM rhythms to sleep-dependent plasticity (Ramanathan et al., 2015; Kim et al., 2019; Huber et al., 2004; Durkin et al., 2017; Barakat et al., 2013). We find that during sleep immediately following training (within ~1–3 hr), neuron pairs across M1 and DLS that are modulated during sleep spindles uniquely preserve their short-latency spike-timing relationships, interpreted as a maintenance of functional connectivity between cortex and striatum. We found that this maintenance was influenced by the temporal proximity between sleep spindles and preceding SOs, and was correlated to overnight changes in LFP coherence, suggesting that NREM rhythms during the first few hours of sleep after training may be particularly relevant for sleep-dependent plasticity, consistent with previous work (Miyamoto et al., 2016).

While sleep spindles have been previously linked to plasticity (Durkin et al., 2017; Barakat et al., 2013; Rosanova and Ulrich, 2005; Clawson et al., 2016), how neural activity during sleep spindles leads to long-term plasticity remains unclear. It has been demonstrated in vitro that SO and sleep spindle activity patterns can drive NMDA receptor-dependent potentiation (Rosanova and Ulrich, 2005; Chauvette et al., 2012). Given evidence for NMDA receptor-dependent potentiation of cortical inputs to the striatum during skill learning (Calabresi et al., 1992; Charpier and Deniau, 1997), one intriguing possibility is that sleep spindles, gated by their temporal proximity to preceding SOs, promote the potentiation of cortical inputs to the striatum through NMDA receptor activation. This would be consistent with our finding that blocking striatal NMDA activation during offline periods disrupts skill learning, as well as previous work linking NMDA receptors to sleep-dependent consolidation (Gais et al., 2008). Importantly, however, blocking striatal NMDA receptors also impacts spontaneous striatal activity (Pomata et al., 2008). Further work is required to understand how striatal AP5 infusions may influence striatal activity during NREM.

We also observed a decrement in short-latency spike-timing relationships within M1 and DLS neuron pairs measured in pre-training sleep, or pairs measured in post-training sleep but not modulated during sleep spindles. This change is consistent with a growing body of work supporting the synaptic homeostasis hypothesis (SHY), which proposes that sleep drives the general homeostatic downscaling of synapses which are upregulated during wake (Tononi and Cirelli, 2014). Such general downscaling can support memory consolidation indirectly by increasing the signal-to-noise of memory representations encoded during wake (Rasch and Born, 2013; Miyamoto et al., 2021). An outstanding question is how the general downscaling of synapses during sleep proposed in SHY may interact with the preservation or potentiation of specific synapses relevant to learning (Rasch and Born, 2013; Miyamoto et al., 2021). Recent work suggests a way to reconcile both ideas, demonstrating that activity during sleep may preserve activity patterns generated during learning while also downscaling task-irrelevant activity (Kim et al., 2019; Gulati et al., 2017). As NREM sleep rhythms have been linked to both processes (Kim et al., 2019; Huber et al., 2004; Gulati et al., 2017; Norimoto et al., 2018), it will be important to explore how NREM rhythms differentially impact downscaling versus preserving/strengthening synapses for nearby neurons in the same brain region or connected neurons across different brain regions.

It is important to note that in this work, we measure only functional measures of corticostriatal connectivity, including LFP coherence and spike-timing relationships across M1 and DLS. One possibility is that these functional measures of connectivity reflect changes in the synaptic strength of M1 projections to the DLS. This would be consistent with evidence for the strengthening of cortical inputs to the striatum with motor training (O'Hare et al., 2016Rothwell et al., 2015Yin et al., 2009). An alternative possibility is that coordinated inputs to both M1 and DLS drive increased functional connectivity. We believe our results are most consistent with a physical change in synaptic strength, as we measured increased functional connectivity during both NREM, reflected as increased LFP coherence, as well as during awake task performance, reflected in the emergence of coupled cross-area dynamics. However, future work is required to determine whether our observations are consistent with sleep-related structural changes in synaptic strength.

Skill learning in the corticostriatal network

Here, we show that 4–8 Hz LFP coherence across M1 and DLS measured during NREM closely tracks the emergence of a stable skilled reaching behavior, as measured by the emergence of a stable day-to-day reaching velocity profile. This is consistent with previous work showing increased coordination of M1 and DLS neural activity with skill acquisition (Santos et al., 2015; Lemke et al., 2019; Koralek et al., 2013), suggesting that increased communication and connectivity between cortex and striatum may be a central feature of stable skilled behavior. We also found that the emergence of a stable day-to-day reaching velocity profile was correlated to peak single-trial reaching velocity, consistent with the idea that movement velocity is a relevant aspect of skill learning (Lemke et al., 2019; Hikosaka et al., 2013). Intriguingly, here, we find that functional connectivity between M1 and DLS increases offline, rather than during training itself. This is consistent with a range of studies demonstrating that sleep benefits speed and consistency in motor tasks in humans (Fischer et al., 2002; Walker et al., 2002) and rodents (Ramanathan et al., 2015; Nagai et al., 2017), as well as rodent brain-machine interface (BMI) tasks (Gulati et al., 2014; Kim et al., 2019). As the basal ganglia are an important regulator of movement vigor (Dudman and Krakauer, 2016), future work is required to determine how increases in coupling between motor cortex and striatum precisely relate to changes in the consistency and vigor of movement.

While there is growing evidence that neural signals across cortex and striatum grow more coordinated during skill learning (Santos et al., 2015; Lemke et al., 2019; Koralek et al., 2013; Costa et al., 2004), the relative importance of the ‘direction’ of communication between cortex and the striatum remains unclear. On one hand, it is well-established that cortical activity influences striatal activity (Peters et al., 2021), and that, in turn, the basal ganglia is connected to brain stem regions that control movement (McElvain et al., 2021). On the other hand, there is evidence that DLS activity may be important for stabilizing cortical activity patterns (Koralek et al., 2012; Lemke et al., 2019), suggesting a role for basal ganglia ‘feedback’ to cortex through the thalamus (Aoki et al., 2019; Athalye et al., 2020). Recent work demonstrated the importance of thalamic input for reliable cortical neural dynamics (Sauerbrei et al., 2020). One intriguing possibility is that the nature of corticostriatal communication evolves during learning. For example, cortical input to striatum may be essential during the initial acquisition of a skilled movement, while striatal feedback to cortex becomes important for well-learned stable and skilled movements (Lemke, 2020). Future work is required to determine whether sleep may facilitate changes in the direction of communication between cortex and striatum.

In summary, our results suggest a role for sleep in modifying cross-area connectivity across cortex and striatum that, in turn, impacts behavioral stability and network activity during skill learning. One important extension of this work is to explore whether sleep can impact corticostriatal connectivity in the context of maladaptive behavioral stability, such as addiction, that has been linked to the corticostriatal network (Lipton et al., 2019; Gerdeman et al., 2003). Recent work suggests that modulating NREM rhythms can regulate memory consolidation versus forgetting (Kim et al., 2019). It will be informative to determine whether similar manipulations could be used in the context of maladaptive stability to provide a therapeutic benefit.

Materials and methods

Animal care and surgery

Request a detailed protocol

This study was performed in strict accordance with guidelines from the USDA Animal Welfare Act and United States Public Health Science Policy. Procedures were in accordance with protocols approved by the Institutional Animal Care and Use Committee at the San Francisco Veterans Affairs Medical Center. Experiments were conducted with 12 male Long-Evans rats (approximately 12–16 weeks old) housed under controlled temperature and a 12 hr light/12 hr dark cycle with lights on at 6:00 a.m. All behavioral experiments were performed during the light period. Surgical procedures were performed using sterile techniques under 2–4% isoflurane. Six animals were implanted with either microwire electrodes (n=five animals; 32 or 64 channel 33 µm diameter Tungsten microwire arrays with ZIF-clip adapter; Tucker-Davis Technology) or high-density silicon probes (n=one animal; 256 channel custom-built silicon probes; Egert et al., 2020) targeted to the forelimb area of M1 (centered at 3.5 mm lateral and 0.5 mm anterior to bregma and implanted in layer V at a depth of 1.5 mm) and the DLS (centered at 4 mm lateral and 0.5 mm anterior to bregma and implanted at a depth of 4 mm). Six additional animals were implanted with infusion cannulas (PlasticsOne; 26 Ga) targeted to the DLS. Surgery involved exposure and cleaning of the skull, preparation of the skull surface (using cyanoacrylate), and implantation of skull screws for overall headstage stability. In the animals implanted with neural probes, a reference screw was implanted posterior to lambda, contralateral to the neural recordings and a ground screw was implanted posterior to lambda, ipsilateral to the neural recordings. Craniotomy and durectomy were then performed, followed by implantation of neural probes or infusion cannulas and securing of the implant with Kwik-Sil (World Precision Instruments), C and B Metabond (Parkell, Product #S380), and Duralay dental acrylic (Darby, Product #8830630). Final location of electrodes was confirmed by electrolytic lesion. In two of the animals implanted with neural probes, the forearm was also implanted with a pair of twisted EMG wires (0.007 in. single-stranded, Teflon-coated, stainless steel wire; A-M Systems) with a hardened epoxy ball (J-B Weld Company) at one end preceded by 1–2 mm of uncoated wire under the ball. Wires were inserted into the muscle belly and pulled through until the ball came to rest on the belly. EMG wires were braided, tunneled under the skin to a scalp incision, and soldered into an electrode interface board (ZCA-EIB32, Tucker-Davis Technology). The postoperative recovery regimen included administration of buprenorphine at 0.02 mg/kg, meloxicam at 0.2 mg/kg, dexamethasone at 0.5 mg/kg, and trimethoprim/sulfadiazine at 15 mg/kg, administered postoperatively for 5 days. All animals recovered for at least 1 week before the start of behavioral training.

In vivo electrophysiology

Request a detailed protocol

Spiking activity, LFP, and EMG activity were recorded using an RZ2 system (Tucker-Davis Technologies). For neural activity recorded with microwire electrode arrays, spiking data was sampled at 24,414 Hz and LFP/EMG data was sampled at 1017 Hz. To detect spikes in microwire-implanted animals, an online threshold was set using a standard deviation of 4.5 (calculated over a 5 min baseline period). Waveforms and timestamps were stored for any event that crossed below that threshold. Spike sorting was performed using Offline Sorter v.4.3.0 (Plexon) with a PCA-based clustering method followed by manual inspection. Spikes were sorted separately for each day, combining the pre-training, training, and post-training periods. Units were accepted based on waveform shape, clear cluster boundaries in PC space, and 99.5% of detected events with an ISI>2 ms. Neural activity recorded with silicon probes was recorded at 24,414 Hz. Spike times and waveforms were detected from the broadband signal using Offline Sorter v.4.3.0 (Plexon). Spike waveforms were then sorted using Kilosort2 (https://github.com/MouseLand/Kilosort2Pachitariu, 2020). We accepted units based on manual inspection using Phy (https://github.com/cortex-lab/phyBuccino et al., 2021) and 99.5% of detected events with an ISI>2 ms.

Viral injection (Figure 1)

Request a detailed protocol

To label anterograde projections in M1, we injected 750 nl of AAV8-hsyn-JAWs-KGC-GFP-ER2 virus into two sites (1.5 mm anterior, 2.7 mm lateral to bregma, at a depth of 1.4 mm and 0.5 mm posterior, 3.5 mm lateral to bregma, at a depth of 1.4 mm). Two weeks after injection rats were anesthetized and transcardially perfused with 0.9% sodium chloride, followed by 4% formaldehyde. The harvested brains were post-fixed for 24 hr and immersed in 20% sucrose for 2 days. Coronal cryostat sections (40 μm thickness) were then mounted and imaged with a fluorescent microscope.

Reach-to-grasp task (Figures 1, 3 and 4)

Request a detailed protocol

Rats naïve to any motor training were first tested for forelimb preference. This involved presenting approximately 10 food pellets to the animal and observing which forelimb was most often used to reach for the pellet. Rats then underwent surgery for either neural probe or cannula implantation in the hemisphere contralateral to the preferred hand. Following the recovery period, rats were trained on the reach-to-grasp task using an automated reach-box, controlled by custom MATLAB scripts and an Arduino microcontroller. This setup requires minimal user intervention, as described previously (Wong et al., 2015). Each trial consisted of a pellet dispensed on the pellet tray followed by an alerting beep indicating that the trial was beginning, then the door would open. Animals had 15 s to reach, grasp, and retrieve the pellet or the trial would automatically end, and the door would close. A real-time ‘pellet detector’ using an infrared sensor centered over the pellet would determine when the pellet was moved, indicating the trial was over and, after 2 s, the door would close. Trials were separated by a 10-s inter-trial interval. All trials were captured by a camera placed on the side of the behavioral box (n=2 animals monitored with a Microsoft LifeCam at 30 frames/s; n=10 animals monitored with a Basler ace acA640-750uc at 75 frames/s). For animals implanted with neural probes, each animal underwent 5–14 days of training (~100–150 trials per day). For the infusion cannula implanted animals, each animal underwent 10 days of training (100 trials per day). Reach trajectories were captured from video using DeepLabCut (Mathis et al., 2018) to track the center of the rat’s hand as well as the food pellet. We specifically analyzed reach trajectories from 500 ms before to 500 ms after ‘pellet touch,’ which was classified as the frame in which the hand was closest to the pellet, before the pellet was displaced off the pellet holder. Only trials in which the pellet was displaced off the pellet holder were considered. We assessed behavioral consistency throughout training in both neural probe and cannula implanted animals by calculating the correlation between the mean velocity profile of reaches on each day of training and the mean velocity profile of reaches on the last day of training which served as the learned ‘template.’ These correlations were computed separately for the x and y dimensions and then averaged. To calculate total velocity profile correlation change, the last day, which served as template, was excluded in both neural probe and cannula implanted animals. We also generated shuffled distributions to test the significance of the effect of AP5 infusions (compared to saline) on velocity profile correlation and single-trial peak reaching velocity. To do this, we first computed the day-to-day changes in either measure (for velocity profile correlation we excluded the last day which served as ‘template’). We then computed the real effect of AP5 infusions (compared to saline) by taking the difference between the mean day-to-day change with either post-training AP5 or saline infusion, across animals. We then randomly reshuffled the AP5/saline labels and recomputed the difference 10,000 times. To generate a P value, we measured the percentile of the difference from the real data within the shuffled distribution of differences.

DLS infusions (Figure 1)

Request a detailed protocol

To test if the offline activation of striatal NMDA receptors is required for skill learning, we infused 1 µl of either saline or NMDA blocker AP5 (5 µg/µl) at an infusion rate of 200 nl/min into the DLS immediately following training in six animals for 10 consecutive days. During the first 5 days of training, we infused three rats with AP5 and three rats with saline. During the second 5 days, we switched the infusions, that is, animals that received AP5 in the first 5 days, received saline for the second 5 days, and vice-versa.

Neural data analyses (Figures 27)

Request a detailed protocol

All neural data analyses were conducted using MATLAB 2019a (MathWorks) and functions from the EEGLAB (http://sccn.ucsd.edu/eeglab/) and Chronux (http://chronux.org/) toolboxes.

Offline behavioral state classification (Figures 27)

Request a detailed protocol

During each training day, neural signals were monitored during a 2–3 hr pre- and post-training period. A video was also captured from a camera placed above the behavioral box (Microsoft LifeCam at one frame/s). Behavioral states (wake and sleep states) were classified using cortical LFP signals and movement, measured either by video or EMG activity if animals were implanted with an EMG wire. LFP was preprocessed by artifact rejection, including manual rejection of noisy electrodes and z-scoring of each electrode’s signal across the entire recording session. A mean LFP signal was then generated in M1 for sleep classification by averaging across all M1 electrodes. This mean M1 LFP signal was then segmented into non-overlapping 10 s windows. In each window, the power spectral density was computed using the Chronux function mtspecgramc. Delta power (1–4 Hz) and theta ratio (5–10 Hz/2–15 Hz) were computed and used for behavioral state classification. Within each pre- and post-training period, mean values of delta power and theta ratio were then computed and used as thresholds for behavioral state classification: epochs with high delta power (greater than mean delta) and no movement were classified as NREM, epochs with high theta ration (greater than mean theta) and low delta power (less than mean delta) were classified as REM, and all other epochs were classified as wake. All consecutive NREM or REM epochs that were less than 60 s long (six consecutive epochs) were reclassified as wake.

Measuring corticostriatal functional connectivity using LFP coherence (Figures 24 and 6)

Request a detailed protocol

To examine changes in corticostriatal functional connectivity across days, we measured LFP coherence during NREM across all M1 and DLS electrode pairs on each pre- and post-training period. On each day, we first applied common-mode referencing on M1 and DLS LFP signals using the median signal in each region, that is, at every time-point, the median signal across all electrodes in a region was calculated and subtracted from every electrode in that region to decrease common noise and minimize volume conduction. LFP coherence was then computed for LFP signals during NREM in nonoverlapping 10 s windows using chronux function cohgramc. For each pre- and post-training period, we classified ‘high coherence LFP pairs’ as pairs of M1 and DLS electrodes with a mean 4–8 Hz coherence during NREM>0.6. When comparing LFP coherence changes occurring online (from the pre- to post-training period on the same day) and offline (from the post-training period on 1 day to the pre-training period on the next day), we computed a single online and offline change value per M1 and DLS electrode pair by averaging online and offline change across training days. To determine the relationship between 4–8 Hz LFP coherence and behavior, we averaged LFP coherence across electrodes in the pre-training session and compared that value to behavior during the subsequent training period on that day. To determine the relationship between LFP coherence and velocity profile correlation values accounting for single-trial peak velocity, we computed a Pearson linear partial correlation coefficient using MATLAB function partialcorr.

Measuring the phase difference between M1 and DLS 4–8 Hz LFP signals (Figure 2)

Request a detailed protocol

To calculate the mean phase difference between M1 and DLS 4– and 8 Hz LFP signals in NREM, we filtered M1 and DLS LFP signals during NREM in each pre- and post-training period using the EEGLAB function eegfilt. We then extracted the phase of the filtered LFP signals using the MATLAB function hilbert and computed the difference between M1 and DLS signals.

Measuring M1 LFP–DLS spike phase locking (Figure 2)

Request a detailed protocol

To compare M1-DLS 4–8 Hz LFP coherence to a distinct measure of corticostriatal functional connectivity, we calculated the phase locking of DLS units to 4–8 Hz M1 LFP in NREM during each pre- and post-training period. To measure phase locking, we filtered M1 LFP signals during NREM between 4 and 8 Hz using the EEGLAB function eegfilt and extracted the phase of the filtered LFP signals using the MATLAB function hilbert. Then, for each DLS unit simultaneously recorded, we then computed a histogram of the M1 phase at each spike time. We then computed the circular standard deviation (cSD) of these histograms using Matlab toolbox circstats (https://www.mathworks.com/matlabcentral/fileexchange/10676-circular-statistics-toolbox-directional-statistics). We used this cSD value as our measure of phase locking, with low cSD representing a ‘peakier’ histogram and therefore greater phase locking. We then compared the mean cSD for each M1 electrode (averaged across DLS units simultaneously recorded) to the mean M1-DLS 4–8 Hz LFP coherence for that M1 electrode (averaged across DLS electrodes).

Measuring corticostriatal network dynamics during action execution (Figure 4)

Request a detailed protocol

To measure corticostriatal network dynamics during action execution, we extracted low-dimensional representations of DLS activity by performing PCA using MATLAB function pca. For each DLS unit, spiking activity during each trial was binned at 100 ms from 5 s before to 5 s after pellet touch and then concatenated across trials. PCA was computed on a matrix of DLS units by time bins (number of trials * 100 bins per trial). DLS activity from 1 s before to 1 s after pellet touch on each trial was then projected onto the first three PCs to generate low-dimensional neural trajectory representations of population spiking activity in DLS during action execution. We then fit a linear regression model to predict DLS neural trajectories from single-unit spiking activity in M1. A separate model was used to predict activity projected onto each of the first three PCs, using MATLAB function fitlm and fivefold cross-validation. For each time bin of the DLS neural trajectory, the preceding 1.5 s of spiking activity for all M1 units, binned at 100 ms, were used as predictors. The ability to predict DLS activity from M1 activity on each day was measured by averaging the correlation values from correlating the actual DLS neural trajectories and the predicted trajectories. The same method was also used to predict a baseline, non-reaching, period from 5 s to 4 s before pellet touch.

Measuring spiking modulation during action execution (Figure 4)

Request a detailed protocol

To measure spiking modulation during action execution, spiking activity during each trial was binned at 25 ms from 5 s before to 5 s after pellet touch. Spiking activity was then averaged across trials and z-scored (separately for each M1 and DLS unit on each training day). Spiking modulation was then calculated by taking the sum of the absolute value of the z-scored activity from 1 s before to 500 ms after pellet touch divided by the sum of the absolute value of the z-scored activity from 3.5 s before pellet touch to 2 s before pellet touch.

Identifying coupled M1 and DLS neuron pairs (Figures 57)

Request a detailed protocol

We used a ‘basic spike jitter’ method to identify pairs of M1 and DLS neurons with consistent short-latency spike-timing relationships (Hatsopoulos et al., 2003; Fujisawa et al., 2008; Amarasingham et al., 2012). Briefly, we binned at 1 ms and concatenated together the spiking activity during the first 5 min of NREM of both the pre- and post-training period (10 min total) for each pair of M1 and DLS units on each day of training. We then calculated the mean value of the short-latency cross-correlation for each pair (1–15 ms time lag centered on DLS spiking, such that positive time lags corresponded to DLS spiking after M1 spiking; consistent with the conduction and synaptic delay between M1 and DLS; Koralek et al., 2013). We then generated a ‘jittered’ distribution of short-latency cross-correlation values by jittering each DLS spike within a 50 ms window centered on the spike and recalculating the cross-correlation, repeated 1000 times. To perform the jittering, a 50 ms window is centered on each DLS spike, and the spike is replaced by one randomly chosen within that window. This method destroys any consistent spike-timing relationship at timescales smaller than the jitter window, while preserving spiking relationships on timescales greater than the jitter window. We classified a pair of M1 and DLS units as ‘coupled’ if the real mean short-latency cross-correlation value was greater than the 99th percentile of the jittered distribution.

Measuring corticostriatal transmission strength (Figure 5)

Request a detailed protocol

To compare corticostriatal transmission strength between NREM and wake, we calculated cross-correlations of spiking activity binned at 1 ms from each behavioral state (NREM and wake, pre- and post-training periods concatenated together) for all coupled M1 and DLS neuron pairs (‘coupling’ was based on the jittering method described above). To account for firing rate differences across behavioral states, we normalized each M1 and DLS pair’s cross-correlation in each behavioral state by subtracting a mean jitter cross-correlation for that behavioral state generated by repeating the jittering processes described above 1000 times and taking the average of the 1000 jittered cross-correlations. The mean short-latency cross-correlation magnitude (1–15 ms time lag centered on DLS spiking, such that positive time lags corresponded to DLS spiking after M1 spiking) was then compared between NREM and wake (rats did not spend enough time in REM sleep to make a robust comparison). To compare corticostriatal transmission strength across NREM rhythms, we calculated cross-correlations of spiking activity binned at 1 ms from each NREM rhythm (sleep spindles, delta waves, and SOs, rhythms during pre- and post-training periods were concatenated together) for all coupled M1 and DLS neuron pairs. For sleep spindles, 1 s of spiking centered on sleep spindle peak (−500 ms to 500 ms) was included from each spindle. For SOs and delta waves, a 1-s window around upstate peak (−500 ms to 500 ms) was used. The same normalization was applied as in comparisons across behavioral states (subtraction of mean jittered cross-correlation from the real cross-correlation). The mean short-latency cross-correlation magnitude (1–15 ms time lag) was then compared between NREM rhythms.

NREM rhythm detection (Figure 5)

Request a detailed protocol

The NREM rhythm detection applied here is based on a previously used detection algorithm (Kim et al., 2019; Silversmith et al., 2020). Briefly, a mean LFP signal was generated in M1 by averaging across all electrodes. To detect sleep spindles, this mean signal was filtered in the spindle band (10–16 Hz) using a zero-phase shifted, third-order Butterworth filter. A smoothed envelope was calculated by computing the magnitude of the Hilbert transform of this signal then convolving it with a Gaussian window. Next, we determined two upper thresholds for spindle detection based on the mean and standard deviation (s.d.) of the spindle band envelope during NREM. Epochs in which the spindle envelope exceeded 2.5 s.d. above the mean for at least one sample and the spindle power exceeded 1.5 s.d. above the mean for at least 500 ms were detected as spindles. Then, spindles that were sufficiently close in time (<300 ms) were combined. To detect SOs and delta waves, the mean M1 signal was filtered in a low-frequency band (second order, zero phase shifted, high pass Butterworth filter with a cutoff at 0.1 Hz followed by a fifth order, zero phase shifted, low pass Butterworth filter with a cutoff at 4 Hz). Next, all positive-to-negative zero crossings during NREM were identified, along with the previous peaks, the following troughs, and the surrounding negative-to-positive zero crossings. Each identified epoch was considered a SO if the peak was in the top 15% of peaks, the trough was in the top 40% of troughs and the time between the negative-to-positive zero crossings was greater than 300 ms but did not exceed 1 s. Each identified epoch was considered a delta wave if the peak was in the bottom 85% of peaks, the trough was in the top 40% of troughs and the time between the negative-to-positive zero crossings was greater than 250 ms.

NREM rhythm modulation (Figure 5)

Request a detailed protocol

To measure the sleep spindle modulation of individual M1 and DLS units, spiking during each sleep spindle was time locked to the peak of the filtered LFP and binned at 10 ms. Spiking was averaged across sleep spindles and modulation was calculated by taking the minimum to maximal firing rate bin in the second around sleep spindle peak (−500 ms to 500 ms) divided by the minimum to maximal firing rate bin in a second long-baseline period before each spindle (−1500 ms to −500 ms relative to spindle peak). To determine SO and delta wave modulation of individual M1 and DLS units, spiking during each SO or delta wave was time locked to the peak of the upstate and binned at 10 ms. Spiking was averaged across SOs or delta waves and modulation was calculated by taking the minimum to maximal firing rate bin in the second around upstate peak (−500 ms to 500 ms) divided by the minimum to maximal firing rate bin in a second long-baseline period before each SO or delta wave (−1500 ms to −500 ms relative to upstate peak).

Measuring changes in corticostriatal transmission strength within pre- and post-training periods (Figure 6)

Request a detailed protocol

To measure changes in corticostriatal transmission strength within pre- and post-training periods, we calculated cross-correlations of spiking activity binned at 1 ms from NREM activity during the first and second half of each pre- and post-training period for all coupled M1 and DLS neuron pairs. We compared changes in the mean short-latency cross-correlation magnitude (1–15 ms) from the first to second half of each pre- and post-training period between coupled M1 and DLS pairs that were spindle modulated and non-spindle modulated. To determine spindle modulated pairs, we generated peri-event time histograms (PETHs) of spiking activity for each M1 and DLS unit, locked to sleep spindle peak and binned in 10 ms bins from 2 s before to 2 s after spindle peak (400 bins), averaged across all spindles. Sleep spindle modulation was then calculated by taking the minimum to maximal firing rate bin within the 1 s period centered on spindle peak (−500 ms to 500 ms). We then generated a distribution of shuffled modulations by shuffling the time bins and recalculating the modulation of this shuffled PETH. This shuffling procedure was repeated 1000 times to generate a distribution. Units with a non-shuffled modulation greater than the 99% percentile of the shuffled distribution were considered significantly sleep spindle modulated. Spindle modulated pairs included both a spindle modulated M1 and DLS unit and all other pairs were considered non-spindle modulated.

Measuring changes in sleep depth within pre- and post-training periods (Figure 6)

Request a detailed protocol

To measure changes in sleep depth within pre- and post-training periods, we first generated a mean LFP signal in each period by averaging across all M1 electrodes. This mean M1 LFP signal was then separated into the first and second half of each pre- and post-training period and segmented into non-overlapping 10 s window. Power spectral density was then computed in each window using the Chronux function mtspecgramc and then averaged over low frequencies (1–4 Hz) as a proxy for sleep depth. We interpolated the low-frequency power values in each pre- and post-training period to normalize duration across days and animals.

Determining temporal proximity between slow oscillations and sleep spindles (Figure 7)

Request a detailed protocol

SO to sleep spindle proximity was determined by measuring the temporal proximity between each sleep spindle peak and the preceding SO zero-crossing (positive to negative LFP). SO-nested spindles were defined as spindles occurring with 1 s of a SO zero-crossing and non-SO-nested spindles were defined as spindles occurring at least 5 s after a SO zero-crossing.

Measuring corticostriatal transmission strength dynamics around sleep spindles (Figure 7)

Request a detailed protocol

To determine corticostriatal transmission strength changes occurring around sleep spindles, we calculated cross-correlations of spiking activity binned at 1 ms in 30 s bins around every sleep spindle. These bins were placed −91 s to −61 s, −61 s to −31 s, −31 s to −1 s, 1 s to 30 s, 31 s to 61 s, and 61 s to 91 s around a spindle as to avoid including spiking activity during the spindle itself. Corticostriatal transmission strength was calculated in each bin by averaging the short-latency cross-correlation values for each coupled M1 and DLS pair in each bin (1–15 ms time lag centered on DLS spiking, such that positive time lags corresponded to DLS spiking after M1 spiking). Corticostriatal transmission strength values around each spindle (six total bins) were then normalized by subtracting by the value of the first bin (−91 s to −61 s). Changes in corticostriatal transmission strength were then compared between SO-nested and non-SO-nested spindles. The same normalization was used to compute changes in M1 and DLS firing rates, as well as sleep spindle probability, around each sleep spindle.

Data availability

The data and corresponding code used for analyses is available on Dryad.

The following data sets were generated
    1. Lemke SM
    2. Ramanathan DS
    3. Darevksy D
    4. Egert D
    5. Berke JD
    6. Ganguly K
    (2021) Dryad Digital Repository
    Data from: Coupling between motor cortex and striatum increases during sleep over long-term skill learning.
    https://doi.org/10.7272/Q6KK9927

References

  1. Book
    1. Dudman JT
    2. Gerfen CR
    (2015) The basal ganglia
    In: Paxinos G, editors. Rat Nervous System. Elsevier. pp. 391–440.
    https://doi.org/10.1016/C2009-0-02419-2
  2. Thesis
    1. Lemke SM
    (2020)
    Learning in the Corticostriatal Network
    UC San Francisco Electronic Theses and Dissertations.

Decision letter

  1. Michael J Frank
    Senior Editor; Brown University, United States
  2. Aryn H Gittis
    Reviewing Editor; Carnegie Mellon University, United States
  3. Eric Yttri
    Reviewer; Janelia Farm, United States
  4. David Robbe
    Reviewer; INSERM U1249; Aix-Marseille University, France

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This work is a thought-provoking study of the interaction between sleep and corticostriatal plasticity. The reviewers agreed that there are many strengths of the study and it could spark new directions of research.

Decision letter after peer review:

Thank you for sending your article entitled "Sleep spindles coordinate corticostriatal reactivations during the emergence of automaticity" for peer review at eLife. Your article is being evaluated by 3 peer reviewers, and the evaluation is being overseen by a Reviewing Editor and Michael Frank as the Senior Editor.

Essential revisions:

The reviewers felt that the topic of the paper is very interesting and the study is original and creative. However, a number of issues were raised about clarity of the presentation, data analysis, and interpretation of the results. The concerns fall under the following categories:

1. Trajectory consistency and movement speed

Throughout the manuscript, the authors define automaticity has an increased consistency in reaching trajectory (abstract, line 23-26, line 86) but the main measurement used in figure 1 is the correlation of the velocity profile (which actually shows a strong increase in speed during learning). This is problematic for several reasons:

– An inattentive reader may think that such that the y-axis label "reach correlation" in panels e, f, and g of figure 1 is referring to the correlations of the trajectories shown in c.

– This choice raises the question of why the authors quantified behavior by looking at the consistency of the speed profile rather than directly through trajectories correlation. Looking at the traces of the trajectories in Figure 1C, the improvement in trajectory consistency is not clear (day 2 seems to be more consistent than day 8). If the improvement in trajectory consistency is less pronounced than the increase in fast speed consistency, the authors should make significant modifications in the way they present their behavioral data and acknowledge that their physiological changes could explain either trajectory consistency, increased (fast) speed consistency, increase speed or a mixture of the three.

– Increase in speed consistency does not necessarily imply an increase in speed. The striking increase in reaching speed with learning (Figure 1D) is just shown for one animal. The authors should show it for their 6 animals. If the 6 animals show an increase in movement speed this is a point that should be discussed throughout the manuscript, especially in light of the many works linking dorsal striatum and movement speed.

– In figure 1e, the authors showed in grey the individual speed profile correlation of 6 rats and the mean+sem. Something is quite wrong with the mean trace. In the first days, the mean should be much higher, close to 0.8. Currently, its first value is between the 5th and 6ht values. Second, on a statistical point of view, using mean and SEM is meaningless when n=6. The median would make more sense and there is no need for error bars as the entire dataset is shown. Once the group representation is corrected, taking into account that the y-axis in e) is cut at 0.6, it will be clear that the increase in speed profile consistency is far from impressive.

– To demonstrate that the behavior is automatic, in the new pellet location task, the authors used trajectory correlation to quantify behavior, that is, a different metric than in figure 1. This inconsistency in behavioral metrics across two related figures (Figure 1 and Figure S1) raises the question of whether the lack of behavioral change shown in Figure S1 (which the authors use to claim automaticity) is robust when looking at speed (either absolute speed or consistency). Added to the fact that this experiment was only performed on two animals this part was really not convincing. Anyway, the authors should also examine whether movement speed is affected by the relocation of the pellet.

2. Validate LFP recordings

To study M1 -DLS functional coupling, the authors, in some of their analyses, used striatal LFP to compute M1 and DLS coherency. When introducing their result section, the authors stated that "within the corticostriatal network, theta coherence (4-8Hz) has been previously shown to reflect coordinated population spiking activity8,9,34 " (Line 128). The authors should mention recent works showing major volume conduction in the striatum in this frequency range (Lalla et al., 2017) and in the γ range (Carmichael et al., 2017), as the authors seem to be aware of this potential confound (in Lemke et al. NN, 2019, relevant works are cited). Sleep rhythms are well known to be controlled by the thalamocortical systems and the LFP' sources to be in the cortex (Kandel and Buzsaki 1997). The lack of organization of the input on striatal neurons along with their radial somatodendritic shape makes the striatum a poor candidate to generate fields that can summate and be recorded extracellularly. Because the striatal recording sites in the present study are located just below the cortex it is very likely that most, if not all the striatal LFPs is volume-conducted from neighboring cortical sources. The authors said they used common-average referencing to limit volume conduction but this will not fix the problem. Indeed, subtracting two oscillatory signals with the same phase and frequency, but different amplitude (as it can happen in the striatum due to the passive attenuating effects of the brain tissue on LFPs) will result in an oscillatory signal with a preserved rhythmicity. The authors should look carefully at the work of Carmichael and collaborator (2017). In this study, the authors showed that striatal LFPs amplitude were slowly decreasing as the striatal electrodes were further away from the cortex. Carmichael also showed that striatal spiking modulation by striatal LFPs is not a criterion for local generation of the field (all it shows is that striatal units are modulated by cortical rhythms which is expected as the cortex provides strong excitation to the striatal neurons). If the authors want to make a claim about striatal LFPs beeing local then they need to show that their striatal LFPs (referenced against their cereball screw) are not progressively decreasing away from the cortex. It is an important issue that can not be ignored by the authors.

– Monosynaptically connected pairs and cross-correlograms (CCGs). The authors claim to identify monosynaptically connected neuron pairs from CCGs. Previous works (see for instance Bartho et al. 2004) have shown extremely sharp peaks in CCG of putatively connected neurons in the cortex and hippocampus but something is clearly different in the CCGs shown here. Indeed, it is striking that the CCG peaks shown are very smoothed (it is more a wave than a peak). In fact, this wave crosses the center of CCG and seems significant in the positive time bins, suggesting that striatal neurons fire before cortical neurons, which makes little sense. It is surprising that the authors did not mention in their result section the potential alternative mechanisms explaining such "peaks" that spread until positive CCG values. Indeed, it is well known that there are alternative explanations for these observations, such as indirect polysynaptic partners as well as common ("third party") input or slower co-modulation (see Brody CD (1999) Correlations without synchrony, and several papers by Asohan Amarasingham and colleagues). Amarasingham et al. (2012), J. Neurophysiology, talks about interval jitter, why it's relevant to separating fast from slow comodulation and explains the history of these problems. In this regard, the shuffling method briefly mentioned by the authors in the method section is unclear and does not seem to address properly the issue of separating fast and slow comodulation (see Amarasingham 2012).

– While M1 RM significance passed the arbitrary α of 0.05, it is a dramatically weaker effect than that seen in DLS. I would ask the authors please comment on this difference. Additionally, both Δ and SO strongly modulate M1, and **to a greater extent than any of the other effects mentioned in the text!** Please point this out to your readers and offer an interpretation. Do these results argue that in M1 – unlike DLS- the spindles not the main contributor? Or if nothing else, that the modulation is non-specific? (I think the former. this may actually help the authors in their attempts to dissociate m1 from dls sleep processes). The spindle story is great, but these results change the interpretation should not be buried.

3. Neural Trajectories

There were 3 major concerns raised 1)how the trajectories are composed, 2) the validity of the predicted trajectory and 3) the comparison of the trajectories:

1) The pc plots represent trial-averaged activity. Although no reach dynamics are given in this paper, a very similar study by the same authors shows that early on, the variability in duration of the outward, and outward+grasping, are very high compared to late in learning. Whatever 'reach signal' is present, this variability will cause the trial-averaged values to be greatly reduced. It is recommended that the authors address this by: (a) time-warping individual trials to overcome this and (b) doing this for only the outward trajectory, as their previous paper shows that so much of the change in duration due to learning is the result of improved grasping ( e.g the difference between PT and RO in Lemke 2019).

2) It is also unclear if the authors are trying to say in Figure 2 "Offline increases in functional connectivity predict the emergence of low-dimensional cross area neural dynamics during behavior" The authors should disambiguate cross area dynamics from within area dynamics. Previous work has shown that there is little to no goal-related movement activity early on in training. More importantly, the authors themselves show this in their 2019 paper that although there is some Day 1 m1 spiking modulation related to the reach, the psth for striatum is flat psth (previous comments on variability of timing still apply). They also show some reach-aligned power increase in 4-8 Hz in M1, but nothing in striatum. Given the small dynamics(?) that might be in the striatum, it is actually impressive that there is as much similarity between DLS and predict as there is. Can the authors show that this is a lack of a cross-area relationship, not just a lack of signal – which would preclude even the possibility of that relationship. Potentially they could scale with the power of the signal – amplitude or % explained variability of the first PC's, or just plot avg dms modulation in b. Otherwise, the cross area dynamics argument is not compelling. This same problematic facet comes into play later when determining reach modulation (which was done across all days).

3) Finally, how do you quantify quality of prediction? "The predictive ability of these models was assessed by calculating the correlation between the actual neural trajectories and the predicted trajectories." This explanation lacks vital details about the comparison and the model itself. Moreover, these are time-varying 2-D vectors, where one offset or delay early on can propagate throughout the trajectory, even though the trajectories would otherwise be identical. Much work has been done on this in recent years, as this is a difficult question to tackle. Euclidean distances of a time-warped distribution or point distribution models might help. Regardless, more detail is needed.

4. Clarify data presentation:

– Figure 2c – The authors show an interesting temporal dynamic here, and there are hints that online DEcorrelation may contribute to learning (for which there is some evidence) as well as offline increases in coherence. This may be just a happenstance of this specific example, but 2c makes it clear that showing the trends across time, rather than in sum as in Figure 2d, would be helpful. Could the authors please add a figure like 2c, but across all 6 animals? It would be nice to see this possibility of online changes at least briefly discussed as well.

– Fig4 sup2, etc – The cited conduction delay between m1 and DS is very specific: 5-7ms (and with a decrease beyond baseline in the 1-3ms range). The authors should redo their analysis using 1-10ms to 5-10ms, or provide a clearly compelling rationale for including 1ms latencies.

Moreover, it is unclear why the x scale is +/- 75ms? this seems quite large and belies the bigger question of why the distributions are so large, especially for data that should be normalized (subtracting out baseline correlations, and thus should be sharper). The largely symetic and broad increase, extending before 0 lag, is worrisome. If it is a function sleep vs wake, can connectivity be determined only during wake?

AP5 effects:

– The presentation and description of the results are confusing and incomplete. The experimental design could not be interpreted by reading the result section or looking at the figure 1 (panel f and g). In the method section, the authors wrote that " In the first five days of training, we infused three rats with AP5 and three rats with saline, for the second five days, we switched the infusion, i.e., animals that received AP5 in the first five days, received saline for the second five days, and vice-versa". First, this sentence should be included in the main text. Second, why did the authors show only one animal in figure 1f ? they should show the six rats.

– I could not make sense of the analytical logic in panel g. The authors correlated the trial by trial speed profiles during training with the average speed profile at the end of training, saline, or AP5 sessions. During "control" training (saline and learning cohorts), as the behavior changes, there should be lower correlation at the beginning of training and higher correlation at the end. In addition when the speed profile does not change (early AP5 or late learning cohort) there could be higher correlations. It is not clear to me how this measurement can be useful as it may vary a lot during learning and could also be different depending on the type of behavioral plateau.

– Also it is unclear whether the effect of AP5 on day-to-day correlation should be the same when injected since day1 or after 5 days of practice under saline condition. The data shown for one animal seems to show a clear decrease in speed profile correlation during AP5. What do the authors make of such effect ? It seems to indicate that AP5 not only block automatization but also impair performance. Is a similar effect observed when AP5 given after 5 days of saline ? The effect of AP5 on movement speed should be shown.

– Histology is absent. At a minimum, ascertaining how well layer 5 was targeted across animals should be included.

– Reviewers were confused by sleep nomenclature:

Figure 2a – it was not obvious that sleep 3 = next day's sleep 1. can you include something to the effect of "sleep3 = day n+1's sleep 1" just to help make it perfectly clear to the readers?

Another commented that "pre-sleep" and "post-sleep" nomenclature very confusing. These aren't used to define time periods before and after a sleeping episode, but instead are periods of "sleep" (although there can be wake states mixed in) before and after training. "Pre-training sleep" and "post-training sleep" would be much more clear.

5. Reach modulation across days

– Reach Modulation appears to be determined across all days? This assumes that each unit is maintained across 8 days- evidence supporting this needs to be provided. More importantly though, the potential for this to change the DLS results is remarkable. As previously stated, the authors showed very little reach related activity on day 1. Why not determine these values for day 1-2 vs day 7-8? or better yet, as somewhat suggested by the authors, can they predict what neurons will become RM over time?

– It seems that the day-to-day correlation measurement allows the authors to have more data points per rat. However, these measurements for a given animal are not independent. Unfortunately, pseudoreplication is a major issue in neuroscience and behavioral studies more broadly (S.E Lazic's "The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?" BMC Neuroscience, 2010, highlights some things to worry about in this regard). The authors should avoid this in this figure and other neurophysiological analyses.

6. NREM sleep

– It would be interesting to know how many times the animals entered NREM sleep between the post-training sleep day N and pre-sleep of subsequent day N+1. Are these between day changes depended upon the total amount of sleep that the animals got?

– pg6 line 192. In order to ascertain that NREM > others, it would be useful to see this relationship in unconnected pairs. e.g. control for the possibility that NREM is always highest.

– Since all of the data is from a specific subsection of these offline states (i.e. – NREM sleep) it would be good to discuss their findings within the context of what is known/unknown about plasticity processes during offline wake states or other sleep periods.

7. Topics to clarify/discuss

– Page 7 Line 234 – "In contrast, DLS units did not increase in modulation during either δ waves (Supplemental Figure 5) or slow oscillations (Supplemental Figure 6) after training." This statement is only partially correct. I applaud the authors giving, and showing, the same treatment for all 3 types of data. Many would gloss over this. DLS nonRM weak input demonstrated a stronger effect than M1 RM, and SO demonstrated a trend. Some will find this juxtaposition – task related-spindle, unrelated-other to be informative, particularly as information itself can be more strongly represented through increasing the signal or increasing the discernability from not-signal. There is considerable work, including from the authors, showing the importance of other elements of sleep. Throughout the paper, SO and δ aren't negligible contributors. Rather than glaze over this for the purpose of a throughline, please expand upon this for a more complete view.

– The discussion is somewhat lacking, being just a summary of the findings. In particular, I would like to see the authors put these results into context with the broader models of Tononi (SHY) and their own work, e.g. Kim 2019 wherein they focus on SO and δ in a very similar task – but which are largely thrown under the bus here.

– It is unclear why "automaticity" is a beneficial outcome here for this task. Does it result in greater pellet retrieval success (no data presented on this point) or does it free up or allow reorienting of cognitive resources (this is indirectly alluded to in the Discussion)?

– The argument structure in the Intro has a few holes in it.

a) Specifically the paragraph covering lines 42-53. "Currently, our understanding of how sleep impacts distributed brain networks is largely derived from the systems consolidation theory, where it has been shown that coordinated activity patterns across hippocampus and cortex lead to the formation of stable long-term memories in cortex that do not require the hippocampus 23-25. Notably, whether sleep impacts the connectivity across hippocampus and cortex has not been established. Therefore, one possibility is that, in the network, we similarly observe coordinated cross-area activity patterns during sleep but do not find evidence for the modification of corticostriatal connectivity during offline periods. Alternatively, it is possible that we find evidence that cross-area activity patterns during sleep modify the connectivity between cortex and striatum and impact network activity during subsequent behavior."

b) The first two sentences describe the role of sleep in systems consolidation theory and "coordinated activity patterns" and "connectivity" across hippocampus and cortex. The third and fourth sentences then jump to two alternative speculations about "the network" (which is undefined), and the role of sleep in modifying corticostriatal connectivity patterns. It is not clear how these two things are linked. This link should be explicitly stated.

– Offline is used in two different ways here. Offline behavioral states include any sleeping or waking state outside of training. Then there are online (between pre- and post-sleep on the same day) and offline (between post-sleep of day N and pre-sleep of subsequent day N+1) changes in neural activity between these offline behavioral states.

– The authors need to better define what they mean by slow oscillations (Figure 7, Figure 4e). Cortical slow oscillations are classically described as regular up and down states transition, with sleep spindles occurring during the up states (Buzsaki, rhythms of the brain p 197). When the authors look at the spiking modulation by slow oscillations and spindles (Figure 4e) or the temporal proximity between slow oscillations and spindles (Figure 7), I doubt the authors are referring to this classical up/down slow oscillations. Are the slow oscillations the authors refer to equivalent to K complexes? If yes, this could be mentioned.

Reviewer #1:

Lemke and co-authors present a thorough interrogation of the role of sleep in motor memory consolidation and performance, a clear continuation of the lab's focus. Multiple techniques are used to arrive at the conclusions, which are largely robust and arrived at through sound scientific methods. It is a rather beefy and ambitious paper, and for that and the polish, the authors should certainly be commended. Based on the findings and the quality of the work, I am eager to see the work brought to publication but some critical points must be addressed before continuing the discussion.

Figure 2c – The authors show an interesting temporal dynamic here, and there are hints that online DEcorrelation may contribute to learning (for which there is some evidence) as well as offline increases in coherence. This may be just a happenstance of this specific example, but 2c makes it clear that showing the trends across time, rather than in sum as in Figure 2d, would be helpful. Could the authors please add a figure like 2c, but across all 6 animals? It would be nice to see this possibility of online changes at least briefly discussed as well.

Figure 3 – I appreciate the authors' attempts here. Previously they showed an increase in within-area spiking and 4-8hz dynamics as learning progressed, so this work is a natural progression. However, I have three major reservations concerning (1)how the trajectories are composed, (2) the validity of the predicted trajectory and (3) the comparison of the trajectories:

1) The pc plots represent trial-averaged activity. Although no reach dynamics are given in this paper, a very similar study by the same authors shows that early on, the variability in duration of the outward, and outward+grasping, are very high compared to late in learning. Whatever 'reach signal' is present, this variability will cause the trial-averaged values to be greatly reduced. I would recommend a) time-warping individual trials to overcome this and b) doing this for only the outward trajectory, as their previous paper shows that so much of the change in duration due to learning is the result of improved grasping ( e.g the difference between PT and RO in Lemke 2019). If the authors choose to include full trajectories, or better yet, outward and inward, this would be fine as well, but as the limb trajectories are only outward, I would compare apples to apples.

2) It is also unclear if the authors are trying to say in Figure 2 / "Offline increases in functional connectivity predict the emergence of low-dimensional cross area neural dynamics during behavior" In particular, I would disambiguate cross area dynamcs from within area dynamics. David Robbe and others have shown data suggesting that there is little to no goal-related movement activity early on in training. More importantly, the authors themselves show this in their 2019 paper that although there is some Day 1 m1 spiking modulation related to the reach, the psth for striatum is flat psth (previous comments on variability of timing still apply). They also show some reach-aligned power increase in 4-8 Hz in M1, but nothing in striatum. Given the small dynamics(?) that might be in the striatum, I'm actually impressed that there is as much similarity between DLS and predict as there is. Thus, I ask the authors to show that this is a lack of a cross-area relationship, not just a lack of signal – which would preclude even the possibility of that relationship. Potentially they could scale with the power of the signal – amplitude or % explained variability of the first PC's, or just plot avg dms modulation in b. Otherwise, I don't quite buy the cross area dynamics argument. This same problematic facet comes into play later when determining reach modulation (which was done across all days).

3) Finally, how do you quantify quality of prediction? "The predictive ability of these models was assessed by calculating the correlation between the actual neural trajectories and the predicted trajectories." This explanation lacks vital details about the comparison and the model itself. Moreover, these are time-varying 2-D vectors, where one offset or delay early on can propagate throughout the trajectory, even though the trajectories would otherwise be identical. Much work has been done on this in recent years, as this is a difficult question to tackle. Euclidean distances of a time-warped distribution or point distribution models might help. Regardless, more detail is needed.

Figure 4 sup2, etc – The cited conduction delay between m1 and DS is very specific: 5-7ms (and with a decrease beyond baseline in the 1-3ms range). I would ask the authors to redo their analysis using 1-10ms to 5-10ms, or provide a clearly compelling rationale for including 1ms latencies.

Moreover, it is unclear to me why is the x scale +/- 75ms? this seems quite large and belies the bigger question of why the distributions are so large, especially for data that should be normalized (subtracting out baseline correlations, and thus should be sharper). The largely symmetric and broad increase, extending before 0 lag, is worrisome. I would expect even the average data to look more like 6c right, or highlighted in the differences between sup3c top/bottom.

If it is a function sleep vs wake, can connectivity be determined only during wake?

Figure 4 – I would appreciate a differentiation of spn's / fsi's. even just baseline firing rate. although I assume most of the striatal neurons are spn's, it would be good to know a) if the ratio was above chance and b) if the findings generalized to both populations.

pg6 line 192. In order to ascertain that NREM > others, it would be useful to see this relationship in unconnected pairs. e.g. control for the possibility that NREM is always highest.

Reach Modulation appears to be determined across all days? This assumes that each unit is maintained across 8 days (I won't make a stink about this, but if this assertion is to be made, I should be substantiated). More importantly though, the potential for this to change the DLS results is remarkable. As previously stated, the authors showed very little reach related activity on day 1. Why not determine these values for day 1-2 vs day 7-8? or better yet, as somewhat suggested by the authors, can they predict what neurons will become RM over time?

While M1 RM significance passed the arbitrary α of 0.05, it is a dramatically weaker effect than that seen in DLS. I would ask the authors please comment on this difference. Additionally, both Δ and SO strongly modulate M1, and to a greater extent than any of the other effects mentioned in the text! Please point this out to your readers and offer an interpretation. Do these results argue that in M1 – unlike DLS- the spindles not the main contributor? Or if nothing else, that the modulation is non-specific? (I think the former. this may actually help the authors in their attempts to dissociate m1 from dls sleep processes). The spindle story is great, but these results change the interpretation should not be buried.

Page 7 Line 234 – "In contrast, DLS units did not increase in modulation during either δ waves (Supplemental Figure 5) or slow oscillations (Supplemental Figure 6) after training." This statement is only partially correct. I applaud the authors giving, and showing, the same treatment for all 3 types of data. Many would gloss over this. DLS nonRM weak input demonstrated a stronger effect than M1 RM, and SO demonstrated a trend. Some will find this juxtaposition – task related-spindle, unrelated-other to be informative, particularly as information itself can be more strongly represented through increasing the signal or increasing the discernability from not-signal. There is considerable work, including from the authors, showing the importance of other elements of sleep. Throughout the paper, SO and δ aren't negligible contributors. Rather than glaze over this for the purpose of a throughline, please expand upon this for a more complete view.

The discussion is somewhat lacking, being just a summary of the findings. In particular, I would like to see the authors put these results into context with the broader models of Tononi (SHY) and their own work, e.g. Kim 2019 wherein they focus on SO and δ in a very similar task – but which are largely thrown under the bus here.

Methods – thank you for indicating the data for which each method applies. It really helps the reader.

histology is absent but I do not see this as an absolutely critical element. I will say that ascertaining how well layer 5 was targeted across animals would be very helpful

Reviewer #2:

This is a well written, novel, technically sound and timely paper. Addressing the following issues could strengthen it further.

It is unclear why "automaticity" is a beneficial outcome here for this task. Does it result in greater pellet retrieval success (no data presented on this point) or does it free up or allow reorienting of cognitive resources (this is indirectly alluded to in the Discussion)?

The argument structure in the Intro has a few holes in it. Specifically the paragraph covering lines 42-53.

"Currently, our understanding of how sleep impacts distributed brain networks is largely derived from the systems consolidation theory, where it has been shown that coordinated activity patterns across hippocampus and cortex lead to the formation of stable long-term memories in cortex that do not require the hippocampus 23-25. Notably, whether sleep impacts the connectivity across hippocampus and cortex has not been established. Therefore, one possibility is that, in the network, we similarly observe coordinated cross-area activity patterns during sleep but do not find evidence for the modification of corticostriatal connectivity during offline periods. Alternatively, it is possible that we find evidence that cross-area activity patterns during sleep modify the connectivity between cortex and striatum and impact network activity during subsequent behavior."

The first two sentences describe the role of sleep in systems consolidation theory and "coordinated activity patterns" and "connectivity" across hippocampus and cortex. The third and fourth sentences then jump to two alternative speculations about "the network" (which is undefined), and the role of sleep in modifying corticostriatal connectivity patterns. It is not clear how these two things are linked. This link should be explicitly stated.

I found the "pre-sleep" and "post-sleep" nomenclature very confusing. These aren't used to define time periods before and after a sleeping episode, but instead are periods of "sleep" (although there can be wake states mixed in) before and after training. "Pre-training sleep" and "post-training sleep" would be much more clear.

Offline is used in two different ways here. Offline behavioral states include any sleeping or waking state outside of training. Then there are online (between pre- and post-sleep on the same day) and offline (between post-sleep of day N and pre-sleep of subsequent day N+1) changes in neural activity between these offline behavioral states.

It would be interesting to know how many times the animals entered NREM sleep between the post-training sleep day N and pre-sleep of subsequent day N+1. Are these between day changes depended upon the total amount of sleep that the animals got?

Also, since all of their data is from a specific subsection of these offline states (i.e. – NREM sleep) it would be good to discuss their findings within the context of what is known/unknown about plasticity processes during offline wake states or other sleep periods.

Reviewer #3:

In this manuscript, Lemke and collaborators examined whether offline changes in the functional coupling between the primary motor cortex (M1) and the dorsolateral striatum (DLS) contribute to the automatization of reaching movements in rats. The authors performed perturbation of striatal activity and simultaneous multi-unit/LFP recordings in M1 and DLS during sleep/rest recording sessions before and after reaching sessions. The authors report behavioral impairment following blocking of striatal NMDA receptors during offline periods. They also show the results of analyses congruent with the idea that there is an increased functional coupling between M1 and DLS observed during sleep, which parallels the increase in automaticity. They also point at sleep spindles as a critical period of corticostriatal plasticity.

A majority of studies have examined corticostriatal activity during behavior and the focus of this study on what's going during sleep/rest periods is very interesting, especially taking into account the effect of sleep on consolidation of motor skills. The electrophysiological recordings performed are extremely challenging experiments and consequently, the data generated are extremely rich. Still, large-scale unit activity and sleep-related LFPs rhythms are notoriously tricky to analyze, especially in such different structures as M1 and DLS. In this context, I found that some of the main claims were a bit hasty considering the inherent limitations of the analyses performed (extracellular recordings are blind to many of the underlying mechanisms and multiple confounds that could account for increased coordination between LFPs and spiking patterns across brain regions). In addition, the behavioral analyses performed were statistically problematic and did not address the potential interaction/confound between automaticity and speed/vigor, which is relevant to the corticostriatal function. In conclusion, IMO, this is an impressive experimental work with potentially interesting results on a topic that has not been very well studied, but the manuscript should be significantly improved in several key points.

1. Trajectory consistency and movement speed

Throughout the manuscript, the authors define automaticity has an increased consistency in reaching trajectory (abstract, line 23-26, line 86) but the main measurement used in figure 1 is the correlation of the velocity profile (which actually shows a strong increase in speed during learning). This is problematic for several reasons:

-An inattentive reader may think that such that the y-axis label "reach correlation" in panels e, f, and g of figure 1 is referring to the correlations of the trajectories shown in c.

-This choice raises the question of why the authors quantified behavior by looking at the consistency of the speed profile rather than directly through trajectories correlation. Looking at the traces of the trajectories in Figure 1C, the improvement in trajectory consistency is not clear (day 2 seems to be more consistent than day 8). If the improvement in trajectory consistency is less pronounced than the increase in fast speed consistency, the authors should make significant modifications in the way they present their behavioral data and acknowledge that their physiological changes could explain either trajectory consistency, increased (fast) speed consistency, increase speed or a mixture of the three.

– Increase in speed consistency does not necessarily imply an increase in speed. The striking increase in reaching speed with learning (Figure 1D) is just shown for one animal. The authors should show it for their 6 animals. If the 6 animals show an increase in movement speed this is a point that should be discussed throughout the manuscript, especially in light of the many works linking dorsal striatum and movement speed.

– In figure 1e, the authors showed in grey the individual speed profile correlation of 6 rats and the mean+sem. Something is quite wrong with the mean trace. In the first days, the mean should be much higher, close to 0.8. Currently, its first value is between the 5th and 6ht values. Second, on a statistical point of view, using mean and SEM is meaningless when n=6. The median would make more sense and there is no need for error bars as the entire dataset is shown. Once the group representation is corrected, taking into account that the y-axis in e) is cut at 0.6, it will be clear that the increase in speed profile consistency is far from impressive.

– To demonstrate that the behavior is automatic, in the new pellet location task, the authors used trajectory correlation to quantify behavior, that is, a different metric than in figure 1. This inconsistency in behavioral metrics across two related figures (Figure 1 and Figure S1) raises the question of whether the lack of behavioral change shown in Figure S1 (which the authors use to claim automaticity) is robust when looking at speed (either absolute speed or consistency). Added to the fact that this experiment was only performed on two animals this part was really not convincing. Anyway, the authors should also examine whether movement speed is affected by the relocation of the pellet.

2. AP5 effects.

– The presentation and description of the results are confusing and incomplete. I could not understand the experimental design by reading the result section or looking at the figure 1 (panel f and g). In the method section, the authors wrote that " In the first five days of training, we infused three rats with AP5 and three rats with saline, for the second five days, we switched the infusion, i.e., animals that received AP5 in the first five days, received saline for the second five days, and vice-versa". First, this sentence should be included in the main text. Second, why did the authors show only one animal in figure 1f ? they should show the six rats.

– I could not make sense of the analytical logic in panel g. The authors correlated the trial by trial speed profiles during training with the average speed profile at the end of training, saline, or AP5 sessions. During "control" training (saline and learning cohorts), as the behavior changes, there should be lower correlation at the beginning of training and higher correlation at the end. In addition when the speed profile does not change (early AP5 or late learning cohort) there could be higher correlations. It is not clear to me how this measurement can be useful as it may vary a lot during learning and could also be different depending on the type of behavioral plateau.

– Also it is unclear whether the effect of AP5 on day-to-day correlation should be the same when injected since day1 or after 5 days of practice under saline condition. The data shown for one animal seems to show a clear decrease in speed profile correlation during AP5. What do the authors make of such effect ? It seems to indicate that AP5 not only block automatization but also impair performance. Is a similar effect observed when AP5 given after 5 days of saline ? The effect of AP5 on movement speed should be shown.

– It seems that the day-to-day correlation measurement allows the authors to have more data points per rat. However, these measurements for a given animal are not independent. Unfortunately, pseudoreplication is a major issue in neuroscience and behavioral studies more broadly (S.E Lazic's "The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?" BMC Neuroscience, 2010, highlights some things to worry about in this regard). The authors should avoid this in this figure and other neurophysiological analyses.

3. Striatal LFPs. To study M1 -DLS functional coupling, the authors, in some of their analyses, used striatal LFP to compute M1 and DLS coherency. When introducing their result section, the authors stated that "within the corticostriatal network, theta coherence (4-8Hz) has been previously shown to reflect coordinated population spiking activity8,9,34 " (Line 128). I find it a bit unfair from the authors not to mention recent works showing major volume conduction in the striatum in this frequency range (Lalla et al., 2017) and in the γ range (Carmichael et al., 2017), as the authors seem to be aware of this potential confound (in Lemke et al. NN, 2019, relevant works are cited). Sleep rhythms are well known to be controlled by the thalamocortical systems and the LFP' sources to be in the cortex (Kandel and Buzsaki 1997). The lack of organization of the input on striatal neurons along with their radial somatodendritic shape makes the striatum a poor candidate to generate fields that can summate and be recorded extracellularly. Because the striatal recording sites in the present study are located just below the cortex it is very likely that most, if not all the striatal LFPs is volume-conducted from neighboring cortical sources. The authors said they used common-average referencing to limit volume conduction but this will not fix the problem. Indeed, subtracting two oscillatory signals with the same phase and frequency, but different amplitude (as it can happen in the striatum due to the passive attenuating effects of the brain tissue on LFPs) will result in an oscillatory signal with a preserved rhythmicity.

The authors should really look carefully at the work of Carmichael and collaborator (2017). In this study, the authors showed that striatal LFPs amplitude were slowly decreasing as the striatal electrodes were further away from the cortex. Carmichael also showed that striatal spiking modulation by striatal LFPs is not a criterion for local generation of the field (all it shows is that striatal units are modulated by cortical rhythms which is expected as the cortex provides strong excitation to the striatal neurons). If the authors want to make a claim about striatal LFPs being local then they need to show that their striatal LFPs (referenced against their cerebral screw) are not progressively decreasing away from the cortex. It is an important issue that can not be ignored by the authors.

4. Monosynaptically connected pairs and cross-correlograms (CCGs). The authors claim to identify monosynaptically connected neuron pairs from CCGs. Previous works (see for instance Bartho et al. 2004) have shown extremely sharp peaks in CCG of putatively connected neurons in the cortex and hippocampus but something is clearly different in the CCGs shown here. Indeed, it is striking that the CCG peaks shown are very smoothed (it is more a wave than a peak). In fact, this wave crosses the center of CCG and seems significant in the positive time bins, suggesting that striatal neurons fire before cortical neurons, which makes little sense. I am surprised that the authors did not mention in their result section the potential alternative mechanisms explaining such "peaks" that spread until positive CCG values. Indeed, it is well known that there are alternative explanations for these observations, such as indirect polysynaptic partners as well as common ("third party") input or slower co-modulation (see Brody CD (1999) Correlations without synchrony, and several papers by Asohan Amarasingham and colleagues). Amarasingham et al. (2012), J. Neurophysiology, talks about interval jitter, why it's relevant to separating fast from slow comodulation and explains the history of these problems. In this regard, the shuffling method briefly mentioned by the authors in the method section is unclear and does not seem to address properly the issue of separating fast and slow comodulation (see Amarasingham 2012).

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Coupling between motor cortex and striatum increases during sleep over long-term skill learning" for further consideration by eLife. Your revised article has been evaluated by Michael Frank (Senior Editor) and a Reviewing Editor.

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

1. A graph showing the effects of AP5 on speed, for all 6 mice should be shown in the main figure in a single plot with a fixed scale. All of Reviewer #3's requests for this figure should be included, including statistics.

2. This figure is requested to address potential confounds of effects on vigor that are currently interpreted as effects on motor learning. Depending on how conclusive the effects of AP5 on speed are, the authors might need to revise their language throughout the paper, if an effect of movement vigor cannot be rules out.

3. A paragraph should be added to the Discussion explaining the potential confounds of effect on movement vigor on the study's results.

Although these points came from Reviewer 3, in consultation the other Reviewers agreed these points above are important to address.

Reviewer #1:

The reviewed document is much more coherent, with very clear and intuitive figures in addition to a high degree of rigor. I have nothing to add at this time.

Reviewer #2:

Comments have been satisfactorily addressed.

Reviewer #3:

The authors have made a significant effort to reply to most of my comments and those of the other reviewers. However one of the major points that I had asked has not been addressed and another related one is not addressed properly. Those points were critical for the authors claim that "our results provide evidence that sleep shapes cross-area coupling required for skill learning" (last sentence abstract, maybe a "is" is missing?).

Specifically I asked the authors to report the effect of AP5 injection on movement speed. They responded that they provide the effect of AP5 on the speed profile correlation but clearly this is not what I asked. I don't think the authors misunderstood me because in reply to another comment about movement speed (my original comment :"-Increase in speed consistency does not necessarily imply an increase in speed. The striking increase in reaching speed with learning (Figure 1D) is just shown for one animal. The authors should show it for their 6 animals. If the 6 animals show an increase in movement speed this is a point that should be discussed throughout the manuscript, especially in light of the many works linking dorsal striatum and movement speed"), the authors did provide in their supplementary figure 3, plots showing that movement speed increases in the 6 animals during training. Thus it is unclear why the authors did not show the effect of AP5 on movement speed.

My request is really fundamental in regard of the main claim of the author (requirement for skill learning) because it is possible that AP5 decreases not just the trial-by-trial speed profile correlation (what the authors use as a definition for skill) but also the general speed of movements which is not necessarily related to skill learning but could have a motivational origin (see work of the Galea lab). If this was the case, the authors would need to seriously consider that the changes in corticostriatal connectivity could primarily reflect altered vigor or motor motivation which would be in agreement with several works in the field (Rob Turner lab, Dudman lab, Robbe lab, or even ideas on motor motivation by Josh Berke, one of the authors of this study).

This point is even more important as the authors, in the introduction or discussion, tend to write definitive sentences on a well-demonstrated role of cortico-striatal connection in motor skills while most of the references cited do not disambiguate the vigor or motivation confound (e.g., Kupferschmidt et al., Dang et al., Costa et al., Yin et al. …). This is quite misleading.

Moreover, when looking at the result of the AP5 experiment on the speed profile correlation across the 6 animals (supp Figure 4), the results are far from convincing. There are only 3 animals in each condition (3 in AP5-Saline and 3 in Saline AP5) and in each of the conditions, the result is not clear in at least one animal. There are also weird day by day drastic changes that make these results not so reliable. Moreover, the authors keep changing the y axis scale in a way that makes small improvements look big. Ethically speaking this is not appropriate, especially in such a journal as eLife. Neither is appropriate to put in the main figures 1d and 1e the best example out of 6 animals. The authors should integrate into these panels the results of the other animals. In addition to being fairer to the data, this would remove two supplementary figures.

Thus, the main figure 1 should show 1) the effect of AP5 on speed profile correlation on the 6 animals in the same graph with the same scale (Figure 1e); 2) the effect of AP5 on max speed on the 6 animals (new small panel) and the fig1d for all the animals. In addition, the statistics to examine the effect of AP5 should be done using paired comparison of the correlation and speed values not on their change. The usage of the changes in panel 1f is unfair because it is not clear what should be the effect of AP5 after saline (decrease or plateau?) I am afraid that the authors designed their statistical test in a way that favored their hypothesis.

In conclusion, while I do think this paper should be published in eLife, I am convinced that a fairer presentation and analyses of key experiments in figure 1 are required. Clearly, the effect of AP5 on skill learning is not as clear-cut as stated by the authors (and choosing to only show in the main figure the single best animal is not appropriate). In addition, an effect on movement speed is still highly likely (based on one animal) and should be reported for the 6 animals. Thus the main behavioral function of the interesting neurophysiological changes could be related to vigor/motor motivation, not skill learning. All this critical information should not be hidden but rather openly disclosed and discussed such as the reader can make up its mind.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Coupling between motor cortex and striatum increases during sleep over long-term skill learning" for further consideration by eLife. Your revised article has been evaluated by Michael Frank (Senior Editor) and a Reviewing Editor.

Reviewer #3:

I asked the authors to show the data for the 6 animals in which they compared AP5 and Saline striatal injection on the speed correlation profile. The authors have managed to plot the data in a way in which it is impossible to know which rat is which (all the points are gray, no running lines between the points). They plotted the mean of these data which is meaningless from a statistical viewpoint (n=6).

I also asked the authors to show on the same main figure the effect of AP5 on the max speed (to test the effect on vigor). The authors have put those results in a supplementary figure. The authors claim that AP5 has only an effect on the speed correlation profile not on max speed. But this is clearly due to their biased statistic in which they only look at the difference between the first and last sessions. Indeed in figure 1e AP5 only reduces the speed correlation on day 5 but there is no effect from day 1 to day4. Juxtaposing the effect of AP5/Saline on max speed (now Figure 1 S5) and speed correlation ( figure 1e bottom) clearly shows that the effects are strikingly similar. In fact, an unbiased treatment of the data (using the entire profile and permutation/bootstrapping) would probably show that the effect of AP5 on max speed is more pronounced than on speed correlation.

The authors seem to conclude that the AP5 effect is different on speed profile correlation and max speed because their statistical comparison with saline is significant in one case and insignificant in the other case.

However, comparing p values is meaningless statistically. This is an important issue that has been subject to publication in highly visible journals (https://www.nature.com/articles/nn.2886).

The authors should have compared the effect themselves which again should be done using permutation/bootstrapping on the entire profile (not last/first).

Thus I maintain that impartial analyses of the data cannot disentangle whether the DLS is primarily contributing to learning the accurate movement or contribute to the increased vigor which is driven by a motivational aspect. I am glad the authors acknowledge it in their discussion. However, most readers will go quickly through the title and abstract (and maybe introduction) and will probably cite this paper as additional evidence for a critical role of the striatum in motor skill learning which in my opinion is misleading.

https://doi.org/10.7554/eLife.64303.sa1

Author response

Essential revisions:

The reviewers felt that the topic of the paper is very interesting and the study is original and creative. However, a number of issues were raised about clarity of the presentation, data analysis, and interpretation of the results. The concerns fall under the following categories:

1. Trajectory consistency and movement speed

Throughout the manuscript, the authors define automaticity has an increased consistency in reaching trajectory (abstract, line 23-26, line 86) but the main measurement used in figure 1 is the correlation of the velocity profile (which actually shows a strong increase in speed during learning). This is problematic for several reasons:

– An inattentive reader may think that such that the y-axis label "reach correlation" in panels e, f, and g of figure 1 is referring to the correlations of the trajectories shown in c.

We thank the reviewers for this comment. – we have renamed reach correlation to “velocity profile correlation” to decrease this potential confusion. Please also see next point.

– This choice raises the question of why the authors quantified behavior by looking at the consistency of the speed profile rather than directly through trajectories correlation. Looking at the traces of the trajectories in Figure 1C, the improvement in trajectory consistency is not clear (day 2 seems to be more consistent than day 8). If the improvement in trajectory consistency is less pronounced than the increase in fast speed consistency, the authors should make significant modifications in the way they present their behavioral data and acknowledge that their physiological changes could explain either trajectory consistency, increased (fast) speed consistency, increase speed or a mixture of the three.

We thank the reviewers for the opportunity to expand on our methods. We have revised our manuscript to include a more elaborate and explicit rational behind choosing our learning metric. We will summarize our reasoning for quantifying the consistency of the velocity profile of the reach, rather than the spatial trajectory, below:

First, the spatial trajectory is constrained by both the task (the rat must reach through a small window in their box to a pellet location held constant) and the fact that it does not inherently capture temporal components. Over the course of learning, we observed that the spatial trajectory does not capture the main changes in reach strategy, which tends to be the speed and consistency at which the reaching trajectory is traversed. As our goal was to capture learning- (and sleep-) related changes in behavior, we found measuring changes in the velocity profile to best capture learning.

Second, we noted that animals often vary in the initial location of their reaching action, thus making day-to-day differences in spatial trajectory correlation difficult to interpret. Using the velocity profile and time locking trials to the moment when the rats interact with the pellet allows us to capture the consistency of the reaching velocity while approaching the pellet, while largely ignoring differences in starting location.

Third, one of the main changes that we observe during the learning of our task is the “smooth binding” of sub-movements – i.e., reaching towards the pellet, grasping the pellet, and retracting the pellet to the mouth – that were initially distinct. While all these movements still occur and traverse roughly the same path as in early training, with practice these movements are combined into a single, smooth action. The velocity profile appears to best capture this phenomenon.

– Increase in speed consistency does not necessarily imply an increase in speed. The striking increase in reaching speed with learning (Figure 1D) is just shown for one animal. The authors should show it for their 6 animals. If the 6 animals show an increase in movement speed this is a point that should be discussed throughout the manuscript, especially in light of the many works linking dorsal striatum and movement speed.

We have included a supplementary figure (Figure 1 —figure supplement 3) showing mean single-trial peak velocity across learning for all animals.

– In figure 1e, the authors showed in grey the individual speed profile correlation of 6 rats and the mean+sem. Something is quite wrong with the mean trace. In the first days, the mean should be much higher, close to 0.8. Currently, its first value is between the 5th and 6ht values. Second, on a statistical point of view, using mean and SEM is meaningless when n=6. The median would make more sense and there is no need for error bars as the entire dataset is shown. Once the group representation is corrected, taking into account that the y-axis in e) is cut at 0.6, it will be clear that the increase in speed profile consistency is far from impressive.

We apologize for this error and sincerely thank the reviewers for identifying it – we significantly edited Figure 1 and included a supplementary figure (Figure 1 —figure supplement 2) that displays all individual animal traces.

– To demonstrate that the behavior is automatic, in the new pellet location task, the authors used trajectory correlation to quantify behavior, that is, a different metric than in figure 1. This inconsistency in behavioral metrics across two related figures (Figure 1 and Figure S1) raises the question of whether the lack of behavioral change shown in Figure S1 (which the authors use to claim automaticity) is robust when looking at speed (either absolute speed or consistency). Added to the fact that this experiment was only performed on two animals this part was really not convincing. Anyway, the authors should also examine whether movement speed is affected by the relocation of the pellet.

We appreciate the reviewer’s concern and perspective on our measure of automaticity. In an effort to keep the revisions of reasonable scope, we agree that a compelling demonstration of this definition of automaticity is unachievable as we did not carry out the test of automaticity in all of our learning cohort and would therefore need to perform a new set of recordings to add further animals. Instead, we have focused on the long-term emergence and stabilization of a skilled action and its link to offline corticostriatal plasticity.

2. Validate LFP recordings

To study M1 -DLS functional coupling, the authors, in some of their analyses, used striatal LFP to compute M1 and DLS coherency. When introducing their result section, the authors stated that "within the corticostriatal network, theta coherence (4-8Hz) has been previously shown to reflect coordinated population spiking activity8,9,34 " (Line 128). The authors should mention recent works showing major volume conduction in the striatum in this frequency range (Lalla et al., 2017) and in the γ range (Carmichael et al., 2017), as the authors seem to be aware of this potential confound (in Lemke et al. NN, 2019, relevant works are cited). Sleep rhythms are well known to be controlled by the thalamocortical systems and the LFP' sources to be in the cortex (Kandel and Buzsaki 1997). The lack of organization of the input on striatal neurons along with their radial somatodendritic shape makes the striatum a poor candidate to generate fields that can summate and be recorded extracellularly. Because the striatal recording sites in the present study are located just below the cortex it is very likely that most, if not all the striatal LFPs is volume-conducted from neighboring cortical sources. The authors said they used common-average referencing to limit volume conduction but this will not fix the problem. Indeed, subtracting two oscillatory signals with the same phase and frequency, but different amplitude (as it can happen in the striatum due to the passive attenuating effects of the brain tissue on LFPs) will result in an oscillatory signal with a preserved rhythmicity. The authors should look carefully at the work of Carmichael and collaborator (2017). In this study, the authors showed that striatal LFPs amplitude were slowly decreasing as the striatal electrodes were further away from the cortex. Carmichael also showed that striatal spiking modulation by striatal LFPs is not a criterion for local generation of the field (all it shows is that striatal units are modulated by cortical rhythms which is expected as the cortex provides strong excitation to the striatal neurons). If the authors want to make a claim about striatal LFPs being local then they need to show that their striatal LFPs (referenced against their cerebral screw) are not progressively decreasing away from the cortex. It is an important issue that cannot be ignored by the authors.

We appreciate the reviewer’s concern regarding volume-conducted LFP signals and have carried out the following revisions to address this important point.

First, we have revised the text and included citations to outline the issue of volume conduction in the striatum. An important point about our current method is that the common average referencing is performed in DLS and M1 separately, not across M1 and DLS signals together, thus pre-empting the issue cited of subtracting different magnitude oscillations. Moreover, in this work, we use increases in M1-DLS LFP coherence to argue that M1 and DLS become more functionally coupled during offline, rather than online, periods. Our revisions therefore seek to provide compelling evidence that M1-DLS LFP coherence reflects functional connectivity, rather than making a direct claim about the local and non-local components of DLS LFP.

Second, we have added a new supplemental figure (Figure 2 —figure supplement 1) showing that volume-conducted LFP signals are not a major contributor to 4-8Hz LFP coherence. This figure displays the non-zero phase difference between 4-8Hz LFP signals in M1 and DLS during NREM sleep. If 4-8Hz LFP coherence is simply reflecting volume conducted signals, then we would have observed 4-8Hz LFP signals in M1 and DLS that have zero-phase lag, consistent with volume conduction. However, a non-zero phase lag in M1 and DLS LFP signals between 4-8Hz during NREM is incompatible with volume conduction. We have previously used this method to show that reach related LFP signals across M1 and DLS have a phase lag consistent with the conduction and synaptic delay between M1 and DLS, and inconsistent with volume conduction (Lemke, et al., 2019).

Third, we added a new supplemental figure that links changes in M1-DLS LFP coherence to a separate measure of M1-DLS functional connectivity that is independent of DLS LFP: the phase-locking of DLS units to M1 4-8Hz LFP signals. Importantly, this separate metric does not require a local generator of striatal LFP and is therefore not susceptible to volume conduction. We show that M1 electrodes with high LFP coherence with DLS electrodes also entrain DLS units to a greater degree than M1 electrodes with low coherence, providing evidence for M1-DLS LFP coherence as a measure of functional connectivity.

Lastly, while we do agree that the reviewer-proposed method would provide compelling evidence if we observe no change in LFP coherence with increasing distance from cortex, we believe such decreases may also occur due to changes in corticostriatal projections patterns across the dorsal-ventral axis of the striatum. Moreover, as outlined above, our goal is not to argue that DLS has a local field in principle, but that cortical inputs to striatum change with learning/sleep. We believe that examining both phase differences and a spike based complementary approach provide support for this.

– Monosynaptically connected pairs and cross-correlograms (CCGs). The authors claim to identify monosynaptically connected neuron pairs from CCGs. Previous works (see for instance Bartho et al. 2004) have shown extremely sharp peaks in CCG of putatively connected neurons in the cortex and hippocampus but something is clearly different in the CCGs shown here. Indeed, it is striking that the CCG peaks shown are very smoothed (it is more a wave than a peak). In fact, this wave crosses the center of CCG and seems significant in the positive time bins, suggesting that striatal neurons fire before cortical neurons, which makes little sense. It is surprising that the authors did not mention in their result section the potential alternative mechanisms explaining such "peaks" that spread until positive CCG values. Indeed, it is well known that there are alternative explanations for these observations, such as indirect polysynaptic partners as well as common ("third party") input or slower co-modulation (see Brody CD (1999) Correlations without synchrony, and several papers by Asohan Amarasingham and colleagues). Amarasingham et al. (2012), J. Neurophysiology, talks about interval jitter, why it's relevant to separating fast from slow comodulation and explains the history of these problems. In this regard, the shuffling method briefly mentioned by the authors in the method section is unclear and does not seem to address properly the issue of separating fast and slow comodulation (see Amarasingham 2012).

We thank the reviewers for their critical points regarding the cross-correlation analyses. We have made significant revision to our manuscript on these analyses, utilizing a more conservative “basic jitter” method for detecting significant short-latency spike timing relationships. We have made major revisions to the main and supplemental figures reflecting this change. Additionally, we have included a more explicit discussion about the shape of our CCGs in contrast to what has been reported in cortex.

– While M1 RM significance passed the arbitrary α of 0.05, it is a dramatically weaker effect than that seen in DLS. I would ask the authors please comment on this difference. Additionally, both Δ and SO strongly modulate M1, and **to a greater extent than any of the other effects mentioned in the text!** Please point this out to your readers and offer an interpretation. Do these results argue that in M1 – unlike DLS- the spindles not the main contributor? Or if nothing else, that the modulation is non-specific? (I think the former. this may actually help the authors in their attempts to dissociate m1 from dls sleep processes). The spindle story is great, but these results change the interpretation should not be buried.

We appreciate the reviewers raising this point. As we have significantly changed our method for detecting connected pairs of M1 and DLS units (see above point), we did not end up with enough coupled pairs to make robust claims about differences in NREM rhythm modulation between coupled pairs and non-coupled pairs. Therefore, in an effort to focus on relevant analyses for the story, we have removed these analyses. In addition, we agree about the importance of δ waves and slow oscillations in cortex and, although we cannot make strong claims about corticostriatal processing (besides that DLS units are modulated to these rhythms as shown in Figure 5), we have included further discussion of these rhythm in the main text and discussion.

3. Neural Trajectories

There were 3 major concerns raised (1)how the trajectories are composed, (2) the validity of the predicted trajectory and (3) the comparison of the trajectories:

1) The pc plots represent trial-averaged activity. Although no reach dynamics are given in this paper, a very similar study by the same authors shows that early on, the variability in duration of the outward, and outward+grasping, are very high compared to late in learning. Whatever 'reach signal' is present, this variability will cause the trial-averaged values to be greatly reduced. It is recommended that the authors address this by: (a) time-warping individual trials to overcome this and (b) doing this for only the outward trajectory, as their previous paper shows that so much of the change in duration due to learning is the result of improved grasping ( e.g the difference between PT and RO in Lemke 2019).

We thank the reviewers for the opportunity to clarify this analysis. While we present trial-averaged neural trajectories in Figure 3, the PCA is performed on concatenated single trial spiking data (not trial averages), and the predictions of DLS neural trajectories are also performed on a single-trial basis. We have clarified this in the text. In addition, we have included a comparison of DLS spiking modulation from early to late training days, as well as a comparison of variance explained by top PCs from early to late training days, to argue that changes in the ability to predict DLS activity from M1 activity is not solely attributable to local learning-related changes in DLS. Furthermore, we have also confirmed that we see similar results when restricting our results to the outward reaching action, although with increased variability as we are predicting ½ of the data in the original analysis:

Author response image 1

2) It is also unclear if the authors are trying to say in Figure 2 "Offline increases in functional connectivity predict the emergence of low-dimensional cross area neural dynamics during behavior" The authors should disambiguate cross area dynamics from within area dynamics. Previous work has shown that there is little to no goal-related movement activity early on in training. More importantly, the authors themselves show this in their 2019 paper that although there is some Day 1 m1 spiking modulation related to the reach, the psth for striatum is flat psth (previous comments on variability of timing still apply). They also show some reach-aligned power increase in 4-8 Hz in M1, but nothing in striatum. Given the small dynamics(?) that might be in the striatum, it is actually impressive that there is as much similarity between DLS and predict as there is. Can the authors show that this is a lack of a cross-area relationship, not just a lack of signal – which would preclude even the possibility of that relationship. Potentially they could scale with the power of the signal – amplitude or % explained variability of the first PC's, or just plot avg dms modulation in b. Otherwise, the cross area dynamics argument is not compelling. This same problematic facet comes into play later when determining reach modulation (which was done across all days).

We agree with the reviewers that this is an important point. Please see above reply regarding the two new analyses we have included to address the potential confound of a ‘reach signal’ lacking from DLS on early training days. This is consistent with our previously reported results (Lemke et al., 2019, Supplemental figure 7), where we see no large change in percentage of task-related units in M1 or DLS with learning

3) Finally, how do you quantify quality of prediction? "The predictive ability of these models was assessed by calculating the correlation between the actual neural trajectories and the predicted trajectories." This explanation lacks vital details about the comparison and the model itself. Moreover, these are time-varying 2-D vectors, where one offset or delay early on can propagate throughout the trajectory, even though the trajectories would otherwise be identical. Much work has been done on this in recent years, as this is a difficult question to tackle. Euclidean distances of a time-warped distribution or point distribution models might help. Regardless, more detail is needed.

We have expanded our explanation of these analyses in the methods to address these points.

4. Clarify data presentation:

– Figure 2c – The authors show an interesting temporal dynamic here, and there are hints that online DEcorrelation may contribute to learning (for which there is some evidence) as well as offline increases in coherence. This may be just a happenstance of this specific example, but 2c makes it clear that showing the trends across time, rather than in sum as in Figure 2d, would be helpful. Could the authors please add a figure like 2c, but across all 6 animals? It would be nice to see this possibility of online changes at least briefly discussed as well.

We thank the reviewers for bringing up this interesting point. We have included a supplementary figure that shows the mean change (across electrodes) for individual animals.

– Figure 4 sup2, etc – The cited conduction delay between m1 and DS is very specific: 5-7ms (and with a decrease beyond baseline in the 1-3ms range). The authors should redo their analysis using 1-10ms to 5-10ms, or provide a clearly compelling rationale for including 1ms latencies.

Moreover, it is unclear why the x scale is +/- 75ms? this seems quite large and belies the bigger question of why the distributions are so large, especially for data that should be normalized (subtracting out baseline correlations, and thus should be sharper). The largely symmetric and broad increase, extending before 0 lag, is worrisome. If it is a function sleep vs wake, can connectivity be determined only during wake?

To clarify our method, although we cite a specific delay, we use a wider window as suggested for analysis (1-15ms), we have clarified this in the text. Regarding the broad width of the cross correlations, please see above point regarding CCGs.

– AP5 effects:

The presentation and description of the results are confusing and incomplete. The experimental design could not be interpreted by reading the result section or looking at the figure 1 (panel f and g). In the method section, the authors wrote that " In the first five days of training, we infused three rats with AP5 and three rats with saline, for the second five days, we switched the infusion, i.e., animals that received AP5 in the first five days, received saline for the second five days, and vice-versa". First, this sentence should be included in the main text. Second, why did the authors show only one animal in figure 1f ? they should show the six rats.

We thank the reviewers for pointing out this confusion. We have included a supplementary figure (Figure 1 —figure supplement 4) with all individual animal curves.

– I could not make sense of the analytical logic in panel g. The authors correlated the trial by trial speed profiles during training with the average speed profile at the end of training, saline, or AP5 sessions. During "control" training (saline and learning cohorts), as the behavior changes, there should be lower correlation at the beginning of training and higher correlation at the end. In addition when the speed profile does not change (early AP5 or late learning cohort) there could be higher correlations. It is not clear to me how this measurement can be useful as it may vary a lot during learning and could also be different depending on the type of behavioral plateau.

We have revised Figure 1 panel g to show across animal differences in total correlation change.

– Also it is unclear whether the effect of AP5 on day-to-day correlation should be the same when injected since day1 or after 5 days of practice under saline condition. The data shown for one animal seems to show a clear decrease in speed profile correlation during AP5. What do the authors make of such effect ? It seems to indicate that AP5 not only block automatization but also impair performance. Is a similar effect observed when AP5 given after 5 days of saline ? The effect of AP5 on movement speed should be shown.

We have included a supplementary figure with all individual animal curves.

– Histology is absent. At a minimum, ascertaining how well layer 5 was targeted across animals should be included.

In this study we recorded physiology signals using large, combined arrays that targeted M1 and DLS, resulting in difficulty removing the arrays without damage to M1. This impeded our attempt to precisely localize the depth of electrode tips with histology. Our method to standardize depth of electrode insertion is to use precise stereotactic insertion of electrodes to 1.5mm from the brain surface, targeting layer 5 of motor cortex. In lieu of precise localization of electrode tips, we have performed gross histology and have confirmed that we are targeting M1 and DLS, we have included this in a supplemental figure.

– Reviewers were confused by sleep nomenclature:

Figure 2a – it was not obvious that sleep 3 = next day's sleep 1. can you include something to the effect of "sleep3 = day n+1's sleep 1" just to help make it perfectly clear to the readers?

Another commented that "pre-sleep" and "post-sleep" nomenclature very confusing. These aren't used to define time periods before and after a sleeping episode, but instead are periods of "sleep" (although there can be wake states mixed in) before and after training. "Pre-training sleep" and "post-training sleep" would be much more clear.

We appreciate the reviewer’s identification of this confusion and have edited this figure to clarify.

5. Reach modulation across days

– Reach Modulation appears to be determined across all days? This assumes that each unit is maintained across 8 days- evidence supporting this needs to be provided. More importantly though, the potential for this to change the DLS results is remarkable. As previously stated, the authors showed very little reach related activity on day 1. Why not determine these values for day 1-2 vs day 7-8? or better yet, as somewhat suggested by the authors, can they predict what neurons will become RM over time?

We appreciate the reviewer’s identification of this confusion and have edited this figure to clarify.

– It seems that the day-to-day correlation measurement allows the authors to have more data points per rat. However, these measurements for a given animal are not independent. Unfortunately, pseudoreplication is a major issue in neuroscience and behavioral studies more broadly (S.E Lazic's "The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?" BMC Neuroscience, 2010, highlights some things to worry about in this regard). The authors should avoid this in this figure and other neurophysiological analyses.

We thank the reviewer for raising this point. We have edited this panel to compare total change in correlation value across days per animal (one value per animal).

6. NREM sleep

– It would be interesting to know how many times the animals entered NREM sleep between the post-training sleep day N and pre-sleep of subsequent day N+1. Are these between day changes depended upon the total amount of sleep that the animals got?

We agree wholeheartedly with the reviewers that this is important and interesting information (and we hope to collect such information in the future!), but unfortunately, due to the challenge of such continuous recordings, we do not currently monitor animals both during post-training and overnight.

– pg6 line 192. In order to ascertain that NREM > others, it would be useful to see this relationship in unconnected pairs. e.g. control for the possibility that NREM is always highest.

We thank the reviewers for this point – we have completely redone our analysis of spike timing relationships between M1 and DLS neurons using a more conservative approach – this resulted in ~2.6% of pairs with a significant short-latency relationship, therefore it may be difficult to interpret this relationship in “non-connected” pairs. However, we have also included discussion in the text regarding the fact that there are clearly relationships in activity between M1 and DLS at longer time scales, however it is unclear how to interpret these timescales since they are likely resulting from slower common fluctuations.

– Since all of the data is from a specific subsection of these offline states (i.e. – NREM sleep) it would be good to discuss their findings within the context of what is known/unknown about plasticity processes during offline wake states or other sleep periods.

We share the reviewer’s interest in other offline states and agree that an interplay among activity patterns across different behavioral states is likely critical. We now include this point in the discussion.

7. Topics to clarify/discuss

– Page 7 Line 234 – "In contrast, DLS units did not increase in modulation during either δ waves (Supplemental Figure 5) or slow oscillations (Supplemental Figure 6) after training." This statement is only partially correct. I applaud the authors giving, and showing, the same treatment for all 3 types of data. Many would gloss over this. DLS nonRM weak input demonstrated a stronger effect than M1 RM, and SO demonstrated a trend. Some will find this juxtaposition – task related-spindle, unrelated-other to be informative, particularly as information itself can be more strongly represented through increasing the signal or increasing the discernability from not-signal. There is considerable work, including from the authors, showing the importance of other elements of sleep. Throughout the paper, SO and δ aren't negligible contributors. Rather than glaze over this for the purpose of a throughline, please expand upon this for a more complete view.

We appreciate the reviewer’s note and have expanded the text on the potential roles or interactions between sleep spindles, slow oscillations, and δ waves in the discussion.

– The discussion is somewhat lacking, being just a summary of the findings. In particular, I would like to see the authors put these results into context with the broader models of Tononi (SHY) and their own work, e.g. Kim 2019 wherein they focus on SO and δ in a very similar task – but which are largely thrown under the bus here.

We appreciate the reviewer’s note and have expanded the text on the potential roles or interactions between sleep spindles, slow oscillations, and δ waves in the discussion.

– It is unclear why "automaticity" is a beneficial outcome here for this task. Does it result in greater pellet retrieval success (no data presented on this point) or does it free up or allow reorienting of cognitive resources (this is indirectly alluded to in the Discussion)?

In line with our above response, it is difficult to interpret automaticity in this task and have therefore focused on the emergence of a stable skilled action. It is certainly possible that automaticity is beneficial in this task because it reduces cognitive load and therefore makes the action more “cognitively efficient”. However, we have not designed the experiment to capture this “cognitive” aspect of automaticity.

– The argument structure in the Intro has a few holes in it.

a) Specifically the paragraph covering lines 42-53. "Currently, our understanding of how sleep impacts distributed brain networks is largely derived from the systems consolidation theory, where it has been shown that coordinated activity patterns across hippocampus and cortex lead to the formation of stable long-term memories in cortex that do not require the hippocampus 23-25. Notably, whether sleep impacts the connectivity across hippocampus and cortex has not been established. Therefore, one possibility is that, in the network, we similarly observe coordinated cross-area activity patterns during sleep but do not find evidence for the modification of corticostriatal connectivity during offline periods. Alternatively, it is possible that we find evidence that cross-area activity patterns during sleep modify the connectivity between cortex and striatum and impact network activity during subsequent behavior."

b) The first two sentences describe the role of sleep in systems consolidation theory and "coordinated activity patterns" and "connectivity" across hippocampus and cortex. The third and fourth sentences then jump to two alternative speculations about "the network" (which is undefined), and the role of sleep in modifying corticostriatal connectivity patterns. It is not clear how these two things are linked. This link should be explicitly stated.

We thank the reviewer for pointing out our omission of the word “corticostriatal” before network and have revised the introduction to address these concerns.

– Offline is used in two different ways here. Offline behavioral states include any sleeping or waking state outside of training. Then there are online (between pre- and post-sleep on the same day) and offline (between post-sleep of day N and pre-sleep of subsequent day N+1) changes in neural activity between these offline behavioral states.

We note this potential area of confusion and have clarified our use of offline terminology.

– The authors need to better define what they mean by slow oscillations (Figure 7, Figure 4e). Cortical slow oscillations are classically described as regular up and down states transition, with sleep spindles occurring during the up states (Buzsaki, rhythms of the brain p 197). When the authors look at the spiking modulation by slow oscillations and spindles (Figure 4e) or the temporal proximity between slow oscillations and spindles (Figure 7), I doubt the authors are referring to this classical up/down slow oscillations. Are the slow oscillations the authors refer to equivalent to K complexes? If yes, this could be mentioned.

We agree that the use of slow oscillations can differ across the sleep literature and have included a supplemental figure that outlines our specific method to detect slow oscillations vs. δ waves vs. sleep spindles.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

The manuscript has been improved but there are some remaining issues that need to be addressed, as outlined below:

1. A graph showing the effects of AP5 on speed, for all 6 mice should be shown in the main figure in a single plot with a fixed scale. All of Reviewer #3's requests for this figure should be included, including statistics.

This is now included as Figure 1e. Please see discussion below regarding additional aspects of Reviewer #3’s comments.

2. This figure is requested to address potential confounds of effects on vigor that are currently interpreted as effects on motor learning. Depending on how conclusive the effects of AP5 on speed are, the authors might need to revise their language throughout the paper, if an effect of movement vigor cannot be rules out.

We have included this as Figure 1 – Supplement 5. We did not find a significant effect on single trial peak reaching velocity. This suggests that these two aspects of behavior may be distinctly regulated. Please also see discussion below.

3. A paragraph should be added to the Discussion explaining the potential confounds of effect on movement vigor on the study's results.

Please see the paragraph starting on line 375.

Although these points came from Reviewer 3, in consultation the other Reviewers agreed these points above are important to address.

Reviewer #3:

The authors have made a significant effort to reply to most of my comments and thoses of the other reviewers. However one of the major points that I had asked has not been addressed and another related one is not addressed properly. Those points were critical for the authors claim that "our results provide evidence that sleep shapes cross-area coupling required for skill learning" (last sentence abstract, maybe a "is" is missing?).

Specifically I asked the authors to report the effect of AP5 injection on movement speed. They responded that they provide the effect of AP5 on the speed profile correlation but clearly this is not what I asked. I don't think the authors misunderstood me because in reply to another comment about movement speed (my original comment :"-Increase in speed consistency does not necessarily imply an increase in speed. The striking increase in reaching speed with learning (Figure 1D) is just shown for one animal. The authors should show it for their 6 animals. If the 6 animals show an increase in movement speed this is a point that should be discussed throughout the manuscript, especially in light of the many works linking dorsal striatum and movement speed"), the authors did provide in their supplementary figure 3, plots showing that movement speed increases in the 6 animals during training. Thus it is unclear why the authors did not show the effect of AP5 on movement speed.

My request is really fundamental in regard of the main claim of the author (requirement for skill learning) because it is possible that AP5 decreases not just the trial-by-trial speed profile correlation (what the authors use as a definition for skill) but also the general speed of movements which is not necessarily related to skill learning but could have a motivational origin (see work of the Galea lab). If this was the case, the authors would need to seriously consider that the changes in corticostriatal connectivity could primarily reflect altered vigor or motor motivation which would be in agreement with several works in the field (Rob Turner lab, Dudman lab, Robbe lab, or even ideas on motor motivation by Josh Berke, one of the authors of this study).

This point is even more important as the authors, in the introduction or discussion, tend to write definitive sentences on a well-demonstrated role of cortico-striatal connection in motor skills while most of the references cited do not disambiguate the vigor or motivation confound (e.g., Kupferschmidt et al., Dang et al., Costa et al., Yin et al. …). This is quite misleading.

Moreover, when looking at the result of the AP5 experiment on the speed profile correlation across the 6 animals (supp Figure 4), the results are far from convincing. There are only 3 animals in each condition (3 in AP5-Saline and 3 in Saline AP5) and in each of the conditions, the result is not clear in at least one animal. There are also weird day by day drastic changes that make these results not so reliable. Moreover, the authors keep changing the y axis scale in a way that makes small improvements look big. Ethically speaking this is not appropriate, especially in such a journal as eLife. Neither is appropriate to put in the main figures 1d and 1e the best example out of 6 animals. The authors should integrate into these panels the results of the other animals. In addition to being fairer to the data, this would remove two supplementary figures.

Thus, the main figure 1 should show 1) the effect of AP5 on speed profile correlation on the 6 animals in the same graph with the same scale (Figure 1e); 2) the effect of AP5 on max speed on the 6 animals (new small panel) and the fig1d for all the animals. In addition, the statistics to examine the effect of AP5 should be done using paired comparison of the correlation and speed values not on their change. The usage of the changes in panel 1f is unfair because it is not clear what should be the effect of AP5 after saline (decrease or plateau?) I am afraid that the authors designed their statistical test in a way that favored their hypothesis.

In conclusion, while I do think this paper should be published in eLife, I am convinced that a fairer presentation and analyses of key experiments in figure 1 are required. Clearly, the effect of AP5 on skill learning is not as clear-cut as stated by the authors (and choosing to only show in the main figure the single best animal is not appropriate). In addition, an effect on movement speed is still highly likely (based on one animal) and should be reported for the 6 animals. Thus the main behavioral function of the interesting neurophysiological changes could be related to vigor/motor motivation, not skill learning. All this critical information should not be hidden but rather openly disclosed and discussed such as the reader can make up its mind.

Challenges in measuring movement vigor in the reach-to-grasp task:

We wish to emphasize the challenges in determining movement vigor in the current work, specifically with the reach-to-grasp (R2G) task. As we are sure the reviewers are aware, published work on movement vigor often utilizes highly constrained tasks in which mice are head-fixed, which reduces variability in body position, and are gripping a joystick that only can move in a single dimension (these joysticks also typically have a “resetting” force so the only movement the mouse needs to make is outward). Such a preparation is ideal for carefully measuring specifically the vigor of a movement which is constrained to be very similar (in terms of muscle activation patterns) across different animals.

Our task is quite the opposite! As made clear by the seminal work of Ian Wishaw and colleagues, the R2G is a sequence learning task; the animals appear to learn to correctly order and time “sub-movements” (e.g., paw plant, reach, grasp, retract) in order to solve the task. For rodents, this movement is typically so variable in early stages that they rarely successfully grasp the pellet in the early stages. Thus, it is not obvious that a simple gain change is sufficient to drive learning.

Consistent with this, there are numerous reports of cortical plasticity associated with R2G learning (e.g. Kleim et al, J Neurophys 1998; Xu et al., Nature 2009). By design, this task is unconstrained, so as to allow the animals to explore the many degrees of freedom associated with skill learning and each animal is free to develop a unique strategy. This is the reason why we use a higher-dimensional readout of behavior, the correlation between velocity reaching profiles, rather than reduced representations such as maximum velocity. In fact, the nature of our task is not well-equipped to measure and make strong claims about movement vigor because, while we do report maximum velocity of the outward reaching movement, this value not only depends on the “vigor” of movement, but also the starting body position of the animal which is not constrained (for example an animal reaching straight-on toward the pellet vs. an animal reaching from an angle of 15 or 30 degrees would require a different set of muscle activations), as well as the distance of the reach.

Moreover, while maximum velocity is one aspect of the reaching action, it is likely one of many features that change with learning. For example, perhaps the most important aspect of “movement vigor” is the grasp of the pellet – a force we do not measure in the current work. All that to say is that we do completely agree that a change in movement vigor may play an important part in learning. However, as noted above, a simple “gain” change in movement vigor would not likely explain the changes we see in the consistency of the reaching velocity profile.

We do, of course, seek to give a complete and fair assessment of changes in movement velocity and how they might relate to skill learning: during the first revision we included maximum velocity for all animals on all days of learning, we have now also added the correlation between maximum velocity and consistency of reaching velocity profile in the learning cohort (Figure 1 – Figure 1 Supplement 3) as well as a figure displaying the changes in maximum velocity for AP5/saline animals. From the correlation between maximum velocity and consistency, we see that the consistency and maximum velocity of reaching movements are linked. As mentioned above, we don’t interpret this as evidence that maximum velocity fully explains the change in stability (R2 value = 0.13), but rather that it is one of the features that changes as a part of learning. Furthermore, we do not find a significant change in maximum velocity comparing AP5 and saline (Figure 1 – Figure 1 Supplement 5), suggesting these two aspects of behavior may be distinctly regulated. As requested, we have also expanded on the control of movement vigor by the basal ganglia in the discussion.

Inter-animal variability in the reach-to-grasp task:

We seek to emphasize another critical aspect of our task in response to the reviewers concern about differences in the magnitude of reach-profile correlation values across animals. For relatively “constrained” motor tasks, e.g., joystick press, inter-animal variability is often much lower in compared to relatively “unconstrained” tasks, e.g., reachto-grasp task. Such inter-animal variability has been studied in the reach-to-grasp task; in the well-regarded work by Nitz & Kargo, J. Neuroscience, 2003 - they examined the different “strategies” that animals utilized to learn the reach-to-grasp task (from their paper: “Skill improvement was associated with both motor pattern selection and pattern tuning. One group of animals (3 of 11) appeared to switch between motor patterns underlying the reach portion of the task…. In contrast to the first group of animals, a second group (8 of 11) appeared mainly to tune a single starting motor pattern over time”). Consistent with their finding that different animals tend to undergo different “paths” to learning, we believe that an increase in correlation value from, for example, .4 to .6 vs. .8 to .9 may both reflect relevant aspects of motor learning, while the correlation magnitude value itself is not easily interpretable. However, as to not “hide” this variability, we have included all animal data in plots with fixed scales in Figure 1. We have no desire to hide data and appreciate this opportunity to be more transparent.

The reviewer’s also suggested that we change our comparison to do a “paired comparison of the correlation and speed values not on their change”. As discussed above however, we believe the change in correlation is much more interpretable, than the absolute value of the correlation values which vary between animals. Another issue with the proposed comparison is that, as the reviewers points out, in half of the animals AP5 infusions occur after five days of learning with saline infusions (and vice versa), thus the animals have already undergone learning and the correlation values are higher. When we planned our experimental method, we organized our experiment in this way to reduce the number of animals needed, with the plan to use a statistical approach that compared changes in correlation values rather than the raw values themselves. This was informed by the clear evidence of changes in correlation during our initial studies in normal learning. We wish to also note that our study design was done in a randomized blinded manner in order to add further rigor. The experiment is, of course, balanced such that the same number of animals receive saline infusions after AP5 infusions, as AP5 infusions after saline infusions. Thus if AP5 had no effect on learning, we would not expect to see a difference in correlation change, as we do.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Reviewer #3:

I asked the authors to show the data for the 6 animals in which they compared AP5 and Saline striatal injection on the speed correlation profile. The authors have managed to plot the data in a way in which it is impossible to know which rat is which (all the points are gray, no running lines between the points). They plotted the mean of these data which is meaningless from a statistical viewpoint (n=6).

– We replotted the data to connect the points of individual animals (Figure 1d and e).

– All individual animal plots (of behavioral and neural data) are also presented in the supplemental figures (Figure 1 Figure Supplement 2,3,4,5; Figure 3 Figure Supplement 1).

I also asked the authors to show on the same main figure the effect of AP5 on the max speed (to test the effect on vigor). The authors have put those results in a supplementary figure. The authors claim that AP5 has only an effect on the speed correlation profile not on max speed. But this is clearly due to their biased statistic in which they only look at the difference between the first and last sessions. Indeed in figure 1e AP5 only reduces the speed correlation on day 5 but there is no effect from day 1 to day4. Juxtaposing the effect of AP5/Saline on max speed (now Figure 1 S5) and speed correlation ( figure 1e bottom) clearly shows that the effects are strikingly similar. In fact, an unbiased treatment of the data (using the entire profile and permutation/bootstrapping) would probably show that the effect of AP5 on max speed is more pronounced than on speed correlation.

We expanded our treatment of the data to show this is not the case.

– We added a test of significance for these metrics using permutations/bootstrapping on day-to-day changes, in addition to the Whitney rank-sum test on total change in each animal (Figure 1; Figure 1 Figure Supplement 5).

– We expanded the data presentation to include plots of single trial peak velocity across days in individual animals and added histograms of day-to-day changes with AP5 vs. saline infusions for both single trial peak velocity and velocity profile correlation, which include mean day-to-day changes in each animal (Figure 1; Figure 1 Figure Supplement 4 and 5).

The authors seem to conclude that the AP5 effect is different on speed profile correlation and max speed because their statistical comparison with saline is significant in one case and insignificant in the other case.

However, comparing p values is meaningless statistically. This is an important issue that has been subject to publication in highly visible journals (https://www.nature.com/articles/nn.2886).

The authors should have compared the effect themselves which again should be done using permutation/bootstrapping on the entire profile (not last/first).

We do not seek to claim that AP5 is having an effect on a specific aspect of the reach-to grasp movement, but rather that post-training AP5 infusions impact skill learning, which we measure by looking at the emerging consistency of the velocity profile.

– We have revised the text to ensure we are not implying that AP5 is having a different or specific effect on our learning measure vs. single trial peak velocity. We certainly think reach velocity is a relevant feature of the learned action; however, it cannot capture all the changes involved in learning the reach-to-grasp skill (which involves an outward reach of the paw, a dexterous interaction with the pellet/grasping, and a retraction of the paw, all combined into a smooth and consistent skilled movement).

Thus I maintain that impartial analyses of the data cannot disentangle whether the DLS is primarily contributing to learning the accurate movement or contribute to the increased vigor which is driven by a motivational aspect. I am glad the authors acknowledge it in their discussion. However, most readers will go quickly through the title and abstract (and maybe introduction) and will probably cite this paper as additional evidence for a critical role of the striatum in motor skill learning which in my opinion is misleading.

We have added further evidence that physiological changes across the corticostriatal network covary with the emergence of a consistent velocity profile, beyond what is attributable to simply changes in peak velocity.

– We have included the computation of the partial correlation coefficient between mean corticostriatal LFP coherence during the pre-training period and the subsequent training period’s velocity profile correlation value, while controlling for the subsequent training period’s mean single-trial peak velocity (R value = 0.62, P value = 1*10-4; compared to R value = 0.73, P value = 5*10-10 when not controlling for single trial peak velocity). If, in contrast, we compute the relationship between corticostriatal functional connectivity and single-trial peak velocity, taking into account velocity profile correlation, we do not see a significant relationship (R value = 0.16, P value = 0.25).

– We also reemphasize that the goal of this work is not to disentangle the contributions of the striatum to movement vigor and movement consistency, rather to provide evidence that offline periods are relevant to skill learning because they help shape the corticostriatal network, which is known to be an important network related to learning. The reach-to grasp skill is a complex action made up of several movements that requires several effectors (to reach vs. grasp) that is not well-suited to isolating and studying the neural correlates of a specific movement feature such as reach vigor.

https://doi.org/10.7554/eLife.64303.sa2

Article and author information

Author details

  1. Stefan M Lemke

    1. Neuroscience Graduate Program, University of California, San Francisco, San Francisco, United States
    2. Neurology Service, San Francisco Veterans Affairs Medical Center, San Francisco, United States
    3. Department of Neurology, University of California, San Francisco, San Francisco, United States
    Present address
    Istituto Italiano di Tecnologia, Rovereto, Italy
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Writing - original draft, Project administration, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-1721-5425
  2. Dhakshin S Ramanathan

    Department of Psychiatry, University of California, San Diego, San Diego, United States
    Contribution
    Conceptualization, Data curation, Formal analysis, Writing - review and editing
    Competing interests
    No competing interests declared
  3. David Darevksy

    1. Neurology Service, San Francisco Veterans Affairs Medical Center, San Francisco, United States
    2. Department of Neurology, University of California, San Francisco, San Francisco, United States
    Contribution
    Conceptualization, Data curation, Formal analysis
    Competing interests
    No competing interests declared
  4. Daniel Egert

    Department of Neurology, University of California, San Francisco, San Francisco, United States
    Contribution
    Data curation, Validation, Methodology
    Competing interests
    No competing interests declared
  5. Joshua D Berke

    1. Department of Neurology, University of California, San Francisco, San Francisco, United States
    2. Weill Institute for Neurosciences and Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, San Francisco, United States
    Contribution
    Methodology
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1436-6823
  6. Karunesh Ganguly

    1. Neurology Service, San Francisco Veterans Affairs Medical Center, San Francisco, United States
    2. Department of Neurology, University of California, San Francisco, San Francisco, United States
    3. Weill Institute for Neurosciences and Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, San Francisco, United States
    Contribution
    Conceptualization, Supervision, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    karunesh.ganguly@ucsf.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-2570-9943

Funding

Veterans Health Administration HSR and D (I01RX001640-06)

  • Karunesh Ganguly

National Institute of Mental Health (R01MH111871-04)

  • Karunesh Ganguly

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Ethics

Animal experimentation: This study was performed in strict accordance with guidelines from the USDA Animal Welfare Act and United States Public Health Science Policy. Procedures were in accordance with protocols approved by the Institutional Animal Care and Use Committee at the San Francisco Veterans Affairs Medical Center (Protocol 19-002).

Senior Editor

  1. Michael J Frank, Brown University, United States

Reviewing Editor

  1. Aryn H Gittis, Carnegie Mellon University, United States

Reviewers

  1. Eric Yttri, Janelia Farm, United States
  2. David Robbe, INSERM U1249; Aix-Marseille University, France

Publication history

  1. Received: October 24, 2020
  2. Accepted: August 9, 2021
  3. Accepted Manuscript published: September 10, 2021 (version 1)
  4. Version of Record published: September 14, 2021 (version 2)

Copyright

© 2021, Lemke et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,105
    Page views
  • 193
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

Further reading

    1. Neuroscience
    Debora Fusca, Peter Kloppenburg
    Research Article

    Local interneurons (LNs) mediate complex interactions within the antennal lobe, the primary olfactory system of insects, and the functional analog of the vertebrate olfactory bulb. In the cockroach Periplaneta americana, as in other insects, several types of LNs with distinctive physiological and morphological properties can be defined. Here, we combined whole-cell patch-clamp recordings and Ca2+ imaging of individual LNs to analyze the role of spiking and nonspiking LNs in inter- and intraglomerular signaling during olfactory information processing. Spiking GABAergic LNs reacted to odorant stimulation with a uniform rise in [Ca2+]i in the ramifications of all innervated glomeruli. In contrast, in nonspiking LNs, glomerular Ca2+ signals were odorant specific and varied between glomeruli, resulting in distinct, glomerulus-specific tuning curves. The cell type-specific differences in Ca2+ dynamics support the idea that spiking LNs play a primary role in interglomerular signaling, while they assign nonspiking LNs an essential role in intraglomerular signaling.

    1. Neuroscience
    Wanhui Sheng et al.
    Research Article Updated

    Hypothalamic oxytocinergic magnocellular neurons have a fascinating ability to release peptide from both their axon terminals and from their dendrites. Existing data indicates that the relationship between somatic activity and dendritic release is not constant, but the mechanisms through which this relationship can be modulated are not completely understood. Here, we use a combination of electrical and optical recording techniques to quantify activity-induced calcium influx in proximal vs. distal dendrites of oxytocinergic magnocellular neurons located in the paraventricular nucleus of the hypothalamus (OT-MCNs). Results reveal that the dendrites of OT-MCNs are weak conductors of somatic voltage changes; however, activity-induced dendritic calcium influx can be robustly regulated by both osmosensitive and non-osmosensitive ion channels located along the dendritic membrane. Overall, this study reveals that dendritic conductivity is a dynamic and endogenously regulated feature of OT-MCNs that is likely to have substantial functional impact on central oxytocin release.