Figures and data

Experimental design and behavioral performance.
A. Skill learning task. Participants engaged in a procedural motor skill learning task, which required them to repeatedly type a keypress sequence, “4 − 1 − 3 − 2 − 4” (1 = little finger, 2 = ring finger, 3 = middle finger, and 4 = index finger) with their non-dominant, left hand. The Day 1 Training session included 36 trials, with each trial consisting of alternating 10s practice and rest intervals. The rationale for this task design was to minimize reactive inhibition effects during the period of steep performance improvements (early learning) [11, 29] (see Methods). After a 24-hour break, participants were retested on performance of the same sequence (4-1-3-2-4) for 9 trials (Day 2 Retest) to inform on the generalizability of the findings over time and MEG recording sessions, as well as single-trial performance on 9 different control sequences (Day 2 Control; 2-1-3-4-2, 4-2-4-3-1, 3-4-2-3-1, 1-4-3-4-2, 3-2-4-3-1, 1-4-2-3-1, 3-2-4-2-1, 3-2-1-4-2, and 4-2-3-1-4) to inform on specificity of the findings to the learned skill. MEG was recorded during both Day 1 and Day 2 sessions with a 275-channel CTF magnetoencephalography (MEG) system (CTF Systems, Inc., Canada). B. Skill Learning. As reported previously1, participants on average reached 95% of peak performance by trial 11 of the Day 1 Training session (see Figure 1 – figure supplement 1A for results over all Day 1 Training and Day 2 Retest trials). At the group level, total early learning was exclusively accounted for by micro-offline gains during inter-practice rest intervals (Figure 1B, inset; F [2,75] = 14.79, p = 3.86x10-6; micro-online vs. micro-offline: p = 7.98x10-6; micro-online vs. total: p = 0.0002; micro-offline vs. total: p = 0.669). These results were not impacted by potential preplanning effects on initial skill performance [30] since alternative measurements of cumulative micro-online and - offline gains remain unchanged after omission of the first 3 keypresses in each trial from the correct sequence speed computation (paired t-tests; micro-online: t25 = -0.0223, p = 0.982; micro-offline: t25 = -0.879, p = 0.388). C. Keypress transition time (KTT) variability. Distribution of KTTs normalized to the median correct sequence time for each participant and centered on the mid-point for each full sequence iteration during early learning (see Figure 1 – figure supplement 1B for results over all Day 1 Training and Day 2 Retest trials). Note the initial variability of the relative KTT composition of the sequence (i.e., – 4-1, 1-3, 3-2, 2-4, 4-4), before it stabilizes in the early learning period.

Spatial and oscillatory contributions to neural decoding of finger identities
A) Contribution of whole-brain oscillatory frequencies to decoding. When trained on broadband activity relative to narrow frequency band features, decoding accuracy (i.e. - test sample performance) was highest for whole-brain voxel-space (74.51% ± SD 7.34%, t = 8.08, p < 0.001) and parcel-space (70.11% ± SD 7.11%, t = 13.22, p < 0.001) MEG activity. Thus, decoders trained on whole-brain broadband data consistently outperformed those trained on narrowband activity. Dots depict decoding accuracy for each participant. *p < 0.05, **p< 0.01, ***p< 0.001, ns.: not significant. B) Whole-brain parcel-space decoding. Color-coded standard (FreeSurfer fsaverage) brain surface plot displaying the relative importance of individual brain regions (parcels) to broadband whole-brain parcel-space decoding performance (far-left light gray box plot in A). C) Whole-brain voxel space decoding. Color-coded standard (FreeSurfer fsaverage) brain surface plot displaying the relative importance of individual voxels to broadband whole-brain voxel-space decoding performance (far-left dark gray box plot in A). D) Regional voxel-space decoding. Broadband voxel-space decoding performance for top-ranked brain regions across the group is displayed on a standard (FreeSurfer fsaverage) brain surface and color-coded by accuracy. Note that while whole-brain parcel- and voxel-space decoders relied more on information from brain regions contralateral to the engaged hand, regional voxel-space decoders performed similarly for bilateral sensorimotor regions.

Hybrid spatial approach for neural decoding during skill learning A. Pipeline.
Sensor-space MEG data (𝑁 = 272 channels) were source-localized (voxel-space features; 𝑁 = 15684 voxels), and then parcellated (parcel-space features; 𝑁 = 148) by averaging the activity of all voxels located within an individual region defined in a standard template space (Desikan-Killiany Atlas). Individual regional voxel-space decoders were then constructed and ranked. The final hybrid-space keypress state (i.e. – 4-class) decoder was constructed using all whole-brain parcel-space features and top-ranked regional voxel-space features (see Methods). B. Decoding performance across parcel, voxel, and hybrid spaces. Note that decoding performance was highest for the hybrid space approach compared to performance obtained for whole-brain voxel- and parcel spaces. Addition of linear discriminant analysis (LDA)-based dimensionality reduction further improved decoding performance for both parcel- and hybrid-space approaches. Each dot represents accuracy for a single participant and method. “∗∗∗” indicates 𝑝 < 0.001 and “∗” indicates 𝑝 < 0.05. C. Confusion matrix of individual finger identity decoding for hybrid-space manifold features. True predictions are located on the main diagonal. Off-diagonal elements in each row depict false-negative predictions for each finger, while off-diagonal elements in each column indicate false-positive predictions. Please note that the index finger keypress had the highest false-negative misclassification rate (11.55%).

Evolution of keypress neural representations with skill learning
A. Keypress neural representations differentiate during early learning. t-SNE distribution of neural representation of each keypress (top scatter plots) is shown for trial 1 (start of training; top-left), 11 (end of early learning; top-center), and 36 (end of training; top-right) for a single representative participant. Individual keypress manifold representation clustering in trial 11 (top-center; end of early learning) depicts sub-clustering for the index finger keypress performed at the two different ordinal positions in the sequence (IndexOP1 and IndexOP5), which remains present by trial 36 (top-right). Spatial distribution of regional contributions to decoding (bottom brain surface maps). The surface color heatmap indicates feature importance scores across the brain. Note that decoding contributions shifted from contralateral right pre-central cortex at trial 1 (bottom-left) to contralateral superior and middle frontal cortex at trials 11 (bottom-center) and 36 (bottom-right). B. Confusion matrix for 5-class decoding of individual sequence items. Decoders were trained to classify contextual representations of the keypresses (i.e., 5-class classification of the sequence elements 4-1-2-3-4). Note that the decoding accuracy increased to 94.15% ± SD 4.84% and the misclassification of keypress 4 was significantly reduced (from 141 to 82). C. Trial-by-trial classification accuracy for 2-class decoder (IndexOP1 vs. IndexOP5). A decoder (200 ms window duration aligned to the KeyDown event) was trained to differentiate between the two index finger keypresses embedded at different positions within the practiced skill sequence (IndexOP1 = index finger keypress at ordinal position 1 of the sequence; IndexOP5 = index finger keypress at ordinal position 5 of the sequence). Decoder accuracy progressively improved over early learning, stabilizing around 96% by trial 11 (end of early learning). Similar results were observed for other decoding window sizes (50, 100, 150, 250 and 300 ms; see Figure 4 – figure supplement 2). Taken together, these findings indicate that the neural feature space evolves over early learning to incorporate sequence location information.

Neural representation distance between index finger keypresses performed at two different ordinal positions within a sequence.
A. Contextualization increases over Early Learning during Day 1 Training. Online (green) and offline (purple) neural representation distances (contextualization) between two index finger key presses performed at ordinal positions 1 and 5 of the trained sequence (4-1-3-2-4) are shown for each trial during Day 1 Training. Both online and offline contextualization between the two index finger representations increase sharply over Early Learning before stabilizing across later Day 1 Training trials. B. Contextualization develops predominantly during rest periods (offline) on Day 1. The cumulative neural representation differences during early learning were significantly greater over rest (Offline contextualization; right) than during practice (Online contextualization; left) periods (t = 4.84, p < 0.001, df = 25, Cohen’s d= 1.2). C. Contextualization acquired on Day 1 was retained on Day 2 specifically for the trained sequence. The neural representation differences assessed across both rest and practice for the trained sequence (4-1-3-2-4) were retained at Day 2 Retest. This is in stark contrast with the reduction in contextualization for several untrained sequences controlling for: 1) index finger keypresses located at the same ordinal positions 1 and 5 but within a different intervening sequence pattern (Pattern Specificity Control: 4-2-3-1-4, 51.05% lower contextualization); 2) use of a finger different than the index (little or ring finger) in both ordinal positions 1 and 5 (Finger Specificity Control: 2-1-3-4-2, 1-4-2-3-1 and 2-3-1-4-2; 35.80% lower contextualization); and 3) multiple index finger keypresses occurring at ordinal positions other than 1 and 5 (Position Specificity Control: 4-2-4-3-1 and 1-4-3-4-2; 22.06% lower contextualization). Note that offline contextualization cannot be measured for the Day 2 Control sequences as each sequence was only performed over a single trial.

Behavioral performance during skill learning.
A) Total Skill Learning over Day 1 Training (36 trials) and Day 2 Retest (9 trials). As reported previously [1], participants on average reached 95% of peak performance during Day 1 Training by trial 11. Note that after trial 11, performance stabilizes around a plateau through trial 36. Following a 24-hour break, participants displayed an upward shift in performance during the Day 2 Retest – indicative of an overnight skill consolidation effect. B) Keypress transition time (KTT) variability. Distribution of KTTs normalized to the median correct sequence time for each participant and centered on the mid-point for each full sequence iteration during early learning. Note that the initial variability of the five component transitions in the sequence (i.e. – 4-1, 1-3, 3-2, 2-4, 4-4) stabilize by trial 6 in the early learning period and remain stable throughout the rest of Day 1 Training (through trial 36) and Day 2 Retest.

Oscillatory contributions at individual brain regions.
Decoding performance of regional voxel-space activity patterns within individual brain areas for broadband and each narrowband oscillatory range is displayed in the form of a heatmap for both the left and right hemisphere. Optimal decoding performance for broadband regional voxel-space decoders were obtained from bilateral superior frontal (Left: 68.77% ± SD 7.6%; Right: 67.52% % ± SD 6.78%), middle frontal (Left: 63.41% ± SD 7.58%; Right: 62.78% % ± SD 76.94%), pre-central (Left: 62.37% % ± SD 6.32%; Right: 62.69% ± SD 5.94%), and post-central (Left: 61.71% ± SD 6.62%; Right: 61.09% ± SD 6.2%) brain regions. Superior parietal, central, paracentral, anterior-cingulate, and precuneus regions also showed broadband decoding performance exceeding 60%. With respect to decoders constructed from narrowband oscillatory input features, only Delta-band voxel-space activity from bilateral superior frontal regions achieved at least 60% decoding accuracy of keypresses.

Distribution of correlation coefficients between parcel-space time-series and their constituent voxels.
Data is shown for all subjects. Parcels represented in the regional voxel-space features of the hybrid-space decoder are marked with red vertical boxes (bilateral superior frontal, middle frontal, pre-central and post-central regions). The y-axis indicates the absolute correlation coefficients for each voxel time-series with the time-series of the parcel it is a member of (1 = complete redundancy; 0 = orthogonality). Note that while signal in some voxels correlate strongly with parcel-space time-series, others are fully orthogonal. That is, the degree to which information obtained at the two different spatial scales is complimentary (or redundant) varies substantially over the regional voxel-space. This finding is consistent with the documented increase in correlational structure of neural activity across larger spatial scales that does not reflect perfect dependency or orthogonality [28]. The normalized cumulative distributions of parcel-to-voxel-space correlations depicted on the right show that voxels included in the hybrid-space decoder (red) are correlated less overall (two-sample Kolmogorov-Smirnov test: D = 0.2484, p < 1x10- 10) with their respective parcel-space time-series relative to excluded voxels (grey).

Contribution of whole-brain oscillatory frequencies to decoding.
Accuracy for decoders trained on four different input feature spaces—sensor, whole-brain parcel, whole-brain voxel and hybrid (combination of whole-brain parcel plus regional voxel)—was highest for broadband MEG activity, followed by Delta-band activity. The hybrid approach resulted in the highest decoding accuracy, regardless of whether input features were broadband or narrowband-limited. Sensor-, parcel- and voxel-space decoders displayed similar accuracy with respect to one another for broadband MEG activity, and also for all narrowband ranges assessed. Dots depict decoding accuracy for each participant. “***” indicates 𝑝 < 0.001, while “n.s.” denotes no statistical significance (i.e. - 𝑝 > 0.05).

Comparison of different dimensionality reduction techniques.
Dimensionality reduction was applied to the input features for each approach (parcel-space: N=148; voxel-space: N=15684; hybrid-space: N=1295)[34]. The results with principal component analysis (PCA, in green), multi-dimensional scaling (MDS, in blue), minimum redundant maximum relevance algorithm (MRMR, in red), linear discriminant analysis (LDA, in black) are shown in comparison to performance obtained using all input features (in magenta). For parcel-space input features, all these approaches increased the mean decoding accuracy with PCA and LDA (both of which result in extraction of orthogonal features) showing statistically significant improvement (1-way ANOVA: F= 13.05, p < 0.001; post hoc Tukey tests: p =0.032; PCA: p < 0.001; LDA: p > 0.05). For voxel-space features, there was no statistically significant improvement with any of the approaches (p > 0.05). While MRMR resulted in the largest voxel-space decoding accuracy improvement it was not statistically significant (post hoc Tukey test: p = 0.14), and application of LDA dimensionality reduction actually reduced performance dramatically. Uniquely for hybrid-space features—all dimensionality reduction techniques improved decoding performance significantly (1-way ANOVA: F= 21.32; post hoc Tukey tests: p < 0.05) with the best largest improvement observed following application of LDA. “***” indicates 𝑝 < 0.001, “**” indicates 𝑝 < 0.01, “*” indicates 𝑝 < 0.05 and “n.s.” denotes no statistical significance (i.e. - 𝑝 > 0.05).

A) Example of ICA component time-series for components labeled as artefacts from a single subject during MEG data pre-processing. The features of these components are consistent with known motion and physiological artefacts in MEG data. B) 4-class confusion matrix and C) decoding performance of keypress action labels from ICA components labeled as artefacts and removed from the MEG data during pre-processing. These components failed to predict keypress labels above empirically determined chance levels (as shown by decoding performance after random label shuffling). Note that in all cases, decoding performance from movement and physiological artefacts was substantially lower than 4-class MEG hybrid-space decoding for all participants. D) Head position was assessed at the beginning and at the end of each recording and used to measure head movement. The mean measured head movement across the study group was 1.159 mm (± 1.077 SD).

Confusion matrices for decoding performance on Day 2 Retest (A) and Day 2 Control (B) data.
Note that the hybrid-space decoding strategy generalized to Day 2 data with 87.11% overall accuracy for keypresses embedded within the trained sequence (Day 2 Retest) and 79.44% overall accuracy for keypresses embedded within untrained control sequences (Day 2 Control).

A) Average decoding accuracies across participants with varying window parameters. The x-axis indicates the onset of the time window (in ms) used to relate MEG activity time-series to individual keypresses (i.e. – KeyDown event = 0 ms), while the y-axis indicates the window duration (in ms). The heatmap color denotes the decoding accuracy for all window onset/duration pairings. The best decoding accuracy across subjects was obtained using a window duration of 200 ms with the leading edge aligned to the KeyDown event (i.e. – 0 ms; marked by the dashed lines and open circle). B) Decoder window parameters (onset and duration) used for each subject in reported decoder accuracy comparisons (Figures 2-4). Please note that the group-optimal set of parameters (window onset = 0 ms; window duration = 200 ms; LDA dimensionality reduction) was utilized for all contextualization analyses (Figure 5) to allow for comparison across participants.

Comparison of decoding performances with two different hybrid approaches.
HybridOverlap (regional voxel-space features from top-ranked parcels combined with all whole-brain parcel-space features as shown in Figures 3B and Figure 3 – figure supplements 1, 3-5 of the manuscript) and HybridNon-overlap (regional voxel-space features of top-ranked parcels and spatially non-overlapping whole-brain parcel-space features). Filled circle markers represent decoding accuracy for individual subjects. Dashed lines indicate within-subject performance changes between decoding approaches. Note, that the HybridOverlap (the approach used in our manuscript) significantly outperforms the HybridNon-overlap approach (Wilcoxon signed rank test, z = 3.7410, p = 1.8326e-04), despite the removed features (n = 8) only comprising less than 1% of the overall input feature space. These results indicate that the spatially overlapping whole-brain (lower resolution) parcel-space and regional (higher resolution) voxel-space features provide complimentary—as opposed to redundant—information to the hybrid-space decoder.

Comparison of different decoder methods.
Performance for all different machine learning decoders assessed is shown for each participant. The results show that the linear discriminant analysis (LDA) classifier outperformed other methods, on average, across the group. Decoding analysis performance comparisons reported in the current study utilized the LDA decoder for all subjects.

Quantification of parcel-space trial-by-trial feature importance score during skill learning.
Parcel-space trial-by-trial changes in feature importance scores are shown for right superior frontal, middle frontal, pre-central, and post-central cortex (i.e. – the contralateral regions showing the highest regional voxel-space decoding accuracy). Note that the feature importance is initially higher for the contralateral pre-central cortex in early trials before shifting towards the contralateral middle and superior frontal cortex during later trials, as can be seen with the divergence of line plots beginning around trial 11.

Trial-by-trial classification accuracy for 2-class decoder (IndexOP1 vs. IndexOP5).
Several decoders (with varying window durations aligned to the KeyDown event) were trained to differentiate between the two index finger keypresses embedded at different positions within the practiced skill sequence (IndexOP1 at ordinal position 1 vs. IndexOP5 at ordinal position 5). Decoding accuracy for the 200 ms duration windows (i.e. – the optimal window size for 5-class decoding of individual keypresses) progressively improves over early learning, stabilizing around 96% by trial 11 (end of early learning). Similar results were observed for all other decoding window sizes (50, 100, 150, 250 and 300 ms), with overall accuracy slightly lower compared to 200 ms. These findings indicate that the neural representations of the skill action is updated over early learning to incorporate sequence location information.

A) Scatter plot of gaze positions at the KeyDown event and 200ms after the KeyDown event (i.e. – beginning and ending of window used for decoding keypress labels from MEG input features) from a representative participant. Transparent grey dots indicate all sampled gaze positions during practice trials. The overall mean gaze position during practice trials is indicated by the black filled circle marker. Colored right-pointing triangle markers indicate the gaze position at the KeyDown event for each ordinal position keypress (IndexOP1 – magenta; LittleOP2 – yellow; MiddleOP3 – blue; RingOP4 – green; IndexOP5 – brown), while left-pointing triangle markers indicate the gaze position 200ms after the KeyDown event. The mean gaze position for these two time-points is indicated by the larger-sized triangle markers. On average, gaze position is largely fixed for the OP1 and OP3 keypresses, moves from left-to-right for OP2 and OP4 keypresses, and from right-to-left for OP5 keypresses (which is when the asterisk moves leftward from the last sequence item back to the first). B) Confusion matrix showing that three eye movement features fail to predict asterisk position on the task display above chance levels (Fold 1 test accuracy = 0.21718; Fold 2 test accuracy = 0.22023; Fold 3 test accuracy = 0.21859; Fold 4 test accuracy = 0.22113; Fold 5 test accuracy = 0.21373; Overall cross-validated accuracy = 0.2181). Since the ordinal position of the asterisk on the display is highly correlated with the ordinal position of individual keypresses in the sequence, this analysis provides strong evidence that keypress decoding performance from MEG features is not explained by systematic relationships between finger movement behavior and eye movements (i.e. – behavioral artefacts). C) 5-class decoding of ordinal position keypress labels from eye movement recording features approached empirically determined chance levels (as shown by decoding performance after random label shuffling). Note that all decoding performances from eye movement data was substantially lower than MEG hybrid-space decoding for all participants.

A) Relationship between offline neuronal representational changes and micro-offline learning. Offline contextualization—calculated as the Euclidian distance between the neural representations observed for the first IndexOP1 keypress from practice trial, n, and the last IndexOP5 keypress from practice trial, n-1—increased over early learning. A linear regression analysis (shown in the inset) revealed a strong temporal relationship (correlation coefficient [r] = 0.903 and coefficient of variance explained [R2] = 0.816) between contextualization and cumulative micro-offline gains over early learning. B) Changes in offline contextualization for different decoding window durations as a function of rest breaks. We constructed decoders from different MEG input feature time windows (window durations of 50, 100, 150, 200, 250 and 300ms; all aligned to the KeyDown event), to assess the robustness of the offline contextualization finding with respect to this parameter selection. Offline contextualization showed similar trends for all options tested. C) Relationship between offline neural representational changes and micro-offline learning across all window durations. The linear regression analysis from (A) was repeated for all contextualization measures from (B) obtained after varying the MEG input feature window size (50 – 300 ms). This strong temporal relationship was observed for all window durations (0.598 ≥ R2 ≥ 0.816), except for 300 ms (R2 = 0.284) where temporal overlap of individual keypress features was most prominent.

Trial-by-trial trends for different measurement approaches of offline and online contextualization changes.
A) Offline contextualization between the last sequence of a preceding trial and the second sequence of the subsequent one (skipping the first sequence of that trial) rendered a comparable result to the measure reported in Figure 5 and Figure 5 – figure supplement 1 which use the first sequence—inconsistent with a possible confounding effect of pre-planning [30]. B) Two different measurement approaches were used to characterize online contextualization changes. The sequence-based approach calculated the mean distance between IndexOP1 and IndexOP5 for each correct sequence iteration within a trial (green). A second trial-based approach was also implemented, which controlled for the passage of time between observations used in both online and offline distance measures (10 seconds between IndexOP1 and IndexOP5 observations in both cases). Note that the trial-based approach showed no increase in online contextualization over early learning. Importantly, the overall magnitude of online contextualization by the end of early learning was similar for both measurement approaches, and both showed reduced online relative to offline contextualization.

Online contextualization versus micro-online learning.
The relationship between online contextualization and online learning is shown for both sequence- (A, left) and trial-based (B, right) distance measurement approaches. There was no significant relationship between online learning and online contextualization regardless of the measurement approach.

Within-subject correlations between online and offline contextualization changes versus learning.
Pirate plots displaying individual subject correlation coefficients for offline (i.e. – over rest) and online (i.e. – during practice) contextualization changes versus micro-offline and -online performance gains. Within-subject correlations were significantly greater for offline contextualization changes versus micro-offline performance gains than for online contextualization changes versus either micro-offline or -online performance gains. The average correlation between offline contextualization and micro-offline gains within individuals was significantly greater than zero (left; t = 3.87, p = 0.00035, df = 25, Cohen’s d = 0.76) and stronger than correlations between online contextualization and either micro-online (middle; t = 3.28, p = 0.0015, df = 25, Cohen’s d = 1.2) or micro-offline gains (right; t = 3.7021, p = 5.3013e-04, df = 25, Cohen’s d = 0.69).

Online versus offline changes in keypress transition patterns.
A) Trial-by-trial Euclidian distance between the relative share of each keypress transition time to the full sequence duration (i.e. – differences in typing rhythm). This distance was calculated for the first and last sequence of each trial (online pattern distance; green) and the last sequence of a trial versus the first sequence of the next (offline pattern distance; purple). B) Cumulative online (green; left) and offline (purple; right) pattern distances recorded over all forty-five trials covering Days 1 and 2. Note the comparable online and offline typing rhythm changes do not explain differences between online and offline contextualization, which is fully developed by trial 11 (Figure 5).

Relationship between adjacent index finger transitions and online contextualization.
Scatter plot showing that the sum of adjacent index finger keypress transition times (i.e. – the 4-4 transition at the conclusion of one sequence iteration and the 4-1 transition at the beginning of the next sequence iteration) versus online contextualization distances measured during practice trials. Both the keypress transition times, and online contextualization scores were z-score normalized within individual subjects and then concatenated into a single data superset. A simple linear regression between keypress transition time predictor and the online contextualization response variable showed a very weak linear relationship between the two (R2 = 0.00507, F[1,3202] = 16.3). This result shows that contextualization of index finger representations does not reflect the amount of overlap between adjacent keypresses.

Between subject differences in typing speed versus online contextualization.
A) Between-subject relationship between plateau performance speed and online contextualization. The plateau performance typing speed showed no significant relationship with the degree of online contextualization (R2 = 0.028, p = 0.41). Each dot represents the maximum speed attained and the corresponding degree of contextualization of each participant. Thus, the magnitude of online contextualization was not dependent on how fast individuals could perform the task at the end of early learning. B) Trial-by-trial relationship between typing speed and degree of online contextualization. We also performed a trial-by-trial regression analysis that related the degree of online contextualization for each trial with the median typing speed for that trial. The R2 values obtained for regression analyses performed on individual trials were also low, and not statistically significant (mean R2= 0.06; p > 0.05).