Introduction

Motor learning is required to perform a wide array of basic daily living activities, intricate athletic endeavors, and professional skills. Whether it’s learning to type more quickly on a keyboard1, improve one’s tennis game2, or play a piece of music on the piano3 – all of these skills require the ability to execute sequences of actions with precise temporal coordination. Action sequences thus form the building blocks of fine motor skills4. Initial exposure to acquisition of a new skill results in rapid performance improvements during the initial practice session (i.e. – early learning), which is up to four times larger in magnitude compared to offline performance improvements reported following overnight sleep1,5.

The neural representation of a sequential skill binds discrete individual actions (e.g. - single piano keypress) into complex, temporally and spatially precise sequence representations (e.g. - a refrain from a piece of music)6,7. The neural representation or manifold8 of sequential skills has been characterized in humans3,9,10. Previous fMRI reports showed that the representation of individual motor sequence actions may remain relatively stable over days or weeks of practice, after a memory is formed11. However, it is not known if individual sequence action representations differentiate or remain stable during early skill learning, when most prominent performance improvements occur, and the memory is not yet fully formed1. Furthermore, it is not known if performance of the same action executed at different contextual locations within a skill sequence can change as a function of learning progression, an issue crucial to the robustness of BCI applications.

Investigating the contextualization of discrete action representations at a millisecond level is challenging since both the individual actions and the skill sequence they are embedded within are concurrently represented in changing neural activity dynamics during learning10,12. To address this problem, we constructed a series of decoders aimed at predicting keypress actions from magnetoencephalographic (MEG) neural activity, dependent upon the learning state and the local sequence context (ordinal position) the keypress action is performed within. Implementing this novel approach allowed us to determine that individual sequence actions are indeed contextualized during early skill learning resulting in differentiation of their neural representations, and that the degree of this differentiation predicts skill gains. This representational contextualization was also observed to predominantly develop during rest rather than during practice intervals in parallel with rapid consolidation of skill.

Results

Participants engaged in a well characterized sequential skill learning task1,5,13 that involved repetitive typing of a sequence (4-1-3-2-4) performed with their (non-dominant) left hand over 36 trials with alternating periods of 10s practice and 10s rest (inter-practice rest; Day 1 Training; Figure 1A). Individual keypress times and finger keypress identities were recorded and used to quantify skill as the correct sequence speed (keypresses/s)1.

Experimental design and behavioral performance

A. Skill learning task. Participants engaged in a procedural motor skill learning task, which required them to repeatedly type a keypress sequence, “4 − 1 − 3 − 2 − 4” (1 = little finger, 2 = ring finger, 3 = middle finger, and 4 = index finger) with their non-dominant, left hand. The Day 1 Training session included 36 trials, with each trial consisting of alternating 10s practice and rest intervals. After a 24-hour break, participants were retested on performance of the same sequence (4-1-3-2-4) for 9 trials (Day 2 Retest) as well as single-trial performance on 9 different sequences (Day 2 Control; 2-1-3-4-2, 4-2-4-3-1, 3-4-2-3-1, 1-4-3-4-2, 3-2-4-3-1, 1-4-2-3-1, 3-2-4-2-1, 3-2-1-4-2, and 4-2-3-1-4). MEG was recorded during both Day 1 and Day 2 sessions with a 275-channel CTF magnetoencephalography (MEG) system (CTF Systems, Inc., Canada). B. Skill Learning. As reported previously1, participants on average reached 95% of peak performance by trial 11 of the Day 1 Training session (see Figure 1 - figure Supplement 1A for results over all Day 1 Training and Day 2 Retest trials). At the group level, total early learning was exclusively accounted for by micro-offline gains during inter-practice rest intervals (Figure 1B inset). C. Keypress transition time (KTT) variability. Distribution of KTTs normalized to the median correct sequence time for each participant and centered on the mid-point for each full sequence iteration during early learning (see Figure 1—figure Supplement 1B for results over all Day 1 Training and Day 2 Retest trials). Note the initial variability of the relative KTT composition of the sequence (i.e. – 4-1, 1-3, 3-2, 2-4, 4-4), before it stabilizes by trial 6 in the early learning period.

Participants reached 95% of maximal skill (i.e. - Early Learning) within the initial 11 practice trials (Figure 1B), with improvements developing over inter-practice rest periods (micro-offline gains) accounting for almost all of total learning across participants (Figure 1B inset)1. In addition to the reduction in sequence duration during early learning, the relative duration of component keypress transitions displayed increased temporal regularity across sequence iterations (Figure 1C). On the following day, participants were retested on performance of the same sequence (4-1-3-2-4) over 9 trials (Day 2 Retest), as well as single-trial performance on 9 different untrained sequences (Day 2 Controls: 2-1-3-4-2, 4-2-4-3-1, 3-4-2-3-1, 1-4-3-4-2, 3-2-4-3-1, 1-4-2-3-1, 3-2-4-2-1, 3-2-1-4-2, and 4-2-3-1-4). As expected, an upward shift in performance of the trained sequence (0.68 ± SD 0.56 keypresses/s; t = 7.21, p < 0.001) was observed during Day 2 Retest, indicative of an overnight skill consolidation effect for the trained sequence only (Figure 1—figure Supplement 1C).

Keypress actions are represented in multi-scale hybrid-space manifolds

To investigate the possibility of contextualization of action representations, we constructed a set of decoders to predict keypress actions from MEG activity as a function of both the learning state and the ordinal position of the keypress within the sequence. We first characterized the spectral and spatial features of keypress state representations by comparing performance of decoders constructed around broadband (1-100Hz) or narrowband [delta- (1-3Hz), theta- (4-7Hz), alpha- (8-14 Hz), beta- (15-24 Hz), gamma- (25-50Hz) and high gamma-band (51-100Hz)] MEG oscillatory activity. We found that decoders trained on broadband data consistently outperformed those trained on narrowband activity, and that whole-brain parcel- (148 brain regions or parcels) or voxel-space (15684 voxels)14 decoders exhibited greater accuracy (parcel: t = 1.89, p = 0.035; voxel: t = 7.18, p < 0.001) than individual voxel-space intra-parcel decoders (Figure 2). Specifically, while the highest performing intra-parcel decoders predicted keypresses with up to 68.77% (± SD 7.6%) accuracy, whole-brain inter-parcel decoder accuracy reached 74.51% (± SD 7.34%) (Figure 2, also see Figure 2 – figure supplement 1).

Spatial and oscillatory contributions to neural decoding of finger identities

A. Contribution of whole brain oscillatory frequencies to decoding. When trained on broadband activity relative to narrow frequency band features, decoding accuracy (i.e. - test sample performance) was highest for whole-brain voxel- (74.51% ± SD 7.34%, t = 8.08, p < 0.001) and parcel-space (70.11% ± SD 7.11%, t = 13.22, p < 0.001) MEG activity. Thus, decoders trained on whole-brain broadband data consistently outperformed those trained on narrowband activity. Dots depict decoding accuracy for each participant. *p < 0.05, **p< 0.01, ***p< 0.001, ns.: not significant. B. Contribution of intra-parcel brain oscillatory frequencies to decoding. Performance of the top performing decile of intra-parcel decoders (see Methods, Figure 2—figure supplement 1) over different frequency bands (top-panel; decoding accuracy is color coded and listed for each parcel and frequency band assessed). Note that broadband activity resulted in best accuracy closely followed by delta band activity. Broadband intra-parcel decoding accuracy mapped to a standard brain surface (FreeSurfer fsaverage brain) and color-coded by accuracy (bottom panel). Note that bilateral superior frontal cortex (yellow, 69%) contributed the most to decoding accuracy followed by middle frontal, pre-and post-central regions (> 60%).

Next, we asked if the combination of intra- and inter-parcel activity patterns, constituting a multi-scale hybrid space, improve further keypress predictions. We constructed hybrid space decoders (N = 1295 ± 20) combining all inter-parcel activity with top-ranked intra-parcel features (covering 8 regions, see Methods, Figure 3A). Decoder accuracy was higher for hybrid (78.15% ± SD 7.03% accuracy and a weighted mean F1 score of 0.78 ± SD 0.07) than for voxel- (74.51% ± SD 7.34%; paired t-test: t = 6.30, p < 0.001) and parcel- (70.11% ± SD 7.48%; paired t-test = 12.08, p < 0.001) spaces (Figure 3B, Figure 3 – figure supplement 1). Thus, a multi-scale hybrid-space representation best characterizes keypress action manifolds.

Hybrid spatial approach for neural decoding during skill learning A. Pipeline

Sensor-space MEG data (N = 272 channels) were source-localized (voxel-space features; N = 15684 voxels), and then parcellated (parcel-space features; N = 148) by averaging the activity of all voxels located within an individual region defined in a standard template space (Desikan-Killiany Atlas). Individual voxel-space regional decoders (intra-parcel decoders) were then constructed and ranked. The final hybrid-space keypress state (i.e. – 4-class) decoder was constructed using all parcel-spaces and top-ranked intra-parcel voxel input features (see Methods). B. Decoding performance across parcel, voxel, and hybrid spaces. Note that decoding performance was highest for the hybrid space approach compared to performance obtained for whole-brain voxel- and parcel spaces. Addition of linear discriminant analysis (LDA)-based dimensionality reduction further improved decoding performance for both parcel- and hybrid-space approaches. Each dot represents accuracy for a single participant and method. “∗∗∗” indicates p < 0.001 and “” indicates p < 0.05. C. Confusion matrix of individual finger identity decoding for hybrid-space manifold features. True predictions are located on the main diagonal. Off-diagonal elements in each row depict false-negative predictions for each finger, while off-diagonal elements in each column indicate false-positive predictions. Please note that the index finger keypress had the highest misclassifications (n = 141 or 47.5% of all prediction errors).

We implemented several dimensionality reduction or manifold learning strategies including principal component analysis (PCA), multi-dimensional scaling (MDS), minimum redundant maximum relevance (MRMR), and linear discriminant analysis (LDA)15 to map the input feature (parcel, voxel or hybrid) space to a low-dimensional latent space or manifold8. LDA-based manifold representation performed the best improving the keypress decoding accuracy to 90.47% ± SD 4.84% (Figure 3B; weighted mean F1 score = 0.91 ± SD 0.05). Consistently, LDA dimensionality reduction improved decoder performance at parcel (82.95% ± SD 5.48%) but reduced it at voxel space (40.38 % ± SD 6.78%; also see Figure 3 – figure supplement 2). Notably, decoding associated with index finger keypresses (executed at two different ordinal positions in the sequence) exhibited the highest number of misclassifications of all digits (N = 141 or 47.5% of all decoding errors; Figure 3C), raising the hypothesis that the same action could be differentially represented when executed at different learning state or sequence context locations.

We assessed the robustness of this hybrid strategy for decoding actions during skill learning over multiple sessions by applying it to data collected on the following day during the Day 2 Retest (9-trial retest of the same trained sequence) and Day 2 Control (single-trial performance of 9 different untrained sequences) blocks. The decoding accuracy for Day 2 MEG data remained high (87.11% ± SD 8.54% for the trained sequence during Retest, and 79.44% ± SD 5.54% for the untrained Control sequences; see confusion matrices in Figure 3 – figure supplement 3). This indicates that the hybrid decoding strategy is particularly reliable for decoding keypress actions during skill learning and the neural representations of the learned sequence remain stable the following day. Although the keypress state decoding performance was lower for untrained Control sequences relative to the trained sequence, greater accuracy of the hybrid approach over purely parcel- or voxel-space input features was still observed. Thus, this hybrid approach allows robust sequential finger movement decoding across multiple days and sequences.

Inclusion of keypress sequence context location optimized decoding performance

Next, we tracked the trial-by-trial evolution of keypress action manifolds as training progressed. Keypress neural representations were observed to progressively differentiate during early learning (Figure 4). A representative example in Figure 4A (top row) depicts progressive representational clustering of four-digit representations from trials 1, 11, and 36. The spatial representation of these clusters changed over the course of training, from predominant involvement of pre-central areas in trial 1 to post-central, superior, and middle frontal cortex contributions in later trials 11 and 36 (Figure 4A, bottom row). Thus, a shift in activity from pre-central areas to post-central, superior, and middle frontal cortex paralleled improvements in decoding performance (also see Fig. Figure 4 – figure supplement 1 for trial-by-trial quantitative feature importance score changes during skill learning).

Evolution of Keypress Neural Representations with Skill Learning

A. Keypress neural representations differentiate during early learning. t-SNE distribution of neural representation of each keypress (top scatter plots) is shown for trial 1 (start of training; top-left), 11 (end of early learning; top-center), and 36 (end of training; top-right) for a single representative participant. Individual keypress manifold representation clustering in trial 11 (top-center; end of early learning) depicts sub-clustering for the index finger keypress performed at the two different ordinal positions in the sequence (IndexOP1 and IndexOP5), which remains present by trial 36 (top-right). Spatial distribution of regional contributions to decoding (bottom brain surface maps). The surface color heatmap indicates feature importance scores across the brain. Note that decoding contributions shifted from right pre-central cortex at trial 1 (bottom-left) to superior and middle frontal cortex at trials 11 (bottom-center) and 36 (bottom-right). B. Confusion matrix for 5-class decoding of individual sequence items. Decoders were trained to classify contextual representations of the keypresses (i.e., 5-class classification of the sequence elements 4-1-2-3-4). Note that the decoding accuracy increased to 94.15% ± SD 4.84% and the misclassification of keypress 4 was significantly reduced (from 141 to 82). C. Trial-by-trial classification accuracy for 2-class decoder (IndexOP1 vs. IndexOP5). A decoder trained to differentiate between the two index finger keypresses embedded at different positions (IndexOP1 at ordinal position 1 vs. IndexOP5 at ordinal position 5) within the trained sequence becomes progressively more accurately over early learning, stabilizing around 96% by trial 12 (end of early learning). Taken together, these findings indicate that the neural feature space evolves over early learning to incorporate sequence location information.

Correct performance of the training sequence required pressing the index finger twice (4-1-3-2-4) at two contextually different ordinal positions (sequence positions 1 and 5). Inclusion of sequence location information (i.e. – sequence context) for each keypress action (five sequence elements with the one keypress represented twice at two different locations) improved decoding accuracy (t = 7.09, p < 0.001, Figure 4B) from 90.47% (± SD 3.44%) to 94.15% (± SD 4.84%; weighted mean F1 score: 0.94), and reduced overall misclassifications by 54.3% (from 219 to 119; Figure 3C and 4B). This is supported by greater differentiation in neural representations of the two different index finger keypresses embedded at different points within the sequence (Figure 4A), which leads to a trial-by-trial increase in 2-class decoding accuracy (Figure 4C). Inclusion of contextual information to the hybrid-space decoder also increased classification accuracy of Day 2 Retest trained sequence data (from 87.11% for 4-class to 90.22% for 5-class). As expected, contextualized 5-class decoding of Day 2 Control MEG data for untrained sequences performed at or below chance levels (≤ 30.22% ± SD 0.44%) due to varied ordinal positions of keypress actions across the different sequences. Thus, inclusion of contextual information in the decoding framework only improves decoding accuracy for skill sequences which have been learned through practice.

Neural representation of keypress sequence location diverged during early skill learning

Finally, we used a Euclidian distance measure to evaluate the differentiation of the neural representation manifold of the same action (i.e. - an index-finger keypress) executed within different local sequence contexts (i.e. - ordinal position 1 vs. ordinal position 5). Note that, the neural representation manifolds (i.e. - the reduced dimension hybrid-space features extracted from broadband MEG data) are the features that resulted in the best decoding performance (See Methods, Figure 4 – figure supplement 2).

The Euclidian distance between keypress sequence location manifolds of the index finger in its two ordinal positions increased progressively during both rest and practice periods of early learning before stabilizing (Figure 5A). Change in this Euclidian distance was more prominent during rest than during practice periods (t = 4.84, p < 0.001; Figure 5B), similar to early learning gains (Figure 1B) and strongly predicted cumulative micro-offline gains (r = 0.90, R2 = 0.82, p < 0.001; Figure 5 - figure supplement 1). Importantly, the differentiation in neural representations was not explained by differences in keypress transition pattern regularity (within-subject; t = −0.03, p = 0.976; Figure 5 – figure supplement 2) or typing speed (between-subject; R2 = 0.028, p = 0.41; Figure 5 – figure supplement 3). These neural representation manifold differences remained consistent on the next day test session for the trained sequence, significantly higher than those for the untrained sequences (Figure 5C). The effect of contextualization also remained stable on Day 2 Retest – at the same time that an upward shift in performance speed and slight decrease in 4-class decoding accuracy was observed – and was substantially greater than contextualization observed for groups of untrained sequences (Figure 5C) indicating specificity for sequence pattern, keypress finger and ordinal position.

Neural representation distance between index finger keypresses performed at two different ordinal positions within a sequence

A. Contextualization increases over Early Learning during Day 1 Training. Online (Practice; green line) and offline (Rest; magenta line) neural representation distances between two index finger key presses performed at ordinal positions 1 and 5 of the trained sequence (4-1-3-2-4) are shown for each trial during Day 1 Training. Both online and offline distances between the two index finger representations increase sharply over the Early Learning before stabilizing across later Day 1 Training trials. B. Contextualization primarily occurs offline during short inter-practice rest periods. The neural representation difference was significantly greater when assessed offline (right distribution; purple) versus online (left distribution; green) periods (t = 4.84, p < 0.001). C. Contextualization was retained after 24 hours and was specific to the trained sequence. The neural representation differences assessed across both rest and practice for the trained sequence (4-1-3-2-4) were retained for Day 2 Retest. Further, contextualization was significantly reduced for several untrained sequences controlling for: 1) index finger keypresses located at the same ordinal positions 1 and 5 but with a different intervening sequence pattern (Pattern Specificity Control: 4-2-3-1-4); 2) both ordinal 1 and 5 position keypresses performed with either the little or ring finger instead of the index finger (Finger Specificity Control: 2-1-3-4-2, 1-4-2-3-1 and 2-3-1-4-2); and 3) multiple index finger keypresses occurring at ordinal positions other than 1 and 5 (Position Specificity Control: 4-2-4-3-1 and 1-4-3-4-2). The mean online neural representation distance, or degree of contextualization, was substantially lower for the untrained control sequences (51.05% lower for the Pattern Specificity Control sequence, 35.80% lower for the Finger Specificity Control sequences, and 22.06% lower for the Position Specificity Control sequences) compared with the trained sequence. Note that offline contextualization cannot be measured for the Day 2 Control sequences as each sequence was only performed over a single trial.

Discussion

The main findings of this study were that individual sequence action representations differentiate during early skill learning in a manner reflecting the local sequence context in which they are performed, and that the degree of representational differentiation, particularly prominent during rest intervals, predicts skill gains.

Optimizing decoding of sequential finger movements from MEG activity

The initial phase of this study involved optimizing the accuracy of decoding individuated finger keypresses from MEG brain activity. The decoding of individual finger movement execution or intention is a fundamental part of many translational BCI applications1620. For example, decoding of virtual keypresses can be interfaced with applications for controlling motorized wheelchairs, writing and sending emails, or even controlling robotic hands or exoskeletons21, and are currently being evaluated in patients with spinal cord and brain lesions22,23. Once these neural signals are characterized, they can be translated into meaningful commands or actions21,24. However, achieving robustness in clinical applications necessitates enhancements in state-of-the-art decoding tools, with a particular focus on improving decoding accuracy18,20,25.

In the present study, we developed a novel hybrid-space decoder that captured the evolving neural representational dynamics over multiple spatial resolutions. This approach combined (1) parcel-space estimates from whole-brain activity and (2) voxel-space estimates from brain regions containing most keypress-related brain activity (see Methods). Using this approach, we achieved keypress state decoding accuracy that exceeded 90%, a substantial improvement over accuracies reported before with either non-invasive neuroimaging (i.e. – 43% to 77% for fMRI, MEG, and EEG)13,18,2628 or invasive neural recording decoding of imagined movements (i.e. – 77% to 86% for ECOG and micro-array implant)29,30 techniques. Given these findings, the improvement in the keypress decoding accuracy from Figure 2 (parcel/voxel) to Figure 3 (parcel/voxel/hybrid with dimension reduction) is likely explained by the capability of the hybrid-space architecture to capture both lower spatially resolved whole-brain and higher spatially resolved regional activity patterns that contribute unique skill-related information to the sequence-embedded keypress state representations. Importantly, this approach allowed accurate decoding during skill acquisition before a performance plateau was reached. It is possible that the ability to decode finger movements in the midst of substantial trial-by-trial performance changes as shown here could improve robustness of BCI applications in neurorehabilitation17,31. Accurate estimation of finger movement representations as they evolve during skill learning should also improve detection of neural replay events during wakeful rest or sleep, and enhance investigations of the role replay has in supporting skill consolidation13.

Best decoding performance was obtained from activity in the bilateral superior and middle frontal cortex and the pre- and post-central regions, all known contributors to skill learning12,32. Low-frequency oscillations (LFOs) contributed the most to decoding accuracy, consistent with previous reports8,33. LFOs, present during movement onset in the cerebral cortex of animals34,35 and humans3638, encode information about movement trajectories and velocity34,35. They also contain information related to movement timing3941, preparation42,43, sensorimotor integration37, kinematics42,43 and may contribute to the precise temporal coordination of movements required for sequencing44. Within clinical contexts, LFOs in the frontoparietal regions that were the main contributors to decoding performance in the present study are related to recovery of motor function after brain lesions like stroke36,39,45,46. Our results, in conjunction with those discussed above, suggest that LFOs contain crucial information relevant to decoding of sequence-embedded skill actions.

Neural representations of individual sequence actions are contextualized during early skill learning

Previous work decoded actions from human neural activity when performed in isolation (e.g. individuated finger movement decoding)8,26,27,29,30,4750 or embedded within previously learned sequences28,51 near stable upper performance limits. To our knowledge, decoding of individual sequence-embedded actions has not been either attempted or reported within the context of rapid performance improvements that characterize early skill learning. We addressed this gap in knowledge by investigating different decoding strategies and identified the approach that rendered the best decoding performance, which in this case was approximately 94% (5-class hybrid-space decoder). We then applied this strategy to investigate neural representations of sequential individual finger movements during the skill learning task. We found that improved 5-class decoding accuracy was linked to trial-by-trial increases in the differentiation between index finger keypresses embedded within the sequence at two different locations throughout early learning (Figure 4C), which occurred in parallel to skill performance improvements. Feature importance scores are initially led by the pre-central cortex during early learning followed by an intermediate shift towards post-central cortex by trial 11, which is consistent with the known role of the sensorimotor cortex in early skill acquisition6. Concurrently, the superior frontal and then middle frontal cortex continue to increase in importance over the Day 1 Training session and represent the two most important features between trials 15 and 36 after the rapid initial performance gains stabilize. This is consistent with previous findings that activity patterns within these regions represent hierarchical structures of skill sequences10.

Neural representations are modified by experience and practice32. For example, FMRI studies showed that weeks-long practice of a keyboard task results in representational changes of the skill (finger sequence), but not of the individual sequence actions components (individual finger movements) in the primary motor cortex10. An MEG study showed that previously learned individual sequence-embedded action (i.e. – finger movement) representations display an ordered queuing gradient when preparing to generate a previously learned sequence28. Thus, representations of previously learned skills following prolonged practice over days and weeks appear to reflect some combination of invariant features of individual action components with higher order representations of the action sequence.

Less is known about changes in neural representations during rapid performance improvements that characterize early skill learning. MEG, which possesses millisecond temporal resolution, is the tool of choice to investigate this short early learning phase52. In the present study, we leveraged this feature of MEG to investigate if the representation of the same sequence action (index finger movement) differentiates over early learning when embedded at different ordinal locations within the sequence. We found that practice led to a progressive differentiation of the neural representation of index finger movements reflecting the different contextual positions within the sequence. The finding that the magnitude of representational differentiation predicts skill gains supports the view that contextualization of neural representations contributes to early learning. Further support for this view comes from the finding that adding context information to the decoding algorithm increased decoding accuracy to levels higher than previously reported8,2630,4751. A possible neural mechanism supporting contextualization could be the emergence of conjunctive “what–where” representations of procedural memories53 with the corresponding modulation of neuronal population dynamics54,55.

Representational contextualization developed predominantly during rest periods of early learning

Representational contextualization developed to a larger extent during rest than during practice intervals. Furthermore, the magnitude of offline contextualization predicted skill gains while online contextualization did not. These findings are in line with previous work showing that most skill gains during early learning develop during rest rather than during practice intervals1,56 and suggest that contextualization of action representations could contribute to rapid consolidation of skill. Importantly, contextualization of the learned sequence was retained the following day indicating the stability of the formed skill memory.

Limitations

One limitation of this study is that contextualization was investigated for only one finger movement (index finger or digit 4) embedded within a relatively short 5-item skill sequence. It would be useful to determine if representational contextualization of multiple finger movements embedded within longer sequences (e.g. – a short piece of piano music) also exhibit similar results. While a supervised manifold learning approach was used here (LDA), unsupervised approaches (PCA and MDS) more suitable for BCI applications, also substantially improved decoding accuracy that can be used for real-time BCI applications (Figure 3 – figure supplement 2)

Summary

In summary, individual sequence action representations contextualize during the initial practice trials of a new skill and the degree of differentiation parallels skill gains. Importantly, the neural representation of context develops to a larger extent during rest than during practice intervals of early learning in parallel with rapid consolidation of skill. It is possible that the systematic inclusion of contextualized information into sequence skill practice environments could improve learning in areas as diverse as music education, sports training, and rehabilitation of motor skills after brain lesions.

Materials and Methods

Study Participants

The study was approved by the Combined Neuroscience Institutional Review Board of the National Institutes of Health (NIH). A total of thirty-three young and healthy adults (comprising 16 females) with a mean age of 26.6 years (± 0.87 SEM) participated in the study after providing written informed consent and undergoing a standard neurological examination. No participants were actively engaged in playing musical instruments in their daily lives, as per guidelines outlined in prior research57,58. All study scientific data were de-identified, and permanently unlinked from all personal identifiable information (PII) before the analysis. These data are publicly available upon request (https://nih.box.com/v/hcpsSkillLearningData). Two participants were excluded from the analysis due to MEG system malfunction during data acquisition. An additional 5 subjects were excluded where no comparable correct sequences were generated for two or more consecutive trials. The sample size was pre-determined through a power analysis designed to characterize skill learning1.

Experimental Setup

Participants practiced a novel procedural motor skill learning task that involved repetitively typing a 5-item numerical sequence (4-1-3-2-4) displayed on a computer screen. They were instructed to perform the task as quickly and accurately as possible using their non-dominant, left hand on a response pad (Cedrus LS-LINE, Cedrus Corp). Each numbered sequence item corresponded to a specific finger keypress: 1 for a little finger keypress, 2 for a ring finger keypress, 3 for a middle finger keypress, and 4 for an index finger keypress. Individual keypress times and identities were recorded and used to assess skill learning and performance.

Participants practiced the skill for 36 trials. Each trial spanned a total of 20 seconds and included a 10s practice round followed by a 10s inter-practice rest period. The five-item sequence was displayed on the computer screen for the duration of each practice round and participants were directed to fix their gaze on the sequence. Small asterisks were displayed above a sequence item after each successive keypress, signaling the participants’ present position within the sequence. Following the completion of a full sequence iteration, the asterisks were removed. The asterisk display did not provide error feedback as it appeared for both correct and incorrect keypresses. At the end of each practice round, the displayed number sequence was replaced by a string of five “X” symbols displayed on the computer screen, which remained for the duration of the inter-practice rest period. Participants were instructed to focus their gaze on the screen during this time.

On the next day, participants were tested (Day 2 Retest) with the same trained sequence (4-1-3-2-4) for 9 trials as well as for 9 different unpracticed control sequences (Day 2 Control; 2-1-3-4-2; 4-2-4-3-1; 3-4-2-3-1; 1-4-3-4-2; 3-2-4-3-1; 1-4-2-3-1; 3-2-4-2-1; 2-3-1-4-2; 4-2-3-1-4) each for one trial. The practice schedule structure for Day 2 was same as Day 1, with 10s practice trials interleaved with 10s of rest.

Behavioral data analysis

Skill

Skill, in the context of the present task, is quantified as the correct sequence typing speed, (ie. - the number of correctly typed sequence keypresses per second; kp/s). That is, improvements in the speed/accuracy trade-off equate to greater skill. Keypress transition times (KTT) were calculated as the difference in time between the KeyDown events recorded for consecutive keypresses. Since the sequence was repeatedly typed within a single trial, individual keypresses were marked as correct if they were members of a 5 consecutive keypress set that matched any possible circular shift of the displayed 5-item sequence. The instantaneous correct sequence speed was calculated as the inverse of the average KTT across a single correct sequence iteration and was updated for each correct keypress. Trial-by-trial skill changes were assessed by computing the median correct sequence typing speed for each trial.

Early Learning

The early learning period was defined as the trial range (1 - T trials) over which 95% of the total skill performance was first attained at the group level. We quantified this by fitting the group average trial-by-trial correct sequence speed data with an exponential model of the form:

Here, the trial number is denoted by t, and L(t) signifies the group-averaged performance at trial t. Parameters C1 and C2 correspond to the pre-training performance baseline and asymptote, respectively, while k denotes the learning rate. The values for C1, C2, and k were computed using a constrained nonlinear least-squares method (MATLAB’s lsqcurvefit function, trust-region-reflective algorithm) and were determined to be 0.5, 0.15, and 0.2, respectively. The early learning trial cut-off, denoted as T, was identified as the first trial where 95% of the learning had been achieved. In this study, T was determined to be trial 11.

Performance improvements during rest (micro-offline gains), for each rest period of a trial, were calculated as the net change in performance (instantaneous correct sequence typing speed) from the end of one practice period to the onset of the next (i.e. – over a single inter-practice rest period). Micro-online gains (performance improvements during practice periods) were computed as the change in performance between the beginning and end of a single practice trial. Total early learning was derived as the sum of all micro-online and micro-offline gains over trials 1-11. Cumulative micro-offline gains, micro-online gains, and total early learning were statistically compared using 1-way ANOVAs and post-hoc Tukey tests.

MRI Acquisition

We acquired T1-weighted high-resolution anatomical MRI volumes images (1 mm3 isotropic MPRAGE sequence) for each participant on a 3T MRI scanner (GE Excite HDxt or Siemens Skyra) equipped with a 32-channel head coil. These data allowed for spatial co-registration of an individual participant’s brain with the MEG sensors, and individual head models required for surface-based cortical dipole estimation from MEG signals (i.e. – MEG source-space modeling).

MEG Acquisition

We recorded continuous magnetoencephalography (MEG) at a sampling frequency of 600 Hz using a CTF 275 MEG system (CTF Systems, Inc., Canada) while participants were seated in an upright position. The MEG system comprises a whole-head array featuring 275 radial 1st-order gradiometer/SQUID channels housed in a magnetically shielded room (Vacuumschmelze, Germany). Three of the gradiometers (two non-functional and one with high channel noise after visual inspection) were excluded from the analysis resulting in a total of 272 MEG channels for analysis. Synthetic 3rd order gradient balancing was applied to eliminate background noise in real-time data collection. Temporal alignment of behavioral and MEG data was achieved using a TTL trigger. Head position in the scanner coordinate space was monitored using head localization coils at the nasion, left, and right pre-auricular locations. These fiducial positions were co-registered in the participants’ T1-MRI coordinate space using a stereotactic neuronavigation system (BrainSight, Rogue Research Inc.). MEG data was acquired starting 6 min before the task (resting-state baseline) and continued through the end of the 12 min training session.

MEG Data Analysis

Preprocessing

MEG data were preprocessed using the FieldTrip59 and EEGLAB60 toolboxes on MATLAB 2022a. Continuous raw MEG data (acquired at 600 Hz) were band pass filtered between 1-100 Hz with a 4th order noncausal Butterworth filter. 60 Hz line noise was removed with a narrow-band discrete Fourier transform (DFT) notch filter. Independent component analysis (ICA) was used to remove typical MEG signal artifacts related to eye blinks or movement or cardiac pulsation. All recordings were visually inspected and marked to denoise segments containing other large amplitude artifacts due to movements.

Source Reconstruction and Parcellation

For each participant, individual volume conduction models were established to depict the propagation of brain-generated currents through tissue to externally measurable magnetic fields. This was accomplished through a single-shell head corrected-sphere approach based on the brain volume segmentation of their high-resolution T1 MRI. Source models and surface labels from the Desikan-Killiany Atlas14 were created for each participant using inner-skull and pial layer surfaces obtained through FreeSurfer segmentation and connectome-workbench. Aligning sensor positions in the MEG helmet to individual head space involved warping MEG head coil positions (mean of pre and post recording) to the fiducials of the MRI, applying the same transformation matrix to all MEG sensors.

The individual source, volume conduction model, and sensor positions were then utilized to generate the forward solution at each source dipole location, describing the propagation of source activity from each cortical location on the grid to each MEG sensor. The Linearly Constrained Minimum-Variance (LCMV) beamformer was employed for computing the inverse solution. Each trial of MEG activity contributed to calculating the inverse solution data covariance matrix. The individual sample noise covariance matrix was derived from 6 minutes of pre-training rest MEG data recorded in the same subject during the same session. A total of 15,684 surface-based cortical dipoles (source-space voxels) were estimated. Cortical parcellation involved averaging the time series of all source dipoles within a distinct anatomical boundary defined in the Desikan-Killiany Atlas. To prevent within-parcel source cancellation during averaging, a technique called mean-flipping was applied. This process entailed flipping the sign for all sources with a sign differing from that of the average source.

Feature Selection for Decoding

Features were extracted from the MEG activity at several spatial, oscillatory, and temporal scales.

Oscillatory Analysis

MEG signals were band limited to broadband (1-100 Hz) and to classic neural oscillatory frequencies defined as delta (1 – 3 Hz), theta (4 – 7 Hz), alpha (8 – 15 Hz), beta (16 – 24 Hz), gamma (25 – 50 Hz), and high-gamma (51 – 100 Hz) with a 4th order non-causal Butterworth filter. The decoding analysis was conducted independently for each band of MEG activity.

Spatial Analysis

Decoding was performed at sensor space as well as at source space. For sensor space decoding the feature dimension was 272 (corresponding to the 272 gradiometer channels). At the source space, decoding was executed at both the high-resolution voxel and low-resolution parcel space. The feature dimension for voxel-space decoding was 15,684, representing the total number of independently estimated cortical dipoles. For parcel-space decoding, the feature dimension was 148, derived from spatially averaging the voxel activities within each parcel defined by the Desikan-Killiany Atlas. In both cases, decoding was carried out across all oscillatory frequency bands (i.e. - broadband, delta, theta, alpha, beta, gamma, and high-gamma) for comprehensive comparison.

Temporal Analysis

Temporal MEG activity corresponding to each keypress was defined by taking a time window of [ t + Δt], where t ∈ [0 : 10 ms : 100 ms] and Δt ∈ [25 ms : 25 ms : 350 ms]. In other words, a sliding window of variable size (from 25 ms to 350 ms with 25 ms increments) sliding from onset until the start of the window reached 100 ms post onset (with increments of 10s) was used. This approach generated 140 different temporal windows for each keypress for each participant. Average MEG activity from each of these time widows were analyzed independently for decoding. The best time window was selected for each subject that resulted in best cross-validation performance. This temporally sliding and varying size window analysis was performed at each of the spatial and oscillatory scales.

Hybrid Spatial Approach

First, we evaluated the decoding performance of each brain region in decoding finger identity. We trained decoders on the voxel activities of each region (intra-parcel decoders) to predict finger identity. Brain regions were then ranked from 1 to 148 based on their decoding accuracy at the group level. In a stepwise manner, we incrementally incorporated voxel activities of brain regions, starting from the top-ranked region. In other words, first, we concatenated the voxel-level features of the highest-ranked region with whole-brain parcel-level features for decoding analysis. Subsequently, we added the voxel-level features of the second-ranked brain region, along with those of the top-ranked region to the parcel-level features for decoding and continued this process. This iterative addition of brain regions was performed until decoding accuracy reached saturation. The optimal feature space comprised all 148 parcel-space features in conjunction with voxel-space features from the top 8 brain regions.

Dimension Reduction

We used several supervised and unsupervised dimension reduction techniques including linear discriminant analysis (LDA), minimum redundant maximum relevance (MRMR), principal component analysis (PCA), Autoencoder, Diffusion maps, factor analysis, large margin nearest neighbor (LMNN), multi-dimensional scaling (MDS), neighbor component analysis (NCA), spatial predictor envelope (SPE)15 on each of the spatial feature space (sensor, parcel, voxel, and hybrid). Among these techniques, PCA, MDS, MRMR, and LDA emerged as particularly effective in significantly improving decoding performance.

PCA, a method for unsupervised dimensionality reduction, transformed the high-dimensional dataset into a new coordinate system of uncorrelated principal components. These components, capturing the maximum variance in the data, were iteratively added to reconstruct the feature space and execution of decoding. MDS finds a configuration of points in a lower-dimensional space such that the distances between these points reflect the dissimilarities or similarities between the corresponding objects in the original high-dimensional space. MRMR, an approach combining relevance and redundancy metrics, ranked features based on their significance to the target variable and their non-redundancy with other features. The decoding process involved starting with the highest-ranked feature and iteratively incorporating subsequent features until decoding accuracy reached saturation. LDA finds the linear combinations of features (dimensions) that best separate different classes in a dataset. It projects the original features onto a lower-dimensional space (number of classes −1) while preserving the class-discriminatory information. This transformation maximizes the ratio of the between-class variance to the within-class variance. In our study, LDA transformed the features to a 3-dimensional hyperdimensional space that were used for decoding. Dimension reduction was applied to train data and then with the tuned parameters of the dimension reduction model test data was transformed for decoder metrics evaluation. Decoding accuracies were systematically compared between the original and reduced dimension feature spaces, providing insights into the effectiveness of each dimension reduction technique. By rigorously assessing the impact of dimension reduction on decoding accuracy, the study aimed to identify techniques that not only reduced the computational burden associated with high-dimensional data but also enhanced the discriminative power of the selected features. This comprehensive approach aimed at optimizing the neural representation of each finger keypress for decoding performance across various spatial contexts.

Decoding Analysis

Decoding analysis was conducted for each participant individually, employing a randomized split of the data into training (90%) and test (10%) samples for 8 iterations. For each iteration, to optimize decoder configuration, an 8-fold cross-validation was applied to the training samples, allowing for the fine-tuning of hyperparameters and selection of the most effective model. Across participants, on average, the total number of individual keypresses for the whole duration of training was 219 ± SD: 66 (keypress 1: little), 205 ± SD: 66 (keypress 2: ring), 209 ± 66 (keypress 3: middle), and 426 ± SD: 131 (keypress 4: index). Only keypresses belonging to correctly typed sequence iterations (94.64% ± 4.04% of all keypresses) were considered. The total number of keypresses for keypress 4 was approximately twice that of keypresses 1, 2, and 3, as it was the only action that occurred more than once in the trained sequence (4-1-3-2-4), albeit in two different sequence contexts or ordinal positions. Considering the higher (2x) number of samples for one-class, we independently oversampled the keypresses 1, 2 and 3 to avoid overfitting to the over-represented class. Importantly, oversampling was applied independently for each keypress class, ensuring that validation folds were never oversampled, and training folds did not share common oversampled patterns. The decoder configuration demonstrating the best validation performance was selected for each iteration, and subsequently, test accuracy was evaluated on the independent/unseen test samples. This process was repeated for the 8 different iterations of train-test splitting and the average test accuracy was reported. This rigorous methodology aimed at generalizing decoding performance to ensure robust and reliable results across participants. Further, decoding evaluation was performed on the Day 2 data, for both the trained (Day 2 Retest; 9 trials) and untrained sequences (Day 2 Control; 9 different single-trial tests).

Machine Learning Classifiers

We employed a diverse set of machine learning decoders, including Naïve Bayes (NB), decision trees (DT), ensembles (EN), k-nearest neighbor (KNN), linear discriminant analysis (LDA), support vector machines (SVM), and artificial neural network (ANN), to train features generated with all possible combinations of spatial, temporal, and oscillatory scales for comprehensive comparative analysis. The hyperparameters of these decoders underwent fine-tuning using Bayesian optimization search.

The Naive Bayes classifier was configured with a normal distribution predictor and Gaussian Kernel, while the KNN classifier had a K value of 4 (for keypress decoding) and utilized the Euclidean distance metric. For decision trees, the maximum number of splits was set to 4 (for keypress decoding), with leaves being merged based on the sum of risk values greater or equal to the risk associated with the parent node. The optimal sequence of pruned trees was estimated, and the predictor selection method was ‘Standard CART,’ selecting the split predictor that maximizes the split-criterion gain over all possible splits of all predictors. The split criterion used was ‘gdi’ (Gini’s diversity index). Ensembles employed the bagging method with random predictor selections at each split, forming a random forest. The maximum number of learning cycles was set to 100 with a weak learner based on discriminant analysis. For SVM, the RBF kernel was selected through cross-validation (CV), and the ‘C’ parameter and kernel scale were optimized using Bayesian optimization search. In the case of LDA, the linear coefficient threshold and the amount of regularization were computed based on Bayesian optimization search. The ANN decoder consisted of one hidden layer with 128 nodes, followed by a sigmoid and a softmax layer, each with 4 nodes (for keypress decoding). Training utilized a scaled conjugate gradient optimizer with backpropagation, employing a learning rate of 0.01 (coarse to fine tuning) for a maximum of 100 epochs, with early stopping validation patience set to 6 epochs.

Additionally, we assessed long short-term memory recurrent neural networks (LSTM-RNN) and bidirectional LSTM-RNN to train the time series of each keypress for decoding. The configuration of these RNNs included two hidden (Bi)LSTM layers of 100 units, followed by three fully connected layers (50, 10, and 4), and a softmax layer with 4 nodes to decode the four keypress classes. The networks were trained using an Adam optimizer with an initial learning rate of 0.001, a learning rate drop factor of 0.1 after 20 epochs, and a maximum epoch of 400. For regularization, 20% dropouts after the (Bi)LSTM layers, 50% dropout after the fully connected layers, and L2 Norm with a gradient threshold value of 0.1 were employed. These hyperparameter values were chosen based on grid search optimization during cross-validation.

Decoding Performance Metrics

Decoding performance was assessed using several metrics, including accuracy (%), which indicates the proportion of correct classifications among all test samples. The confusion matrix provided a detailed comparison of the number of correctly predicted samples for each class against the ground truth. The F1 score for each keypress was utilized as a comprehensive metric, combining precision and recall scores. This score represents the predictive skill of the decoders by offering insights into their class-wise performance. Additionally, the weighted mean F1 score was computed to convey the overall predictive skill of the models across all classes. This metric is derived from a weighted average of the F1 scores for each class, providing a comprehensive evaluation of the models’ performance. Test accuracies were used for statistical comparisons.

Decoding During Skill Learning Progression

We systematically assessed decoding performance of the 2-class decoder (IndexOP1 vs IndexOP5) at each trial during the skill learning process to identify the relationship between the evolution of decoding proficiency and the acquired skill. Our approach involved evaluating decoder performance individually for each Day 1 Training trial. To mitigate the influence of varying sample sizes in later trials, we ensured an equal number of samples (first k keypresses) in each trial.

We used t-distributed stochastic neighborhood estimation (t-SNE) to visualize the evolution of neural representations corresponding to each keypress at each trial of the learning period. Further, within t-SNE distribution, we separately labeled the keypress 4 based on their context (i.e., keypress 4 at ordinal position 1: IndexOP1 and keypress 4 at ordinal position 5: IndexOP5). Brain regions important to decoding were determined for each trial based on MRMR based feature ranking and highlighted in topography plots.

Decoding Sequence Elements

Finally, we performed decoding of each contextual action of the sequence (i.e., IndexOP1, Little, Middle, Ring, and IndexOP5). We used the same decoding strategy (90%-10% split for train and test, 8-fold cross validation of training samples to select best decoder configuration, hybrid spatial features, and LDA based dimension reduction) to decode these 5-classes. Note, oversampling was not needed after sub-grouping the index finger keypresses into two separate classes based on their context. Decoding sequence elements were evaluated for both Day 1 Training, Day 2 Retest and Day 2 Control data.

Neural Representation analysis

We evaluated the online (within-trial) and offline (between-trial) changes in the neural representation of the contextual actions (IndexOP1 and IndexOP5) for each trial during training. For offline differentiation, we evaluated the Euclidian distance between the hybrid spatial features of the last index finger keypress of a trial (IndexOP5) to the first index finger keypress (IndexOP1) of the subsequent trial, mirroring the approach used to calculate micro-offline gains in skill. This offline distance provided insights on the net change in contextual representation of the index finger keypress during each interspersed rest interval. For online differentiation, we calculated the mean Euclidian distance between IndexOP1 and IndexOP5 of all the correctly typed sequences within a practice trial. Online differentiation informed on the net change in the contextual representation of the index finger keypress within each practice trial. Cumulative offline and online representation distances across participants were compared using 2-sample t-tests. Additionally, we computed trial-by-trial differences in offline and online representations during early learning, exploring their relationships with cumulative micro-offline and micro-online gains in skill, respectively, through regression analysis and Pearson correlation analysis. Linear regression models were trained utilizing the fitlm function in MATLAB. The model employed M-estimation, formulating estimating equations and solving them through the Iteratively Reweighted Least Squares (IRLS) method61. Key metrics such as the square root of the mean squared error (RMSE), which estimates the standard deviation of the prediction error distribution, the coefficient of explained variance (R2), the F-statistic as a test statistic for the F-test on the regression model, examining whether the model significantly outperforms a degenerate model consisting only of a constant term, and the p-value for the F-test on the model were computed and compared across different models. This multifaceted approach aimed to uncover the nuanced dynamics of neural representation changes in response to skill acquisition.

As a control analysis, we also computed the difference in neural representation between IndexOP1 and IndexOP5 on Day 2 Retest data for the same sequence (4-1-3-2-4) as well as for different Day 2 Control untrained sequences where the same action was performed at ordinal positions 1 and 5 (2-1-3-4-2; 1-4-2-3-1; 2-3-1-4-2; 4-2-3-1-4). Finally, we assessed for specificity of contextualization to the trained sequence, by evaluating differentiation between index finger keypress representations performed at two different positions within untrained sequences (4-2-4-3-1 and 1-4-3-4-2). The cumulative differences across participants were compared with 2-sample t-tests.

Data Availability

All de-identified and permanently unlinked from all personal identifiable information (PII) data are publicly available. (Box Repository).

Acknowledgements

We thank Ms. Tasneem Malik, Ms. Michele Richman, and NIMH MEG Core Facility staff for their support. This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov). This research was supported by the Intramural Research Program of the NINDS, NIH.

Supplementary Materials

Behavioral performance during skill learning

A. Total Skill Learning over Day 1 Training (36 trials) and Day 2 Retest (9 trials). As reported previously1, participants on average reached 95% of peak performance during Day 1 Training by trial 11. Note that after trial 11 performance stabilizes around a plateau through trial 36. Following a 24-hour break, participants displayed an upward shift in performance during the Day 2 Retest – indicative of an overnight skill consolidation effect. B. Keypress transition time (KTT) variability. Distribution of KTTs normalized to the median correct sequence time for each participant and centered on the mid-point for each full sequence iteration during early learning. Note the initial variability of the five component transitions in the sequence (i.e. – 4-1, 1-3, 3-2, 2-4, 4-4), stabilize by trial 6 in the early learning period and remain stable throughout the rest of Day 1 Training (through trial 36) and Day 2 Retest.

Oscillatory contributions at individual brain regions

Decoding performance of each individual brain region (intra-parcel decoder performance) at each oscillatory level is shown for both left and right hemisphere as a heatmap. Optimal decoding performances were obtained from bilateral superior frontal (Left: 68.77% ± SD 7.6%; Right: 67.52% % ± SD 6.78%), middle frontal (Left: 63.41% ± SD 7.58%; Right: 62.78% % ± SD 76.94%), pre-central (Left: 62.37% % ± SD 6.32%; Right: 62.69% ± SD 5.94%), and post-central (Left: 61.71% ± SD 6.62%; Right: 61.09% ± SD 6.2%) brain regions. Superior parietal, central, paracentral, anterior-cingulate, and precuneus regions also showed greater (> 60%) decoding performance. For Delta band, only superior frontal regions showed > 60% decoding performance.

Contribution of whole-brain oscillatory frequencies to decoding

Accuracy was highest when decoders were trained on broadband activity, closely followed by delta band activity, across whole-brain -parcel, -voxel, and -hybrid space. Sensor-space decoder accuracy did not differ statistically when trained on either broadband or delta-band activity. Hybrid approach resulted in best decoding accuracy for each frequency range. Interestingly, accuracy with sensor space decoders was comparable to accuracy with parcel and voxel space decoders. Dots depict decoding accuracy for each participant. “***” indicates p < 0.001, “**” indicates p < 0.01, and “n.s.” denotes no statistical significance (i.e. - p > 0.05).

Comparison of different dimensionality reduction techniques

Dimensionality reduction was applied to the input features for each approach [parcel (N=148)/voxel(N=15684)/hybrid(N=1295)] [16]. The results with principal component analysis (PCA, in green), multi-dimensional scaling (MDS, in blue), minimum redundant maximum relevance algorithm (MRMR, in red), linear discriminant analysis (LDA, in black) are shown in comparison to performance obtained using all input features (in magenta). For parcel space, all these approaches increased the mean decoding accuracy with PCA and LDA showing statistically significant improvement [1-way ANOVA: F= 13.05, p < 0.001; post hoc Tukey tests: p =0.032 (PCA), p < 0.001 (LDA), p > 0.05 (MDS, MRMR)]. At the voxel space, there was no statistically significant improvement with either of the approaches (p > 0.05). MRMR showed the highest improvement but not statistically significant with post hoc Tukey tests (p = 0.14). With LDA the performance dropped significantly. For hybrid space, all the dimensionality reduction techniques were significant in improving decoding performance [1-way ANOVA: F= 21.32, post hoc Tukey tests: p < 0.05] and the best improvement was seen with LDA.

Confusion matrices for decoding performance on Day 2 Retest (A) and Day 2 Control (B) data

Note that, the hybrid decoding strategy generalized to Day 2 data with 87.11% keypress decoding accuracy for the trained sequence (Day 2 Retest) and 79.44% accuracy for decoding keypresses embedded within untrained control sequences (Day 2 Control).

Quantification of regional trial-by-trial feature importance score during skill learning

The quantification of regional importance in decoding for each trial is shown for the regions that showed highest decoding accuracy, i.e., superior frontal, middle frontal, pre-central, and post-central cortex. Please note that, the feature importance score was higher for pre-central cortex which shifted to middle frontal cortex during later trials, as can be seen with the divergence of line plots about trial 11.

Average decoding accuracies across participants with varying temporal scales. X-axis represents the onset of window for decoding analysis with respect to keypress onset (0). Y-axis represents the window size. The heatmap color denotes the decoding accuracy for all window size/location pairings. Note that, the best decoding accuracy across subjects is obtained by taking a window starting from 0 (i.e., onset of keypress) with a window size of 200ms.

Relationship between offline contextualization of neural representations and micro-offline learning

Cumulative micro-offline gains, i.e., the net gain in skill during inter-practice rest periods increased over time during early learning. Offline representation difference, i.e., the net change in the neural representations of keypress 4 over the inter-practice rest intervals between different ordinal positions in the sequence (context) increased over time during early learning. A linear regression analysis showed a strong temporal relationship (correlation coefficient (r) = 0.9034 and coefficient of variance explained (R2) = 0.82) between amount of contextualization during rest and cumulative micro-offline gains during rest, signifying the effect of contextualization in sequential skill learning.

Online versus offline changes in keypress transition patterns. A. Trial-by-trial Euclidian distance between keypress transition patterns (i.e. – relative share of each keypress transition across the full sequence duration) for the first and last sequence iteration within a single trial (online; green) and last sequence iteration of the current trial versus the first sequence iteration of the subsequent trial (offline; magenta). B. Cumulative online (green; left) and offline (magenta; right) pattern distances recorded over all forty-five trials covering Days 1 and 2. Note that cumulative online and offline distances are not significantly different (t=-0.03, p = 0.976).

Relationship between contextualization and absolute speed

A. Relationship between maximum speed and corresponding contextualization: The maximum typing speed and the degree of contextualization was related using a linear regression analysis that showed no significant relationship between maximum typing speed and degree of contextualization (R2 = 0.028, p = 0.41). Here, each dot represents the maximum speed attained and the corresponding degree of contextualization of each participant. Thus, people with higher typing speed did not show higher degree of contextualization. B. Relationship between typing speed and degree of contextualization at each trial. We performed a regression analysis for each trial to relate the degree of contextualization at each trial and the typing speed at that trial. The violin plot represents the distribution of R2 values obtained with regression analysis and here each dot represents a trial. Note that, there was no significant relationship at any trial (mean R2 = 0.06; p > 0.05).