Abstract
Activities of daily living rely on our ability to acquire new motor skills composed of precise action sequences. Early learning of a new sequential skill is characterized by steep performance improvements that develop predominantly during rest intervals interspersed with practice, a form of rapid consolidation. Here, we ask if the millisecond level neural representation of an action performed at different locations within a skill sequence contextually differentiates or remains stable as learning evolves. Optimization of machine learning decoders to classify sequence-embedded finger movements from MEG activity reached approximately 94% accuracy. The representation manifolds of the same action performed in different sequence contexts progressively differentiated during rest periods of early learning, predicting skill gains. We conclude that sequence action representations contextually differentiate during early skill learning, an issue relevant to brain-computer interface applications in neurorehabilitation.
Introduction
Motor learning is required to perform a wide array of basic daily living activities, intricate athletic endeavors, and professional skills. Whether it’s learning to type more quickly on a keyboard1, improve one’s tennis game2, or play a piece of music on the piano3 – all of these skills require the ability to execute sequences of actions with precise temporal coordination. Action sequences thus form the building blocks of fine motor skills4. Initial exposure to acquisition of a new skill results in rapid performance improvements during the initial practice session (i.e. – early learning), which is up to four times larger in magnitude compared to offline performance improvements reported following overnight sleep1,5.
The neural representation of a sequential skill binds discrete individual actions (e.g. - single piano keypress) into complex, temporally and spatially precise sequence representations (e.g. - a refrain from a piece of music)6,7. The neural representation or manifold8 of sequential skills has been characterized in humans3,9,10. Previous fMRI reports showed that the representation of individual motor sequence actions may remain relatively stable over days or weeks of practice, after a memory is formed11. However, it is not known if individual sequence action representations differentiate or remain stable during early skill learning, when most prominent performance improvements occur, and the memory is not yet fully formed1. Furthermore, it is not known if performance of the same action executed at different contextual locations within a skill sequence can change as a function of learning progression, an issue crucial to the robustness of BCI applications.
Investigating the contextualization of discrete action representations at a millisecond level is challenging since both the individual actions and the skill sequence they are embedded within are concurrently represented in changing neural activity dynamics during learning10,12. To address this problem, we constructed a series of decoders aimed at predicting keypress actions from magnetoencephalographic (MEG) neural activity, dependent upon the learning state and the local sequence context (ordinal position) the keypress action is performed within. Implementing this novel approach allowed us to determine that individual sequence actions are indeed contextualized during early skill learning resulting in differentiation of their neural representations, and that the degree of this differentiation predicts skill gains. This representational contextualization was also observed to predominantly develop during rest rather than during practice intervals in parallel with rapid consolidation of skill.
Results
Participants engaged in a well characterized sequential skill learning task1,5,13 that involved repetitive typing of a sequence (4-1-3-2-4) performed with their (non-dominant) left hand over 36 trials with alternating periods of 10s practice and 10s rest (inter-practice rest; Day 1 Training; Figure 1A). Individual keypress times and finger keypress identities were recorded and used to quantify skill as the correct sequence speed (keypresses/s)1.
Participants reached 95% of maximal skill (i.e. - Early Learning) within the initial 11 practice trials (Figure 1B), with improvements developing over inter-practice rest periods (micro-offline gains) accounting for almost all of total learning across participants (Figure 1B inset)1. In addition to the reduction in sequence duration during early learning, the relative duration of component keypress transitions displayed increased temporal regularity across sequence iterations (Figure 1C). On the following day, participants were retested on performance of the same sequence (4-1-3-2-4) over 9 trials (Day 2 Retest), as well as single-trial performance on 9 different untrained sequences (Day 2 Controls: 2-1-3-4-2, 4-2-4-3-1, 3-4-2-3-1, 1-4-3-4-2, 3-2-4-3-1, 1-4-2-3-1, 3-2-4-2-1, 3-2-1-4-2, and 4-2-3-1-4). As expected, an upward shift in performance of the trained sequence (0.68 ± SD 0.56 keypresses/s; t = 7.21, p < 0.001) was observed during Day 2 Retest, indicative of an overnight skill consolidation effect for the trained sequence only (Figure 1—figure Supplement 1C).
Keypress actions are represented in multi-scale hybrid-space manifolds
To investigate the possibility of contextualization of action representations, we constructed a set of decoders to predict keypress actions from MEG activity as a function of both the learning state and the ordinal position of the keypress within the sequence. We first characterized the spectral and spatial features of keypress state representations by comparing performance of decoders constructed around broadband (1-100Hz) or narrowband [delta- (1-3Hz), theta- (4-7Hz), alpha- (8-14 Hz), beta- (15-24 Hz), gamma- (25-50Hz) and high gamma-band (51-100Hz)] MEG oscillatory activity. We found that decoders trained on broadband data consistently outperformed those trained on narrowband activity, and that whole-brain parcel- (148 brain regions or parcels) or voxel-space (15684 voxels)14 decoders exhibited greater accuracy (parcel: t = 1.89, p = 0.035; voxel: t = 7.18, p < 0.001) than individual voxel-space intra-parcel decoders (Figure 2). Specifically, while the highest performing intra-parcel decoders predicted keypresses with up to 68.77% (± SD 7.6%) accuracy, whole-brain inter-parcel decoder accuracy reached 74.51% (± SD 7.34%) (Figure 2, also see Figure 2 – figure supplement 1).
Next, we asked if the combination of intra- and inter-parcel activity patterns, constituting a multi-scale hybrid space, improve further keypress predictions. We constructed hybrid space decoders (N = 1295 ± 20) combining all inter-parcel activity with top-ranked intra-parcel features (covering 8 regions, see Methods, Figure 3A). Decoder accuracy was higher for hybrid (78.15% ± SD 7.03% accuracy and a weighted mean F1 score of 0.78 ± SD 0.07) than for voxel- (74.51% ± SD 7.34%; paired t-test: t = 6.30, p < 0.001) and parcel- (70.11% ± SD 7.48%; paired t-test = 12.08, p < 0.001) spaces (Figure 3B, Figure 3 – figure supplement 1). Thus, a multi-scale hybrid-space representation best characterizes keypress action manifolds.
We implemented several dimensionality reduction or manifold learning strategies including principal component analysis (PCA), multi-dimensional scaling (MDS), minimum redundant maximum relevance (MRMR), and linear discriminant analysis (LDA)15 to map the input feature (parcel, voxel or hybrid) space to a low-dimensional latent space or manifold8. LDA-based manifold representation performed the best improving the keypress decoding accuracy to 90.47% ± SD 4.84% (Figure 3B; weighted mean F1 score = 0.91 ± SD 0.05). Consistently, LDA dimensionality reduction improved decoder performance at parcel (82.95% ± SD 5.48%) but reduced it at voxel space (40.38 % ± SD 6.78%; also see Figure 3 – figure supplement 2). Notably, decoding associated with index finger keypresses (executed at two different ordinal positions in the sequence) exhibited the highest number of misclassifications of all digits (N = 141 or 47.5% of all decoding errors; Figure 3C), raising the hypothesis that the same action could be differentially represented when executed at different learning state or sequence context locations.
We assessed the robustness of this hybrid strategy for decoding actions during skill learning over multiple sessions by applying it to data collected on the following day during the Day 2 Retest (9-trial retest of the same trained sequence) and Day 2 Control (single-trial performance of 9 different untrained sequences) blocks. The decoding accuracy for Day 2 MEG data remained high (87.11% ± SD 8.54% for the trained sequence during Retest, and 79.44% ± SD 5.54% for the untrained Control sequences; see confusion matrices in Figure 3 – figure supplement 3). This indicates that the hybrid decoding strategy is particularly reliable for decoding keypress actions during skill learning and the neural representations of the learned sequence remain stable the following day. Although the keypress state decoding performance was lower for untrained Control sequences relative to the trained sequence, greater accuracy of the hybrid approach over purely parcel- or voxel-space input features was still observed. Thus, this hybrid approach allows robust sequential finger movement decoding across multiple days and sequences.
Inclusion of keypress sequence context location optimized decoding performance
Next, we tracked the trial-by-trial evolution of keypress action manifolds as training progressed. Keypress neural representations were observed to progressively differentiate during early learning (Figure 4). A representative example in Figure 4A (top row) depicts progressive representational clustering of four-digit representations from trials 1, 11, and 36. The spatial representation of these clusters changed over the course of training, from predominant involvement of pre-central areas in trial 1 to post-central, superior, and middle frontal cortex contributions in later trials 11 and 36 (Figure 4A, bottom row). Thus, a shift in activity from pre-central areas to post-central, superior, and middle frontal cortex paralleled improvements in decoding performance (also see Fig. Figure 4 – figure supplement 1 for trial-by-trial quantitative feature importance score changes during skill learning).
Correct performance of the training sequence required pressing the index finger twice (4-1-3-2-4) at two contextually different ordinal positions (sequence positions 1 and 5). Inclusion of sequence location information (i.e. – sequence context) for each keypress action (five sequence elements with the one keypress represented twice at two different locations) improved decoding accuracy (t = 7.09, p < 0.001, Figure 4B) from 90.47% (± SD 3.44%) to 94.15% (± SD 4.84%; weighted mean F1 score: 0.94), and reduced overall misclassifications by 54.3% (from 219 to 119; Figure 3C and 4B). This is supported by greater differentiation in neural representations of the two different index finger keypresses embedded at different points within the sequence (Figure 4A), which leads to a trial-by-trial increase in 2-class decoding accuracy (Figure 4C). Inclusion of contextual information to the hybrid-space decoder also increased classification accuracy of Day 2 Retest trained sequence data (from 87.11% for 4-class to 90.22% for 5-class). As expected, contextualized 5-class decoding of Day 2 Control MEG data for untrained sequences performed at or below chance levels (≤ 30.22% ± SD 0.44%) due to varied ordinal positions of keypress actions across the different sequences. Thus, inclusion of contextual information in the decoding framework only improves decoding accuracy for skill sequences which have been learned through practice.
Neural representation of keypress sequence location diverged during early skill learning
Finally, we used a Euclidian distance measure to evaluate the differentiation of the neural representation manifold of the same action (i.e. - an index-finger keypress) executed within different local sequence contexts (i.e. - ordinal position 1 vs. ordinal position 5). Note that, the neural representation manifolds (i.e. - the reduced dimension hybrid-space features extracted from broadband MEG data) are the features that resulted in the best decoding performance (See Methods, Figure 4 – figure supplement 2).
The Euclidian distance between keypress sequence location manifolds of the index finger in its two ordinal positions increased progressively during both rest and practice periods of early learning before stabilizing (Figure 5A). Change in this Euclidian distance was more prominent during rest than during practice periods (t = 4.84, p < 0.001; Figure 5B), similar to early learning gains (Figure 1B) and strongly predicted cumulative micro-offline gains (r = 0.90, R2 = 0.82, p < 0.001; Figure 5 - figure supplement 1). Importantly, the differentiation in neural representations was not explained by differences in keypress transition pattern regularity (within-subject; t = −0.03, p = 0.976; Figure 5 – figure supplement 2) or typing speed (between-subject; R2 = 0.028, p = 0.41; Figure 5 – figure supplement 3). These neural representation manifold differences remained consistent on the next day test session for the trained sequence, significantly higher than those for the untrained sequences (Figure 5C). The effect of contextualization also remained stable on Day 2 Retest – at the same time that an upward shift in performance speed and slight decrease in 4-class decoding accuracy was observed – and was substantially greater than contextualization observed for groups of untrained sequences (Figure 5C) indicating specificity for sequence pattern, keypress finger and ordinal position.
Discussion
The main findings of this study were that individual sequence action representations differentiate during early skill learning in a manner reflecting the local sequence context in which they are performed, and that the degree of representational differentiation, particularly prominent during rest intervals, predicts skill gains.
Optimizing decoding of sequential finger movements from MEG activity
The initial phase of this study involved optimizing the accuracy of decoding individuated finger keypresses from MEG brain activity. The decoding of individual finger movement execution or intention is a fundamental part of many translational BCI applications16–20. For example, decoding of virtual keypresses can be interfaced with applications for controlling motorized wheelchairs, writing and sending emails, or even controlling robotic hands or exoskeletons21, and are currently being evaluated in patients with spinal cord and brain lesions22,23. Once these neural signals are characterized, they can be translated into meaningful commands or actions21,24. However, achieving robustness in clinical applications necessitates enhancements in state-of-the-art decoding tools, with a particular focus on improving decoding accuracy18,20,25.
In the present study, we developed a novel hybrid-space decoder that captured the evolving neural representational dynamics over multiple spatial resolutions. This approach combined (1) parcel-space estimates from whole-brain activity and (2) voxel-space estimates from brain regions containing most keypress-related brain activity (see Methods). Using this approach, we achieved keypress state decoding accuracy that exceeded 90%, a substantial improvement over accuracies reported before with either non-invasive neuroimaging (i.e. – 43% to 77% for fMRI, MEG, and EEG)13,18,26–28 or invasive neural recording decoding of imagined movements (i.e. – 77% to 86% for ECOG and micro-array implant)29,30 techniques. Given these findings, the improvement in the keypress decoding accuracy from Figure 2 (parcel/voxel) to Figure 3 (parcel/voxel/hybrid with dimension reduction) is likely explained by the capability of the hybrid-space architecture to capture both lower spatially resolved whole-brain and higher spatially resolved regional activity patterns that contribute unique skill-related information to the sequence-embedded keypress state representations. Importantly, this approach allowed accurate decoding during skill acquisition before a performance plateau was reached. It is possible that the ability to decode finger movements in the midst of substantial trial-by-trial performance changes as shown here could improve robustness of BCI applications in neurorehabilitation17,31. Accurate estimation of finger movement representations as they evolve during skill learning should also improve detection of neural replay events during wakeful rest or sleep, and enhance investigations of the role replay has in supporting skill consolidation13.
Best decoding performance was obtained from activity in the bilateral superior and middle frontal cortex and the pre- and post-central regions, all known contributors to skill learning12,32. Low-frequency oscillations (LFOs) contributed the most to decoding accuracy, consistent with previous reports8,33. LFOs, present during movement onset in the cerebral cortex of animals34,35 and humans36–38, encode information about movement trajectories and velocity34,35. They also contain information related to movement timing39–41, preparation42,43, sensorimotor integration37, kinematics42,43 and may contribute to the precise temporal coordination of movements required for sequencing44. Within clinical contexts, LFOs in the frontoparietal regions that were the main contributors to decoding performance in the present study are related to recovery of motor function after brain lesions like stroke36,39,45,46. Our results, in conjunction with those discussed above, suggest that LFOs contain crucial information relevant to decoding of sequence-embedded skill actions.
Neural representations of individual sequence actions are contextualized during early skill learning
Previous work decoded actions from human neural activity when performed in isolation (e.g. individuated finger movement decoding)8,26,27,29,30,47–50 or embedded within previously learned sequences28,51 near stable upper performance limits. To our knowledge, decoding of individual sequence-embedded actions has not been either attempted or reported within the context of rapid performance improvements that characterize early skill learning. We addressed this gap in knowledge by investigating different decoding strategies and identified the approach that rendered the best decoding performance, which in this case was approximately 94% (5-class hybrid-space decoder). We then applied this strategy to investigate neural representations of sequential individual finger movements during the skill learning task. We found that improved 5-class decoding accuracy was linked to trial-by-trial increases in the differentiation between index finger keypresses embedded within the sequence at two different locations throughout early learning (Figure 4C), which occurred in parallel to skill performance improvements. Feature importance scores are initially led by the pre-central cortex during early learning followed by an intermediate shift towards post-central cortex by trial 11, which is consistent with the known role of the sensorimotor cortex in early skill acquisition6. Concurrently, the superior frontal and then middle frontal cortex continue to increase in importance over the Day 1 Training session and represent the two most important features between trials 15 and 36 after the rapid initial performance gains stabilize. This is consistent with previous findings that activity patterns within these regions represent hierarchical structures of skill sequences10.
Neural representations are modified by experience and practice32. For example, FMRI studies showed that weeks-long practice of a keyboard task results in representational changes of the skill (finger sequence), but not of the individual sequence actions components (individual finger movements) in the primary motor cortex10. An MEG study showed that previously learned individual sequence-embedded action (i.e. – finger movement) representations display an ordered queuing gradient when preparing to generate a previously learned sequence28. Thus, representations of previously learned skills following prolonged practice over days and weeks appear to reflect some combination of invariant features of individual action components with higher order representations of the action sequence.
Less is known about changes in neural representations during rapid performance improvements that characterize early skill learning. MEG, which possesses millisecond temporal resolution, is the tool of choice to investigate this short early learning phase52. In the present study, we leveraged this feature of MEG to investigate if the representation of the same sequence action (index finger movement) differentiates over early learning when embedded at different ordinal locations within the sequence. We found that practice led to a progressive differentiation of the neural representation of index finger movements reflecting the different contextual positions within the sequence. The finding that the magnitude of representational differentiation predicts skill gains supports the view that contextualization of neural representations contributes to early learning. Further support for this view comes from the finding that adding context information to the decoding algorithm increased decoding accuracy to levels higher than previously reported8,26–30,47–51. A possible neural mechanism supporting contextualization could be the emergence of conjunctive “what–where” representations of procedural memories53 with the corresponding modulation of neuronal population dynamics54,55.
Representational contextualization developed predominantly during rest periods of early learning
Representational contextualization developed to a larger extent during rest than during practice intervals. Furthermore, the magnitude of offline contextualization predicted skill gains while online contextualization did not. These findings are in line with previous work showing that most skill gains during early learning develop during rest rather than during practice intervals1,56 and suggest that contextualization of action representations could contribute to rapid consolidation of skill. Importantly, contextualization of the learned sequence was retained the following day indicating the stability of the formed skill memory.
Limitations
One limitation of this study is that contextualization was investigated for only one finger movement (index finger or digit 4) embedded within a relatively short 5-item skill sequence. It would be useful to determine if representational contextualization of multiple finger movements embedded within longer sequences (e.g. – a short piece of piano music) also exhibit similar results. While a supervised manifold learning approach was used here (LDA), unsupervised approaches (PCA and MDS) more suitable for BCI applications, also substantially improved decoding accuracy that can be used for real-time BCI applications (Figure 3 – figure supplement 2)
Summary
In summary, individual sequence action representations contextualize during the initial practice trials of a new skill and the degree of differentiation parallels skill gains. Importantly, the neural representation of context develops to a larger extent during rest than during practice intervals of early learning in parallel with rapid consolidation of skill. It is possible that the systematic inclusion of contextualized information into sequence skill practice environments could improve learning in areas as diverse as music education, sports training, and rehabilitation of motor skills after brain lesions.
Materials and Methods
Study Participants
The study was approved by the Combined Neuroscience Institutional Review Board of the National Institutes of Health (NIH). A total of thirty-three young and healthy adults (comprising 16 females) with a mean age of 26.6 years (± 0.87 SEM) participated in the study after providing written informed consent and undergoing a standard neurological examination. No participants were actively engaged in playing musical instruments in their daily lives, as per guidelines outlined in prior research57,58. All study scientific data were de-identified, and permanently unlinked from all personal identifiable information (PII) before the analysis. These data are publicly available upon request (https://nih.box.com/v/hcpsSkillLearningData). Two participants were excluded from the analysis due to MEG system malfunction during data acquisition. An additional 5 subjects were excluded where no comparable correct sequences were generated for two or more consecutive trials. The sample size was pre-determined through a power analysis designed to characterize skill learning1.
Experimental Setup
Participants practiced a novel procedural motor skill learning task that involved repetitively typing a 5-item numerical sequence (4-1-3-2-4) displayed on a computer screen. They were instructed to perform the task as quickly and accurately as possible using their non-dominant, left hand on a response pad (Cedrus LS-LINE, Cedrus Corp). Each numbered sequence item corresponded to a specific finger keypress: 1 for a little finger keypress, 2 for a ring finger keypress, 3 for a middle finger keypress, and 4 for an index finger keypress. Individual keypress times and identities were recorded and used to assess skill learning and performance.
Participants practiced the skill for 36 trials. Each trial spanned a total of 20 seconds and included a 10s practice round followed by a 10s inter-practice rest period. The five-item sequence was displayed on the computer screen for the duration of each practice round and participants were directed to fix their gaze on the sequence. Small asterisks were displayed above a sequence item after each successive keypress, signaling the participants’ present position within the sequence. Following the completion of a full sequence iteration, the asterisks were removed. The asterisk display did not provide error feedback as it appeared for both correct and incorrect keypresses. At the end of each practice round, the displayed number sequence was replaced by a string of five “X” symbols displayed on the computer screen, which remained for the duration of the inter-practice rest period. Participants were instructed to focus their gaze on the screen during this time.
On the next day, participants were tested (Day 2 Retest) with the same trained sequence (4-1-3-2-4) for 9 trials as well as for 9 different unpracticed control sequences (Day 2 Control; 2-1-3-4-2; 4-2-4-3-1; 3-4-2-3-1; 1-4-3-4-2; 3-2-4-3-1; 1-4-2-3-1; 3-2-4-2-1; 2-3-1-4-2; 4-2-3-1-4) each for one trial. The practice schedule structure for Day 2 was same as Day 1, with 10s practice trials interleaved with 10s of rest.
Behavioral data analysis
Skill
Skill, in the context of the present task, is quantified as the correct sequence typing speed, (ie. - the number of correctly typed sequence keypresses per second; kp/s). That is, improvements in the speed/accuracy trade-off equate to greater skill. Keypress transition times (KTT) were calculated as the difference in time between the KeyDown events recorded for consecutive keypresses. Since the sequence was repeatedly typed within a single trial, individual keypresses were marked as correct if they were members of a 5 consecutive keypress set that matched any possible circular shift of the displayed 5-item sequence. The instantaneous correct sequence speed was calculated as the inverse of the average KTT across a single correct sequence iteration and was updated for each correct keypress. Trial-by-trial skill changes were assessed by computing the median correct sequence typing speed for each trial.
Early Learning
The early learning period was defined as the trial range (1 - T trials) over which 95% of the total skill performance was first attained at the group level. We quantified this by fitting the group average trial-by-trial correct sequence speed data with an exponential model of the form:
Here, the trial number is denoted by t, and L(t) signifies the group-averaged performance at trial t. Parameters C1 and C2 correspond to the pre-training performance baseline and asymptote, respectively, while k denotes the learning rate. The values for C1, C2, and k were computed using a constrained nonlinear least-squares method (MATLAB’s lsqcurvefit function, trust-region-reflective algorithm) and were determined to be 0.5, 0.15, and 0.2, respectively. The early learning trial cut-off, denoted as T, was identified as the first trial where 95% of the learning had been achieved. In this study, T was determined to be trial 11.
Performance improvements during rest (micro-offline gains), for each rest period of a trial, were calculated as the net change in performance (instantaneous correct sequence typing speed) from the end of one practice period to the onset of the next (i.e. – over a single inter-practice rest period). Micro-online gains (performance improvements during practice periods) were computed as the change in performance between the beginning and end of a single practice trial. Total early learning was derived as the sum of all micro-online and micro-offline gains over trials 1-11. Cumulative micro-offline gains, micro-online gains, and total early learning were statistically compared using 1-way ANOVAs and post-hoc Tukey tests.
MRI Acquisition
We acquired T1-weighted high-resolution anatomical MRI volumes images (1 mm3 isotropic MPRAGE sequence) for each participant on a 3T MRI scanner (GE Excite HDxt or Siemens Skyra) equipped with a 32-channel head coil. These data allowed for spatial co-registration of an individual participant’s brain with the MEG sensors, and individual head models required for surface-based cortical dipole estimation from MEG signals (i.e. – MEG source-space modeling).
MEG Acquisition
We recorded continuous magnetoencephalography (MEG) at a sampling frequency of 600 Hz using a CTF 275 MEG system (CTF Systems, Inc., Canada) while participants were seated in an upright position. The MEG system comprises a whole-head array featuring 275 radial 1st-order gradiometer/SQUID channels housed in a magnetically shielded room (Vacuumschmelze, Germany). Three of the gradiometers (two non-functional and one with high channel noise after visual inspection) were excluded from the analysis resulting in a total of 272 MEG channels for analysis. Synthetic 3rd order gradient balancing was applied to eliminate background noise in real-time data collection. Temporal alignment of behavioral and MEG data was achieved using a TTL trigger. Head position in the scanner coordinate space was monitored using head localization coils at the nasion, left, and right pre-auricular locations. These fiducial positions were co-registered in the participants’ T1-MRI coordinate space using a stereotactic neuronavigation system (BrainSight, Rogue Research Inc.). MEG data was acquired starting 6 min before the task (resting-state baseline) and continued through the end of the 12 min training session.
MEG Data Analysis
Preprocessing
MEG data were preprocessed using the FieldTrip59 and EEGLAB60 toolboxes on MATLAB 2022a. Continuous raw MEG data (acquired at 600 Hz) were band pass filtered between 1-100 Hz with a 4th order noncausal Butterworth filter. 60 Hz line noise was removed with a narrow-band discrete Fourier transform (DFT) notch filter. Independent component analysis (ICA) was used to remove typical MEG signal artifacts related to eye blinks or movement or cardiac pulsation. All recordings were visually inspected and marked to denoise segments containing other large amplitude artifacts due to movements.
Source Reconstruction and Parcellation
For each participant, individual volume conduction models were established to depict the propagation of brain-generated currents through tissue to externally measurable magnetic fields. This was accomplished through a single-shell head corrected-sphere approach based on the brain volume segmentation of their high-resolution T1 MRI. Source models and surface labels from the Desikan-Killiany Atlas14 were created for each participant using inner-skull and pial layer surfaces obtained through FreeSurfer segmentation and connectome-workbench. Aligning sensor positions in the MEG helmet to individual head space involved warping MEG head coil positions (mean of pre and post recording) to the fiducials of the MRI, applying the same transformation matrix to all MEG sensors.
The individual source, volume conduction model, and sensor positions were then utilized to generate the forward solution at each source dipole location, describing the propagation of source activity from each cortical location on the grid to each MEG sensor. The Linearly Constrained Minimum-Variance (LCMV) beamformer was employed for computing the inverse solution. Each trial of MEG activity contributed to calculating the inverse solution data covariance matrix. The individual sample noise covariance matrix was derived from 6 minutes of pre-training rest MEG data recorded in the same subject during the same session. A total of 15,684 surface-based cortical dipoles (source-space voxels) were estimated. Cortical parcellation involved averaging the time series of all source dipoles within a distinct anatomical boundary defined in the Desikan-Killiany Atlas. To prevent within-parcel source cancellation during averaging, a technique called mean-flipping was applied. This process entailed flipping the sign for all sources with a sign differing from that of the average source.
Feature Selection for Decoding
Features were extracted from the MEG activity at several spatial, oscillatory, and temporal scales.
Oscillatory Analysis
MEG signals were band limited to broadband (1-100 Hz) and to classic neural oscillatory frequencies defined as delta (1 – 3 Hz), theta (4 – 7 Hz), alpha (8 – 15 Hz), beta (16 – 24 Hz), gamma (25 – 50 Hz), and high-gamma (51 – 100 Hz) with a 4th order non-causal Butterworth filter. The decoding analysis was conducted independently for each band of MEG activity.
Spatial Analysis
Decoding was performed at sensor space as well as at source space. For sensor space decoding the feature dimension was 272 (corresponding to the 272 gradiometer channels). At the source space, decoding was executed at both the high-resolution voxel and low-resolution parcel space. The feature dimension for voxel-space decoding was 15,684, representing the total number of independently estimated cortical dipoles. For parcel-space decoding, the feature dimension was 148, derived from spatially averaging the voxel activities within each parcel defined by the Desikan-Killiany Atlas. In both cases, decoding was carried out across all oscillatory frequency bands (i.e. - broadband, delta, theta, alpha, beta, gamma, and high-gamma) for comprehensive comparison.
Temporal Analysis
Temporal MEG activity corresponding to each keypress was defined by taking a time window of [ t + Δt], where t ∈ [0 : 10 ms : 100 ms] and Δt ∈ [25 ms : 25 ms : 350 ms]. In other words, a sliding window of variable size (from 25 ms to 350 ms with 25 ms increments) sliding from onset until the start of the window reached 100 ms post onset (with increments of 10s) was used. This approach generated 140 different temporal windows for each keypress for each participant. Average MEG activity from each of these time widows were analyzed independently for decoding. The best time window was selected for each subject that resulted in best cross-validation performance. This temporally sliding and varying size window analysis was performed at each of the spatial and oscillatory scales.
Hybrid Spatial Approach
First, we evaluated the decoding performance of each brain region in decoding finger identity. We trained decoders on the voxel activities of each region (intra-parcel decoders) to predict finger identity. Brain regions were then ranked from 1 to 148 based on their decoding accuracy at the group level. In a stepwise manner, we incrementally incorporated voxel activities of brain regions, starting from the top-ranked region. In other words, first, we concatenated the voxel-level features of the highest-ranked region with whole-brain parcel-level features for decoding analysis. Subsequently, we added the voxel-level features of the second-ranked brain region, along with those of the top-ranked region to the parcel-level features for decoding and continued this process. This iterative addition of brain regions was performed until decoding accuracy reached saturation. The optimal feature space comprised all 148 parcel-space features in conjunction with voxel-space features from the top 8 brain regions.
Dimension Reduction
We used several supervised and unsupervised dimension reduction techniques including linear discriminant analysis (LDA), minimum redundant maximum relevance (MRMR), principal component analysis (PCA), Autoencoder, Diffusion maps, factor analysis, large margin nearest neighbor (LMNN), multi-dimensional scaling (MDS), neighbor component analysis (NCA), spatial predictor envelope (SPE)15 on each of the spatial feature space (sensor, parcel, voxel, and hybrid). Among these techniques, PCA, MDS, MRMR, and LDA emerged as particularly effective in significantly improving decoding performance.
PCA, a method for unsupervised dimensionality reduction, transformed the high-dimensional dataset into a new coordinate system of uncorrelated principal components. These components, capturing the maximum variance in the data, were iteratively added to reconstruct the feature space and execution of decoding. MDS finds a configuration of points in a lower-dimensional space such that the distances between these points reflect the dissimilarities or similarities between the corresponding objects in the original high-dimensional space. MRMR, an approach combining relevance and redundancy metrics, ranked features based on their significance to the target variable and their non-redundancy with other features. The decoding process involved starting with the highest-ranked feature and iteratively incorporating subsequent features until decoding accuracy reached saturation. LDA finds the linear combinations of features (dimensions) that best separate different classes in a dataset. It projects the original features onto a lower-dimensional space (number of classes −1) while preserving the class-discriminatory information. This transformation maximizes the ratio of the between-class variance to the within-class variance. In our study, LDA transformed the features to a 3-dimensional hyperdimensional space that were used for decoding. Dimension reduction was applied to train data and then with the tuned parameters of the dimension reduction model test data was transformed for decoder metrics evaluation. Decoding accuracies were systematically compared between the original and reduced dimension feature spaces, providing insights into the effectiveness of each dimension reduction technique. By rigorously assessing the impact of dimension reduction on decoding accuracy, the study aimed to identify techniques that not only reduced the computational burden associated with high-dimensional data but also enhanced the discriminative power of the selected features. This comprehensive approach aimed at optimizing the neural representation of each finger keypress for decoding performance across various spatial contexts.
Decoding Analysis
Decoding analysis was conducted for each participant individually, employing a randomized split of the data into training (90%) and test (10%) samples for 8 iterations. For each iteration, to optimize decoder configuration, an 8-fold cross-validation was applied to the training samples, allowing for the fine-tuning of hyperparameters and selection of the most effective model. Across participants, on average, the total number of individual keypresses for the whole duration of training was 219 ± SD: 66 (keypress 1: little), 205 ± SD: 66 (keypress 2: ring), 209 ± 66 (keypress 3: middle), and 426 ± SD: 131 (keypress 4: index). Only keypresses belonging to correctly typed sequence iterations (94.64% ± 4.04% of all keypresses) were considered. The total number of keypresses for keypress 4 was approximately twice that of keypresses 1, 2, and 3, as it was the only action that occurred more than once in the trained sequence (4-1-3-2-4), albeit in two different sequence contexts or ordinal positions. Considering the higher (2x) number of samples for one-class, we independently oversampled the keypresses 1, 2 and 3 to avoid overfitting to the over-represented class. Importantly, oversampling was applied independently for each keypress class, ensuring that validation folds were never oversampled, and training folds did not share common oversampled patterns. The decoder configuration demonstrating the best validation performance was selected for each iteration, and subsequently, test accuracy was evaluated on the independent/unseen test samples. This process was repeated for the 8 different iterations of train-test splitting and the average test accuracy was reported. This rigorous methodology aimed at generalizing decoding performance to ensure robust and reliable results across participants. Further, decoding evaluation was performed on the Day 2 data, for both the trained (Day 2 Retest; 9 trials) and untrained sequences (Day 2 Control; 9 different single-trial tests).
Machine Learning Classifiers
We employed a diverse set of machine learning decoders, including Naïve Bayes (NB), decision trees (DT), ensembles (EN), k-nearest neighbor (KNN), linear discriminant analysis (LDA), support vector machines (SVM), and artificial neural network (ANN), to train features generated with all possible combinations of spatial, temporal, and oscillatory scales for comprehensive comparative analysis. The hyperparameters of these decoders underwent fine-tuning using Bayesian optimization search.
The Naive Bayes classifier was configured with a normal distribution predictor and Gaussian Kernel, while the KNN classifier had a K value of 4 (for keypress decoding) and utilized the Euclidean distance metric. For decision trees, the maximum number of splits was set to 4 (for keypress decoding), with leaves being merged based on the sum of risk values greater or equal to the risk associated with the parent node. The optimal sequence of pruned trees was estimated, and the predictor selection method was ‘Standard CART,’ selecting the split predictor that maximizes the split-criterion gain over all possible splits of all predictors. The split criterion used was ‘gdi’ (Gini’s diversity index). Ensembles employed the bagging method with random predictor selections at each split, forming a random forest. The maximum number of learning cycles was set to 100 with a weak learner based on discriminant analysis. For SVM, the RBF kernel was selected through cross-validation (CV), and the ‘C’ parameter and kernel scale were optimized using Bayesian optimization search. In the case of LDA, the linear coefficient threshold and the amount of regularization were computed based on Bayesian optimization search. The ANN decoder consisted of one hidden layer with 128 nodes, followed by a sigmoid and a softmax layer, each with 4 nodes (for keypress decoding). Training utilized a scaled conjugate gradient optimizer with backpropagation, employing a learning rate of 0.01 (coarse to fine tuning) for a maximum of 100 epochs, with early stopping validation patience set to 6 epochs.
Additionally, we assessed long short-term memory recurrent neural networks (LSTM-RNN) and bidirectional LSTM-RNN to train the time series of each keypress for decoding. The configuration of these RNNs included two hidden (Bi)LSTM layers of 100 units, followed by three fully connected layers (50, 10, and 4), and a softmax layer with 4 nodes to decode the four keypress classes. The networks were trained using an Adam optimizer with an initial learning rate of 0.001, a learning rate drop factor of 0.1 after 20 epochs, and a maximum epoch of 400. For regularization, 20% dropouts after the (Bi)LSTM layers, 50% dropout after the fully connected layers, and L2 Norm with a gradient threshold value of 0.1 were employed. These hyperparameter values were chosen based on grid search optimization during cross-validation.
Decoding Performance Metrics
Decoding performance was assessed using several metrics, including accuracy (%), which indicates the proportion of correct classifications among all test samples. The confusion matrix provided a detailed comparison of the number of correctly predicted samples for each class against the ground truth. The F1 score for each keypress was utilized as a comprehensive metric, combining precision and recall scores. This score represents the predictive skill of the decoders by offering insights into their class-wise performance. Additionally, the weighted mean F1 score was computed to convey the overall predictive skill of the models across all classes. This metric is derived from a weighted average of the F1 scores for each class, providing a comprehensive evaluation of the models’ performance. Test accuracies were used for statistical comparisons.
Decoding During Skill Learning Progression
We systematically assessed decoding performance of the 2-class decoder (IndexOP1 vs IndexOP5) at each trial during the skill learning process to identify the relationship between the evolution of decoding proficiency and the acquired skill. Our approach involved evaluating decoder performance individually for each Day 1 Training trial. To mitigate the influence of varying sample sizes in later trials, we ensured an equal number of samples (first k keypresses) in each trial.
We used t-distributed stochastic neighborhood estimation (t-SNE) to visualize the evolution of neural representations corresponding to each keypress at each trial of the learning period. Further, within t-SNE distribution, we separately labeled the keypress 4 based on their context (i.e., keypress 4 at ordinal position 1: IndexOP1 and keypress 4 at ordinal position 5: IndexOP5). Brain regions important to decoding were determined for each trial based on MRMR based feature ranking and highlighted in topography plots.
Decoding Sequence Elements
Finally, we performed decoding of each contextual action of the sequence (i.e., IndexOP1, Little, Middle, Ring, and IndexOP5). We used the same decoding strategy (90%-10% split for train and test, 8-fold cross validation of training samples to select best decoder configuration, hybrid spatial features, and LDA based dimension reduction) to decode these 5-classes. Note, oversampling was not needed after sub-grouping the index finger keypresses into two separate classes based on their context. Decoding sequence elements were evaluated for both Day 1 Training, Day 2 Retest and Day 2 Control data.
Neural Representation analysis
We evaluated the online (within-trial) and offline (between-trial) changes in the neural representation of the contextual actions (IndexOP1 and IndexOP5) for each trial during training. For offline differentiation, we evaluated the Euclidian distance between the hybrid spatial features of the last index finger keypress of a trial (IndexOP5) to the first index finger keypress (IndexOP1) of the subsequent trial, mirroring the approach used to calculate micro-offline gains in skill. This offline distance provided insights on the net change in contextual representation of the index finger keypress during each interspersed rest interval. For online differentiation, we calculated the mean Euclidian distance between IndexOP1 and IndexOP5 of all the correctly typed sequences within a practice trial. Online differentiation informed on the net change in the contextual representation of the index finger keypress within each practice trial. Cumulative offline and online representation distances across participants were compared using 2-sample t-tests. Additionally, we computed trial-by-trial differences in offline and online representations during early learning, exploring their relationships with cumulative micro-offline and micro-online gains in skill, respectively, through regression analysis and Pearson correlation analysis. Linear regression models were trained utilizing the fitlm function in MATLAB. The model employed M-estimation, formulating estimating equations and solving them through the Iteratively Reweighted Least Squares (IRLS) method61. Key metrics such as the square root of the mean squared error (RMSE), which estimates the standard deviation of the prediction error distribution, the coefficient of explained variance (R2), the F-statistic as a test statistic for the F-test on the regression model, examining whether the model significantly outperforms a degenerate model consisting only of a constant term, and the p-value for the F-test on the model were computed and compared across different models. This multifaceted approach aimed to uncover the nuanced dynamics of neural representation changes in response to skill acquisition.
As a control analysis, we also computed the difference in neural representation between IndexOP1 and IndexOP5 on Day 2 Retest data for the same sequence (4-1-3-2-4) as well as for different Day 2 Control untrained sequences where the same action was performed at ordinal positions 1 and 5 (2-1-3-4-2; 1-4-2-3-1; 2-3-1-4-2; 4-2-3-1-4). Finally, we assessed for specificity of contextualization to the trained sequence, by evaluating differentiation between index finger keypress representations performed at two different positions within untrained sequences (4-2-4-3-1 and 1-4-3-4-2). The cumulative differences across participants were compared with 2-sample t-tests.
Data Availability
All de-identified and permanently unlinked from all personal identifiable information (PII) data are publicly available. (Box Repository).
Acknowledgements
We thank Ms. Tasneem Malik, Ms. Michele Richman, and NIMH MEG Core Facility staff for their support. This work utilized the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov). This research was supported by the Intramural Research Program of the NINDS, NIH.
Supplementary Materials
References
- 1.A rapid form of offline consolidation in skill learningCurrent Biology 29:1346–1351
- 2.Motor Control and Learning: A Behavioral EmphasisHuman kinetics
- 3.Reorganization and plasticity in the adult brain during learning of motor skillsCurr Opin Neurobiol 15:161–167
- 4.The neural representation of sequences: from transition probabilities to algebraic patterns and linguistic treesNeuron 88:2–19
- 5.Mechanisms of offline motor learning at a microscale of seconds in large-scale crowdsourced dataNPJ Sci Learn 5:1–10
- 6.Functional MRI evidence for adult motor cortex plasticity during motor skill learningNature 377:155–158
- 7.Practice and sleep form different aspects of skillNat Commun 5
- 8.Compartmentalized dynamics within a common multi-area mesoscale manifold represent a repertoire of human hand movementsNeuron 110:154–174
- 9.Learning of a sequential motor skill comprises explicit and implicit components that consolidate differentlyJ Neurophysiol 101:2218–2229
- 10.Neural organization of hierarchical motor sequence representations in the human neocortexNeuron 103:1178–1190
- 11.Binding during sequence learning does not alter cortical representations of individual actionsJournal of Neuroscience 39:6968–6977
- 12.Parallel neural networks for learning sequential proceduresTrends Neurosci 22:464–471
- 13.Consolidation of human skill linked to waking hippocampo-neocortical replayCell Rep 35
- 14.Automatic parcellation of human cortical gyri and sulci using standard anatomical nomenclatureNeuroimage 53:1–15
- 15.Dimensionality reduction: A comparative reviewJournal of Machine Learning Research 10
- 16.The role of distinct ECoG frequency features in decoding finger movementJ Neural Eng 20
- 17.Decoding the EEG patterns induced by sequential finger movement for brain-computer interfacesFront Neurosci 17
- 18.Individual finger movement decoding using a novel ultra-high-density electroencephalography-based brain-computer interface systemFront Neurosci 16
- 19.Decoding finger movement patterns from microscopic neural drive information based on deep learningMed Eng Phys 104
- 20.Fast and accurate decoding of finger movements from ECoG through Riemannian features and modern machine learning techniquesJ Neural Eng 19
- 21.Think to move: a neuromagnetic brain-computer interface (BCI) system for chronic strokeStroke 39:910–917
- 22.Continuous control of the dlr light-weight robot iii by a human with tetraplegia using the braingate2 neural interface systemExperimental Robotics: The 12th International Symposium on Experimental Robotics Springer :125–136
- 23.BrainGate: An Intracortical Brain-Computer Interface for the Restoration of Communication and Functional Independence for People with Paralysis2023 11th International Winter Conference on Brain-Computer Interface (BCI) IEEE :1–3
- 24.Brain–computer interfaces: communication and restoration of movement in paralysisJ Physiol 579:621–636
- 25.Brain–computer interface in paralysisCurr Opin Neurol 21:634–638
- 26.Decoding individual finger movements from one hand using human EEG signalsPLoS One 9
- 27.Single trial discrimination of individual finger movements on one hand: a combined MEG and EEG studyNeuroimage 59:3316–3324
- 28.Neural competitive queuing of ordinal structure underlies skilled sequential actionNeuron 101:1166–1180
- 29.Finger movement classification for an electrocorticographic BCI2007 3rd International IEEE/EMBS Conference on Neural Engineering 192–195 IEEE
- 30.A hybrid SVM/HMM based system for the state detection of individual finger movements from multichannel ECoG signals2011 5th International IEEE/EMBS Conference on Neural Engineering IEEE :457–460
- 31.Decoding intra-limb and inter-limb kinematics during treadmill walking from scalp electroencephalographic (EEG) signalsIEEE Transactions on neural systems and rehabilitation engineering 20:212–219
- 32.Neuroplasticity subserving motor skill learningNeuron 72:443–454
- 33.Theta-phase dependent neuronal coding during sequence learning in human single neuronsNat Commun 12
- 34.Relationships among low-frequency local field potentials, spiking activity, and three-dimensional reach and grasp kinematics in primary motor and ventral premotor corticesJ Neurophysiol 105:1603–1619
- 35.Spatiotemporal variation of multiple neurophysiological signals in the primary motor cortex during dexterous reach-to-grasp movementsJournal of Neuroscience 31:15531–15543
- 36.Low-frequency brain oscillations track motor recovery in human strokeAnn Neurol 86:853–865
- 37.Theta oscillations reflect a putative neural mechanism for human sensorimotor integrationJ Neurophysiol 107:65–77
- 38.Theta oscillations locked to intended actions rhythmically modulate perceptionElife 6
- 39.Low-frequency cortical activity is a neuromodulatory target that tracks recovery after strokeNat Med 24:1257–1267
- 40.Real-time estimation and biofeedback of single-neuron firing rates using local field potentialsNat Commun 5
- 41.Phase entrainment of human delta oscillations can mediate the effects of expectation on reaction speedJournal of Neuroscience 30:13578–13585
- 42.Local field potentials allow accurate decoding of muscle activityJ Neurophysiol 108:18–24
- 43.Generalizability of EMG decoding using local field potentials2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society IEEE :1630–1633
- 44.Neural population dynamics during reachingNature 487:51–56
- 45.Consciousness among delta waves: a paradox?Brain 144:2257–2277
- 46.Recovery of consolidation after sleep following stroke—interaction of slow waves, spindles, and GABACell Rep 38
- 47.Decoding the individual finger movements from single-trial functional magnetic resonance imaging recordings of human brain activityEuropean Journal of Neuroscience 39:2071–2082
- 48.EEG-based BCI system for decoding finger movements within the same handNeurosci Lett 698:113–120
- 49.Evaluation of EEG features in decoding individual finger movements from one handComput Math Methods Med 2013
- 50.EEG pattern decoding of rhythmic individual finger imaginary movements of one handHum Physiol 42:32–42
- 51.Decoding sequential finger movements from preparatory activity in higher-order motor regions: a functional magnetic resonance imaging multi-voxel pattern analysisEuropean Journal of Neuroscience 42:2851–2859
- 52.Magnetoencephalography for brain electrophysiology and imagingNat Neurosci 20:327–339
- 53.Robust conjunctive item–place coding by hippocampal neurons parallels learning what happens whereJournal of Neuroscience 29:9918–9929
- 54.Population activity in the control of movementInt Rev Neurobiol 103
- 55.On the relations between the direction of two-dimensional arm movements and cell discharge in primate motor cortexJ Neurosci 2:1527–1537
- 56.Generalization of procedural motor sequence learning after a single practice trialNPJ Sci Learn 8
- 57.Detecting wrong notes in advance: neuronal correlates of error monitoring in pianistsCerebral cortex 19:2625–2639
- 58.Nobody is perfect: ERP effects prior to performance errors in musicians indicate fast monitoring processesPLoS One 4
- 59.FieldTrip: open source software for advanced analysis of MEG, EEG, and invasive electrophysiological dataComput Intell Neurosci 2011:1–9
- 60.EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysisJ Neurosci Methods 134:9–21
- 61.Robust regression using iteratively reweighted least-squaresCommunications in Statistics-theory and Methods 6:813–827
Article and author information
Author information
Version history
- Preprint posted:
- Sent for peer review:
- Reviewed Preprint version 1:
Copyright
© 2024, Dash et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
- views
- 154
- download
- 1
- citations
- 0
Views, downloads and citations are aggregated across all versions of this paper published by eLife.