Figures and data

Example sequence of stimuli and button presses in the experiment.
Participants viewed images of nature. In manual trials, they watched the image until they pressed a button to advance to the next image. In automatic trials, they viewed the image until it automatically advanced to the next image. Before each image, for one second, a word prompt instructed them whether this was a manual or an automatic trial followed by a 0.25 s buffer. A color-coded fixation cross was also present afterwards overlayed on the image. In manual trials, participants were instructed to wait for at least roughly 3 seconds before pressing, however they were instructed not to count time in their heads.

Traditional electrophysiological and behavioral features of self-initiated actions.
(A) Grand average (n=15) movement-related cortical potential at C3 for active (blue) and passive (red) trials aligned to slide transition (t=0;dotted gray line). The three gray dashed vertical lines indicate the time of onset of the MRCP signal for active trials according to three onset-identification methods from the RP literature. The shaded area represents the standard error of the mean. A topography (bottom left) shows the grand average activity in the 0.5 s right before slide transition (blue is more negative). (B) Grand average (n=15) lateralized readiness potential taken as the difference wave between electrodes C3 and C4 for active (blue) and passive (red) trials aligned to slide transition (t=0;dotted gray line). Note that all movements were right index finger button presses. The shaded area represents the standard error of the mean. (C) Grand average (n=15) power spectrum at C3 for active trials aligned to slide transition (t=0;dotted gray line) and normalized using data from -3 to -2.5 s, this window was chosen to avoid the inclusion of edge artifacts in the baseline. Color represents power, with blue indicating a decrease in power relative to baseline while yellow indicates an increase. No cluster survived significance testing with cluster correction (cluster-level p<0.05;pixel-level p<0.01;see Materials and Methods, Event-related desynchronization). (D) Waiting time distributions in the active trials (i.e., how long people waited before advancing to the next slide). Each shade of grey represents a different participant.

Model performance for the task- and time-based methods.
(A) Grand average (n=15) time course of the validation AUC (10-fold) using the task-based (blue) and time-based (green) method. The task-based method with the MEG data from the parametrization set (purple) show near ceiling performance post-slide transition. X-axis marks the leading edge (in seconds) of the sliding window aligned to slide transition (t=0; dotted gray line). Thin gray dashed vertical lines indicate onset times labeled with the corresponding onset-method for the task-based approach; these onsets are determined with respect to a baseline AUC taken from -2.5 s to -2 s. This different baseline was selected as the earliest available 0.5 s AUC period. Shaded areas represent the standard error of the mean. (B) Boxplot of earliest decoding times (EDT) across participants (n=15) for both the time-based (green) and task-based methods (blue). Y-axis shows time aligned to slide transition (t = 0;dashed gray line). Participants’ mean single trial derived EDT (earliest correct classification of each trial) are displayed on the left, AUC derived EDT (earliest time the lower AUC error bound remains above 0.5) on the right (for a description of the two methods see Materials and Methods, Model evaluation). Red crosses indicate outliers. Red dotted horizontal lines indicate the MRCP onsets for each method (see Figure 2A). Note that in panel A, AUC for the task-based method seems to onset much earlier (0.44 s before with the onset-by-eye method) than the median earliest decoding (panel B, -0.085 s for the single trial method and +0.02 s for the AUC method). This is in part due to A showing an average of time courses, with the early rise in the average being driven by outliers, and in part due to the different methodologies used to derive each metric. The same data without outliers is shown in the appendix (see appendix figure S7).

Comparison of the task- and time-based methods using simulated datasets.
(A) Time course of the validation AUC (10-fold) using the task-based method (blue), time-based with fixed baseline (green), and time-based with pre-trial start baseline (blue-green) for leaky stochastic accumulator simulated data. The x-axis represents the time of the leading edge of the sliding window aligned to the threshold crossing (t=0; dotted gray line). The shaded area represents the standard error of the mean of the AUC. (B) Time course of the validation AUC (10-fold) using the task-based method (blue) and time-based method (green) for pink noise simulated data. The shaded area represents the standard error of the mean of the AUC.

Benchmark analysis of Haar-AdaBoost against standard models.
Validation AUC of Haar-AdaBoost using the task-based approach applied to the EEG of the fifteen participants of the OC main dataset using Haar-AdaBoost (blue) EEGNet (purple) and basic slope LDA (pink). Time 0 is the time of slide transition (dotted gray line). The AUC is aligned to the leading edge of the sliding window. Shaded regions indicate the standard error of the mean of AUC across participants.

Validation AUC of PF dataset.
Validation AUC of Haar-Adaboost applied to the MEG and EEG of the three participants of the PF dataset using the task-based approach. Time 0 is the time of slide transition (dotted gray line). The AUC is aligned to the leading edge of the sliding window. Shaded areas are the standard error of the AUC. While EEG provides great temporal resolution it typically has poor spatial resolution. MEG on the other hand has a much higher spatial resolution for an identical temporal resolution. In order to ensure that our analysis could be performed interchangeably on EEG and MEG we acquired concurrently EEG and MEG data in the same participants at PF. We found no improvement of using MEG over EEG before the slide transition. We do however see that MEG performs better after movement-indicating that post movement some spatial information is missing in the EEG.

Effect-matched spatial filtering of MEG data from PF.
Event-locked averages of the MEG data from PF projected onto a spatial filter. The top two rows represent the magnometers and the gradiometers averaged over active trials. The bottom two rows represent the magnometers and the gradiometers averaged over passive trials. Time 0 is the time of slide transition. Magnometers measure the total strength of the magnetic field under the sensor while gradiometers measure a difference in magnetic field strength between two sensors - as such they reveal different aspects ofthe same signal (see (Hämäläinen et al., 1993) fora more in-depth understanding). Thin lines represent individual sessions. Topographic plots represent the distribution ofthe spatial filters over the scalp. Each column is one of the three participants. In order to accurately represent the time course ofthe motor-related cortical field in the MEG we used a method called effect-matched spatial filtering (EMSF) to reduce the dimensionality ofthe data (see (Schurger et al., 2013)). For gradiometers and magnometers separately, a spatial filter maximizing the difference between the activity 2.9 s before slide transition with that at slide transition, was applied to the data to reduce all channels to a single time series. This encapsulates any trend in the signal that occurs over that 2.9 s period. We can see that for active trials, a negative deflection in the MEG is present.

Individual ERP.
Individual movement-related cortical potentials at C3 forthe 15 participants at OC. In blue are the averages ofthe passive trials and in red the averages ofthe active trials. Time 0 is the time ofslide transition (dotted gray line). Shaded areas are the standard errors of the mean.

AUC and ERP of Libet-task participants.
Machine learning and ERP results for three participants that we ran with a spontaneous movement initiation task based on Libet et al. (1983). (A) Individual validation AUC for the 3 participants. In light blue are the AUCs using the task-based approach and in green are the AUCs using the time-based approach. Time 0 is the time of slide transition (dotted gray line). The AUC is aligned to the leading edge of the sliding window. Shaded areas are the standard errors of the AUC. (B) Individual movement-related cortical potentials at C3 for the 3 participants. In blue are the averages of the passive trials andin red the averages of the active trials. Time 0 is the time of slide transition (dotted gray line). Shaded areas are the standard errors of the mean. Our slideshow paradigm, while providing a more ecologically valid task, is agnostic as to the spontaneity of the participants’ movement decisions. To ensure that our task would generalize to more standard paradigms in the field of self-initiated action, we performed our analyses on data from a spontaneous voluntary movement paradigm based on Libet et al. (1983). Here we collected three further participants (1 female, 2 males, age M=29.3,1 left-handed, 2 right-handed). They performed 350 trials of a task similar to the task performed by participants at OC except for the fact that there were no pictures, simply a fixation cross. Furthermore, the instructions for the manual trials were for participants to wait for a minimum of 3 seconds then start monitoring inwards for an urge to move. Whenever they detected such urge they were instructed to press as abruptly and spontaneously as possible, ending the trial. In the automatic trials, participants were instructed to do the same (monitor introspectively for an urge), except that they should not act on the urge if/when they felt it, but rather to wait for the next urge passively, and repeat such process until the trial ended automatically. This paradigm is much closer to the seminal studies on self-initiated actions (Libet et al., 1983; Kornhuber and Deecke, 1965). Here movements are performed spontaneously, and the matched condition (automatic) is identical in most regards apart from it not containing or terminating in a movement. All preprocessing and data analysis applied were the same. We found no qualitative difference with our main result (Figures 2A and 3A), suggesting that our task would generalize to other types of self-initiated actions.

Individual validation AUC.
Individual validation AUC for the 15 participants at OC. In light blue are the AUCs using the task-based approach and in green are the AUCs using the time-based approach. Time 0 is the time of slide transition (dotted gray line). The AUC is aligned to the leading edge ofthe sliding window. Shaded areas are the standard errors ofthe AUC.

Feature importance.
Feature importance for the task-based method (A) Time-frequency plot of the grand average (n=15) feature importance as a normalized measure of the final weights selected by the model associated with features from each frequency at each time point. (B) Heatmap of the grand average (n=15) channel importance as a normalized measure of the final weights selected by the model associated with each channel over time. Values are normalized such that a value of 0.07 means that activity at this channel accounted for 7% of the model’s decision at this timepoint. When performing our sliding window analysis using Haar-AdaBoost (Figure 3) we extracted the importance of the different features used overtime by the algorithm. For each window of the sliding window, each selected weak learner (using a decision stump classifier) was associated with a single feature. One clear advantage of this approach is that each feature corresponds to a channel or a pair of channels, a unique frequency, and a unique time interval. Using the weights each of these weak classifiers was attributed, one can then extract from the final model, by normalizing the weights, the importance of each channel, and frequencies over time for the classification. Once the model was trained, we thus extracted the weight attributed to each channel and frequency over time. For each sliding window, we extracted the proportional weight a channel or a frequency played in the classification. Notably, when we ran our analysis, the frequency feature importance (panel A) focused on the Mu rhythms and Beta desynchronization (Figure 2C) right after movement onset. Channels over left motor cortices also exhibited higher relevance right before, during and after movement (panel B).

Validation AUC with outliers excluded.
Model performance for the task- and time-based methods without earliest decoding time outliers. Grand average time course of the validation AUC (10-fold) using the task-based method (blue; N=12) and time-based method (green; N=13) from OC EEG data. The three gray dashed lines indicate the time of onset of the signal according to their respectively labeled indicators for the control method at OC EEG data. The shaded area represents the standard error of the mean. For the task-based method, 3 outliers were removed (participants 3, 6 and 8). For the time-based method, 2 outliers were removed (participants 8 and 11). Outliers were defined using the Tukey method as participants with earliest decoding times either larger than 1.5 times the difference between the third and first quartiles plus the third quartile, or smaller than 1.5 times the difference between the third and first quartiles minus the first quartile. AUC onset t-test method here is at 20 ms before slide transition. Matching MRCP onsets with outliers removed are displayed in red. The thick dotted gray line represents the time of slide transition.

Benchmark comparison of standard classifiers.
Benchmark comparison of standard algorithms on PF data. (A-E) Validation AUC of Haar-AdaBoost using the task-based approach applied to the EEG of the three participants of the PF validation dataset using Haar-AdaBoost (blue), EEGNet (purple), CSP-LDA (green), SVM (yellow), and basic slope LDA (pink). The color scheme matches the one used in Figure 5.

Number of trials required for classification.
Performance of Haar-Adaboost on EEG data from PF (averaged over the 3 participants) using the task-based approach as function of the number of trials. (A) The time course of the validation AUC of 10 runs of Haar-Adaboost with a different number of trials. Time 0 is the time of slide transition (dotted gray line). (B) The time course of the standard error of the validation AUC of 10 runs of Haar-Adaboost with a different number of trials. Time 0 is the time of slide transition (dotted gray line). We iteratively ran via random subsampling our pipeline on our data from PF. We found that after 150 trials the AUC time course plateaued and did not improve much with more samples (panel A). Similarly, after about 300 trials the standard error did not improve much and converged (panel B).

Decoding performance for frequency based differences.
Performance, in the frequency domain, of Haar-Adaboost using the task-based approach on EEG data from PF. (A) Time courses of the validation AUC of Haar-Adaboost for three participants using raw data (purple) or data decomposed into different frequency bands (light green). Time 0 is the time of slide transition (dotted gray line). AUC is aligned to the leading edge of the sliding window. Shaded areas indicate the standard error of the AUC. (B) AUC of Haar-Adaboost when classifying raw data from passive trials with data from passive trials that has been bandstopped filtered at a specific frequency (x-axis) aligned to the slide transition. Error bars represent the standard error of the AUC. To establish whether our Haar wavelet features gave Haar-AdaBoost the ability to pick up on frequency differences in specific bands we performed two separate analyses. Using data from PF, we initially decomposed it using a causal butterworth filter and Hilbert transform into 30 frequencies ranging from 2 to 80 Hz. We then compared it to using only raw data with Haar-AdaBoost and found that they perform on a par (panel A). Next, we took the passive trials from PF data and made 30 copies to each of which we selectively applied a bandstop filter using 30 distinct bands from 2 to 80 Hz. We then ran the algorithm 30 times classifying each time the non-bandstopped trials to those that have been bandstopped filtered. We found that Haar-AdaBoost could classify all of them with high AUCs despite the two classes’ difference each time being only on a missing frequency range (panel B) indicating that Haar-AdaBoost can effectively pick up on subtle frequency differences.

Comparison of growing and sliding windows.
Average validation AUC of Haar-Adaboost ofthe three participants ofthe PF EEG dataset using a sliding window (light blue) or growing window approach (pink) with the task-based approach. Time 0 is the time of slide transition (dotted gray line). The AUC is aligned to the leading edge of the sliding window. Shaded area is the standard error ofthe mean. A growing window analysis is similar to the sliding window analysis except that at each iteration, instead of sliding the window, the window width is increased by 0.02 s. This means that for an epoch beginning 3 s before movement, the window of analysis for the leading edge aligned to the slide transition would be 3 s wide from -3 s to 0 s (while for the sliding window method it would be 0.5 s wide from -0.5 s to 0 s). While it enables the capture of more patterns, it is a computation heavy analysis. To ensure that we were not missing out on some substantial longer-term patterns with sliding window, we ran a growing window on our PF EEG data. We found no improvement of using growing window over sliding window.

Decoding performance at trial onset.
Average model performance using the task-based approach on the EEG data from 15 participants at OC in terms of validation AUC. The analysis is aligned to the start ofthe trial. The shaded red region represents the one second period during which the instructions (“manual” or “automatic”) were being displayed to the participants. The appearance ofthe slide occurred at time 1.25 s with respect to trial start (vertical gray line). In an effort to ensure that our algorithm would be able to detect cognitive differences between the two tasks prior to movement onset, we decided to check whether it could classify above chance with the epochs time-locked to trial start, while the instructions (manual or automatic) were displayed on screen. We aligned our epochs to the start of the trial and then ran Haar-Adaboost to classify manual versus automatic. At the start of each trial, the instructions were shown on screen for 1 second followed by a 0.25 s blank screen before the appearance of the slide. Here, for convenience, trial rejection was not performed manually but instead, any channel within each trial that had a range of more than 120 μV (max-min) were interpolated trial-by-trial. If more than 5 channels needed to be interpolated in a given trial, the trial was excluded from the analysis.While our decoding did not boast as high AUC values as it did right after slide transition, Haar-Adaboost did successfully classify while the instructions were on screen. Participant’s individual classification AUC peaked at about 0.65 during the 1 s instruction window and at 0.63 over a period of 1.75 s following the onset of the slide (peaks at the individual level occurred on average 0.61 s after the onset of the slide), indicating the presence of a second jittered decoding peak after the appearance ofthe slide (which is flattened on the average across participants; this post slide onset peak only reaches 0.55 at the group average). It is worth noting that this early classification time-locked to the trial start is not reflected in the classification time-locked to the slide transition (Figure 3) for multiple reasons. Firstly, it is a relatively low AUC which phases off after a little over a second following trial start. When combined with the jitter induced by the trial variability of response times (Figure 2D), in those trials that overlap, the jitter might be enough to smear the AUC. Secondly, AUC onset, using the MRCP onset methods (Figure 3A) or the AUC and single trial methods (Figure 3B) are computed backwards from slide transition. They could therefore miss early isolated temporal windows of above chance classification. We chose this approach to avoid false positive rate contaminating our earliest decoding time values but also because we were specifically interested in the onset of a signal similar to the early signals found in MRCP, that is, exhibiting an onset that is then sustained until the slide transition. Additionally, early decoding peaks followed by silent decoding periods have been reported in the literature. For instance, when predicting upcoming decision outcome from fMRI data, the pre-SMA first reflects this outcome significantly 6 seconds before movements then the significance goes aways and picks back up 2 seconds after the movement (see Figure 2 in (Soon et al., 2008)). In memory research, activity silent neurons have also been identified (Stokes, 2015; Trübutschek et al., 2019). All in all it is very plausible that the activity encoding the task (active vs. passive) was just not accessible to our classifier (i.e. not in the EEG), but this is entirely compatible with our claim that early MRCP (which are accessible to the classifier) are not predictive of an early commitment to a decision.

Comparison of total number of trials per session.
Summary of the results with participants split according to the number of trials they had per session. (A) Grand average movement-related cortical potential at C3 for non-outlier (see appendix figure S6) participants whose sessions contained 350 trials (blue; N=7) and those whose sessions contained 700 trials (red; N=5) aligned to slide transition (t=0; dotted gray line). The shaded area represents the standard error of the mean. The mean MRCP in the 0.5 s preceding slide transitions between the two types of participants was not significantly different (M=0.58 SD=1.56; t(10)=0.63, p=0.54). (B) Grand average time course of the validation AUC (10-fold) for non-outlier participants whose sessions contained 350 trials (light blue; N=7) and those whose sessions contained 700 trials (green; N=5). The x-axis represents the time of the leading edge of the sliding window aligned to slide transition (t=0; dotted gray line). The shaded area represents the standard error of the mean.
