1. Neuroscience
Download icon

Neural precursors of decisions that matter—an ERP study of deliberate and arbitrary choice

  1. Uri Maoz  Is a corresponding author
  2. Gideon Yaffe
  3. Christof Koch
  4. Liad Mudrik
  1. Chapman University, United States
  2. University of California, Los Angeles, United States
  3. California Institute of Technology, United States
  4. Yale University, United States
  5. Allen Institute for Brain Science, United States
  6. Tel Aviv University, Israel
Research Communication
  • Cited 0
  • Views 502
  • Annotations
Cite this article as: eLife 2019;8:e39787 doi: 10.7554/eLife.39787

Abstract

The readiness potential (RP)—a key ERP correlate of upcoming action—is known to precede subjects' reports of their decision to move. Some view this as evidence against a causal role for consciousness in human decision-making and thus against free-will. But previous work focused on arbitrary decisions—purposeless, unreasoned, and without consequences. It remains unknown to what degree the RP generalizes to deliberate, more ecological decisions. We directly compared deliberate and arbitrary decision-making during a $1000-donation task to non-profit organizations. While we found the expected RPs for arbitrary decisions, they were strikingly absent for deliberate ones. Our results and drift-diffusion model are congruent with the RP representing accumulation of noisy, random fluctuations that drive arbitrary—but not deliberate—decisions. They further point to different neural mechanisms underlying deliberate and arbitrary decisions, challenging the generalizability of studies that argue for no causal role for consciousness in decision-making to real-life decisions.

Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed (see decision letter).

https://doi.org/10.7554/eLife.39787.001

Introduction

Humans typically experience freely selecting between alternative courses of action, say, when ordering a particular item off a restaurant menu. Yet a series of human studies using electroencephalography (EEG) (Haggard and Eimer, 1999; Libet et al., 1983; Salvaris and Haggard, 2014), fMRI (Bode and Haynes, 2009; Bode et al., 2011; Soon et al., 2008; Soon et al., 2013), intracranial (Perez et al., 2015), and single-cell recordings (Fried et al., 2011) challenged the veridicality of this common experience. These studies found neural correlates of decision processes hundreds of milliseconds and even seconds prior to the moment that subjects reported having consciously decided.

The seminal research that launched this series of studies was conducted by Benjamin Libet and colleagues (Libet et al., 1983). There, the authors showed that the readiness potential (RP)—a ramp-up in EEG negativity before movement onset, thought to originate from the presupplementary motor area (pre-SMA)—began before subjects reported a conscious decision to act. Libet and colleagues took the RP to be a marker for an unconscious decision to act (Libet et al., 1983; Soon et al., 2008) that, once it begins, ballistically leads to action (Shibasaki and Hallett, 2006). Under that interpretation, the fact that RP onset precedes the report of the onset of the conscious decision to act was taken as evidence that decisions about actions are made unconsciously. And thus the subjective human experience of freely and consciously deciding to act is but an illusion (Harris, 2012; Libet et al., 1983; Wegner, 2002). This finding has been at the center of the free-will debate in neuroscience for almost four decades, captivating scholars from many disciplines in and outside of academia (Frith et al., 2000; Frith and Haggard, 2018; Haggard, 2008; Jeannerod, 2006; Lau et al., 2004; Mele, 2006; Wegner, 2002).

Critically, in the above studies, subjects were told to arbitrarily move their right hand or flex their right wrist; or they were instructed to arbitrarily move either the right or left hand (Haggard, 2008; Hallett, 2016; Roskies, 2010). Thus, their decisions when and which hand to move were always unreasoned, purposeless, and bereft of any real consequence. This stands in sharp contrast to many real-life decisions that are deliberate—that is, reasoned, purposeful, and bearing consequences (Ullmann-Margalit and Morgenbesser, 1977): which clothes to wear, what route to take to work, as well as more formative decisions about life partners, career choices, and so on.

Deliberate decisions have been widely studied in the field of neuroeconomics (Kable and Glimcher, 2009; Sanfey et al., 2006) and in perceptual tasks (Gold and Shadlen, 2007). Yet, interestingly, little has been done in that field to assess the relation between decision-related activity, subjects’ conscious experience of deciding, and the neural activity instantaneously contributing to this experience. Though some studies compared, for example, internally driven and externally cued decisions (Thut et al., 2000; Wisniewski et al., 2016), or stimulus-based and intention-based actions (Waszak et al., 2005), these were typically arbitrary decisions and actions with no real implications. Therefore, the results of these studies provide no direct evidence about potential differences between arbitrary and deliberate decisions.

Such direct comparisons are critical for the free will debate, because it is deliberate, rather than arbitrary, decisions that are at the center of philosophical arguments about free will and moral responsibility (Breitmeyer, 1985; Maoz and Yaffe, 2016; Roskies, 2010). Deliberate decisions typically involve more conscious and lengthy deliberation and might thus be more tightly bound to conscious processes than arbitrary ones. Consequently, if the RP is a marker for unconscious decisions, while deliberate decisions are driven more by conscious than by unconscious processes, then the RP might be substantially diminished, or even absent, for deliberate decisions.

Another reason that the RP might be completely absent during deliberate decisions has to do with a recent computational model (Schurger et al., 2012). This model claims that the RP—which has been deemed a preparatory signal with a causal link to the upcoming movement—actually reflects an artifact that results from a combination of (i) biased sampling stemming from the methodology of calculating this component and (ii) autocorrelation (or smoothness) in the EEG signal. The RP is calculated by aligning EEG activity (typically in electrode Cz) to movement onset, then segmenting a certain time duration around each movement onset (i.e., epoching), and finally averaging across all movements. Hence, we only look for an RP before movement onset, which results in biased sampling (as ‘movement absent’ is not probed). Put differently, we search for and generally find a ramp up in EEG negativity in Cz before movement onset. But we do not search for movement onset every time there is a ramp up in EEG negativity on Cz. What is more, as EEG is autocorrelated, ramps up or down are to be expected (unlike, say, for white-noise activity).

Schurger and colleagues demonstrated that RPs can be modeled using a mechanistic, stochastic, autocorrelated, drift-diffusion process that integrates noise to a bound (or threshold; see Model section in Materials and methods for details). In the model, it is only the threshold crossing that reflects decision completion and directly leads to action. And thus the beginning of (what is in hindsight and on average) the ramp up toward the threshold is certainly not the completion of the decision that ballistically leads to the threshold crossing and hence to movement onset (Schurger et al., 2012). This interpretation of the RP thus takes the sting out of the Libet argument against free will, as the latter was based on interpreting the RP as reflecting an unconscious decision to act. Importantly for our purposes, within the framework of the model, this artificial accumulation of stochastic fluctuations toward a threshold is expected to occur for arbitrary decisions, but not for deliberate ones. Unlike arbitrary decisions, deliberate decisions are generally not driven by random fluctuations. Rather, it is the values of the decision alternatives that mainly drive the decision and ultimately lead to action. Therefore, if the RP indeed reflects the artificial accumulation of stochastic fluctuations, as the model suggests, a key prediction of the model is that no RP will be found for deliberate decisions (see more below).

Thus, demonstrating the absence of an RP in deliberate decisions challenges the interpretation of the RP as a general index of internal, unconscious decision-making; if this interpretation were correct, such a marker should have been found for all decision types. What is more, and importantly, it questions the generalizability of any studies focused on arbitrary decisions to everyday, ecological, deliberate decisions. In particular, it challenges RP-based claims relating to moral responsibility (Haggard, 2008; Libet, 1985; Roskies, 2010), as moral responsibility can be ascribed only to deliberate decisions.

Here, we tested this hypothesis and directly compared the neural precursors of deliberate and arbitrary decisions—and in particular the RP—on the same subjects, in an EEG experiment. Our experiment utilized a donation-preference paradigm, in which a pair of non-profit organizations (NPOs) were presented in each trial. In deliberate-decision trials, subjects chose to which NPO they would like to donate $1000. In arbitrary-decision trials, both NPOs received an equal donation of $500, irrespective of subjects’ key presses (Figure 1). In both conditions, subjects were instructed to report their decisions as soon as they made them, and their hands were placed on the response keys, to make sure they could do so as quickly as possible. Notably, while the visual inputs and motor outputs were identical between deliberate and arbitrary decisions, the decisions’ meaning for the subjects was radically different: in deliberate blocks, the decisions were meaningful and consequential—reminiscent of important, real-life decisions—while in arbitrary blocks, the decisions were meaningless and bereft of consequences—mimicking previous studies of volition.

Experimental paradigm.

The experiment included deliberate (red, left panel) and arbitrary (blue, right panel) blocks, each containing nine trials. In each trial, two causes—reflecting NPO names—were presented, and subjects were asked to either choose to which NPO they would like to donate (deliberate), or to simply press either right or left, as both NPOs would receive an equal donation (arbitrary). They were specifically instructed to respond as soon as they reached a decision, in both conditions. Within each block, some of the trials were easy (lighter colors) decisions, where the subject’s preferences for the two NPOs substantially differed (based on a previous rating session), and some were hard decisions (darker colors), where the preferences were more similar; easy and hard trials were randomly intermixed within each block. To make sure subjects were paying attention to the NPO names, even in arbitrary trials, and to better equate the cognitive load between deliberate and arbitrary trials, memory tests (in light gray) were randomly introduced. There, subjects were asked to determine which of four NPO names appeared in the immediately previous trial. For a full list of NPOs and causes see Supplementary file 1.

https://doi.org/10.7554/eLife.39787.002

Results

Behavioral results

Subjects’ reaction times (RTs) were analyzed using a 2-way ANOVA along decision type (arbitrary/deliberate) and difficulty (easy/hard). This was carried out on log-transformed data (raw RTs violated the normality assumption; W = 0.94, p=0.001). As expected, subjects were substantially slower for deliberate (M = 2.33, SD = 0.51) than for arbitrary (M = 0.99, SD = 0.32) decisions (Figure 2, left; F(1,17)=114.87, p<0.0001 for the main effect of decision type). A main effect of decision difficulty was also found (F(1,17)=21.54, p<0.0005), with difficult decisions (M = 1.77, SD = 0.40) being slower than easy ones (M = 1.56, SD = 0.28). Importantly, subjects were significantly slower for hard (M = 2.52, SD = 0.62) vs. easy (M = 2.13, SD = 0.44) decisions in the deliberate case (t(17)=4.78, p=0.0002), yet not for the arbitrary case (M = 1.00, SD = 0.34; M = 0.98, SD = 0.32, for hard and easy arbitrary decisions, respectively; t(17)=1.01, p=0.33; F(1,17)=20.85, p<0.0005 for the interaction between decision type and decision difficulty). This validates our experimental manipulation and further demonstrates that, in deliberate decisions, subjects were making meaningful decisions, affected by the difference in the values of the two NPOs, while for arbitrary decisions they were not. What is more, the roughly equal RTs between easy and hard arbitrary decisions provide evidence inconsistent with concerns that subjects were deliberating during arbitrary decisions.

Behavioral results.

Reaction Times (RTs; left) and Consistency Grades (CG; right) in arbitrary (blue) and deliberate (red) decisions. Each dot represents the average RT/CG for easy and hard decisions for an individual subject (hard decisions: x-coordinate; easy decisions: y-coordinate). Group means and SEs are represented by dark red and dark blue crosses. The red and blue histograms at the bottom-left corner of each plot sum the number of red and blue dots with respect to the solid diagonal line. The dashed diagonal line represents equal RT/CG for easy and hard decisions; data points below that diagonal indicate longer RTs or higher CGs for hard decisions. In both measures, arbitrary decisions are more centered around the diagonal than deliberate decisions, showing no or substantially reduced differences between easy and hard decisions.

https://doi.org/10.7554/eLife.39787.003

The consistency between subjects’ choices throughout the main experiment and the NPO ratings they gave prior to the main experimental session was also analyzed using a 2-way ANOVA (see Materials and methods). As expected, subjects were highly consistent with their own, previous ratings when making deliberate decisions (M = 0.91, SD = 0.04), but not when making arbitrary ones (M = 0.52, SD = 0.04; Figure 2, right; F(1,17)=946.55, p<0.0001, BF = 2.32*1029) for the main effect of decision type. A main effect of decision difficulty was also found (F(1,17)=57.39, p<0.0001, though BF = 1.57), with hard decisions evoking less consistent scores (M = 0.66, SD = 0.05) than easy ones (M = 0.76, SD = 0.03). Again, decision type and decision difficulty interacted (F(1,17)=25.96, p<0.0001, BF = 477.47): subjects were much more consistent with their choices in easy (M = 0.99, SD = 0.02) vs. hard (M = 0.83, SD = 0,64) deliberate decisions (t(17)=11.15, p<0.0001, BF = 3.68*106), than they were in easy (M = 0.54, SD = 0.07) vs. hard (M = 0.49, SD = 0.05) arbitrary decisions (t(17)=2.50, p=0.023, BF = 2.69). Nevertheless, though subjects were around chance (i.e., 0.5) in their consistency in arbitrary decisions (ranging between 0.39 and 0.64), it seems that some subjects were slightly influenced by their preferences in easy-arbitrary decisions trials, resulting in the significant difference between hard-arbitrary and easy-arbitrary decisions above, though the Bayes factor was inconclusive. Finally, no differences were found between subjects’ tendency to press the right vs. left key in the different conditions (both main effects and interaction: F < 1).

EEG results: Readiness Potential (RP)

The RP is generally held to index unconscious readiness for upcoming movement (Haggard, 2008; Kornhuber and Deecke, 1990; Libet et al., 1983; Shibasaki and Hallett, 2006); although more recently, alternative interpretations of the RP have been suggested (Miller et al., 2011; Schmidt et al., 2016; Schurger et al., 2012; Trevena and Miller, 2010; Verleger et al., 2016). It has nevertheless been the standard component studied in EEG versions of the Libet paradigm (Haggard, 2008; Haggard and Eimer, 1999; Hallett, 2007; Libet, 1985; Libet et al., 1983; Libet et al., 1982; Miller et al., 2011; Schurger et al., 2012; Shibasaki and Hallett, 2006; Trevena and Miller, 2010). As is common, we measured the RP over electrode Cz in the different conditions by averaging the activity across trials in the 2 s prior to subjects’ movement.

Focusing on the last 500 ms before movement onset for our statistical tests, we found a clear RP in arbitrary decisions, yet RP amplitude was not significantly different from 0 in deliberate decisions (Figure 3A; F(1,17)=11.86, p=0.003, BF = 309.21 for the main effect of decision type; in t-tests against 0 for this averaged activity in the different conditions, corrected for multiple comparisons, an effect was only found for arbitrary decisions (hard: t(17)=5.09, p=0.0001, BF = 307.38; easy: t(17)=5.75, p<0.0001, BF = 1015.84) and not for deliberate ones). The Bayes factor—while trending in the right direction—indicated inconclusive evidence (hard: t(17)=1.24, p>0.5, BF = 0.47; easy: t(17)=1.84, p=0.34, BF = 0.97). Our original baseline was stimulus locked (see Materials and methods). And we hypothesized that the inconclusive Bayes factor for deliberate trials had to do with a constant, slow, negative drift that our model predicted for deliberate trials (see below) rather than reflecting a typical RP. As the RTs for deliberate trials were longer than for arbitrary ones, this trend might have become more pronounced for those trials. To test this, we switched the baseline period to −1000 ms to −500 ms relative to movement onset (i.e., a baseline that immediately preceded our time of interest window). Under this analysis, we found moderate evidence that deliberate decisions (pooled across decision difficulty) are not different from 0 (BF = 0.332), supporting the claim that the RP during the last 500 ms before response onset was completely absent (BF for similarly pooled arbitrary decisions was 5.07·104).

The readiness potentials (RPs) for deliberate and arbitrary decisions.

(A) Mean and SE of the readiness potential (RP; across subjects) in deliberate (red shades) and arbitrary (blue shades) easy and hard decisions in electrode Cz, as well as scalp distributions. Zero refers to time of right/left movement, or response, made by the subject. Notably, the RP significantly differs from zero and displays a typical scalp distribution for arbitrary decisions only. Similarly, temporal clusters where activity was significantly different from 0 were found for arbitrary decisions only (horizontal blue lines above the x axis). Scalp distributions depict the average activity between −0.5 and 0 s, across subjects. The inset bar plots show the mean amplitude of the RP, with 95% confidence intervals, over the same time window. Response-locked potentials with an expanded timecourse, and stimulus-locked potentials are given in Figure 6B and A, respectively. The same (response-locked) potentials as here, but with a movement-locked baseline of −1 to −0.5 s (same as in our Bayesian analysis), are given in Figure 6C. (B) Individual subjects’ Cz activity in the four conditions (n = 18). The linear-regression line for voltage against time over the last 1000 ms before response onset is designated by a dashed, black line. The lines have slopes significantly different from 0 for arbitrary decisions only. Note that the waveforms converge to an RP only in arbitrary decisions.

https://doi.org/10.7554/eLife.39787.004

In an effort to further test for continuous time regions where the RP is different from 0 for deliberate and arbitrary trials, we ran a cluster-based nonparametric permutation analysis (Maris and Oostenveld, 2007) for all four conditions against 0. Using the default parameters (see Materials and methods), we found a prolonged cluster (~1.2 s) of activation that reliably differed from 0 in both arbitrary conditions (designated by horizontal blue-shaded lines above the x axis in Figure 3A). The same analysis revealed no clusters of activity differing from zero in either of the deliberate conditions.

In a similar manner, regressing voltage against time for the last 1000 ms before response onset, the downward trend was significant for arbitrary decisions (Figure 3B; p<0.0001, BF >1025 for both easy and hard conditions) but not for deliberate decisions, with the Bayes factor indicating conclusive evidence for no effect (hard: p>0.5, BF = 0.09; easy: p=0.35, BF = 0.31; all Bonferroni corrected for multiple comparisons). Notably, this pattern of results was also manifested for single-subject analysis (Figure 4; 14 of the 18 subjects had significant downward slopes for arbitrary decisions—that is, p<0.05, Bonferroni corrected for multiple comparisons—when regressing voltage against time for every trial over the last 1000 ms before response onset; but only 5 of the 18 subjects had significant downward slopes for the same regression analysis for deliberate decisions; see Materials and methods. In addition, the average slopes for deliberate and arbitrary decisions were −0.28 ± 0.25 and −1.9 ± 0.32 (mean ± SE), respectively, a significant difference: t(17)=4.55, p<0.001, BF = 380.02). The regression analysis complements the averaged amplitude analysis above, and further demonstrates that the choice of baseline cannot explain our results. This is because the slopes of linear regressions are, by construction, independent of baseline.

Individual-subjects RPs.

Six examples of for individual subjects’ RPs for deliberate decisions (in red) and arbitrary ones (in blue) pooled across decision difficulty.

https://doi.org/10.7554/eLife.39787.005

Control analyses

We further tested whether differences in reaction time between the conditions, eye movements, filtering, and subjects’ consistency scores might explain our effect. We also tested whether the RPs might reflect some stimulus-locked potentials or be due to baseline considerations.

Differences in reaction times (RT) between conditions, including stimulus-locked potentials and baselines, do not drive the effect

RTs in deliberate decisions were typically more than twice as long as RTs in arbitrary decisions. We therefore wanted to rule out the possibility that the absence of RP in deliberate decisions stemmed from the difference in RTs between the conditions. We carried out six analyses for this purpose. First, we ran a median split analysis—dividing the subjects into two groups based on their RTs: lower (faster) and higher (slower) than the median, for deliberate and arbitrary trials, respectively. We then ran the same analysis using only the faster subjects in the deliberate condition (M = 1.91 s, SD = 0.25) and the slower subjects in the arbitrary condition (M = 1.25 s, SD = 0.23). If RT length affects RP amplitudes, we would expect the RP amplitudes to be more similar between these two groups. However, though there were only half the data points, a similar pattern of results to those over the whole dataset was observed (Figure 5A; compare to Figure 3A). Deliberate and arbitrary decisions were still reliably different (F(1,17)=5.22, p=0.03), with significant RPs found in arbitrary (easy: t(8)=4.57, p=0.0018; hard: t(8)=4.09, p=0.0035), but not deliberate (easy: t(8)=1.92, p=0.09; hard: t(8)=0.63, p=0.54) decisions. In addition, the RPs for arbitrary decisions were not significantly different between the subjects with above-median RTs and the entire population for the easy or hard conditions (easy: t(25)=0.14, p>0.5; hard: t(25)=0.56, p>0.5). Similarly, the RPs for deliberate decisions were not significantly different between the subjects with below-median RTs and the entire population for the easy or hard conditions (easy: t(25)=-0.34, p>0.5; hard: t(25)=0.17, p>0.5). This suggest that RTs do not reliably affect Cz activation for deliberate or arbitrary decisions in our results.

Relations between RTs and RPs between subjects (A and B) and within subjects (C and D).

(A) The subjects with above-median RTs for arbitrary decisions (in blue) and below-median RTs for deliberate decisions (in red), show the same activity pattern that was found in the main analysis (compare Figure 3A). (B) A regression of the difference between the RPs versus the difference between the RTs for deliberate and arbitrary decisions for each subject. The equation of the regression line (solid red) is y = 0.54 [CI −0.8, 1.89] x - 0.95 [CI −2.75, 0.85] (confidence intervals: dashed red lines). The R2 is 0.05. One subject, #7, had an RT difference between deliberate and arbitrary decisions that was more than six interquartile ranges (IQRs) away from the median difference across all subjects. That same subject’s RT difference was also more than 5 IQRs higher than the 75th percentile across all subjects. That subject was therefore designated an outlier and removed only from this regression analysis. (C) For each subject separately, we computed the RP using only the faster (below-median RT) deliberate trials and slower (above-median RT) arbitrary trials. The pattern is again the same as the one found for the main analysis. (D) We computed the same regression between the RP differences and the RT differences as in B, but this time the median split was within subjects. The equation of the regression line is y = 1.27 [CI −0.2, 2.73] x - 0.95 [CI 0.14, 1.76]. The R2 is 0.18.

https://doi.org/10.7554/eLife.39787.006

Second, we regressed the difference between RPs in deliberate and arbitrary decisions (averaged over the last 500 ms before response onset) against the difference between the RTs in these two conditions for each subject (Figure 5B). Again, if RT length affects RP amplitudes, we would expect differences between RTs in deliberate and arbitrary conditions to correlate with differences between RPs in the two conditions. But no correlation was found between the two measures (r = 0.28, t(16)=0.86, p=0.4). We further tried regressing the RP differences on RT differences. The regression did not produce any reliable relation between RT and RP differences (regression line: y = 0.54 [CI −0.8, 1.89] x - 0.95 [CI −2.75, 0.85]; the R2 was very low, at 0.05 (as expected from the r value above), and, as the confidence intervals suggest, the slope was not significantly different from 0, F(1,16)=0.74, p=0.4).

While the results of the above analyses suggested that our effects do not stem from differences between the RTs in deliberate and arbitrary decisions, the average RTs for fast deliberate subjects were still 660 ms slower than for slow arbitrary subjects. In addition, we had only half of the subjects in each condition due to the median split, raising the concern that some of our null results might have been underpowered. We also wanted to look at the effect of cross-trial variations within subjects and not just cross-subjects ones. We therefore ran a third, within-subjects analysis. We combined the two decision difficulties (easy and hard) for each decision type (arbitrary and deliberate) for greater statistical power. And then we took the faster (below-median RT) deliberate trials and slower (above-median RT) arbitrary trials for each subject separately. So, this time we had 17 subjects (again, one was removed) and better powered results. Here, fast deliberate arbitrary trials (M = 1.63 s, SD = 0.25) were just 230 ms slower than slow arbitrary decisions (M = 1.40 s, SD = 0.45), on average. This cut the difference between fast deliberate and slow arbitrary by about 2/3 from the between-subjects analysis. We then computed the RPs for just these fast deliberate and slow arbitrary trials within each subject (Figure 5C). Visually, the pattern there is the same as the main analysis (Figure 3A). What is more, deliberate and arbitrary decisions remained reliably different (t(16)=3.36, p=0.004). Arbitrary trials were again different from 0 (t(16)=-4.40, p=0.0005), while deliberate trials were not (t(16)=-1.54, p=0.14).

We further regressed the within-subject differences between RPs in fast deliberate and slow arbitrary decisions (defined as above) against the differences between the corresponding RTs for each subject to ascertain that such a correlation would not exist for trials that are closer together. We again found no reliable relation between the two differences (Figure 5D; regression line: y = 1.27 [CI −0.2, 2.73] x - 0.95 [CI 0.14, 1.76]; R2 = 0.18).

Yet another concern that could relate to the RT differences among the conditions is that the RP in arbitrary blocks might actually be some potential evoked by the stimuli (i.e., the presentations of the two causes), specifically in arbitrary blocks, where the RTs are shorter (and thus stimuli-evoked effects could still affect the decision). In particular, a stimulus-evoked potential might just happen to bear some similarity to the RP when plotted locked to response onset. To test this explanation, we ran a fifth analysis, plotting the potentials in all conditions, locked to the onset of the stimulus (Figure 6A). We also plotted the response-locked potentials across an expanded timecourse for comparison (Figure 6B). If the RP-like shape we see in Figures 3A and 6B is due to a stimulus-locked potential, we would expect to see the following before the four mean response onset times (indicated by vertical lines at 0.98 and 1.00, 2.13, and 2.52 s for arbitrary easy, arbitrary hard, deliberate easy, and deliberate hard, respectively) in the stimulus-locked plot (Figure 6A): Consistent potentials, which precede the mean response times, that would further be of a similar shape and magnitude to the RPs found in the decision-locked analysis in the arbitrary condition (though potentially more smeared for stimulus locking). We thus calculated a stimulus-locked version of our ERPs, using the same baseline (Figure 6A). As the comparison between Figure 6A and B clearly shows, no such consistent potentials were found before the four response times, nor were these potentials similar to the RP in either shape or magnitude (their magnitudes are at the most around 1µV, while the RP magnitudes we found are around 2.5 µV; Figures 3A and 6B). This analysis thus suggests that it is unlikely that a stimulus-locked potential drives the RP we found.

Stimulus- and response-locked Cz-electrode ERPs with different baselines and overlaid events.

(A) Stimulus-locked waveforms including the trial onset range, baseline period, and mean reaction times for all four experimental conditions. (B) Response-locked waveforms with mean stimulus onsets for all four conditions as well as the offset of the highlighting of the selected cause and the start of the next trial. (C) Same potentials and timeline as Figure 3A, but with a response-locked baseline of −1 to −0.5 s—the same baseline used for our Bayesian analysis.

https://doi.org/10.7554/eLife.39787.007

Notably, the stimulus-locked alignment did imply that the arbitrary easy condition evoked a stronger activity in roughly the last 0.5 s before stimulus onset. However, this prestimulus activity cannot explain the response-locked RP, as it was found only in arbitrary easy trials and not in arbitrary hard trials. At the same time, the response-locked RP did not differ between these conditions. What is more, easy and hard trials were randomly interspersed within deliberate and arbitrary blocks, and the subject discovered the trial difficulty only at stimulus onset. Thus, there could not have been differential preparatory activity that varies with decision difficulty. This divergence in one condition only is accordingly not likely to reflect any preparatory RP activity.

One more concern is that the differences in RTs may affect the results in the following manner: Because the main baseline period we used thus far was 1 to 0.5 s before stimulus onset, the duration from the baseline to the decision varied widely between the conditions. To make sure this difference in temporal distance between the baseline period and the response to which the ERPs were locked did not drive our results, we recalculated the potentials for all conditions with a response-locked baseline of −1 to −0.5 s (Figure 6C; the same baseline we used for the Bayesian analysis above). The rationale behind this choice of baseline was to have the time that elapsed from baseline to response onset be the same across all conditions. As is evident in Figure 6C, the results for this new baseline were very similar to those for the stimulus-locked baseline we used before. Focusing again on the −0.5 to 0 s range before response onset for our statistical tests, we found a clear RP in arbitrary decisions, yet RP amplitude was not significantly different from 0 in deliberate decisions (Figure 6C; ANOVA F(1,17)=12.09, p=0.003 for the main effect of decision type; in t-tests against 0, corrected for multiple comparisons, an effect was only found for arbitrary decisions (hard: t(17)=4.13, p=0.0007; easy: t(17)=4.72, p=0.0002) and not for deliberate ones (hard: t(17)=0.38, p>0.5; easy: t(17)=1.13, p=0.27). This supports the notion that the choice of baseline does not strongly affect our main results. Taken together, the results of the six analyses above provide strong evidence against the claim that the differences in RPs stem from or are affected by the differences in RTs between the conditions.

Eye movements do not affect the results

Though ICA was used to remove blink artifacts and saccades (see Materials and methods), we wanted to make sure our results do not stem from differential eye movement patterns between the conditions. We therefore computed a saccade-count metric (SC; see Materials and methods) for each trial for all subjects. Focusing again on the last 500 ms before response onset, we computed mean (± s.e.m.) SC values of 1.65 ± 0.07 and 1.67 ± 0.06 saccades for easy and hard deliberate decisions, respectively, and 1.69 ± 0.07 and 1.73 ± 0.07 saccades for easy and hard arbitrary decisions, respectively. We found no reliable differences between the number of saccades during deliberate and arbitrary trials (F(1,17)=2.56, p=0.13 for main effect of decision type).

We further investigated potential effects of saccades by running a median-split analysis—dividing the trials for each subject into two groups based on their SC score: lower and higher than the median, for deliberate and arbitrary trials, respectively. We then ran the same analysis using only the trials with more saccades in the deliberate condition (SC was 2.02 ± 0.07 and 2.04 ± 0.07 for easy and hard, respectively) and those with less saccades for the arbitrary condition (SC was 1.33 ± 0.07 and 1.31 ± 0.08 for easy and hard, respectively). If the number of saccades affects RP amplitudes, we would expect that the differences in RPs between arbitrary and deliberate trials will diminish, or even reverse (as now we had more saccades in the deliberate condition). However, though there were only half the data points for each subject in each condition, a similar pattern of results to those over the whole dataset was observed: Deliberate and arbitrary decisions were still reliably different within the median-split RPs (F(1,17)=16.70, p<0.001), with significant RPs found in arbitrary (easy: t(17)=4.79, p=0.002; hard: t(17)=5.77, p<0.001), but not deliberate (easy: t(17)=0.90, p=0.38; hard: t(17)=0.30, p>0.5) decisions. In addition, we compared the RP data across all the trials with the median-split RP data above. No significant differences were found for arbitrary decisions (easy: t(17)=1.02, p=0.32; hard: t(17)=0.75, p=0.46) or for deliberate decisions (easy: t(17)=1.63, p=0.12; hard: t(17)=1.47, p=0.16). Taken together, the analyses above provide strong evidence against the involvement of eye movements in our results.

Testing alternative explanations

We took a closer look at subjects’ behavior in the easy arbitrary condition, where some subjects had a consistency score that was further above 0.5 (chance) than others. It seems like those subjects had a greater difficulty ignoring their preferences, despite the instructions to do so. We therefore wanted to test to what extent the RP of those subjects was similar to the RPs of the other subjects. Focusing on the eight subjects that had a consistency score above 0.55 (M = 0.59, SD = 0.03) and comparing their RPs to those of the 10 other subjects (consistency M = 0.50, SD = 0.06) in easy arbitrary trials, we found no reliable differences (t(16)=0.94, p=0.36). This is not surprising, as the mean consistency score of these subjects—though higher than chance—was still far below their consistency score for easy deliberate decisions (M = 0.99, SD = 0.02).

High-pass filter cutoff frequency does not affect the results

Finally, another alternative explanation for the absence of an RP in deliberate decisions might rely on our selection of high-pass filter cutoff frequency, which was 0.1 Hz. Though this frequency was used in some studies of the RP (e.g., Lew et al., 2012; MacKinnon et al., 2013), others opted for lower cutoff frequencies (e.g., Haggard and Eimer, 1999). Arguably, a higher cutoff frequency for the high-pass filter might reduce the chances to find the RP, which is a low-frequency component. And this might have affected the RP for deliberate decisions more than the RP for arbitrary ones, given the slower RTs there. To examine this possible confound, we reanalyzed the data using a 0.01 high-pass filter. This reduced the number of usable trials for each subject, as it allowed lower-frequency trends to remain in the data. Given that our focus was on arbitrary vs. deliberate decisions (with decision difficulty serving mostly to validate the manipulation), we collapsed the trials across decision difficulty, and only tested RP amplitudes in arbitrary vs. deliberate decisions against each other and against zero. In line with our original results, a difference was found between RP amplitude in the two conditions (t(13)=2.29, p=0.039), with RP in the arbitrary condition differing from zero (t(13)=-5.71, p<0.0001), as opposed to the deliberate condition, where it did not (t(13)=-0.76, p=0.462). This provides evidence against the claim that our results are due to our choice of high-pass filter.

EEG results: Lateralized Readiness Potential (LRP)

The LRP, which reflects activation processes within the motor cortex for action preparation after action selection (Eimer, 1998; Masaki et al., 2004), was measured by subtracting the difference potentials (C3-C4) in right-hand response trials from this difference in left-hand response trials and averaging the activity over the same time window (Eimer, 1998; Haggard and Eimer, 1999). In this purely motor component, no difference was found between the two decision types and conclusive evidence against an effect of decision type was further found (Figure 7; all Fs < 0.35; BF = 0.299). Our analysis of EOG channels suggests that some of that LRP might be driven by eye movements (we repeated the LRP computation on the EOG channels instead of C3 and C4). However, the shape of the eye-movement-induced LRP is very different from the LRP we calculated from C3 and C4. Also, the differences that we found between conditions in the EOG LRP are not reflected in the C3/C4 LRP. So, while our LRP might be boosted by eye movements, it is not strictly driven by these eye movements.

Lateralized readiness potential (LRP).

The lateralized LRP for deliberate and arbitrary, easy and hard decisions. No difference was found between the conditions (ANOVA all Fs < 0.35). Temporal clusters where the activity for each condition was independently found to be significantly different from 0 are designated by horizontal thick lines at the bottom of the figure (with their colors matching the legend).

https://doi.org/10.7554/eLife.39787.008

Modeling

The main finding of this study—the absent (or at least strongly diminished) RP in deliberate decisions, suggesting different neural underpinnings of arbitrary and deliberate decisions—is in line with a recent study using a drift-diffusion model (DDM) to investigate the RP (Schurger et al., 2012). There, the RP was modeled as an accumulation of white noise (that results in autocorrelated noise) up to a hard threshold. When activity crosses that threshold, it designates decision completion leading to movement. The model focuses on the activity leading up to the threshold crossing, when that activity is time-locked to the onset of the threshold crossing (corresponding to movement-locked epochs in EEG). Averaging across many threshold crossings, this autocorrelation activity resembles an RP (Schurger et al., 2012). Hence, according to this model, the exact time of the threshold crossing leading to response onset is largely determined by spontaneous, subthreshold, stochastic fluctuations of the neural activity. This interpretation of the RP challenges its traditional understanding as stemming from specific, unconscious preparation for, or ballistic-like initiation of, movement (Shibasaki and Hallett, 2006). Instead, Schurger and colleagues claimed, the RP is not a cognitive component of motor preparation; it is an artifact of accumulating autocorrelated noise to a hard threshold and then looking at signals only around threshold crossings.

We wanted to investigate whether our results could be accommodated within the general framework of the Schurger model, though with the deliberate and arbitrary decisions mediated by two different mechanisms. The first mechanism is involved in value assessment and drives deliberate decisions. It may be subserved by brain regions like the Ventromedial Prefrontal Cortex; VMPFC, (Ramnani and Owen, 2004; Wallis, 2007). But, for the sake of the model, we will remain agnostic about the exact location associated with deliberate decisions and refer to this region as Region X. A second mechanism, possibly at the (pre-)SMA, was held to generate arbitrary decisions driven by random, noise fluctuations.

Accordingly, we expanded the model developed by Schurger et al. (2012) in two manners. First, we defined two DDM processes—one devoted to value-assessment (in Region X) and the other to noise-generation (in SMA; see Figure 8A and Materials and methods). Both of them were run during both decision types, yet the former determined the result of deliberate trials, and the latter determined the results of arbitrary trials. Second, Schurger and colleagues modeled only when subjects would move and not what (which hand) subjects would move. We wanted to account for the fact that, in our experiment, subjects not only decided when to move, but also what to move (either to indicate which NPO they prefer in the deliberate condition, or to generate a meaningless right/left movement in the arbitrary condition). We modeled this by defining two types of movement. One was moving the hand corresponding to the location of the NPO that was rated higher in the first, rating part of the experiment (the congruent option; see Materials and methods). The other was moving the hand corresponding to the location of the lower-rated NPO (the incongruent option). We used the race-to-threshold framework to model the decision process between this pair of leaky, stochastic accumulators, or DDMs. One DDM simulated the process that leads to selecting the congruent option, and the other simulated the process that leads to selecting the incongruent option (see again Figure 8A). (We preferred the race-to-threshold model over a classic DDM with two opposing thresholds because we think it is biologically more plausible [de Lafuente et al., 2015] and because it is easier to see how a ramp-up-like RP might be generated from such a model without requiring a vertical flip of the activity accumulating toward one of the thresholds in each model run.) Hence, in each model run, the two DDMs in Region X and the two in the SMA ran in parallel; the first one to cross the threshold (only in Region X for deliberate decisions and only in the SMA for arbitrary ones) determined decision completion and outcome. Thus, if the DDM corresponding to the congruent (incongruent) option reached the threshold first, the trial ended with selecting the congruent (incongruent) option. For deliberate decisions, the congruent cause had a higher value than the incongruent cause and, accordingly, the DDM associated with the congruent option had a higher drift rate than that of the DDM associated with the incongruent option. For arbitrary decisions, the values of the decision alternatives mattered little and this was reflected in the small differences among the drift rates and in other model parameters (Table 1).

Model description and model runs in the SMA and in Region X.

(A) A block diagram of the model, with its noise (SMA) and value (Region X) components, each instantiated as a race to threshold between a pair of DDMs (or causes—one congruent with the ratings in the first part of the experiment, the other incongruent). (B) A few runs of the model in the deliberate condition, in Region X (green colors), depicting the DDM for the congruent option. As is apparent, the DDM stops when the value-based component reaches threshold. Red arrows point from the Region X DDM trace at threshold to the corresponding time in the trace of the SMA (black and gray colors). The SMA traces integrate without a threshold (as the decision outcome is solely determined by the value component in Region X). The thick green and black lines depict average Region X and SMA activity, respectively, over 10,000 model runs locked to stimulus onset. Hence, we do not expect to find an RP in either brain region. (For decision-locked activity see Figure 9B).

https://doi.org/10.7554/eLife.39787.009
Table 1
Values of the model’s parameters across decision types and decision difficulties.

Values of the drift-rate parameter, I, for the congruent and incongruent options; for the leak rate, k; and for the noise scaling factor, c. We fixed the threshold at the value of 0.3. The values in the table are for the component of the model where the decisions were made. Hence, they are for Region X in deliberate decisions and for the SMA in arbitrary ones. Note that, for deliberate decisions, drift-rate values in the SMA were 1.45 times smaller than the values in this table for each entry.

https://doi.org/10.7554/eLife.39787.010
Decision typeDecision difficultyIcongruentIincongruentKC
Deliberate decisions
(Region X values)
Easy0.230.060.520.08
Hard0.180.090.530.11
Arbitrary decisions
(SMA values)
Easy0.240.210.530.22
Hard0.220.200.540.23

Therefore, within this framework, Cz-electrode activity (above the SMA) should mainly reflect the SMA component (Note that we suggest that noise generation might be a key function of the SMA and other brain regions underneath the Cz electrode, specifically during this task. When subjects make arbitrary decisions, these might be based on some symmetry-breaking mechanism, which is driven by random fluctuations that are here simulated as noise. Thus, we neither claim nor think that noise generation is the main purpose or function of these brain regions in general.) And so, finding that the model-predicted EEG activity resembling the actual EEG pattern we found would imply that our findings are compatible with an account by which the RP represents an artifactual accumulation of stochastic, autocorrelated noise, rather than indexing a genuine marker of an unconscious decision ballistically leading to action.

For ease of explanation, and because decision difficulty had no consistent effect on the EEG data, we focus the discussion below on easy decisions (though the same holds for hard decisions). For arbitrary decisions, the SMA (or Noise) Component of the model is the one determining the decisions, and it is also the one which we pick up in electrode Cz. Hence, the resulting activity would be much like the original Schurger et al. (2012) model and we would expect to see RP-like activity, which we do see (Figure 9B). But the critical prediction of our model for our purposes relates to what happens during deliberate decisions in the SMA (Cz electrode). According to our model, the race-to-threshold pair of DDMs that would determine deliberate decisions and trigger the ensuing action is the value-assessment one in Region X. Hence, when the first DDM of the Region X pair would reach the threshold, the decision would be completed, and movement would ensue. At the same time and in contrast, the SMA pair would not integrate toward a decision (Figure 8B). We modeled this by not including any decision threshold in the SMA in deliberate decisions (i.e., the threshold was set to infinity, letting the DDM accumulate forever). (The corresponding magnitudes of the drift-rate and other parameters are detailed in the Materials and methods and Table 1.) So, when Region X activity reaches the threshold, the SMA (supposedly recorded using electrode Cz) will have happened to accumulate to some random level (Figure 8B). This entails that, when we align such SMA activity to decision (or movement) onset, we will find just a simple, weak linear trend in the SMA. Importantly, the RP is measured in electrode Cz above the SMA. Hence, we search for it in the SMA (or Noise) Component of our model (and not in Region X). The expected trend in the SMA is the one depicted in red in Figure 9B for the deliberate easy and hard conditions (here model activity was flipped vertically—from increasing above the x axis to decreasing below it—as in Schurger et al., 2012). In arbitrary decisions, on the other hand, the SMA pair, from which we record, is also the one that determines the outcome. Hence, motion ensues whenever one of the DDMs crosses the threshold. Thus, when its activity is inspected with respect to movement onset, it forms the RP-like shape of Figure 9B (in blue), in line with the model by Schurger et al. (2012). Note that the downward trend for deliberate hard trials is slightly smaller than for deliberate easy (Figure 9B). While the noise in the empirical EEG signals prohibits reliable statistical differences, the trend in the empirical data is interestingly in the same direction (see the last 500 ms before movement onset in Figure 3A).

Empirical and model RTs and model prediction for Cz activity.

(A) The model (solid) and empirical (dashed) distributions of subjects’ RTs. We present both the data as fitted with gamma functions to the cumulative distributions (see Materials and methods) across the four decision types in the main figure, and the original, non-fitted data in the inset. (B) The model’s prediction for the ERP activity in its Noise Component (Figure 8A) in the SMA (electrode Cz), locked to decision completion (at t = 0 s), across all four decision types.

https://doi.org/10.7554/eLife.39787.011

Akin to the Schurger model, we simultaneously fit our DDMs to the complete distribution of our empirical reaction-times (RTs; Figure 9A) and to the empirical consistency scores (the proportions of congruent decisions; see Materials and methods). The models’ fit to the empirical RTs and consistencies were good (RT and consistency errors for deliberate easy and hard were 0.054 and 0.004 and 0.166 and 0.013 respectively; for arbitrary easy and hard they were 0.053 and 0.002, and 0.055 and 0.003, respectively; Figure 9A) The averages of these empirical RT distributions were 2.13, 2.52, 0.98 and 1.00 s and the empirical consistency scores were 0.99, 0.83, 0.54 and 0.49 for deliberate easy, deliberate hard, arbitrary easy, and arbitrary hard, respectively.

Once we found the models with the best fit (see Materials and methods for details), we used those to predict the resulting ERP patterns in the SMA—that is, those we would expect to record in Cz. The ERP that the model predicted was the mean between the congruent and incongruent activities, as both would be picked up by Cz. The result was an RP-like activity for arbitrary decisions, but only a very slight slope for deliberate decisions (Figure 9B; both activities were flipped vertically, as in Schurger’s model). This was in line with our empirical results (compare Figure 3A).

Note that the Schurger model aims to account for neural activity leading up to the decision to move, but no further (Schurger et al., 2012). Similarly, we expect our DDM to fit Cz neural data only up to around −0.1 to −0.2 s (100 to 200 ms before empirical response onset). What is more, we model Region X activity here using a DDM for simplicity. But we would get a similar result—SMA RP-like activity for arbitrary decision and only a trend for deliberate ones—for other models of decision making, as long as the completion of deliberate decisions would remain statistically independent from threshold crossing in the DDMs of the SMA. Further, we make no claims that ours is the only, or even optimal, model that explains our results. Rather, by specifically extending the Schurger model, our goal was to show how that interpretation of the RP could also be applied to our more-complex paradigm. (We refer the reader to work by Schurger and colleagues [Schurger, 2018; Schurger et al., 2012] for more detailed discussions about the model, its comparison to other models, and the relation to conscious-decision completion).

Discussion

Since the publication of Libet’s seminal work—which claimed that neural precursors of action, in the form of the RP, precede subjects’ reports of having consciously decided to act (Libet et al., 1983)—a vigorous discussion has been raging among neuroscientists, philosophers, and other scholars about the meaning of the RP for the debate on free will (recent collections include Mele, 2015; Pockett et al., 2009; Sinnott-Armstrong and Nadel, 2011). Some claim that the RP is simply a marker for an unconscious decision to act and thus its onset at least reflects an intention to move and ballistically leads to movement (Libet et al., 1983; Soon et al., 2008). Under this interpretation, the onset of the RP before the reported completion of the conscious decision to move effectively removes conscious will from the causal chain leading to action (Haggard, 2005; Haggard, 2008; Libet, 1985; Wegner, 2002). Others do not agree (Breitmeyer, 1985; Mele, 2009; Nahmias et al., 2014; Roskies, 2010). But, regardless, the RP lies at the heart of much of this debate (Kornhuber and Deecke, 1990; Libet et al., 1983).

Notably, the RP and similar findings showing neural activations preceding the conscious decision to act have typically been based on arbitrary decisions (Haggard and Eimer, 1999; Lau et al., 2004; Libet, 1985; Libet et al., 1983; Sirigu et al., 2004; Soon et al., 2008; Soon et al., 2013). This, among other reasons, rested on the notion that for an action to be completely free, it should not be determined in any way by external factors (Libet, 1985)—which is the case for arbitrary, but not deliberate, decisions (for the latter, each decision alternative is associated with a value, and the values the of alternatives typically guide one’s decision). But this notion of freedom faces several obstacles. First, most discussions of free will focus on deliberate decisions, asking when and whether these are free (Frankfurt, 1971; Hobbes, 1994; Wolf, 1990). This might be because everyday decisions to which we associate freedom of will—like choosing a more expensive but more environmentally friendly car, helping a friend instead of studying more for a test, donating to charity, and so on—are generally deliberate, in the sense of being reasoned, purposeful, and bearing consequences (although see Deutschländer et al., 2017). In particular, the free will debate is often considered in the context of moral responsibility (e.g., was the decision to harm another person free or not) (Fischer, 1999; Haggard, 2008; Maoz and Yaffe, 2016; Roskies, 2012; Sinnott-Armstrong, 2014; Strawson, 1994), and free will is even sometimes defined as the capacity that allows one to be morally responsible (Mele, 2006; Mele, 2009). In contrast, it seems meaningless to assign blame or praise to arbitrary decisions. Thus, though the scientific operationalization of free will has typically focused on arbitrary decisions, the common interpretations of these studies—in neuroscience and across the free will debate—have often alluded to deliberate decisions. This is based on the implicit assumption that the RP studies capture the same, or a sufficiently similar, process as that which occurs in deliberate decisions. And so, inferences from RP results on arbitrary decisions can be made to deliberate decisions.

However, here we show that this assumption may not be justified, as the neural precursors of arbitrary decisions, at least in the form of the RP, do not generalize to meaningful, deliberate decisions (Breitmeyer, 1985; Roskies, 2010). For arbitrary decisions, we replicated the waveform found in previous studies, where the subjects moved endogenously, at a time of their choice with no external cues (Shibasaki and Hallett, 2006). The RP was also similarly recorded in the Cz electrode and with a typical scalp topography. However, the RP was altogether absent—or at least substantially diminished—for deliberate decisions; it showed neither the expected slope over time nor the expected scalp topography.

Null-hypothesis significance testing (NHST) suggested that the null hypothesis—that is, that there is no RP—can be rejected for arbitrary decisions but cannot be rejected for deliberate ones. A cluster-based nonparametric permutation analysis—to locate temporal windows where EEG activity is reliably different from 0—found prolonged activity of this type during the 1.2 s before movement onset for easy and hard arbitrary decisions, but no such activity was found for either type of deliberate decisions. A Bayesian analysis found clear evidence for an RP in arbitrary decisions and an inconclusive trend toward no RP in deliberate decisions. Changing the baseline to make it equally distant from arbitrary and deliberate decisions did provide evidence for the absence of an RP in deliberate decisions, while still finding clear evidence for an RP in arbitrary decisions. Further, baseline-invariant trend analysis showed that there is no trend during the RP time window for deliberate decisions (here Bayesian analysis suggested moderate to strong evidence against a trend) while there existed a reliable trend for arbitrary decisions (and extremely strong evidence for an effect in the Bayesian framework). Thus, taken together, there is overwhelming evidence for an RP in arbitrary decisions (in all six different analyses that we conducted—NHST and Bayesian). But, in contrast, we found no evidence for the existence of an RP in deliberate decisions (in all six analyses) and, at the same time, there was evidence against RP existence in such decisions (in five of the six analyses, with the single, remaining analysis providing only inconclusive evidence for an absence of an RP). Therefore, when the above analyses are taken together, we think that the most plausible interpretation of our results is that the RP is absent in deliberate decisions.

Nevertheless, even if one takes our results to imply that the RP is only strongly diminished in deliberate compared to arbitrary decisions, this provides evidence against drawing strong conclusions regarding the free-will debate from the Libet and follow-up results. The assumption in the Libet-type studies is that the RP simply reflects motor preparation (Haggard, 2019; Haggard and Eimer, 1999; Libet, 1985; Libet et al., 1983; Shibasaki and Hallett, 2006) and in that it lives up to its name. However, in our paradigm, both the sensory inputs and the motor outputs were the same between arbitrary and deliberate trials. Thus, motor preparation is expected in both conditions and the RP should have been found in both. Accordingly, any consistent difference in the RP between the decision types suggests—at the very least—that it is a more complex signal than Libet and colleagues had assumed. For one, it shows that it is influenced by cognitive state and that it cannot be regarded as a genuine index of a voluntary decision, be it arbitrary or deliberate. Further, our model predicted an RP in arbitrary decisions but only a slow trend in movement-locked ERP during deliberate decisions that is in the same direction as the RP, but is not an RP. Hence, a signal that resembles a strongly diminished RP but is in fact just slow trend in the same direction as the RP is congruent with our model.

Interestingly, while the RP was present in arbitrary decisions but absent in deliberate ones, the LRP—a long-standing, more-motor ERP component, which began much later than the RP—was indistinguishable between the different decision types. This provides evidence that, at the motor level, the neural representation of the deliberate and arbitrary decisions that our subjects made may have been indistinguishable, as was our intention when designing the task.

Our findings and the model thus suggest that two different neural mechanisms may be involved in arbitrary and deliberate decisions. Earlier literature demonstrated that deliberate, reasoned decision-making—which was mostly studied in the field of neuroeconomics (Kable and Glimcher, 2009) or using perceptual decisions (Gold and Shadlen, 2007)—elicited activity in the prefrontal cortex (PFC; mainly the dorsolateral (DLPFC) part (Sanfey et al., 2003; Wallis and Miller, 2003) and ventromedial (VMPFC) part/orbitofrontal cortex (OFC) (Ramnani and Owen, 2004; Wallis, 2007) and the anterior cingulate cortex (ACC) (Bush et al., 2000; Carter et al., 1998). Arbitrary, meaningless decisions, in contrast, were mainly probed using variants of the Libet paradigm, showing activations in the Supplementary Motor Area (SMA), alongside other frontal areas like the medial frontal cortex (Brass and Haggard, 2008; Krieghoff et al., 2011) or the frontopolar cortex, as well as the posterior cingulate cortex (Fried et al., 2011; Soon et al., 2008) (though see Hughes et al., 2011, which suggests that a common mechanism may underlie both decision types). Possibly then, arbitrary and deliberate decisions may differ not only with respect to the RP, but be subserved by different underlying neural circuits, which makes generalization from one class of decisions to the other more difficult. Deliberate decisions are associated with more lateralized and central neural activity while arbitrary ones are associated with more medial and frontal ones. This appears to align with the different brain regions associated with the two decision types above, as also evidenced by the differences we found between the scalp distributions of arbitrary and deliberate decisions (Figure 3A). Further studies are needed to explore this potential divergence in the neural regions between the two decision types.

Therefore, at the very least, our results support the claim that the previous findings regarding the RP should be confined to arbitrary decisions and do not generalize to deliberate ones. What is more, if the ubiquitous RP does not generalize, it cannot simply be assumed that other markers will. Hence, such differences clearly challenge the generalizability of previous studies focusing on arbitrary decisions to deliberate ones, regardless of whether they were based on the RP or not. In other words, our results put the onus on attempts to generalize markers of upcoming action from arbitrary to deliberate decisions; it is on them now to demonstrate that those markers do indeed generalize. And, given the extent of the claims made and conclusions derived based on the RP in the neuroscience of free will (see again Mele, 2015; Pockett et al., 2009; Sinnott-Armstrong and Nadel, 2011), our findings call for a re-examination of some of the basic tenents of the field.

It should be noted that our study does not provide positive evidence that consciousness is more involved in deliberate decisions than in arbitrary ones; such a strong claim requires further evidence, perhaps from future research. But our results highlight the need for such research. Under some (strong) assumptions, the onset of the RP before the onset of reported intentions to move may point to there being no role for consciousness in arbitrary decisions. But, even if such conclusions can be reached, they cannot be safely extended to deliberate decisions.

To be clear, and following the above, we do not claim that the RP captures all unconscious processes that precede conscious awareness. However, some have suggested that the RP represents unconscious motor-preparatory activity before any kind of decision (e.g., Libet, 1985). But our results provide evidence against that claim, as we do not find an RP before deliberate decisions, which also entail motor preparation. Furthermore, in deliberate decisions in particular, it is likely that there are neural precursors of upcoming actions—possibly involving the above neural circuits as well as circuits that represents values—which are unrelated to the RP (the lack of such precursors is not merely implausible; it implies dualism: Mudrik and Maoz, 2015; Wood, 1985).

Note also that we did not attempt to clock subjects’ conscious decision to move. Rather, we instructed them to hold their hands above the relevant keyboard keys and press their selected key as soon as they made up their mind. This was to keep the decisions in this task more ecological and because we think that the key method of measuring decision completion (using some type of clock to measure Libet’s W-time) is highly problematic (see Materials and methods). But, even more importantly, clock monitoring was demonstrated to have an effect on RP size (Miller et al., 2011), so it could potentially confound our results (Maoz et al., 2015).

Some might also claim that unconscious decision-making could explain our results, suggesting that in arbitrary decisions subjects engage in unconscious deliberation or in actively inhibiting their urge to follow their preference as well as in free choice, while in deliberate decisions only deliberation is required. But this interpretation is unlikely because the longer RTs in deliberate decisions suggest, if anything, that more complex mental processes (conscious or unconscious) took place before deliberate and not arbitrary decisions. In addition, these interpretations should impede our chances of finding the RP in arbitrary trials (as the design diverges from the original Libet task), yet the RP was present, rendering them less plausible.

Aside from highlighting the neural differences between arbitrary and deliberate decisions, this study also challenges a common interpretation of the function of the RP. If the RP is not present before deliberate action, it does not seem to be a necessary link in the general causal chain leading to action. Schurger et al. (2012) suggested that the RP reflects the accumulation of autocorrelated, stochastic fluctuations in neural activity that lead to action, following a threshold crossing, when humans arbitrarily decide to move. According to that model, the shape of the RP results from the manner in which it is computed from autocorrelated EEG: averaged over trials that are locked to response onset (that directly follows the threshold crossing). Our results and our model are in line with that interpretation and expand it to decisions that include both when and which hand to move. They suggest that the RP represents the accumulation of noisy, random fluctuations that drive arbitrary decisions, whereas deliberate decisions are mainly driven by the values associated with the decision alternatives (Maoz et al., 2013).

Our drift-diffusion model was based on the assumption that every decision can be driven by a component reflecting the values of the decision alternatives (i.e., subjects’ support for the two NPOs we presented) or by another component representing noise—random fluctuations in neural activity. The value component plays little to no role in arbitrary decisions, so action selection and timing depend on when the accumulation of noise crosses the decision threshold for the congruent and incongruent decision alternatives. In deliberate decisions, in contrast, the value component drives the decisions, while the noise component plays little to no role. Thus, in arbitrary decisions, action onset closely tracks threshold crossings of the noise (or SMA) component. But, in deliberate decisions, the noise component reaches a random level and is then stopped; so, the value component drives the decision. Hence, as we record from the SMA (the noise component)—locking the ERP to response onset and averaging over trials to obtain the RP leads to a slight slope for deliberate decisions but to the expected RP shape in arbitrary decisions. This provides evidence that the RP does not reflect subconscious movement preparation. Rather, it is induced by threshold crossing of stochastic fluctuations in arbitrary decisions, which do not drive deliberate decisions; accordingly, the RP is not found in deliberate decisions. Our model therefore challenges RP-based claims against free will in both arbitrary and deliberate decisions. Further studies of the causal role of consciousness in deliberate versus arbitrary decisions are required to test this claim.

Nevertheless, two possible, alternative explanations of our results can be raised. First, one could claim that—in the deliberate condition only—the NPO names act as a cue, thereby creating a stimulus-response mapping and in that turning what we term internal, deliberate decisions into no more than simple responses to external stimuli. Under this account, if the preferred NPO is on the right, it is immediately interpreted as ‘press right’. It would therefore follow that subjects are actually not making decisions in deliberate trials, which in turn is reflected by the absence of the RP in those trials. However, the reaction time and consistency results that we obtained for deliberate trials provide evidence against this interpretation. We found longer reaction times for hard-deliberate decisions than for easy-deliberate ones (2.52 versus 2.13 s, on average, respectively; Figure 2 left) and higher consistencies with the initial ratings for easy-deliberate decisions than for hard-deliberate decisions (0.99 versus 0.83, on average, respectively; Figure 2 right). If the NPO names acted as mere cues, we would have expected no differences between reaction times or consistencies for easy- and hard-deliberate decisions. In addition, there were 50 different causes in the first part of the experiment. So, it is highly unlikely that subjects could memorize all 1225 pairwise preferences among these causes and simply transform any decision between a pair of causes into a stimulus instructing to press left or right.

Another alternative interpretation of our results is that subjects engage in (unconscious) deliberation also during arbitrary decisions (Tusche et al., 2010), as they are trying to find a way to break the symmetry between the two possible actions. If so, the RP in the arbitrary decisions might actually reflect the extra effort in those types of decisions, which is not found in deliberate decisions. However, this interpretation entails a longer reaction time for arbitrary than for deliberate decisions, because of the heavier cognitive load, which is the opposite of what we found (Figure 2A).

In conclusion, our study suggests that RPs do not precede deliberate decisions (or at the very least are strongly diminished before such decisions). In addition, it suggests that RPs represent an artificial accumulation of random fluctuations rather than serving as a genuine marker of an unconscious decision to initiate voluntary movement. Hence, our results challenge RP-based claims of Libet and follow-up literature against free will in arbitrary decisions and much more so the generalization of these claims to deliberate decisions. The neural differences we found between arbitrary and deliberate decisions as well as our model further put the onus on any study trying to draw conclusions about the free-will debate from arbitrary decisions to demonstrate that these conclusions generalize to deliberate ones. This motivates future investigations into other precursors of action besides the RP using EEG, fMRI, or other techniques. It also highlights that it would be of particular interest to find the neural activity that precedes deliberate decisions as well as neural activity, which is not motor activity, that is common to both deliberate and arbitrary decisions.

Materials and methods

Subjects

Twenty healthy subjects participated in the study. They were California Institute of Technology (Caltech) students as well as members of the Pasadena community. All subjects had reported normal or corrected-to-normal sight and no psychiatric or neurological history. They volunteered to participate in the study for payment ($20 per hour). Subjects were prescreened to include only participants who were socially involved and active in the community (based on the strength of their support of social causes, past volunteer work, past donations to social causes, and tendency to vote). The data from 18 subjects was analyzed; two subjects were excluded from our analysis (see Sample size and exclusion criteria below). The experiment was approved by Caltech’s Institutional Review Board (14–0432; Neural markers of deliberate and random decisions), and informed consent was obtained from all participants after the experimental procedures were explained to them.

Sample size and exclusion criteria

Request a detailed protocol

We ran a power analysis based on the findings of Haggard and Eimer (1999). Their RP in a free left/right-choice task had a mean of 5.293 µV and standard deviation of 2.267 µV. Data from a pilot study we ran before this experiment suggested that we might obtain smaller RP values in our task (they referenced to the tip of the nose and we to the average of all channels, which typically results in a smaller RP). Therefore, we conservatively estimated the magnitude of our RP as half of that of Haggard and Eimer, 2.647 µV, while keeping the standard deviation the same at 2.267 µV. Our power analysis therefore suggested that we would need at least 16 subjects to reliably find a difference between an RP and a null RP (0 µV) at a p-value of 0.05 and power of 0.99. This number agreed with our pilot study, where we found that a sample size of at least 16 subjects resulted in a clear, averaged RP. Following the above reasoning, we decided beforehand to collect 20 subjects for this study, taking into account that some could be excluded as they would not meet the following predefined inclusion criteria: at least 30 trials per experimental condition remaining after artifact rejection; and averaged RTs (across conditions) that deviated by less than three standard deviations from the group mean.

Subjects were informed about the overall number of subjects that would participate in the experiment when the NPO lottery was explained to them (see below). So, we had to finalize the overall number of subjects who would participate in the study—but not necessarily the overall number of subjects whose data would be part of the analysis—before the experiment began. After completing data collection, we ran only the EEG preprocessing and behavioral-data analysis to test each subject against the exclusion criteria. This was done before we looked at the data with respect to our hypothesis or research question. Two subjects did not meet the inclusion criteria: the data of one subject (#18) suffered from poor signal quality, resulting in less than 30 trials remaining after artifact rejection; another subject (#12) had RTs longer than three standard deviations from the mean. All analyses were thus run on the 18 remaining subjects.

Stimuli and apparatus

Request a detailed protocol

Subjects sat in a dimly lit room. The stimuli were presented on a 21’ Viewsonic G225f (20’ viewable) CRT monitor with a 60 Hz refresh rate and a 1024 × 768 resolution using Psychtoolbox version three and Mathworks Matlab 2014b (Brainard, 1997; Pelli, 1997). They appeared with a gray background (RGB values: [128, 128,128]). The screen was located 60 cm away from subjects' eyes. Stimuli included names of 50 real, non-profit organizations (NPOs). Twenty organizations were consensual (e.g., the Cancer Research Institute, or the Hunger project), and thirty were more controversial: we chose 15 causes that were widely debated (e.g., pro/anti guns, pro/anti abortions), and selected one NPO that supported each of the two sides of the debate. This was done to achieve variability in subjects’ willingness to donate to the different NPOs. In the main part of the experiment, succinct descriptions of the causes (e.g., pro-marijuana legalization, pro-child protection; for a full list of NPOs and causes see Supplementary file 1) were presented in black Comic Sans MS.

Study design

Request a detailed protocol

The objective of this study was to compare ERPs elicited by arbitrary and deliberate decision-making, and in particular the RP. We further manipulated decision difficulty to validate our manipulation of decisions type: we introduced hard and easy decisions which corresponded to small and large differences between subjects’ preferences for the pairs of presented NPOs, respectively. We reasoned that if the manipulation of decision type (arbitrary vs. deliberate) was effective, there would be behavioral differences between easy and hard decisions for deliberate choices but not for arbitrary choices (because differences in preferences should not influence subjects’ arbitrary decisions). Our 2 × 2 design was therefore decision type (arbitrary vs. deliberate) by decision difficulty (easy vs. hard). Each condition included 90 trials, separated into 10 blocks of 9 trials each, resulting in a total of 360 trials and 40 blocks. Blocks of different decision types were randomly intermixed. Decision difficulty was randomly counterbalanced across trials within each block.

Experimental procedure

Request a detailed protocol

In the first part of the experiment, subjects were presented with each of the 50 NPOs and the causes with which the NPOs were associated separately (see Supplementary file 1). They were instructed to rate how much they would like to support that NPO with a $1000 donation on a scale of 1 (‘I would not like to support this NPO at all) to 7 (‘I would very much like to support this NPO’). No time pressure was put on the subjects, and they were given access to the website of each NPO to give them the opportunity to learn more about the NPO and the cause it supports.

After the subjects finished rating all NPOs, the main experiment began. In each block of the experiment, subjects made either deliberate or arbitrary decisions. Two succinct cause descriptions, representing two actual NPOs, were presented in each trial (Figure 1). In deliberate blocks, subjects were instructed to choose the NPO to which they would like to donate $1000 by pressing the <Q> or <P> key on the keyboard, using their left and right index finger, for the NPO on the left or right, respectively, as soon as they decided. Subjects were informed that at the end of each block one of the NPOs they chose would be randomly selected to advance to a lottery. Then, at the end of the experiment, the lottery will take place and the winning NPO will receive a $20 donation. In addition, that NPO will advance to the final, inter-subject lottery, where one subject’s NPO will be picked randomly for a $1000 donation. It was stressed that the donations were real and that no deception was used in the experiment. To persuade the subjects that the donations were real, we presented a signed commitment to donate the money, and promised to send them the donation receipts after the experiment. Thus, subjects knew that in deliberate trials, every choice they made was not hypothetical, and could potentially lead to an actual $1020 donation to their chosen NPO.

Arbitrary trials were identical to deliberate trials except for the following crucial differences. Subjects were told that, at the end of each block, the pair of NPOs in one randomly selected trial would advance to the lottery together. And, if that pair wins the lottery, both NPOs would receive $10 (each). Further, the NPO pair that would win the inter-subject lottery would receive a $500 donation each. Hence it was stressed to the subjects that there was no reason for them to prefer one NPO over the other in arbitrary blocks, as both NPOs would receive the same donation regardless of their button press. Subjects were told to therefore simply press either <Q> or <P> as soon as they decided to do so.

Thus, while subjects’ decisions in the deliberate blocks were meaningful and consequential, their decisions in the arbitrary blocks had no impact on the final donations that were made. In these trials, subjects were further urged not to let their preferred NPO dictate their response. Importantly, despite the difference in decision type between deliberate and arbitrary blocks, the instructions for carrying out the decisions were identical: Subjects were instructed to report their decisions (with a key press) as soon as they made them in both conditions. They were further asked to place their left and right index fingers on the response keys, so they could respond as quickly as possible. Note that we did not ask subjects to report their ‘W-time’ (time of consciously reaching a decision), because this measure was shown to rely on neural processes occurring after response onset (Lau et al., 2007) and to potentially be backward inferred from movement time (Banks and Isham, 2009). Even more importantly, clock monitoring was demonstrated to have an effect on RP size (Miller et al., 2011), so it could potentially confound our results (Maoz et al., 2015).

Decision difficulty (Easy/Hard) was manipulated throughout the experiment, randomly intermixed within each block. Decision difficulty was determined based on the rating difference between the two presented NPOs. NPO pairs with one or at least four rating-point difference were designated hard or easy, respectively. Based on each subject’s ratings, we created a list of NPO pairs, half of each were easy choices and the other half hard choices.

Each block started with an instruction written either in dark orange (Deliberate: ‘In this block choose the cause to which you want to donate $1000’) or in blue (Arbitrary: ‘In this block both causes may each get a $500 donation regardless of the choice’) on a gray background that was used throughout the experiment. Short-hand instructions appeared at the top of the screen throughout the block in the same colors as that block’s initial instructions; Deliberate: ‘Choose for $1000’ or Arbitrary: ‘Press for $500 each’ (Figure 1).

Each trial started with the gray screen that was blank except for a centered, black fixation cross. The fixation screen was on for a duration drawn from a uniform distribution between 1 and 1.5 s. Then, the two causes appeared on the left and right side of the fixation cross (left/right assignments were randomly counterbalanced) and remained on the screen until the subjects reported their decisions with a key press—<Q> or <P> on the keyboard for the cause on the left or right, respectively. The cause corresponding to the pressed button then turned white for 1 s, and a new trial started immediately. If subjects did not respond within 20 s, they received an error message and were informed that, if this trial would be selected for the lottery, no NPO would receive a donation. However, this did not happen for any subject on any trial.

To assess the consistency of subjects’ decisions during the main experiment with their ratings in the first part of the experiment, subjects’ choices were coded in the following way: each binary choice in the main experiment was given a consistency grade of 1, if subjects chose the NPO that was rated higher in the rating session, and 0 if not. Then an averaged consistency grade for each subject was calculated as the mean consistency grade over all the choices. Thus, a consistency grade of 1 indicates perfect consistency with one’s ratings across all trials, 0 is perfect inconsistency, and 0.5 is chance performance.

We wanted to make sure subjects were carefully reading and remembering the causes also during the arbitrary trials. This was part of an effort to better equate—as much as possible, given the inherent difference between the conditions—memory load, attention, and other cognitive aspects between deliberate and arbitrary decisions—except those aspects directly associated with the decision type, which was the focus of our investigation. We therefore randomly interspersed 36 memory catch-trials throughout the experiment (thus more than one catch trial could occur per block). On such trials, four succinct descriptions of causes were presented, and subjects had to select the one that appeared in the previous trial. A correct or incorrect response added or subtracted 50 cents from their total, respectively. (Subjects were informed that, if they reached a negative balance, no money will be deducted off their payment for participation in the experiment.) Thus, subjects could earn $18 more for the experiment, if they answered all memory test questions correctly. Subjects typically did well on these memory questions, on average erring in 2.5 out of 36 memory catch trials (7% error) and gaining additional $16.75 (SD = 3.19). Subjects’ error rates in the memory task did not differ significantly between the experimental conditions (2-way ANOVA; decision type: F(1,17)=2.51, p=0.13; decision difficulty: F(1,17)=2.62, p=0.12; interaction: F(1,17)=0.84, p=0.37).

ERP recording methods

Request a detailed protocol

The EEG was recorded using an Active two system (BioSemi, the Netherlands) from 64 electrodes distributed based on the extended 10–20 system and connected to a cap, and seven external electrodes. Four of the external electrodes recorded the EOG: two located at the outer canthi of the right and left eyes and two above and below the center of the right eye. Two external electrodes were located on the mastoids, and one electrode was placed on the tip of the nose. All electrodes were referenced during recording to a common-mode signal (CMS) electrode between POz and PO3. The EEG was continuously sampled at 512 Hz and stored for offline analysis.

ERP analysis

Request a detailed protocol

ERP analysis was conducted using the ‘Brain Vision Analyzer’ software (Brain Products, Germany) and in-house Mathworks Matlab scripts. Data from all channels were referenced offline to the average of all channels (excluding external electrodes), which is known to result in a reduced-amplitude RP (because the RP is such a spatially diffuse signal). The data were then digitally high-pass filtered at 0.1 Hz using a Finite Impulse Response (FIR) filter to remove slow drifts. A notch filter at 59–61 Hz was applied to the data to remove 60 Hz electrical noise. The signal was then cleaned of blink and saccade artifacts using Independent Component Analysis (ICA) (Junghöfer et al., 2000). Signal artifacts were detected as amplitudes exceeding ± 100 µV, differences beyond 100 µV within a 200 ms interval, or activity below 0.5 µV for over 100 ms (the last condition was never found). Sections of EEG data that included such artifacts in any channel were removed (150 ms before and after the artifact). We further excluded single trials in which subjects pressed a button that was not one of the two designated response buttons (<Q> or <P> ) as well as trials where subjects’ RTs were less than 200 ms, more than 10 s, or more than three standard deviations away from that subject’s mean in that condition (mean number of excluded trials = 7.17, SD = 2.46, which are 1.99% of the trials). Overall, the average number of included trials in each experimental cell was 70.38 trials with a range of 36–86 out of 90 trials per condition. Channels that consistently had artifacts were replaced using interpolation (0–6 channels, 1.95 channels per subject, on average). No significant differences were found in the number of excluded trials across conditions (2-way ANOVA; decision type: F(1,17)=3.31, p=0.09; decision difficulty: F(1,17)=1.83, p=0.19; interaction: F(1,17)=0.42, p=0.53).

The EEG was segmented by locking the waveforms to subjects’ movement onset, starting 2 s prior to the movement and ending 0.2 s afterwards, with the segments averaged separately for each decision type (Deliberate/Arbitrary x Easy/Hard) and decision content (right/left hand). The baseline period was defined as the time window between −1000 ms and −500 ms prior to stimulus onset, that is, the onset of the causes screen, rather than prior to movement onset. In addition to the main baseline, we tested another baseline—from −1000 ms to −500 ms relative to movement onset—to investigate whether the baseline period influenced our main results (see Results). Furthermore, we segmented the EEG based on stimulus onset, using the same baseline, for stimulus-locked analysis (again, see Results).

To assess potential effects of eye movements during the experiment, we defined the radial eye signal as the average over all 4 EOG channels, when band-pass filtered to between 30 and 100 Hz. We then defined a saccade as any signal that was more than 2.5 standardized IQRs away from the median of the radial signal for more than 2 ms. Two consecutive saccades had to be at least 50 ms apart. The saccade count (SC) was the number of saccades during the last 500 ms before response onset (Keren et al., 2010; see also Croft and Barry, 2000; Elbert et al., 1985; Shan et al., 1995).

Statistical analysis

Request a detailed protocol

EEG differences greater than expected by chance were assessed using two-way ANOVAs with decision type (deliberate, arbitrary) and decision difficulty (easy, hard), using IBM SPSS statistics, version 24. For both RP and LRP signals, the mean amplitude from 500 ms before to button-press onset were used for the ANOVAs. Greenhouse–Geisser correction was never required as sphericity was never violated (Picton et al., 2000).

Trend analysis on all subjects’ data was carried out by regressing the voltage for every subject against time for the last 1000 ms before response onset using first-order polynomial linear regression (see Results). We first added a 25 Hz low-pass filter for anti-aliasing and then used every 10th time sample for the regression (i.e., the 1st, 11th, 21st, 31st samples, and so on) to conform with the individual-subject analysis (see below). For the individual-subject analysis, the voltage on all trials was regressed against time in the same manner (i.e., for the last 1000 ms before response onset and using first-order polynomial linear regression). As individual-trial data is much noisier than the mean over all trials in each subject, we opted for standard robust-regression using iteratively reweighted least squares (implemented using the robustfit() function in Mathworks Matlab). The iterative robust-regression procedure is time consuming. So, we used every 10th time sample instead of every sample to make the procedure’s run time manageable. Also, as EEG signals have a 1/f power spectrum, taking every 10th sample further better conforms with the assumption of i.i.d. noise in linear regression.

Furthermore, we conducted Bayesian analyses of our main results. This allowed us to assess the strength of the evidence for or against the existence of an effect, and specifically test whether null results stem from genuine absence of an effect or from insufficient or underpowered data. Specifically, the Bayes factor allowed us to compare the probability of observing the data given H0 (i.e., no RP in deliberate decisions) against the probability of observing the data given H1 (i.e., RP exists in deliberate decisions). We followed the convention that BF <0.33 implies substantial evidence for lack of an effect (that is, the data is at least three times more likely to be observed given H0 than given H1), 0.33 < BF < 3 suggests insensitivity of the data, and BF >3 denotes substantial evidence for the presence of an effect (H1) (Jeffreys, 1998). Bayesian analysis was carried out using JASP (ver. 0.8; default settings).

In addition to the above, we used the cluster-based nonparametric method developed by Maris and Oostenveld to find continuous temporal windows where EEG activity was reliably different from 0 (Maris and Oostenveld, 2007). We used an in-house implementation of the method in Mathworks Matlab with a threshold of 2 on the t statistic and with a significance level of p=0.05.

Model and simulations

Request a detailed protocol

All simulations were performed using Mathworks Matlab 2018b. The model was devised off the one proposed by Schurger et al. (2012). Like them, we built a drift-diffusion model (DDM) (Ratcliff, 1978; Usher and McClelland, 2001), which included a leaky stochastic accumulator (with a threshold on its output) and a time-locking/epoching procedure. The original model amounted to iterative numerical integration of the differential equation

(1) δxi=(Ikxi)Δt+cξiΔt

where I is the drift rate, k is the leak (exponential decay in x), ξ is Gaussian noise, and c is a noise-scaling factor. Δt is the discrete time step used in the simulation (we used Δt = 0.001). The model integrates xi over its iterations until it crosses a threshold, which represents a decision having been made.

In such drift-diffusion models, for a given k and c, the values of I and the threshold together determine how quickly a decision will be reached, on average. If we further fix the threshold, a higher drift rate, I, represents a faster decision, on average. The drift rate alone can thus be viewed as a constant ‘urgency to respond’ (using the original Schurger term) that is inherent in the demand characteristics of the task, evidenced by the fact that no subject took more than 20 s to make a decision on any trial. The leak term, k, ensures that the model would not be too linear; that is, it prevented the drift rate from setting up a linear trajectory for the accumulator toward the threshold. Also, k has a negative sign and is multiplied by xi. So, kxi acts against the drift induced by I and gets stronger as xi grows. Hence, due to the leak term, doubling the height of the threshold could make the accumulator rarely reach the threshold instead of reaching it in roughly twice the amount of time (up to the noise term).

When comparing the model’s activity for the SMA and Region X, we needed to know how to set the drift rates for the DDM in the two regions for deliberate decisions. We made the assumption that the ratio between the drift rate in Region X and in the SMA during deliberate decisions would be the same as the ratio between the average empirical activity in the SMA and in the rest of the brain during arbitrary decisions. Our EEG data suggested that this ratio (calculated as activity in Cz divided by the mean activity in the rest of the electrodes) is 1.45. Hence, we set the drift rates in the SMA to be 1.45 times smaller than those of Region X for deliberate decisions (see Table 1 for all parameter values).

Our model differed from Schurger’s in two main ways. First, it accounted for both arbitrary and deliberate decisions and was thus built to fit our paradigm and empirical results. We devised a model that was composed of two distinct components (Figure 8A), each a race to threshold between 2 DDMs based on Equation 1 (see below), but with different parameter values for each DDM (Table 1). The first component accumulated activity that drove arbitrary decisions (i.e., stochastic fluctuations [Schurger et al., 2012]). Such model activation reflects neural activity that might be recorded over the SMA by the Cz electrode. We term this component of the model the Noise or SMA component. The second component of the model reflects brain activity that drives deliberate decisions, based on the values that subjects associated with the decision alternatives. We term this second component the Value or Region X component. Our model relied on its noise component to reflect arbitrary decisions and on its value component to reflect deliberate decisions.

A second difference between our model and Schurger and colleagues’ is that theirs modeled only the decision when to move (during arbitrary decisions), as those were the only decisions that their subjects faced. But our subjects decided both when and which hand to move. So, we had to extend the Schurger model in that respect as well. We did this using a race-to-threshold mechanism between the decision alternatives. In our empirical paradigm, the difference in rating of the two causes was either 1 (for hard decisions) or 4–6 (for easy decisions; see ‘Experimental Procedure’ in Materials and methods), so there was always an alternative that was ranked higher than the other. Choosing the higher- or lower-ranked alternative was termed a congruent or incongruent choice with respect to the initial ratings, respectively. Hence, we modeled each decision the subjects made as a race to threshold between the DDMs for the congruent and incongruent alternatives in the noise component (for arbitrary decisions) or value component (for deliberate ones).

We found the reaction time (RT) distribution and consistency score for each subject (as detailed above). The model’s RT was defined as the overall time that it took from the onset of the simulation until the first threshold crossing in the race-to-threshold pair (between the congruent and incongruent DDM; for Δt = 0.001 s). To determine the RT error, we ran a similar procedure to Schurger et al. (2012). We computed the empirical, cumulative RT distribution for each subject and fit a gamma function to it. We then averaged those gamma-fitted distributions across all subjects and designated that the empirical distribution. Then, we computed the RT for the model for each parameter set from 1000 model runs and fitted a gamma function to that too. Finally, we computed the ratio of the intersection of the two cumulative distributions and their union; and the RT error was defined as one minus that ratio. The consistency error was computed as the absolute difference between the empirical and model consistencies. Finally, the overall error was defined as mean of these two errors.

To find the parameter values that minimize this overall error, we ran an exhaustive grid search over all the parameters. To do this, we fixed the threshold at 0.3 and found the values of the congruent and incongruent drift rates as well as the leak and the noise scaling factors that, at the same time, best fit the empirical distribution of the RTs and the consistency scores (with equal weights) for (easy, hard) x (deliberate, arbitrary) decisions (Table 1) in the manner described below. We had four parameters to fit: the congruent drift rate (Icongruent), the incongruent drift rate (Iincongruent), the leak (k), and the noise-scaling factor (c). We first explored the 4D space of those four parameters to test how smooth it was and to find the range in which to run the exhaustive grid search. We found that Icongruent was minimized between 0.05 and 0.4 and that Iincongruent was minimized somewhere between the value of the Icongruent and five times smaller than that. Then k was found to lie between 0.2 and 0.55, and c between 0.01 and 0.3. Based on our tests and specifically the smoothness of the search space, we found that we could divide each parameter range into five equal parts. So, we then tested the model in each of the 5 × 5×5 × 5 (=625) entries in the grid by running the model 1000 times per entry and computing its error by comparing its RT distribution and consistency rate to the empirical ones. Once we found the minimum point in that 4D grid, we zoomed in on the range between the grid value to the left of the entry where the minimum error was found and the one to the right of that minimum. (We chose the initial range such that the minimum error was never achieved on the smallest or largest point for all parameters.) We continued this process of finding the minimum and zooming in until each parameter reached a range that was smaller than 0.025 or the error was less than 0.025.

Once we found the parameters that minimized the mismatch between the empirical and model RTs and consistencies for each decision type, we ran the model with those parameters 1000 times again. For each run of the model, we identified the first threshold crossing point (for the congruent or incongruent DDM), which made that model the ‘winning’ DDM. We then extracted the activity in both the winning and losing DDM in the last 2 s (2000 steps) before the crossing. And we averaged across the activity of the two DDMs (because an electrode over the SMA or Region X would be expected to pick up signals from both DDMs). If the first crossing was earlier than sample no. 2000 by n > 0 samples, we padded the beginning of the epoch with n null values (NaN or ‘not-a-number’ in Matlab). Finally, we averaged across all 1000 model runs to calculate the model’s RP. Note that NaN values did not contribute to the average across simulated trials, so the simulated average RP became noisier at earlier time points in the epoch. Hence, our model was similarly limited to the Schurger model in its inability to account for activity earlier than the beginning of the trial (see Results).

Our model describes the mechanism we propose for subjects’ decisions. We therefore now examine how the values in Table 1 relate to the experimental design and actual results. As discussed above, we view Icongruent and Iincongruent as reflecting the values of the decision alternatives that were rated higher and lower in the first part of the experiment, respectively. We thus expect Icongruent and Iincongruent to be more different (similar) for easy (hard) deliberate decisions. And we indeed find an almost 4-fold differences between Icongruent and Iincongruent in deliberate easy versus only 2-fold difference for deliberate hard. For arbitrary decisions, be they easy or hard, we do not expect the values of the decision alternatives to matter much. And this is reflected in the similar values of Icongruent and Iincongruent there.

The leak values (k) were similar across all conditions. In contrast, the values of the noise-scaling factor (c) for deliberate decisions were only 35% to 45% of those in arbitrary decisions. As decision types were blocked, this appears to indicate that subjects adopted different decision parameters across blocks. One reason for the larger levels of noise in arbitrary trials might be the difference in consistency scores. The noise-scaling factor provides the randomness to our model. Hence, the lower the noise factor, the more deterministic the result of the model’s simulation—and thus the higher the chance that the consistent DDM (i.e., with Icongruent, which is larger than Iincongruent) would win the race against the inconsistent DDM. The consistency scores for deliberate easy, deliberate hard, arbitrary easy, and arbitrary hard, were 0.99, 0.83, 0.54 and 0.49, respectively. And these are nicely anti-correlated with the respective increase in the noise levels: 0.08, 0.11, 0.22, and 0.23 (r = -0.99, p = 0.007).

It should be noted that, when minimizing the error in the deliberate-hard condition, we found two disjoint regions within the parameter space (Icongruent, Iincongruent, k, c) where the error reached relatively similar local minima. The parameter values corresponding to the smaller among the two errors are listed in Table 1 (deliberate hard row). The parameter values at the other local minimum are (Icongruent, Iincongruent, k, c) = (0.15, 0.07, 0.21, 0.09).

References

  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
  11. 11
  12. 12
  13. 13
  14. 14
  15. 15
  16. 16
  17. 17
    Abnormalities in the awareness and control of action philosophical transactions of the royal society B
    1. C Frith
    2. S Blakemore
    3. D Wolpert
    (2000)
    Biological Sciences 355:1404–1771.
  18. 18
  19. 19
  20. 20
  21. 21
  22. 22
  23. 23
  24. 24
  25. 25
  26. 26
    Free Will
    1. S Harris
    (2012)
    New York: Simon & Schuster, Inc.
  27. 27
    Leviathan: With Selected Variants From the Latin Edition of 1668
    1. T Hobbes
    (1994)
    Hackett Publishing Company.
  28. 28
  29. 29
    Motor Cognition: What Actions Tell the Self
    1. M Jeannerod
    (2006)
    Oxford University Press.
  30. 30
    The Theory of Probability
    1. H Jeffreys
    (1998)
    Oxford: Oxford University Press.
  31. 31
  32. 32
  33. 33
  34. 34
    Readiness for movement—the Bereitschafts potential-story
    1. H Kornhuber
    2. L Deecke
    (1990)
    Current Contents: Life Sciences 33:14.
  35. 35
  36. 36
  37. 37
  38. 38
  39. 39
  40. 40
  41. 41
  42. 42
  43. 43
  44. 44
    On Reporting the Onset of the Intention to Move
    1. U Maoz
    2. L Mudrik
    3. R Rivlin
    4. I Ross
    5. A Mamelak
    6. G Yaffe
    (2015)
    In: A. R Mele, editors. Surrounding Free Will: Philosophy: Psychology, Neuroscience. Oxford University Press. pp. 184–202.
    https://doi.org/10.1093/acprof:oso/9780199333950.001.0001
  45. 45
  46. 46
  47. 47
  48. 48
  49. 49
  50. 50
  51. 51
  52. 52
  53. 53
  54. 54
  55. 55
  56. 56
  57. 57
  58. 58
  59. 59
  60. 60
  61. 61
  62. 62
  63. 63
  64. 64
  65. 65
  66. 66
  67. 67
  68. 68
  69. 69
  70. 70
  71. 71
  72. 72
  73. 73
  74. 74
  75. 75
  76. 76
  77. 77
  78. 78
  79. 79
    Picking and choosing
    1. E Ullmann-Margalit
    2. S Morgenbesser
    (1977)
    Social Research 44:757–785.
  80. 80
  81. 81
  82. 82
  83. 83
  84. 84
  85. 85
    The Illusion of Conscious Will
    1. D Wegner
    (2002)
    MIT Press.
  86. 86
  87. 87
    Freedom Within Reason
    1. S Wolf
    (1990)
    Oxford University Press.
  88. 88

Decision letter

  1. Redmond G O'Connell
    Reviewing Editor; Trinity College Dublin, Ireland
  2. Joshua I Gold
    Senior Editor; University of Pennsylvania, United States
  3. Redmond G O'Connell
    Reviewer; Trinity College Dublin, Ireland
  4. Jiaxiang Zhang
    Reviewer; Cardiff University, United Kingdom
  5. Boris Burle
    Reviewer; Aix-Marseille University, France

In the interests of transparency, eLife includes the editorial decision letter, peer reviews, and accompanying author responses.

[Editorial note: This article has been through an editorial process in which the authors decide how to respond to the issues raised during peer review. The Reviewing Editor's assessment is that all the issues have been addressed.]

Thank you for submitting your article "Neural precursors of deliberate and arbitrary decisions in the study of voluntary action" for consideration by eLife. Your article has been reviewed by three peer reviewers, including Redmond G O'Connell as a guest Reviewing Editor and Reviewer #1, and the evaluation has been overseen by Joshua Gold as the Senior Editor. The following individuals involved in review of your submission have also agreed to reveal their identity: Jiaxiang Zhang (Reviewer #2) and Boris Burle (Reviewer #3).

The Reviewing Editor has highlighted the concerns that require revision and/or responses, and we have included the separate reviews below for your consideration. If you have any questions, please do not hesitate to contact us.

Summary:

This paper sets out to examine whether the readiness potential (RP) relates differently to decision reports when those decisions are deliberative as opposed to the arbitrary decision scenarios typically studied in the literature spawned by Libet et al. EEG data were acquired while participants performed a value-based decision making task involving apportioning donations to one of two causes. In the arbitrary condition the same amount would be apportioned to both causes irrespective of the participant's choice while in the deliberative condition participants determined which of the two causes would receive the money. The authors report a significant RP build-up prior to response in the arbitrary condition but argue that no significant RP activity is present in the deliberative condition. They conclude from this that the RP may act as a form of random number generator (or noise accumulator) specifically for the purposes of arbitrary decisions and, therefore, that the findings from Libet-type experiments may not generalize to the more deliberative decisions that pervade our daily lives.

Major concerns:

The three reviewers agree that this manuscript addresses an important topic and that the findings have potentially important implications for our understanding of the neural mechanisms governing willed action. However, each of the three reviewers has raised major concerns regarding aspects of the data analysis and interpretation which potentially call into question the central claims that the authors are making. The key points that we would recommend the authors address are:

1) The authors are making the strong claim that the readiness potential is absent in the deliberative decision condition but the reviewers (particularly reviewer 1 and 3) have significant concerns regarding the degree to which the data truly support this claim. In particular the initial Bayes Factors analysis indicate inconclusive evidence, the reviewers expressed concerns regarding re-baselining to a response-aligned interval in which the RP appears to be already active in the arbitrary condition and reviewer 3 has also highlighted an important concern regarding the high-pass filer cutoff.

2) All reviewers agree that the model fitting procedures are inadequately described and, as currently presented, it is not clear what value the model is really adding here.

We hope that these comments will prove helpful to the authors.

Separate reviews (please respond to each point):

Reviewer #1:

I thought this was an interesting paper whose results, if correct, could have a transformative effect on how we think about the RP-voluntary action literature. The manuscript is nicely written and the experiment seems to have been conducted with sound methodology on the whole. I do however have very substantial concerns about several aspects of the data analysis and, by extension, the authors' interpretation of the data which I feel would need to be addressed prior to publication.

Major Comments:

1) I think in the first instance it is important that the authors establish specific hypotheses regarding their data. These should be provided at the end of the Introduction. At present the authors state that:

"Demonstrating differences in RP between arbitrary and deliberate decisions would first challenge the generalizability of the RP (from arbitrary to deliberate decisions) as an index for internal decision-making. Second, it would more generally suggest that different neural mechanisms might be at play between deliberate and arbitrary decisions. This, in turn, would question the generalizability of studies focused on arbitrary decisions to everyday, ecological, deliberate decisions-regardless of whether these studies relied on the RP or not."

I find these statements problematic from the outset. First, given how different the arbitrary and deliberate conditions are, it is wholly expected that some trivial differences would be observed in the RP. In the arbitrary condition the participant knows in advance of stimulus onset that they can randomly select a response and so can prepare to act in advance. In contrast the participant cannot prepare a specific action until after stimulus-onset in the deliberative condition. This is borne out in the stimulus-locked traces where clear preparatory RP activity is observed pre-stimulus in the arbitrary condition. So simply stating that a 'difference' would undermine the generalisability of RP findings is not correct. The authors need to be much more specific about what difference that might be. In fact, their analyses are very much geared toward showing that the RP is wholly absent during deliberative decisions. If so, this should be clearly stated from the beginning.

2) Given the emphasis that is placed on demonstrating an absent RP in the deliberative condition, and my aforementioned concerns that trivial differences in RP are to be expected when comparing across these conditions, it is important to consider this aspect of the analysis very carefully. The authors do well to conduct Bayes Factor analyses to complement null-hypothesis significance testing. In the first instance however their analysis highlights inconclusive evidence for the RP's absence on deliberative trials. The authors then re-baseline their waveforms to -1000 to -500ms to exclude possible negativities that may not be RP related. The BF then reduces to 0.332 i.e. hovering just above the conventional cutoff of <0.33. I have a couple of concerns around this. First, the study is designed to ensure 80% probability of detecting an RP >=2.6uV. Thus it is potentially possible that the RP is indeed present but too small to detect with a relatively small sample size (it is worth noting here that visual inspection of the waveforms would tend to favour there being an RP in both conditions albeit with substantial differences in amplitude). I wonder is it a safer bet for the authors to focus on the fact that the RP is substantially diminished rather than absent in the deliberative condition. If so, what new insights might this offer into the nature of our decisions? For example, it would be very interesting if it turned out that the RP is particularly invoked for arbitrary actions as a sort of random-number generator (but see below comments).

3) Following on from the above, given the quite massive differences in RT across conditions, I am concerned that the response-locked baseline is not appropriate. It is quite plausible that the RP will build more slowly in the deliberative condition reflecting the much slower decision process and applying this baseline will simply subtract away those differences and shift the peak amplitude of the RP closer to 0. The authors do conduct some analyses to test for a relationship between RT and RP amplitude but these are based on a cross-subject median split and, consequently, an N of just 9 in each group. Moreover, even after splitting the groups this way there is still a circa 700ms difference in RT between conditions. The authors observe p-values >0.05 in these follow-ups but no Bayes Factor analyses are provided. Similarly, the regression of RT difference against RP amplitude is likely to be underpowered and no BFs are reported. I suspect that if the authors were to conduct a within-subject median-splits based on RT that an RT effect would be much more likely to emerge in a within-subjects statistical analysis. In my experience, across subject variations in signal amplitude are a much bigger problem than cross trial within-subject variations. In any case, wouldn't a non-significant relationship between the RP and RT be confusing given the previous literature? At minimum, shouldn't we be observing a relationship within the arbitrary condition?

4) The leaky DDM accounts for the shape of the ERP at CZ. I cannot find any description of precisely how the plot in Figure 8C was generated. Does it represent the cumulative noise component?

Minor Comments:

To what degree might the deliberative RP amplitude be impacted by overlapping decision signals. Recent work by my own group has highlighted a P3-like centro-parietal potential that traces perceptual decision formation and peaks at response (e.g. O'Connell et al., 2012, Nat Neurosci; Kelly and O'Connell, 2013, J Neurosci; Twomey et al., 2015, Eur J Neurosci). In Kelly and O'Connell, 2013, we actually observed an RP-like signal evolving around the same time as the CPP and we found that the two signals were impacting one another, something that we were able to resolve using current-source density (CSD) analysis. I wonder what the authors thoughts are on A) the fact that we seemed to observe an RP during deliberative decisions in this case and B) whether an overlapping centro-parietal positivity could partly account for the small size of the RP in their analyses – something that could be verified through application of CSD transforms.

The description of the Schurger study in subsection “Drift Diffusion Model (DDM)” is a little unclear, probably just down to the particular wording used. The meaning of 'non-linear threshold crossing' is not clear to me. 'The crossing of that threshold reflects the onset of the decision in the model, typically leading to action.' 'Onset of the decision' is an ambiguous phrase to use here as it is often used with reference to the time at which evidence accumulation commences. According to these models the threshold instead reflects the completion/termination of the decision. Also the authors state 'Schurger and colleagues claimed, time-locking to response onset ensures that these spontaneous fluctuations appear, when averaged over many trials, as a gradual increase in neural activity.' It is not clear to me whether the authors are aware that according to Schurger this arises because the RP reflects an accumulation of those spontaneous fluctuations over time as opposed to a moment-by-moment reflection of instantaneous fluctuations.

LRP analyses. Again these hinge on embracing the null hypothesis. Given the low statistical power and the signs of a larger peak amplitude (and possibly slower build-up) for deliberative decisions, at minimum bayes factor analyses should be provided.

Reviewer #2:

This is an interesting study. The authors challenged the role of RP in internally-generated decision, an EEG signature often linked to volition or action awareness since Libet's work. They contrast deliberate decision (choices did matter) with arbitrary decision (choices do not matter) in a "two-alternative NPO choice" task. Both types of decisions had the similar LRP component, but only arbitrary decision showed reliable RP (at Cz). The authors further simulated an altered version of DDM, showing the absence of RP when assuming a value component underlying the deliberate decision.

The experimental design and analysis are rigorous. The authors should be applauded for their effects on assessing several alternative explanations to the absence of RP in the deliberate decision.

I have a few comments on current version of the manuscript.

1) The modelling procedure is unclear. First, it is unclear why drift rate was the only parameter allowed to change between conditions. Since different task conditions were blocked (see the other comments below), one may argue that it is possible to have the threshold (or other parameters) to vary between blocks, as an adaptive decision strategy. Would model comparison be feasible to identify the optimal parameters?

Second, DDM (or many other accumulator models) assumed one-off, rapid decisions without deliberate thinking. In other words, the very first boundary crossing would render a decision, as in the current model. Although the subjects were instructed to respond asap, the current design cannot rule out the possibility of rethinking/change-of-mind in a single trial, which violates the model/simulation assumption. The prolonged RT in deliberate decision support such possibility, that subjects might not rush to a decision, even when a decision threshold was reached.

2) The arbitrary decision explicitly urged subjects "not to let their preferred NPO dictate their response". This is a strong requirement (asking NOT to follow their preference), compared with previous studies using the typical free-choice paradigm). I wonder if the authors could comment on whether this may inflate the difference between the two types of decisions in the current study. Second, is there any regularity in the arbitrary decision, such as alternating response in consecutive trials?

3) Different decisions were blocked in the current study. Could the difference in RP be due to a contextual effect? For example, was the early visual ERP comparable between conditions?

4) Subsection “Differences in reaction times (RT) between conditions, including stimulus-locked potentials and baselines, do not drive the effect”. For the median split analysis to work, my understanding is that no subject had above-median deliberate RT as well as below-median arbitrary RT (i.e., the two groups did not overlap). Could the authors confirm that this was the case? Because inferences on the null hypothesis are important to rule out the effect of RT and other confounding variables on RP, it would be useful to report Bayes factor as well.

Minor Comments:

Figures 8A and 8B could be superimposed over each other, allowing easy assessment of the goodness of fit of RT distributions.

Reviewer #3:

The present manuscript presents a single experiment aiming at evaluating whether the typically observed early BP onset on spontaneous (arbitrary, purposeless, unreasoned and without consequences…) movements generalizes to deliberate decision making. The authors hence contrasted two decision making situations: in a first condition, participants had to decide, on every trials, to which NPOs the wanted to give money; their choice was effective. In a second condition, while facing the same situation, they were told that their response was consequenceless, since the same amount of money would be given to each NPOs. While a standard BP was observed in the second condition, the authors argue that there is no such BP in the first condition. They hence argue that the results classically observed in arbitrary movement do not generalize to deliberate decision. The early onset of BP has long been considered as reflecting an activation of a response before the actual decision to move was taken, casting doubts on the very notion of free-will. The absence of such pre-conscious marker in deliberate decision is taken by the authors as an argument that, if there is no real free-will in arbitrary movement, this argument cannot be used for deliberate decision.

General comments:

Although I may generally agree with the authors' conclusions, I'm afraid they are not strongly supported by the facts as they stand. The limitations are methodological and conceptual.

While the data show a clear difference in the BP between the two conditions, the core of the authors reasoning is based on the presence of a BP in the arbitrary condition and the total absence of BP in the deliberate one. Indeed, if there were a BP, even of smaller amplitude, in this condition, the argument would immediately fall. There are, however, several points that may challenge this view.

1) Data were high-pass filtered at 0.1 Hz (which corresponds to a time-constant of 1.6 s). This value is way too high for a proper estimate of the BP. As a matter of comparison, the very first measures of BP were even performed under DC conditions, that is without any filtering… At the very minimum, a filter of.01 Hz, or ideally even lower, like.001Hz should be used. Indeed, the BP is a very slow component, such a high filter value has very likely largely deceased the amplitude of the BP. One may argue that filtering has impacted the two conditions in the same way, and hence that any potential distortion cannot explain the results. This is certainly true concerning the difference in amplitude (see however below). But, as indicated above, the rationale put forward by the authors only holds if there is no BP on the deliberate condition, not if the BP is simply reduced. If, with a more adapted filter, there would have been a BP, the whole argument is invalidated. Second, the response time are much longer in the deliberate than in the arbitrary case. So the time constant of the slow potential might be much longer in this case, and hence much more affected by the inappropriate filter (the control proposed in only partially addressing this issue since i) the comparison is between participants, and ii) at least for the easy deliberate condition, the results are ambiguous).

2) Figure 6 presents the stimulus-locked data, supposed to invalidate a stimulus-locked contamination. While I agree that the response-locked BP is not purely a stimulus induced effect, several features of the stimulus-locked averaged deserve comments. First, as can be seen in Figure 6A, a negative shift (CNV-like) is already present before stimulus presentation in the arbitrary condition, and it seems even modulated by (precued) task difficulty. So, even before participants enter any decision making process related to the relevant choice, one can see differences in the slow negative potentials between the conditions. At minimum, this indicate that the difference observed response-locked might not be specific to the decision period. Furthermore, following the large visual evoked potentials after the stimulus, one can see around 500 ms a first positive component, followed by a negative-going one (starting around 650 ms). This negative-going peaks close to the average RT in the arbitrary condition, and hence likely contribute to the BP. The same activity is present in the deliberate condition (and is actually very similar), but the response being given much later, this early negative-going activity does not contribute to the BP. By itself, this activity, which is not strictly speaking a BP, does create a difference between the two conditions. More generally, the general shape of the late evoked potentials (after 600 ms) are remarkably similar across conditions, which dramatically contrasts with the large difference observed response-locked. This may suggest that the difference observed is, at least in part, simply due the different averaging event is an otherwise very similar signals.

3) The authors focused on Cz to extract the BP. However, there is large literature indicating that the BP is actually made of several sources, some medial in the (pre)SMA, some more lateral in the (primary) motor cortices. Note that, for the authors' rationale to be valid, none of the sources should present a (early) BP. The topographies presented on Figure 3B indicate a clear negative activity above Cz and neighbors in the arbitrary condition, and no activity in the same region in the deliberate condition. But the topography suggests a clear negative activity located more lateral over the left hemisphere. Does this activity correspond to a BP? Since only Cz is shown, one cannot evaluate this possibility.

4) In real spontaneous responses, the BP starts much earlier than the LRP. But in the present data set, as can be inferred from Figure 3A, the conditions diverge around -800 ms (roughly estimated). This is basically the same latency for LRP onset. Note that the latency of the LRP (even with all the cautions that may come with this measure…) seems exactly the same for all 4 conditions. So, at the time of BP divergence, the LRP onset indicates not difference in response selection / decision making. Why is the (CZ-)BP more important for the conclusions the authors want to defend? LRP seem to say something very different…

5) A further issue concerns the modeling part. First, it is very unclear how the fit was performed. In subsection “Drift Diffusion Model (DDM)”, it is indicated that "We fit our DDM to our average empirical reaction-times, which were 2.13, 2.52, 0.98 and 1.00s for the different conditions ". This sentence suggests that only the grand mean were used, not the whole distribution of RTs. Besides the fact that this is in complete deviation with the overall logic of fitting DDM which aims at capturing RT distribution shape and errors at the same time, a fit performed on averages only is largely under-constrained. As a matter of fact, although there is no information about quality of fit, comparison of Figures 8A and B indicate that the empirical and predicted distribution largely differ, especially in terms of spread (i.e. variance). The obtained parameters are also weird and likely invalid: Table 1 indicates that, in the deliberate condition, the estimated drift is actually higher for the hard condition than for the easy one. This does not make any sense, and is disagreement with the empirical RT that are, as expected, longer for the hard than for the easy condition. Furthermore, one reads "[…] The model was further fit to the empirical consistency ratio […]" (emphasizes are mine…). "Further" suggests that the fit to the mean RT and the consistency ration were not performed simultaneously? Is that the case? Please clarify.

Besides these fitting problems, I have conceptual issues. Threshold crossing in the arbitrary condition is supposed to be triggered by noise only. In standard accumulation-to-a-bound models, noise is classically assumed to be gaussian noise with mean 0 and a standard deviation s. For "noise" to hit the threshold one needs to assume a very high s. Or to assume that the mean is not 0… But in this case, this is not noise, this is a drift. Actually, Table 1 indicates: the "drift" in the arbitrary condition is much higher than in the deliberate condition. But what is the meaning of this "drift" if there is no decision, and the response is triggered by random fluctuations of noise? In contrast, what does it mean that a deliberate decision was reached with an information accumulation equals to 0? How can a threshold be reached if there is no information accumulation? If it's simply noise, why is different from the arbitrary condition? Furthermore, if random noise can hit the bound early in the accumulation process, if noise is 0-centered, the probability to hit the threshold decreases with time. So late responses are unlikely to be triggered by noise. So, this whole fitting part is largely based on inconsistent assumptions.

My general feeling is that the authors provide further arguments against the idea that BP can be used as a marker of volition, but this is not new. In contrast, they provide few (if any) new arguments in the discussion of whether there is free-will or not.

Minor comments:

– Bayesian analysis. The core idea of Bayesian ANOVA is that the values are indicative, but no hard threshold should be used. Hence, considering that a BF of.332 (that is a probability ratio > 3) is an argument for no effect is wrong. Actually, in this uncertainty zone, one cannot conclude anything, and this value is NOT an argument for H0

– EEG results. Authors report a main effect of decision type. But is there an interaction with difficulty? If not, the following t test are not completely valid (although this does not change the conclusions…)

– Figure 5B: the data plotted in this figure puzzles me. The y-axis plots the difference (deliberate – arbitrary). 12 points out of 17 (?) are actually negative, indicating that deliberate was more negative than arbitrary. How can this be?

– Subsection “EEG Results: Lateralized Readiness Potential (LRP)”, the argument that LRP amplitude difference is due to the reference used is incorrect: LRP measure is reference-free. Indeed, the formula is (C3 -C4)r + (C4-C3)l. But actually all electrodes values should be written as (C3-ref) and (C4-ref). It becomes then obvious that the reference annihilates and does not intervenes in the computation

– Modeling. The authors sometimes refer to a DDM, sometimes to a LCA, sometimes to a race. Those are very different architectures (although they can be related, see Bogacz et al., 2006). Please clarify what is really used.

– Experimental procedure. It is never indicated how the response were given! From the description (response keys) and EEG analysis (computation of LRP), one can infer that the response were given with the left and right hand, right? But with which finger?

– Subsection “Experimental Procedure” and other: besides providing the t or F values, please provide the behavioral data. For example, please provide error rates (even if they do not statistically differ). Furthermore, considering that a difference with a p value equal to.09 as not significant, is a bit strong.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Neural precursors of deliberate and arbitrary decisions in the study of voluntary action" for further consideration at eLife. Your revised article has been favorably evaluated by Joshua Gold (Senior Editor) and three reviewers, one of whom served as the guest Reviewing Editor.

All three reviewers agree that you have made substantial efforts to revise the manuscript in light of the initial round of comments and this clearly took time and effort on your part. However, I'm afraid we are all in agreement that some further revisions would be strongly recommended prior to publication. The remaining issues boil down to two substantive points:

First, we are still not convinced that the present results call for any alteration in our conception of free will. The key contribution of the paper seems to be to show that the RP may be specialised for arbitrary action selection and this seems worth reporting. However, beyond suggesting that the RP might not be a universal marker of the 'urge to act', the authors have not yet made a sufficiently clear case that their findings call for any change in how we think about the Libet findings or free will in general. Our recommendation at this point would be that the authors reframe the paper to focus more specifically on the RP and how the current findings bear on functional accounts of this signal.

Second, the reviewers also have significant outstanding concerns regarding the modelling. At minimum we would suggest clearly flagging the limitations of the adopted approach in the Discussion.

Reviewer #1:

I think the authors have, by and large, done a good job of addressing the methodological concerns raised by the reviewers. Importantly the authors assert that they are not claiming that the RP is actually absent but rather significantly diminished during deliberative decisions. I have a couple of outstanding concerns.

I am still confused about the overarching premise or rationale for this study. It seems to me that the findings of this study boil down to showing that the RP is smaller for deliberative vs arbitrary decisions but I am not yet convinced that this has any bearing on our understanding of free-will or even on the significance of Libet's original reports. Libet showed that a neural signature of action preparation preceded conscious awareness of the decision to arbitrarily act. Perhaps that same signal (RP) is absent during deliberative decisions (and this is worth reporting) but that tells us nothing regarding the role or timing of conscious processes in this context. Of course it goes without saying that arbitrary and deliberative decisions involve distinct cognitive elements and, consequently, will activate some distinct brain areas. We know that several other signals that reflect action preparation (e.g. LRP, β-band desynchronisation) and evidence accumulation (e.g. P300, LIP spiking activity) precede explicit decision reports by substantial amounts of time.

The authors seem to suggest that Schurger's model presents a challenge to Libet's interpretation: "A further reason to expect such differences stems from a recent computational model, which challenged the claim that the RP represents a genuine marker of unconscious decisions. Rather, the model suggested that the RP might reflect the artificial accumulation, up to a threshold, of stochastic fluctuations in neural activity"

I am still struggling to understand what difference it makes, in terms of our understanding of free will, if RP reflects 'action preparation' or 'noisy evidence accumulation' – wouldn't both processes would reflect the emergence of an 'urge to act'? Moreover, the authors seem to imply that Schurger's process must be conscious but I see no reason to make such an assumption. I think that particular sentence really captures my discomfort with the current framing of the findings.

Another way of putting it is that I am not clear on how exactly people were making generalisations to deliberative decisions based on the RP specifically. I get the impression the authors are trying to make the case that the RP has been thought of as the sole signature of pre-conscious decision making/action preparation in the brain but is that really a widely held view, again given the literature on decision making? I think that perhaps the authors should reframe the paper to focus more narrowly on the apparent domain specificity of the RP. On the same point I think the title of the paper may be too broad given the almost exclusive focus on the RP. I am very much open to being convinced/correct on all of the above points but I feel that currently the paper is rather confused and confusing as regards the relevance of the findings to free will.

My only other comment is that I do not agree with the authors insistence on only allowing one DDM parameter to vary across conditions on the grounds of parsimony. Although this is indeed a common approach in behavioural modelling, there are formal model comparison procedures available that would objectively identify the model that provides the best balance between parsimony and goodness of fit. Personally I would probably be ok with the authors acknowledging this in their Discussion as I do not view the modelling as a critical part of the story

Reviewer #2:

In this revision, the authors addressed some of the concerns in my last review. However, although the MS now has more detailed description of the model and simulation results (which was missing in the previous version), there are some additional issues.

1) The model used 11 parameters, 8 drift rates, a scaling parameter (1.45), threshold and decay to fit 8 data points (4 mean RT and 4 choice probability). The result is a largely unconstrained model that does not describe behaviour accurately. In the response letter, the authors showed model simulation overlayed with empirical data, and there is a large discrepancy in the fit of arbitrary condition. If the authors are determined to present current modelling results, as a proof-of-concept that the model can provide qualitative RP patterns, the limitation (that it does produce satisfactory quantitative fit to RT distributions as in other applications of DDM) needs to be acknowledged in Discussion.

2) Subsection “Model and Simulations”. "Using a parameter sweep.…". Please provide details on the fitting procedure. Was any optimization algorithm used here? If so, what was the cost function? Was the model fitting performed on averaged data or individual data?

3) In the response letter to my point 1, "we think that the longer RT in deliberate trials stemmed from longer deliberation time, reflected in a higher threshold." This is confusing as the same threshold (0.15) across conditions was used in the paper?

4) Figure 8A. There are two independent accumulators in the "noise component". From my understanding of the model description, in arbitrary decisions, SMA activities were reflected in the traces of the winning accumulator only. If that is the case, why is the activity of the losing accumulator in the noise component not taken into account, as sculp EEG activity would not be sensitive or selective to one accumulator vs. the other. The same issue holds for the deliberate decisions, if both accumulators in the noise component do not dictate decision (as threshold was set to infinity), which accumulator (or both?) was representative of SMA activity? Please clarify.

Reviewer #3:

In the first round of review, several questions were raised. The authors did a very good job in answering most of them. It is now clear that a RP is observed in the arbitrary condition, but not in the deliberate one. There are still, however, two points that remain unclear to me.

1) What are the theoretical consequences of these results?

As said above, the authors convincingly show that RP is absent in the deliberate condition. Although this is not entirely new (some previous studies have reported absence of the BP before voluntary movement depending on the context), this report certainly adds to our understanding of the RP, its origins and functional interpretation. However, as can be read in the Abstract, the goals of the authors is much more than that. But I am not sure about the real theoretical impact of the results. Let me try to explain my trouble.

Libet's original report was that the RP starts before the conscious decision to move. It was thus argued that our intention to move is rather a consequence, not a cause, of the preparatory brain activity. This was taken as an argument that "free-will" is an illusion.

Here, the authors report that there is not RP in the deliberate condition. Hence, the nature of the decision differs between the two conditions. So, what do exactly the authors conclude from that?

– the Libet's argument vanishes for deliberate condition, hence there is no evidence against free will in this context. But do they think it still holds for arbitrary ?

– since RP might simply accumulated random noise, it is not an indication of voluntary movement decision, and hence Libet's argument is wrong even for arbitrary movement?

– if deliberate decision are made on another region X, it might still be that activity in region X starts before conscious detection, but this remains to be explicitly studied.

– something else?

I must confess that I cannot really get the real conclusion the authors want to defend, and they should try to be more explicit on what these results do really imply, and what they do not, on this free-will debate.

2) Modeling.

Although many points have been clarified in this new version, some still remain a bit unclear.

To account for the choice situation, the authors modified Schurger et al. original model (who contained a single accumulator), and implemented two accumulators racing for the response. First, one may wonder why they did not choose the standard competitive version of the leaky accumulator (Usher and McClelland, 2001). Second, it is not completely clear how the averaged data on Figure 9C was actually computed. I guess that, for each "trial" the winning accumulator was chosen (left or right) and all the traces of the winning accumulators were averaged. However, if two accumulators are racing within "SMA", the real simulated activity of SMA should be the sum of the two accumulators, not only the winning one. I'm not sure how this would modify the results, but for coherence, this is the way "SMA" activity should be evaluated in the model. Third, I still don't understand how the fit was performed. It is said "[…] we fit our DDMs to our average empirical reaction-times […]" (emphasizes are mine). It the fit was indeed performed only on the averages, this is non standard, and highly under-constrained. Such model are normally fitted to the whole response times distributions. Furthermore, there is no quantitative assessment of the fit quality. Comparison of figure 9 panels A and B, suggests that the fit was not very good, especially in the arbitrary condition. Fourth, Figure 9 only shows the activity of "SMA". However, there is another actor, which is never shown: region "X"… In the deliberate condition, the decision in made base on the activity of this "region", but its dynamics is likely very different from "SMA" one. It would be of interest to see the accumulated activity of this region "X" in both the deliberate and arbitrary conditions. A last question concerns what "SMA" is doing. In the arbitrary condition, its accumulating random, spontaneous noise. But why is it not doing the same in the deliberate condition (in addition to accumulation in region "X")? Do the authors assume a form inhibition of region "X" on "SMA" to prevent it from accumulating? This part is bit too "magic" and an explicit, mechanistic, explanation would be useful, instead of just claiming that accumulation is done differently as a function of the context/choice (which is vague). Somehow relate to this last point, there seems to be a bit of (simulated) accumulated activity in the deliberate conditions in "SMA". Where does it come from?

Besides, I have some more specific points:

Introduction section: "[…] Thus, one could speculate that different findings might be obtained when inspecting the RP in arbitrary compared to deliberate decisions. […]" is still very unspecific.

Introduction section: […] Demonstrating no, or considerably diminished, RP in deliberate decisions would challenge the interpretation of the RP as a general index of internal decision-making.[…] Ok, but the fact that it is not a "general index" does not, de facto, solve Libet's problem: even if reduced, if RP start before conscious decision, the argument is still valid.

[…] More critically, it would question the generalizability of studies focused on arbitrary decisions to everyday, ecological, deliberate decisions […] This is indeed critical for the functional interpretation of the RP, but this sounds partly orthogonal to the free-will debate.

Subsection “EEG Results: Readiness Potential (RP)” paragraph two: why are the student tests corrected for multiple comparisons, since only 4 are performed? Does it mean that the authors performed t-tests for all time-points? In this case, a multiple comparison is, indeed, necessary. But only one t-test is reported! Please clarify.

In the same section: BF = .332 is not a serious evidence for no effect. The authors should not take 3 (or.33 depending on how we compute it) as a threshold above which an effect would become "significant"…

Subsection “EEG Results: Readiness Potential (RP)” paragraph four: while a BF of.09 is indeed an argument for no effect, a BF of.31 is not really.

Subsection “Differences in reaction times (RT) between conditions, including stimulus-locked potentials and baselines, do not drive the effect”: the authors discuss at length the potential impact of (or absence of) the chosen baseline. Besides all the rather indirect arguments based on different baselines (none is immune of criticisms), one analysis suffices to invalidate this argument: the slopes of the linear regressions are, by construction, independent of baseline (only the intercept is). So the fact the slopes more negative for arbitrary than for deliberate condition is a strong and not disputable fact, much stronger than all the baseline changes.

Figure 6: I'm personally convinced that the RP observed on the arbitrary condition is not (only) a contamination by Stimulus-locked activities. However, the arguments based on Figure 6 are pretty weak. Indeed, in the time windows from about 600 ms to 1200 ms, a negative ramp is observed for all 4 conditions. The response is given in this time windows for the arbitrary condition, but for the deliberate one. So, this stimulus-locked negative ramp likely contributes to the RP.

Minor Comments:

Subsection “Experimental Procedure”, fourth paragraph: "right and left index finger" -> "left and right index finger" to be more consistent with the rest of the text.

Subsection “Experimental Procedure”, final paragraph: "[…] We wanted to make sure subjects were carefully reading and remembering the causes also during the arbitrary trials to better equate memory load, attention, and other cognitive aspects between deliberate and arbitrary decisions […]" Although adding a task is a good idea, it may sound a bit naive to say that the task were equated. For example, in the arbitrary decision, there is a short term memory component that is not present in the deliberate.

Subsection “ERP analysis”: "offline to the average of all channels,": including mastoids and nose? I guess not… At least, this should not be the case! Please clarify.

Subsection “ERP analysis”: " which subjects pressed the wrong button " What is a wrong button? Inconsistent response? and in arbitrary condition?

Subsection “ERP analysis”: "Channels that consistently had artifacts were replaced using interpolation (4.2 channels per subject, on average ": Although this is the range acceptable by some standards, I personally find this value high. Furthermore, could we have the range of channels interpolated?

Subsection “Statistical Analysis”: The authors took 1 point over 10 to re-sample the signal. However, re-sampling requires appropriate anti-aliasing filtering to avoid signal distortion. Data were acquired at 512 Hz; Biosemi anti-aliasing filter, if I'm not mistaken, should be around 100Hz. Since no other low-pass filtering was applied, the data contains signal up to 100Hz. Hence, (re)sampling at 50Hz a signal whose max frequency is around 100Hz is extremely problematic. At minimum a 25Hz low pass filter should have been applied… It is very hard to anticipate what would be the impact of such aliasing (especially since the activity of interest is low frequency), but this should be corrected to avoid having incorrect practices published.

Subsection “Model and Simulations”: "used Δt = 0.001, similar to our EEG sampling rate": if sampling rate is 512Hz, dt should be 0.002

[Editors' note: another round of revisions were suggested prior to acceptance.]

Thank you for resubmitting your work entitled "Neural precursors of deliberate and arbitrary decisions in the study of voluntary action" for further consideration at eLife. Your revised article has been favorably evaluated by Joshua Gold (Senior Editor) and two reviewers, one of whom served as a guest Reviewing Editor.

The manuscript has been substantially improved. As part of this peer review trial, as reviewing editor I am required to indicate whether all of the reviewer comments have been fully addressed or if minor or major issues remain. There are some minor comments arising from this latest round of reviews that I thought I would give you the opportunity to address prior to finalising the 'Accept' decision so that I can then potentially indicate that all reviewer concerns were addressed. I have outlined these below. If you prefer to expedite the publication and not address these comments you can let me know.

1) Table 1. As expected, the difference of drift rates between congruent and incongruent options was larger in the deliberate than the arbitrary conditions. Could the authors comment on the large difference in the noise scaling factor c, which was 10-fold between the two types of decisions? The second result I found difficult to conceptualize was the decay rate k, which doubled in the easy-deliberate than in the hard-deliberate condition. Given that task difficulty was randomized across trials, doesn't this imply that the model (and the participants) adjusted the decay rate according to task difficulty prior to trial onset?

2) Figure 9A. It is more meaningful to plot the empirical and simulated RT distributions, rather than their fitted γ functions.

3) In several instances the authors use the term 'decision onset' when referring (I think) to the completion of the decision. This is potentially confusing because for many readers 'decision onset' may refer to the beginning of deliberation/evidence accumulation which means something entirely different. So I would suggest the authors check their terminology and use 'decision completion' or 'commitment' in such instances.

Minor comments:

1) Subsection “Behavioral Results”. DDN

2) Figure 9A. Why was the y-axis labelled as voltage, for RT distributions?

3) Subsection “Model and Simulations” third paragraph and Table 1. I am confused about the scaling parameter 1.45. Does Table 1 show the drift rates only in Region X, and are the drift rates in SMA 1.45 times less than those values? The text and table indicated that the scaling applied only to the deliberate condition, if so, what were the drift rates in SMA in the arbitrary condition? Or do the drift rates in arbitrary decisions in Table 1 refer to the values in SMA?

https://doi.org/10.7554/eLife.39787.017

Author response

Major concerns:

The three reviewers agree that this manuscript addresses an important topic and that the findings have potentially important implications for our understanding of the neural mechanisms governing willed action. However, each of the three reviewers has raised major concerns regarding aspects of the data analysis and interpretation which potentially call into question the central claims that the authors are making. The key points that we would recommend the authors address are:

1) The authors are making the strong claim that the readiness potential is absent in the deliberative decision condition but the reviewers (particularly reviewer 1 and 3) have significant concerns regarding the degree to which the data truly support this claim. In particular the initial Bayes Factors analysis indicate inconclusive evidence, the reviewers expressed concerns regarding re-baselining to a response-aligned interval in which the RP appears to be already active in the arbitrary condition and reviewer 3 has also highlighted an important concern regarding the high-pass filer cutoff.

We thank the reviewers for pointing out these potential issues. For the readiness potential (RP), we now ran an additional analysis using the gold-standard Maris and Oostenveld cluster-based nonparametric method as well as a Bayesian analysis on the trends, both of which further supports our claim that the RP is absent in deliberate decisions. Overall, we now conducted 6 different kinds of analyses, using both NHST and Bayesian methods. None of the analyses we conducted supported the claim that there exists an RP in deliberate decisions. All but one of our 6 analyses supported the claim that the RP is absent in deliberate decisions. The remaining analysis suggests inconclusive evidence for the absence of an RP. Therefore, we think that—taken together—our results provide clear evidence for an absence of RP in deliberate decisions.

Nevertheless, even a reader who remains unconvinced that the RP is absent in deliberate decisions would agree that the RP is strongly diminished during deliberate decisions in comparison to arbitrary ones. And that is enough to support our main claims in this manuscript:

1) Deliberate and arbitrary decisions may involve different neural mechanisms, and that

2) Generalizing from arbitrary to deliberate decisions is problematic.

We now explain these claims better at the end of the Introduction and in the Discussion. More details about this is provided in response to reviewer comments below. We also reran our analyses with the lower high-pass filer that Reviewer 3 suggested; the results remain qualitatively the same. This again is discussed in more detail in the response to Reviewer 3.

2) All reviewers agree that the model fitting procedures are inadequately described and, as currently presented, it is not clear what value the model is really adding here.

We thank the reviewers for pointing out that our explanation of the model was not clear enough. In the current version of the manuscript, we rewrote the section describing the model and added another figure to describe and explain it better. We think that in the current version the model and its contribution to the results are much clearer.

We hope that these comments will prove helpful to the authors.

Separate reviews (please respond to each point):

Reviewer #1:

I thought this was an interesting paper whose results, if correct, could have a transformative effect on how we think about the RP-voluntary action literature. The manuscript is nicely written and the experiment seems to have been conducted with sound methodology on the whole. I do however have very substantial concerns about several aspects of the data analysis and, by extension, the authors' interpretation of the data which I feel would need to be addressed prior to publication.

Major Comments:

1) I think in the first instance it is important that the authors establish specific hypotheses regarding their data. These should be provided at the end of the Introduction. At present the authors state that:

"Demonstrating differences in RP between arbitrary and deliberate decisions would first challenge the generalizability of the RP (from arbitrary to deliberate decisions) as an index for internal decision-making. Second, it would more generally suggest that different neural mechanisms might be at play between deliberate and arbitrary decisions. This, in turn, would question the generalizability of studies focused on arbitrary decisions to everyday, ecological, deliberate decisions-regardless of whether these studies relied on the RP or not."

I find these statements problematic from the outset. First, given how different the arbitrary and deliberate conditions are, it is wholly expected that some trivial differences would be observed in the RP. In the arbitrary condition the participant knows in advance of stimulus onset that they can randomly select a response and so can prepare to act in advance. In contrast the participant cannot prepare a specific action until after stimulus-onset in the deliberative condition. This is borne out in the stimulus-locked traces where clear preparatory RP activity is observed pre-stimulus in the arbitrary condition. So simply stating that a 'difference' would undermine the generalisability of RP findings is not correct. The authors need to be much more specific about what difference that might be. In fact, their analyses are very much geared toward showing that the RP is wholly absent during deliberative decisions. If so, this should be clearly stated from the beginning.

We thank the reviewer for this comment. Following it, we went into more detail about our hypothesis in the Introduction and now clarify and justify them better (Introduction section). Briefly, we hypothesize that the RP is present in arbitrary decisions and absent in deliberate ones. However, we note that it is enough for the RP to be strongly diminished during deliberate decisions in comparison to arbitrary ones to support our main claims in this manuscript: (1) deliberate and arbitrary decisions may involve different neural mechanisms, and (2) generalizing from arbitrary to deliberate decisions is problematic.

Regarding differences between arbitrary and deliberate decisions, we tried to equate the two as much as possible in terms of the sensory input, motor output, and even memory and cognitive load (the catch trials). Additional differences are inherent to the difference in decision types, and accordingly part of the research question (e.g., in deliberate decisions subjects are devoting more thought to the decision and cannot prepare the decision in advance). In reference to the specific concern made by the reviewer regarding the differences in preparatory activity between arbitrary and deliberate decisions and stimulus-locked activity (Figure 6A), we do not think the data supports the interpretation of a preparatory activation in arbitrary and not deliberate decisions. Note that the arbitrary hard activity is indistinguishable from the two deliberate conditions, and it is only the arbitrary easy trials that diverge from the other conditions. However, easy and hard trials were randomly interspersed in deliberate and arbitrary blocks, and the subject discovered the trial difficulty only at stimulus onset. Thus, there couldn’t have been differential preparatory activity that differs with decision difficulty. This divergence in one condition only is accordingly likely due to some fluke in the data rather than reflecting any preparatory RP activity. We now specifically address that in the manuscript.

2) Given the emphasis that is placed on demonstrating an absent RP in the deliberative condition, and my aforementioned concerns that trivial differences in RP are to be expected when comparing across these conditions, it is important to consider this aspect of the analysis very carefully. The authors do well to conduct Bayes Factor analyses to complement null-hypothesis significance testing. In the first instance however their analysis highlights inconclusive evidence for the RP's absence on deliberative trials. The authors then re-baseline their waveforms to -1000 to -500ms to exclude possible negativities that may not be RP related. The BF then reduces to 0.332 i.e. hovering just above the conventional cutoff of <0.33. I have a couple of concerns around this. First, the study is designed to ensure 80% probability of detecting an RP >=2.6uV. Thus it is potentially possible that the RP is indeed present but too small to detect with a relatively small sample size (it is worth noting here that visual inspection of the waveforms would tend to favour there being an RP in both conditions albeit with substantial differences in amplitude). I wonder is it a safer bet for the authors to focus on the fact that the RP is substantially diminished rather than absent in the deliberative condition. If so, what new insights might this offer into the nature of our decisions? For example, it would be very interesting if it turned out that the RP is particularly invoked for arbitrary actions as a sort of random-number generator (but see below comments).

We agree with the reviewer that our results clearly state that the RP in deliberate decisions is at least strongly diminished with respect to arbitrary decisions, and we now specifically address this interpretation in the manuscript, in various locations. And we agree that a reader that remains unconvinced by our 6 different analyses would still likely agree that the RP is strongly diminished in deliberate decisions. Note, however, that our power analysis suggested that we would have a 99% (and not 80%) probability of detecting an RP>=2.6 uV with 16 usable subjects (Materials and method, “Sample size and exclusion criteria”). We had 18 subjects, so our power was even higher.

Regarding the visual inspection that the reviewer notes might suggest a diminished RP, this is actually in line with our results according to our model. The model predicts that we would find a roughly linear decrease in activation in the deliberate condition (Figure 9C). This is now discussed in more detail in the modeling section of the Results.

3) Following on from the above, given the quite massive differences in RT across conditions, I am concerned that the response-locked baseline is not appropriate. It is quite plausible that the RP will build more slowly in the deliberative condition reflecting the much slower decision process and applying this baseline will simply subtract away those differences and shift the peak amplitude of the RP closer to 0. The authors do conduct some analyses to test for a relationship between RT and RP amplitude but these are based on a cross-subject median split and, consequently, an N of just 9 in each group. Moreover, even after splitting the groups this way there is still a circa 700ms difference in RT between conditions. The authors observe p-values >0.05 in these follow-ups but no Bayes Factor analyses are provided. Similarly, the regression of RT difference against RP amplitude is likely to be underpowered and no BFs are reported. I suspect that if the authors were to conduct a within-subject median-splits based on RT that an RT effect would be much more likely to emerge in a within-subjects statistical analysis. In my experience, across subject variations in signal amplitude are a much bigger problem than cross trial within-subject variations. In any case, wouldn't a non-significant relationship between the RP and RT be confusing given the previous literature? At minimum, shouldn't we be observing a relationship within the arbitrary condition?

This comment includes several criticisms. We address them one by one.

1) We thank the reviewer for this comment but note that the response-locked baseline analysis is just 1 of 6 analyses we carry out on our data to investigate the RP (and, as we explain above, though it is indeed inconclusive, it is in the expected direction). One of these six analyses, which was added in this revision, was a Bayesian analysis of the downward trend in the time window of the RP (the last 1 s before movement onset). This new analysis provided extremely strong evidence for a downward slope in the arbitrary case and moderate to strong evidence against the existence of a slope in the deliberate case. Thus, taken together, our results provide clear evidence for an absence of RP in deliberate decisions. We detail this in the manuscript (Results and Discussion).

2) Following the reviewer’s suggestion, we conducted a within-subjects analysis, taking faster deliberate trials and slower arbitrary trials for each subject. As is now reported in the text (Results section), the mean difference between fast deliberate and slow arbitrary across subjects for the within-subject analysis was only about a 1/3 of what it was across subjects—230ms. Nevertheless, when comparing arbitrary and deliberate activity, we again find an RP in arbitrary and not in deliberate decisions and now plot this in a new panel of Figure 5 (Figure 5C). We also find no relation between the differences in RT and those in RP for the fast deliberate and slow arbitrary ones within subjects. This analysis therefore provides more evidence for there being no relation between RT and RP in our data.

3) To the best of our understanding, the key question here is whether the RT differences between arbitrary and deliberate decisions could explain the absence of an RP in deliberate decisions. We know of no literature suggesting a positive relationship between RT and RP, to which the reviewer appears to insinuate. There is, for example. literature suggesting that Parkinson’s patients tend to have longer RTs and normal or smaller RPs (e.g., Simpson and Kuraibet, Neurology, Neurosurgery and Psychiatry, 1987; Dick et al., Brain, 1989).

4) The leaky DDM accounts for the shape of the ERP at CZ. I cannot find any description of precisely how the plot in Figure 8C was generated. Does it represent the cumulative noise component?

Thank you for pointing out that the explanation of the model was not clear enough. We now go into much more details about the DDM and how Figure 9 (previously Figure 8) was generated in the Results and in the Materials and methods. We also added a new figure, Figure 8, that explains more about how the model works. To answer the reviewer’s question, that figure panel represents the noise and value components activation. Again, this is explained in much more detail in the current version of the manuscript.

Minor Comments:

To what degree might the deliberative RP amplitude be impacted by overlapping decision signals. Recent work by my own group has highlighted a P3-like centro-parietal potential that traces perceptual decision formation and peaks at response (e.g. O'Connell et al., 2012, Nat Neurosci; Kelly and O'Connell, 2013, J Neurosci; Twomey et al., 2015, Eur J Neurosci). In Kelly and O'Connell, 2013, we actually observed an RP-like signal evolving around the same time as the CPP and we found that the two signals were impacting one another, something that we were able to resolve using current-source density (CSD) analysis. I wonder what the authors thoughts are on A) the fact that we seemed to observe an RP during deliberative decisions in this case and B) whether an overlapping centro-parietal positivity could partly account for the small size of the RP in their analyses – something that could be verified through application of CSD transforms.

This is an interesting suggestion, which we tested. We conducted an analysis on the CSD transformed data, to see if the lack of RP in the deliberate condition might be explained by the cooccurrence of a CPP component. To our best judgement, the data does not align with such an account. In Author response image 1 we present a figure of the CPP effect obtained in Kelly and O’Connell, 2013, and the topographies of the effects we found in the deliberate condition, both with and without a CSD transformation. These do not reveal any form of an RP, nor a component which resembles the CPP. Thus, though we think this might indeed have been a possible explanation of our results, it was not borne out by the data.

Author response image 1

The description of the Schurger study in subsection “Drift Diffusion Model (DDM)” is a little unclear, probably just down to the particular wording used. The meaning of 'non-linear threshold crossing' is not clear to me. 'The crossing of that threshold reflects the onset of the decision in the model, typically leading to action.' 'Onset of the decision' is an ambiguous phrase to use here as it is often used with reference to the time at which evidence accumulation commences. According to these models the threshold instead reflects the completion/termination of the decision. Also the authors state 'Schurger and colleagues claimed, time-locking to response onset ensures that these spontaneous fluctuations appear, when averaged over many trials, as a gradual increase in neural activity.' It is not clear to me whether the authors are aware that according to Schurger this arises because the RP reflects an ACCUMULATION of those spontaneous fluctuations over time as opposed to a moment-by-moment reflection of instantaneous fluctuations.

We thank the reviewer for pointing out this lack of clarity. We are, of course, aware that the Schurger model accumulates spontaneous fluctuations. And rephrased the explanation of the Schurger model to be clearer. As noted elsewhere, we completely rewrote the section of the manuscript describing the model.

LRP analyses. Again these hinge on embracing the null hypothesis. Given the low statistical power and the signs of a larger peak amplitude (and possibly slower build-up) for deliberative decisions, at minimum bayes factor analyses should be provided.

We followed the reviewer’s suggestion and ran a Bayes factor analysis for the LRP. We found BF=0.299, which—according to the standard interpretation—provides moderate evidence against the effect of the decision type on the LRP.

Reviewer #2:

This is an interesting study. The authors challenged the role of RP in internally-generated decision, an EEG signature often linked to volition or action awareness since Libet's work. They contrast deliberate decision (choices did matter) with arbitrary decision (choices do not matter) in a "two-alternative NPO choice" task. Both types of decisions had the similar LRP component, but only arbitrary decision showed reliable RP (at Cz). The authors further simulated an altered version of DDM, showing the absence of RP when assuming a value component underlying the deliberate decision.

The experimental design and analysis are rigorous. The authors should be applauded for their effects on assessing several alternative explanations to the absence of RP in the deliberate decision.

I have a few comments on current version of the manuscript.

1) The modelling procedure is unclear. First, it is unclear why drift rate was the only parameter allowed to change between conditions. Since different task conditions were blocked (see the other comments below), one may argue that it is possible to have the threshold (or other parameters) to vary between blocks, as an adaptive decision strategy. Would model comparison be feasible to identify the optimal parameters?

Second, DDM (or many other accumulator models) assumed one-off, rapid decisions without deliberate thinking. In other words, the very first boundary crossing would render a decision, as in the current model. Although the subjects were instructed to respond asap, the current design cannot rule out the possibility of rethinking/change-of-mind in a single trial, which violates the model/simulation assumption. The prolonged RT in deliberate decision support such possibility, that subjects might not rush to a decision, even when a decision threshold was reached.

We thank the reviewer for pointing out that our explanation of the model was not clear enough. In the current version of the manuscript, we rewrote the section describing the model and added another figure to describe and explain it better. We think that in the current version the model and its contribution to the results are much clearer.

As for the reviewer’s specific questions:

1) We think that it is most parsimonious that as few parameters as possible change in the model between the different conditions. Having just one parameter change makes it is easier to understand what exactly changes between easy/hard arbitrary/deliberate decisions. In contrast, having multiple parameters change would add unneeded degrees of freedom to the model that must then be accounted for. Note that we do not claim that this is the optimal or even the only model that could explain our data. However, our model is relatively simple, and it makes some interesting, testable predictions that are borne out by the empirical data.

2) DDMs, race-to-threshold, and similar models are increasingly used to model value-based decision-making in neuroeconomics and elsewhere (e.g., Krajbich et al., Am Econ Rev, 2014). In our model, the value of each decision alternative is reflected by the drift in each of the components of the race to threshold. In this setting, one could—for example—consider a case where one component comes close to the threshold and then decreases away while the other component ends up reaching the threshold first. This might be termed a change of mind in this model. Though, just like a person cannot change their mind after they pressed the button in our experimental setup and task, so does the decision get finalized in the model once a component reaches a threshold. Importantly, changes of mind were not part of our task and we did not design our model to deal with them. Hence, the above is speculative. Last, we think that the longer RT in deliberate trials stemmed from longer deliberation time, reflected in a higher threshold. We see no reason to think that subjects went consistently counter to the instructions and continuously changed their mind, especially as no subject reported having had a problem with changing their mind in the post-experimental debriefing.

2) The arbitrary decision explicitly urged subjects "not to let their preferred NPO dictate their response". This is a strong requirement (asking NOT to follow their preference), compared with previous studies using the typical free-choice paradigm). I wonder if the authors could comment on whether this may inflate the difference between the two types of decisions in the current study. Second, is there any regularity in the arbitrary decision, such as alternating response in consecutive trials?

The reviewer is correct that the instructions for the arbitrary decisions might not be trivial.

However, no subjects reported difficulty in carrying out the instructions in post-experiment debriefing. Further, the behavioral results suggest that subjects were generally able to follow those instructions well. In addition, the EEG results replicate those of Libet and other studies, providing more evidence that our subjects were able to generate arbitrary-like behavior. What is more, under the reviewer’s account it should have been more difficult for us to find differences between arbitrary and deliberate decisions (the same goes for the regularity that the reviewer suggests). This is because, had the subjects exercised their preferences, their decisions would have been more deliberate (e.g., activating networks related to values). And, also, this would have been less like the Libet studies, so we would have been less likely to find an RP, for example. Thus, the fact that we find such clear differences goes against this account.

Regarding the regularities the reviewer mentions, we found no glaring regularities in subjects’ decisions during arbitrary blocks. For example, no subject always chose left or always right, constantly alternated left and right, and so on. However, it is known that humans cannot generate truly random series (e.g., Nickerson, Psychol Rev, 2002; Rapoport and Budescu J Experi Psychol: General, 1992; Budescu and Rapoport, J Behav Decision Making, 1994). So, we do not expect our subjects to pass any strict randomness tests. Such regularity would probably also be found in the Libet experiment and cannot explain our results. Again, if anything this would have made arbitrary decisions more deliberate, making us less likely to find differences between decision types.

3) Different decisions were blocked in the current study. Could the difference in RP be due to a contextual effect? For example, was the early visual ERP comparable between conditions?

Indeed, we decided to group the trials by decision type, because in pilot experiments we realized that subjects are having a hard time switching between decision types on a trial by trial basis. However, we do not think that this could explain the effects. If anything, we think that this should have increased the chances of obtaining an effect in deliberate decisions as well, since subjects do not have to pay attention to selecting the appropriate decision strategy (that is, deliberate/arbitrary), and can focus simply on making the decision.

As for the question about the early visual RP, this information can be found in Figure 6A. There, it does seem that the easy arbitrary waveform diverges from the other 3 conditions. However, as we explain in the Results section, this cannot explain the results we find. And this does not appear to reflect contextual effects either because easy and hard decisions are randomly interleaved for each block. So, subjects do not know whether an upcoming decision will be arbitrary easy or hard before stimulus onset, and so an effect found for easy trials only prior to the trial cannot be explained and cannot account for the RP.

4) Subsection “Differences in reaction times (RT) between conditions, including stimulus-locked potentials and baselines, do not drive the effect”. For the median split analysis to work, my understanding is that no subject had above-median deliberate RT as well as below-median arbitrary RT (i.e., the two groups did not overlap). Could the authors confirm that this was the case? Because inferences on the null hypothesis are important to rule out the effect of RT and other confounding variables on RP, it would be useful to report Bayes factor as well.

We thank the reviewer for this comment. First, we now carry out a within-subject median split as well as the between-subject median split. The within-subject analysis provides the same results. Second, following the reviewer’s comment, we checked for possible overlap and found it for only 3 of the 18 subjects. Hence, we removed those subjects and repeated the analysis. As is evident from Author response image 2, the results stay the same.

Author response image 2

Minor Comments:

Figures 8A and 8B could be superimposed over each other, allowing easy assessment of the goodness of fit of RT distributions.

In Author response image 3 is what superimposing the figures looks like. We think that with this format it is harder to understand what is going on and therefore opted for separate panels.

Author response image 3

Reviewer #3:

The present manuscript presents a single experiment aiming at evaluating whether the typically observed early BP onset on spontaneous (arbitrary, purposeless, unreasoned and without consequences…) movements generalizes to deliberate decision making. The authors hence contrasted two decision making situations: in a first condition, participants had to decide, on every trials, to which NPOs the wanted to give money; their choice was effective. In a second condition, while facing the same situation, they were told that their response was consequenceless, since the same amount of money would be given to each NPOs. While a standard BP was observed in the second condition, the authors argue that there is no such BP in the first condition. They hence argue that the results classically observed in arbitrary movement do not generalize to deliberate decision. The early onset of BP has long been considered as reflecting an activation of a response before the actual decision to move was taken, casting doubts on the very notion of free-will. The absence of such pre-conscious marker in deliberate decision is taken by the authors as an argument that, if there is no real free-will in arbitrary movement, this argument cannot be used for deliberate decision.

General comments:

Although I may generally agree with the authors' conclusions, I'm afraid they are not strongly supported by the facts as they stand. The limitations are methodological and conceptual.

While the data show a clear difference in the BP between the two conditions, the core of the authors reasoning is based on the presence of a BP in the arbitrary condition and the total absence of BP in the deliberate one. Indeed, if there were a BP, even of smaller amplitude, in this condition, the argument would immediately fall. There are, however, several points that may challenge this view.

Before addressing the specific points made by the reviewer, we wish to respectfully disagree with the above premise about the interpretation of the data. We do not think that our argument would rise and fall based solely on the dichotomous existence/no-existence of the RP. Rather, we think that the mere fact that the RP is heavily reduced in deliberate decisions challenges previous attempts to generalize from arbitrary decisions to deliberate ones. Thus, while an absence of an RP would perhaps strengthen our claims, a substantial decrease in amplitude is enough for us to draw clear conclusions. At the very least, a considerably reduced RP demonstrates that there are clear differences between the neural processes that drive arbitrary and deliberate decisions. In addition, this means that the RP is more pronounced in arbitrary decisions, which are arguably more driven by random fluctuations, than in deliberate decisions, which are probably based on different neural mechanisms. And so, we think that even without a conclusive null result, the findings of this study would be of importance to the study of voluntary action.

What is more, and as we now discuss at length in the Discussion, we now conducted 6 different kinds of analyses (both using NHST and using Bayesian methods). None of these analyses supported the claim that there exists an RP in deliberate decisions. And all but one of our 6 analyses supported the claim that there is no RP in deliberate decisions, with the sole remaining analysis still suggesting evidence—albeit inconclusive—for the lack of RP. Therefore, we think that— taken together—our results provide clear evidence for an absence of RP in deliberate decisions.

1) Data were high-pass filtered at 0.1 Hz (which corresponds to a time-constant of 1.6 s). This value is way too high for a proper estimate of the BP. As a matter of comparison, the very first measures of BP were even performed under DC conditions, that is without any filtering… At the very minimum, a filter of.01 Hz, or ideally even lower, like.001Hz should be used. Indeed, the BP is a very slow component, such a high filter value has very likely largely deceased the amplitude of the BP. One may argue that filtering has impacted the two conditions in the same way, and hence that any potential distortion cannot explain the results. This is certainly true concerning the difference in amplitude (see however below). But, as indicated above, the rationale put forward by the authors only holds if there is NO BP on the deliberate condition, not if the BP is simply reduced. If, with a more adapted filter, there would have been a BP, the whole argument is invalidated. Second, the response time are much longer in the deliberate than in the arbitrary case. So the time constant of the slow potential might be much longer in this case, and hence much more affected by the inappropriate filter (the control proposed in only partially addressing this issue since i) the comparison is between participants, and ii) at least for the easy deliberate condition, the results are ambiguous).

We thank the reviewer for noticing this point. We first reiterate that we do not think that the rationale only holds if no RP is found in deliberate decisions (see our reply above). More specifically to the question of filtering, our choice of high-pass filter corresponds to some (e.g., MacKinnon et al., 2013; Lew et al., 2012) but indeed not all (e.g., Haggard and Eimer, 1999) studies in the literature. And so, RP was repeatedly found and reported with 0.1 Hz filtering. Yet following this comment, and to make sure our results are indeed not dependent on the filter we used, we reanalyzed the data. A filter of 0.01 is lower than some seminal papers in the field (e.g., Haggard and Eimer, 1999) used a high-pass filter at 0.016). With that filter, we obtained less trials than when using 0.1, as we originally did; but these were enough or analysis. Given the lower number of trials, and that the main question here pertains to arbitrary vs. deliberate (with decision difficulty serving mostly to validate the manipulation), we collapsed the trials across decision difficulty, and only tested RP amplitudes in arbitrary vs. deliberate decisions against each other and against zero. In line with our original results, a difference was found in the RP amplitude between the conditions (t(13)=2.29, p=0.0394), with RP in the arbitrary condition differing from zero (t(13)=-5.71., p<0.0001), as opposed to the deliberate condition, where it did not (t(13)=-0.76, p=0.462). We added this information to the manuscript.

2) Figure 6 presents the stimulus-locked data, supposed to invalidate a stimulus-locked contamination. While I agree that the response-locked BP is not purely a stimulus induced effect, several features of the stimulus-locked averaged deserve comments. First, as can be seen in Figure 6A, a negative shift (CNV-like) is already present BEFORE stimulus presentation in the arbitrary condition, and it seems even modulated by (precued) task difficulty. So, even before participants enter any decision making process related to the relevant choice, one can see differences in the slow negative potentials between the conditions. At minimum, this indicate that the difference observed response-locked might not be specific to the decision period. Furthermore, following the large visual evoked potentials after the stimulus, one can see around 500 ms a first positive component, followed by a negative-going one (starting around 650 ms). This negative-going peaks close to the average RT in the arbitrary condition, and hence likely contribute to the BP. The same activity is present in the deliberate condition (and is actually very similar), but the response being given much later, this early negative-going activity does not contribute to the BP. By itself, this activity, which is not strictly speaking a BP, does create a difference between the two conditions. More generally, the general shape of the late evoked potentials (after 600 ms) are remarkably similar across conditions, which dramatically contrasts with the large difference observed response-locked. This may suggest that the difference observed is, at least in part, simply due the different averaging event is an otherwise very similar signals.

The reviewer’s comment is composed of several points. We respond to them in order.

1) We thank the reviewer for this comment and note that they agree with us that the results cannot be explained solely by stimulus-locked effects. The reviewer’s comment focuses on an apparent difference in the stimulus-locked waveforms: the waveform for arbitrary-easy decisions seems to diverge from all the others from about 500 ms before stimulus onset until stimulus onset. However, and importantly, task difficulty was randomly assigned to each trial within each block, as we explained above. And it was the stimulus that informed the subjects of the decision difficulty in that trial. Therefore, our subjects could not know the decision difficulty in an upcoming trial before stimulus onset. And we see the effect only for arbitrary easy and not for arbitrary hard trials. So, the effect could not be one of precuing in a blocked design. What is more, the response locked waveforms that are the focus of this paper show no such pattern. There, no difference was found between difficulty within each decision type. And the main difference is between the two arbitrary conditions and the two deliberate one, irrespective of difficulty. We now explain that in the manuscript.

2) A few points are worthy of mention here. First, our model predicts similar activity between arbitrary and deliberate decisions when those are stimulus locked (Figure 8B). Second, the negative peak to which the reviewer refers (Figure 6A) is around 200-300 ms after the mean RT for arbitrary decisions in the stimulus-locked condition. So, if this peak is what we are seeing in the decision locked condition, we would expect the RP to also peak around 200-300 ms after movement onset. However, the RP peaks 100-200 ms before movement onset (Figure 6B). So, the reviewer’s interpretation of our results would need to account for this 300-500 ms discrepancy in the peak of the RP. What is more, we do not see any of the other components that the reviewer mentions (e.g., the first positive component around 500 ms) in the RP (Figure 6B), providing more evidence against the role of these stimulus-locked components in driving the decision-locked RP. We therefore think that our interpretation of the results is more plausible.

3) The authors focused on Cz to extract the BP. However, there is large literature indicating that the BP is actually made of several sources, some medial in the (pre)SMA, some more lateral in the (primary) motor cortices. Note that, for the authors' rationale to be valid, none of the sources should present a (early) BP. The topographies presented on Figure 3B indicate a clear negative activity above Cz and neighbors in the arbitrary condition, and no activity in the same region in the deliberate condition. But the topography suggests a clear negative activity located more lateral over the left hemisphere. Does this activity correspond to a BP? Since only Cz is shown, one cannot evaluate this possibility.

Again, while we thank the reviewer for this comment, we think their interpretation of our claims is overstated. We do not argue that deliberate decisions are not preceded by neural activity—such a claim would be dualistic in essence (because to assume that for an action to be free, it should not be preceded by neural activity, entails that free actions should not be originated from the brain. If so, from where should they originate? This, in fact, was basis for criticism on Libet’s view; see Wood (BBS, 1985) and Mudrik and Maoz, 2014. Rather, we are simply claiming that the classical RP – which is recorded over Cz – does not generalize to deliberate decisions. And the topography in deliberate decisions does not reveal any sign for such an RP, as it is completely lateralized. We are not aware of any lateralized RP, but if there are such findings that correspond to ours, we will be very thankful if the reviewer could direct us to them, so we could refer to them in the manuscript.

4) In real spontaneous responses, the BP starts much earlier than the LRP. But in the present data set, as can be inferred from Figure 3A, the conditions diverge around -800 ms (roughly estimated). This is basically the same latency for LRP onset. Note that the latency of the LRP (even with all the cautions that may come with this measure…) seems exactly the same for all 4 conditions. So, at the time of BP divergence, the LRP onset indicates not difference in response selection / decision making. Why is the (CZ-)BP more important for the conclusions the authors want to defend? LRP seem to say something very different.

We ran a Maris and Oostenveld cluster-based nonparametric analysis to rigorously test when the RP and LRP diverge from 0, beyond visual inspection alone. The results of this analysis are now reported in Figures 3A and 7. As is apparent, arbitrary decisions diverge from 0 earlier than 1s before movement onset for both decision difficulties. Deliberate decisions never diverge from 0. And all LRP waveforms diverge from 0 only around 500 ms before movement onset. So, according to this standard method, the RP starts much earlier than the LRP.

Regarding the reviewer’s question about why we focused on the RP (from Cz) rather than the LRP, there are several reasons. First, the RP (from Cz) is the component that is most commonly associated with the Libet paradigm. Second, the LRP, being more lateral, is more directly over the left and right motor cortices while Cz is above the SMA. As such, the LRP is typically taken to reflect more motorrelated brain activity and the LRP more general preparation. Our task was specifically constructed such that the motor output of arbitrary and deliberate decisions would be the same. Hence, seeing the same LRP in all conditions encourages us, as it seems to provide evidence that the neural activity associated with the movement in all decision types is very similar.

5) A further issue concerns the modeling part. First, it is very unclear how the fit was performed. In subsection “Drift Diffusion Model (DDM)”, it is indicated that "We fit our DDM to our average empirical reaction-times, which were 2.13, 2.52, 0.98 and 1.00s for the different conditions ". This sentence suggests that only the grand mean were used, not the whole distribution of RTs. Besides the fact that this is in complete deviation with the overall logic of fitting DDM which aims at capturing RT distribution shape and errors at the same time, a fit performed on averages only is largely under-constrained. As a matter of fact, although there is no information about quality of fit, comparison of Figures 8A and B indicate that the empirical and predicted distribution largely differ, especially in terms of spread (i.e. variance). The obtained parameters are also weird and likely invalid: Table 1 indicates that, in the deliberate condition, the estimated drift is actually higher for the hard condition than for the easy one. This does not make any sense, and is disagreement with the empirical RT that are, as expected, longer for the hard than for the easy condition. Furthermore, one reads "[…] The model was further fit to the empirical consistency ratio […]" (emphasis mine…). "Further" suggests that the fit to the mean RT and the consistency ration were not performed simultaneously? Is that the case? Please clarify.

Besides these fitting problems, I have conceptual issues. Threshold crossing in the arbitrary condition is supposed to be triggered by noise only. In standard accumulation-to-a-bound models, noise is classically assumed to be gaussian noise with mean 0 and a standard deviation s. For "noise" to hit the threshold one needs to assume a very high s. Or to assume that the mean is not 0… But in this case, this is not noise, this is a drift. Actually, Table 1 indicates: the "drift" in the arbitrary condition is much higher than in the deliberate condition. But what is the meaning of this "drift" if there is no decision, and the response is triggered by random fluctuations of noise? In contrast, what does it mean that a deliberate decision was reached with an information accumulation equals to 0? How can a threshold be reached if there is no information accumulation? If it's simply noise, why is different from the arbitrary condition? Furthermore, if random noise can hit the bound early in the accumulation process, if noise is 0-centered, the probability to hit the threshold decreases with time. So late responses are unlikely to be triggered by noise. So, this whole fitting part is largely based on inconsistent assumptions.

First, as we indicated elsewhere, we now rewrote the section devised to the description of the model. And we think it is much easier to read and understand. Nevertheless, we broke down the reviewer’s comment into sections and respond to them one by one below.

1) We fit our model using the same method that Schurger et al., 2012, fit their model, as we base our model on theirs. Though we added constraints based on the consistency scores (Figure 2) to drive the differences in parameters between the congruent and incongruent DDMs.

2) We thank the reviewer for their careful reading of our manuscript and for spotting errors in Table 1. We now corrected these errors in this table in this version of our manuscript. The correct magnitudes in Table 1 have lower drifts for deliberate hard than for deliberate easy.

3) All that was meant in this sentence is that we fit the model to both the RTs and consistencies. We also explained that we fit the RT and consistency together in the Materials and methods, and in the Results. We thank the reviewer for alerting us that this sentence might be confusing. And we changed “further” to “simultaneously” to clarify this.

4) As the reviewer indicates, the noise in our model has a mean of 0, as is standard, and we employ a drift. In this we follow the Schurger et al., 2012, model. It is certainly possible to model our data by changing both the drift and the threshold. But changing two model parameters instead of one is less parsimonious and adds unneeded degrees of freedom that must then be accounted for. We therefore opted to change only the drift parameter among the conditions. The values of the decision alternatives in our model are designated by the drift—separately for each decision type. Hence, for deliberate decisions, the congruent cause had a higher value and thus also a higher drift rate than the incongruent cause and its associated DDM. For arbitrary decisions, the values of the decision alternatives mattered little and this was reflected in the small differences, if at all, among the drift rates.

5) A drift rate of 0 indicates that the threshold would be reached only if the noise parameter would carry out a random walk to the threshold (up to the leak pushing it back toward the baseline). So, the threshold can be and is reached. Note, however, that the 0 drift in Table 1 exists only for the incongruent decision alternative in easy deliberate decisions. For those decisions, the inconsistent choice should almost never win the race to threshold against the consistent choice. And, indeed, this alternative is selected only about 1% of the time empirically as well as in the model. We therefore do not agree that the reviewer that the fitting part is largely based on inconsistent assumptions.

My general feeling is that the authors provide further arguments against the idea that BP can be used as a marker of volition, but this is not new. In contrast, they provide few (if any) new arguments in the discussion of whether there is free-will or not.

As discussed in the Introduction and Discussion, the Libet results have long been claimed to provide evidence against the existence of free will. However, for that claim to hold water, these results must be generalizable from the purely arbitrary setting of raising a hand for no reason and purpose to the deliberate decisions that are the typical focus of the free will debate. Whether the Libet results generalize from arbitrary to deliberate decisions is therefore key to their applicability to the free will debate, especially when claims are made regarding moral responsibility. To the best of our knowledge, our results are the first to directly compare arbitrary and deliberate decisions. As such, our results are an important neuroscientific contribution to the debate on free will.

Minor comments:

– Bayesian analysis. The core idea of Bayesian ANOVA is that the values are indicative, but no hard threshold should be used. Hence, considering that a BF of.332 (that is a probability ratio > 3) is an argument for no effect is wrong. Actually, in this uncertainty zone, one cannot conclude anything, and this value is NOT an argument for H0.

we thank the reviewer for making this point, though we again respectfully disagree. What one could claim based on the BF, is that H0 is 3 times more likely than H1. Is this not an argument for H0? In fact, this is commonly taken in the literature as moderate evidence for the null result, which is exactly how this is described in the manuscript. Importantly, we emphasize there that the evidence are inconclusive to moderate, given the two analyses we conducted.

– EEG results. Authors report a main effect of decision type. But is there an interaction with difficulty? If not, the following t test are not completely valid (although this does not change the conclusions…)

unless we are mistaken, one is not allowed to conduct t-tests following an insignificant interaction when these t-tests are aimed at exploring the source of the interaction. Yet the t-tests we conducted are not aimed at that (we have no claim about the differences between decision types at the different levels of decision difficulty variable, or vice versa). Rather, these are t-tests against zero, and so they are not tested in the ANOVA (or the interaction) to begin with.

– Figure 5B: the data plotted in this figure puzzles me. The y-axis plots the difference (deliberate – arbitrary). 12 points out of 17 (?) are actually negative, indicating that deliberate was more negative than arbitrary. How can this be?

We thank the reviewer for spotting this error. It has now been fixed. The correct plot is included in Figure 5B.

– Subsection “EEG Results: Lateralized Readiness Potential (LRP)”, the argument that LRP amplitude difference is due to the reference used is incorrect: LRP measure is reference-free. Indeed, the formula is (C3 -C4)r + (C4-C3)l. But actually all electrodes values should be written as (C3-ref) and (C4-ref). It becomes then obvious that the reference annihilates and does not intervenes in the computation.

We thank the reviewer for pointing this out. We removed this sentence that refers to a minor point in our manuscript.

– Modeling. The authors sometimes refer to a DDM, sometimes to a LCA, sometimes to a race. Those are very different architectures (although they can be related, see Bogacz et al., 2006). Please clarify what is really used.

We now clarify this in our revised description of the model.

– Experimental procedure. It is never indicated how the response were given! From the description (response keys) and EEG analysis (computation of LRP), one can infer that the response were given with the left and right hand, right? But with which finger?

We thank the reviewer for the thorough read of our manuscript. This information was mistakenly omitted from the original manuscript. Though we explained that subjects were asked to place their fingers on the Q and P keys, we did not explicitly say that it was the right and left index fingers. We added the missing description to the manuscript.

– Subsection “Experimental Procedure” and other: besides providing the t or F values, please provide the behavioral data. For example, please provide error rates (even if they do not statistically differ). Furthermore, considering that a difference with a p value equal to.09 as not significant, is a bit strong.

We now added the behavioral data, as the reviewer requested. However, we do not think there is any problem with considering a value of 0.09 as insignificant. If anything, more claims have been made in recent years against considering p values lower than 0.05 but higher than 0.01 or even 0.005 as significant (e.g., Benjamin et al., Nature Human Behavior, 2018), as means to reduce false discoveries. And so, we stand behind our referral to the 0.09 value as reflecting insignificance.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

All three reviewers agree that you have made substantial efforts to revise the manuscript in light of the initial round of comments and this clearly took time and effort on your part. However, I'm afraid we are all in agreement that some further revisions would be strongly recommended prior to publication. The remaining issues boil down to two substantive points:

First, we are still not convinced that the present results call for any alteration in our conception of free will. The key contribution of the paper seems to be to show that the RP may be specialised for arbitrary action selection and this seems worth reporting. However, beyond suggesting that the RP might not be a universal marker of the 'urge to act', the authors have not yet made a sufficiently clear case that their findings call for any change in how we think about the Libet findings or free will in general. Our recommendation at this point would be that the authors reframe the paper to focus more specifically on the RP and how the current findings bear on functional accounts of this signal.

Thank you for raising this issue. The discovery that RP onset precedes the reported onset of the intention to move was understood by Libet and colleagues—and has been widely understood since—to show that human decisions are not free. The discovery was thought to show either (a) that our decisions to act are unconscious, (b) that our decisions are conscious but are not the causes of our bodily movements, or (c) that our decisions are fully determined by brain activity that precedes them. (The references are too many to list as the original Libet study, (Libet, Gleason et al., 1983), has been cited thousands of times. A small sample of references to the above might include (Libet, 1985, Haynes, 2011, Hallett, 2016, Haggard, 2019)). For various philosophical reasons, deriving from accounts of the necessary and sufficient conditions that an act meets when it is free, (a), (b) and (c) have all, at various times, and by various thinkers, been thought to threaten the freedom of our actions (Again, Mele, 2009, Roskies, 2010, Sinnott-Armstrong and Nadel, 2011, Maoz and Yaffe, 2015 are but a small sample).

Our results here, however, show that any claims about this matter that can be made on the basis of the RP might be unwarranted for deliberate decisions. Given that deliberate decisions are the only ones for which we hold people responsible, and given that one of the reasons freedom has always been thought of interest is because it is necessary for responsibility, our study here has direct and immediate implications for the study of freedom.

Second, the reviewers also have significant outstanding concerns regarding the modelling. At minimum we would suggest clearly flagging the limitations of the adopted approach in the Discussion.

Thank you for this comment. We reconstructed the model and reran all simulations following the comments of the reviewers below (see Modeling subsection of the Results; see also Model and Simulations subsection of the Materials and methods). We also devote more space in the Discussion to the modeling. Importantly, the results of the new model are essentially the same as the previous model. We therefore think that the model, its results, and the conclusions we draw from the modeling are now clearer in the manuscript. Please see specific responses to reviewer comments regarding the model below.

Reviewer #1:

I think the authors have, by and large, done a good job of addressing the methodological concerns raised by the reviewers. Importantly the authors assert that they are not claiming that the RP is actually absent but rather significantly diminished during deliberative decisions. I have a couple of outstanding concerns.

We thank the reviewer for his comments that we did a good job addressing the methodological concerns raised by the reviewers. But we would like to clarify that we claim that at the very least the RP is much diminished, and we think that the most plausible conclusion from our analyses is that the RP is absent. We now discuss this and what a strongly diminished RP would entail in more detail in the Discussion.

I am still confused about the overarching premise or rationale for this study. It seems to me that the findings of this study boil down to showing that the RP is smaller for deliberative vs arbitrary decisions but I am not yet convinced that this has any bearing on our understanding of free-will or even on the significance of Libet's original reports. Libet showed that a neural signature of action preparation preceded conscious awareness of the decision to arbitrarily act. Perhaps that same signal (RP) is absent during deliberative decisions (and this is worth reporting) but that tells us nothing regarding the role or timing of conscious processes in this context. Of course it goes without saying that arbitrary and deliberative decisions involve distinct cognitive elements and, consequently, will activate some distinct brain areas. We know that several other signals that reflect action preparation (e.g. LRP, β-band desynchronisation) and evidence accumulation (e.g. P300, LIP spiking activity) precede explicit decision reports by substantial amounts of time.

As we make clear in the manuscript, our aim was not to question Libet’s original report. In fact, we replicate the central finding that arbitrary decisions are preceded by the RP. Our concern is with the further question of whether conclusions that one might reach about the lack of freedom of decisions preceded by the RP can be reached, also, about deliberate decisions. Our results suggest that they cannot, and we explain why this has clear bearing on the question of free will. We believe that this point is now very clear in the manuscript (much of the Discussion is devoted to this).

The authors seem to suggest that Schurger's model presents a challenge to Libet's interpretation: "A further reason to expect such differences stems from a recent computational model, which challenged the claim that the RP represents a genuine marker of unconscious decisions. Rather, the model suggested that the RP might reflect the artificial accumulation, up to a threshold, of stochastic fluctuations in neural activity"

I am still struggling to understand what difference it makes, in terms of our understanding of free will, if RP reflects 'action preparation' or 'noisy evidence accumulation' – wouldn't both processes would reflect the emergence of an 'urge to act'? Moreover, the authors seem to imply that Schurger's process must be conscious but I see no reason to make such an assumption. I think that particular sentence really captures my discomfort with the current framing of the findings.

We now describe the Schurger model and its relation to the Libet results in much more detail. We think that our explanation will help the readers understand why the Schurger model poses a challenge to the Libet interpretation. Briefly, the Libet experiment suggests that the RP starts before the onset of the conscious intention to move. It assumes that the RP is a ballistic process and thus its beginning marks the onset of a decision (at least an unconscious one). This was then taken to mean that the decisions are unconscious or that the conscious decisions are inefficacious. However, if the RP is not a mark of an unconscious decision but rather an artifact, its start preceding the conscious decision is of little importance (one place, among many, where this is discussed is Haggard, 2019). Also, we do not imply or think that the formation of the RP (as formulated by Schurger's model) needs to be a conscious process. We thank the reviewer for raising this point and removed the sentence that might have evoked this false impression.

Another way of putting it is that I am not clear on how exactly people were making generalisations to deliberative decisions based on the RP specifically. I get the impression the authors are trying to make the case that the RP has been thought of as the sole signature of pre-conscious decision making/action preparation in the brain but is that really a widely held view, again given the literature on decision making?

The RP is not the sole signature of unconscious decision-making. But it is certainly held to be a very prominent signature of unconscious decision making (Libet, Gleason et al., 1983, Libet, 1985, Haggard, 2008, Roskies, 2010, Hallett, 2016, Haggard, 2019 are just some of the very many possible references). And so, while it might be accompanied by other mechanisms, the underlying assumption has been that the RP also characterizes deliberate decisions.

Among other things, this can be learned from the fact that findings related to the RP have been used as basis for arguments about moral responsibility, which clearly pertain to deliberate decisions alone. This is now also clarified in the manuscript. In addition, as the RP has been viewed as such a prominent signature of pre-conscious decision-making, problems with generalizing it from arbitrary to deliberate decisions put the onus on those who wish to use other features to demonstrate that those generalize. We now discuss this too in the manuscript.

I think that perhaps the authors should reframe the paper to focus more narrowly on the apparent domain specificity of the RP. On the same point I think the title of the paper may be too broad given the almost exclusive focus on the RP. I am very much open to being convinced/correct on all of the above points but I feel that currently the paper is rather confused and confusing as regards the relevance of the findings to free will.

We thank the reviewer for alerting us that the manuscript might need to further explain its relevance to the free-will debate and many of the corrections and additions to the paper were made to clarify this. However, we respectfully disagree with the reviewer regarding the need to reframe the paper to focus more on specifically the RP. The paper indeed probes the RP (though notably also the lateralized readiness-potential, LRP), but it does so as part of a large body of literature which used the RP to study volition and the relations between subjects’ conscious decision and the underlying neural activity. Virtually every scientist who has replicated Libet’s result has made the further claim that the findings shed light on the freedom of human decisions. Critically, in many cases this claim related to all types of decisions, and not just arbitrary ones, as we now explain in the manuscript. We therefore do not think that we are overstepping in pointing out the fallacy of this inference.

My only other comment is that I do not agree with the authors insistence on only allowing one DDM parameter to vary across conditions on the grounds of parsimony. Although this is indeed a common approach in behavioural modelling, there are formal model comparison procedures available that would objectively identify the model that provides the best balance between parsimony and goodness of fit. Personally I would probably be ok with the authors acknowledging this in their Discussion as I do not view the modelling as a critical part of the story.

We thank the reviewer for this comment. Following comments from the reviewers, including this one, we reconstructed and re-simulated the model. In particular, we now fit the entire RT distribution of the model (1,200 samples) to the empirical RT distribution (1,200 samples) and, at the same time, we fit the modeling and empirical consistency rates. So, we now fit 4 parameters per condition (16 overall) to these 1,201 points and not just the drift (see Table 1).

Reviewer #2:

In this revision, the authors addressed some of the concerns in my last review. However, although the MS now has more detailed description of the model and simulation results (which was missing in the previous version), there are some additional issues.

1) The model used 11 parameters, 8 drift rates, a scaling parameter (1.45), threshold and decay to fit 8 data points (4 mean RT and 4 choice probability). The result is a largely unconstrained model that does not describe behaviour accurately. In the response letter, the authors showed model simulation overlayed with empirical data, and there is a large discrepancy in the fit of arbitrary condition. If the authors are determined to present current modelling results, as a proof-of-concept that the model can provide qualitative RP patterns, the limitation (that it does produce satisfactory quantitative fit to RT distributions as in other applications of DDM) needs to be acknowledged in Discussion.

As we explain in our last response to Reviewer 1, following the important comments made by the reviewers, including this one, we reconstructed and reran the model. We now fit the entire RT distribution and empirical consistency rates. Hence, we fit 4 parameters per condition (16 overall) to the 1,201 points of the reaction time and consistency. (The scaling parameter, 1.45, does not take its value from an optimization procedure. Instead, it is just calculated from the empirical data. But, even if it did, it would be 17 parameters for 1,201 points.) Our modeling procedure is therefore no longer under-constrained. Also, as is apparent in the new Figure 9A—now overlaid—the fit of the model’s RT and consistency to the empirical RT and consistency is rather good, with an average error of just 0.036, or 3.6%, across all the conditions. So, the model describes the behavior rather well.

2) Subsection “Model and Simulations”. "Using a parameter sweep.…". Please provide details on the fitting procedure. Was any optimization algorithm used here? If so, what was the cost function? Was the model fitting performed on averaged data or individual data?

We are glad to give more details. We ran an exhaustive grid search on the averaged data across participants. The optimization algorithm, the cost function, and the procedure in general are now explained in much more detail.

3) In the response letter to my point 1, "we think that the longer RT in deliberate trials stemmed from longer deliberation time, reflected in a higher threshold." This is confusing as the same threshold (0.15) across conditions was used in the paper?

We thank the reviewer for pointing out the mistake in our answer. We had meant to write that the longer RTs in deliberate trials stemmed from lower drift rates. Regardless, we now fit 4 parameters and not just the drift rate (Table 1). So, we think that this discussion is no longer relevant to our modeling.

4) Figure 8A. There are two independent accumulators in the "noise component". From my understanding of the model description, in arbitrary decisions, SMA activities were reflected in the traces of the winning accumulator only. If that is the case, why is the activity of the losing accumulator in the noise component not taken into account, as sculp EEG activity would not be sensitive or selective to one accumulator vs. the other. The same issue holds for the deliberate decisions, if both accumulators in the noise component do not dictate decision (as threshold was set to infinity), which accumulator (or both?) was representative of SMA activity? Please clarify.

We thank the reviewer for this comment and agree with it. Hence, in the new modeling algorithm we averaged the activity of the winning and losing accumulators as the reviewer suggested. This did not much change our results.

Reviewer #3:

In the first round of review, several questions were raised. The authors did a very good job in answering most of them. It is now clear that a RP is observed in the arbitrary condition, but not in the deliberate one. There are still, however, two points that remain unclear to me.

We thank Reviewer 3 for acknowledging the improvement of the manuscript, and for agreeing with us that the data clearly show that the RP is observed in the arbitrary condition but not in the deliberate condition (cf. our reply to Reviewer 1). We address the remaining points below.

1) What are the theoretical consequences of these results?

As said above, the authors convincingly show that RP is absent in the deliberate condition. Although this is not entirely new (some previous studies have reported absence of the BP before voluntary movement depending on the context), this report certainly adds to our understanding of the RP, its origins and functional interpretation. However, as can be read in the Abstract, the goals of the authors is much more than that. But I am not sure about the real theoretical impact of the results. Let me try to explain my trouble.

Libet's original report was that the RP starts before the conscious decision to move. It was thus argued that our intention to move is rather a consequence, not a cause, of the preparatory brain activity. This was taken as an argument that "free-will" is an illusion.

Here, the authors report that there is not RP in the deliberate condition. Hence, the nature of the decision differs between the two conditions. So, what do exactly the authors conclude from that?

– the Libet's argument vanishes for deliberate condition, hence there is no evidence against free will in this context. But do they think it still holds for arbitrary ?

– since RP might simply accumulated random noise, it is not an indication of voluntary movement decision, and hence Libet's argument is wrong even for arbitrary movement?

– if deliberate decision are made on another region X, it might still be that activity in region X starts before conscious detection, but this remains to be explicitly studied.

– something else?

I must confess that I cannot really get the real conclusion the authors want to defend, and they should try to be more explicit on what these results do really imply, and what they do not, on this free-will debate.

Our primary conclusion is that strong claims about the lack of freedom of decisions preceded by the RP are unsound when asserted about deliberate decisions. This follows immediately from our results and has direct implications for the free will debate. This is now further clarified in various places in the Discussion of our paper. However, we also hold, and now make clear in the paper, that it may also be unsafe to conclude that, where the RP precedes action, the action is unfree. The reason is that our model is compatible with and supportive of Schurger’s findings, which provide strong reason to doubt the validity of this inference.

2) Modeling.

Although many points have been clarified in this new version, some still remain a bit unclear.

To account for the choice situation, the authors modified Schurger et al. original model (who contained a single accumulator), and implemented two accumulators racing for the response. First, one may wonder why they did not choose the standard competitive version of the leaky accumulator (Usher and McClelland, 2001).

There were two reasons that we opted for the race-to-threshold model over the DDM with dual, upper and lower bounds. The first was that we think such a race-to-threshold model is more biological on the neuronal level at least. Work by Shadlen and others has demonstrated decisions (in random-dot motion for example) that are reached when neuronal firing rate reaches a certain threshold (de Lafuente, Jazayeri et al., 2015). But for a dual bound model, postsynaptic activity would need to differentially depend on the neuron either reaching a high-enough firing rate or a low-enough one. This seems more difficult to realize biologically than a race to a threshold between two neurons or regions based on their firing rate.

Second, if we were to use a model with a dual bound, we would end up with activity that either ramps up toward the upper bound or ramps down toward the lower one. Both of those situations would have had to create the RP, similar to the Schurger case. So, we would have had to assume some neural mechanism that would take those increasing and decreasing RPs as inputs and output a single RP, using for example some kind of absolute value function. This again seems to us less likely to be realized biologically than an implementation of the race-to-threshold model. We now discuss this point in the manuscript.

Second, it is not completely clear how the averaged data on Figure 9C was actually computed. I guess that, for each "trial" the winning accumulator was chosen (left or right) and all the traces of the winning accumulators were averaged. However, if two accumulators are racing within "SMA", the real simulated activity of SMA should be the sum of the two accumulators, not only the winning one. I'm not sure how this would modify the results, but for coherence, this is the way "SMA" activity should be evaluated in the model.

We thank the reviewer for this comment and agree with it. A similar one was made by Reviewer 2 (Comment 4). As we note in that response, in the new modeling algorithm we averaged (or took half the sum of) the activity of the winning and losing accumulators as the reviewer suggested. This did not much change our results.

Third, I still don't understand how the fit was performed. It is said "[…] we fit our DDMs to our average empirical reaction-times […]" (emphasis mine). It the fit was indeed performed only on the averages, this is non standard, and highly under-constrained. Such model are normally fitted to the whole response times distributions. Furthermore, there is no quantitative assessment of the fit quality. Comparison of Figure 9 panels A and B, suggests that the fit was not very good, especially in the arbitrary condition. Fourth, Figure 9 only shows the activity of "SMA". However, there is another actor, which is never shown: region "X"… In the deliberate condition, the decision in made base on the activity of this "region", but its dynamics is likely very different from "SMA" one. It would be of interest to see the accumulated activity of this region "X" in both the deliberate and arbitrary conditions. A last question concerns what "SMA" is doing. In the arbitrary condition, its accumulating random, spontaneous noise. But why is it not doing the same in the deliberate condition (in addition to accumulation in region "X")? Do the authors assume a form inhibition of region "X" on "SMA" to prevent it from accumulating? This part is bit too "magic" and an explicit, mechanistic, explanation would be useful, instead of just claiming that accumulation is done differently as a function of the context/choice (which is vague). Somehow relate to this last point, there seems to be a bit of (simulated) accumulated activity in the deliberate conditions in "SMA". Where does it come from?

We thank the reviewer for making this point. As we explain in our previous replies to Reviewers 1 and 2, who made similar points, we reconstructed the model following the good comments we received (including this one). In particular, we now fit the entire RT distribution of the model (1,200 samples) to the empirical RT distribution (1,200 samples) and, at the same time, we fit the modeling and empirical consistency rates. So, we fit 4 parameters per condition (16 overall) to these 1,201 points. Hence, our model is no longer under-constrained. This is explained in detail in the Modeling sections of the Materials and methods and Results and discussed in the Discussion. Note that the results of the model stay essentially the same, though the fit to the RT distributions is better.

Detailed responses to the comments by the reviewer now appear in the parts of the manuscript above. Briefly, one reason that we do not focus on Region X is that we assume that the activity there is similar to that of the SMA only that it is an unknown region to which we do not have access (at least using EEG). Hence, the RPs we show in Figure 9B are what our model predicts we would pick up in electrode Cz over the SMA. The activity of Region X is shown in Figure 8B (in green), where it is demonstrated how, in deliberate decisions, it imposes an early stop at different heights on the SMA component. However, another, perhaps more important, reason that we do not dwell too much on Region X is that the DDM model we chose for it was based on convenience and simplicity. The central result for us, the trend instead of RP in the SMA for deliberate decisions, only requires that the onset of deliberate decisions would remain statistically independent from threshold crossings in the DDMS of the SMA.

Besides, I have some more specific points

Introduction section: "[…] Thus, one could speculate that different findings might be obtained when inspecting the RP in arbitrary compared to deliberate decisions. […]" is still very unspecific

We thank the reviewer for bringing this to our attention. We have accordingly modified that sentence and that paragraph. We also more generally added further explanations about our motivation and hypothesis.

Introduction section: […] Demonstrating no, or considerably diminished, RP in deliberate decisions would challenge the interpretation of the RP as a general index of internal decision-making.[…] Ok, but the fact that it is not a "general index" does not, de facto, solve Libet's problem: even if reduced, if RP start before conscious decision, the argument is still valid.

We thank the reviewer for this comment. We now discuss this in some detail in the Discussion. Briefly, we think (and the reviewer agrees) that the most plausible interpretation of our results is that the RP is absent in deliberate decisions. However, even if one takes the RP to appears as only diminished in deliberate but not arbitrary decisions, that goes against its interpretation as reflecting simple motor preparation. This is because the motor output is the same in both decision types. What is more, our model predicts a slow trend in deliberate decisions that might resemble a heavily diminished RP.

[…] More critically, it would question the generalizability of studies focused on arbitrary decisions to everyday, ecological, deliberate decisions […] This is indeed critical for the functional interpretation of the RP, but this sounds partly orthogonal to the free-will debate.

We disagree with the reviewer on this point. Much of the free will debate focuses on deliberate decisions, especially when the debate pertains to moral responsibility. We now explain this further in the manuscript. In fact, some philosophers define free will as the capacity that allows one to be morally responsible (e.g., Mele, 2006, 2009). And ascribing moral responsibility makes sense only for deliberate decisions.

Subsection “EEG Results: Readiness Potential (RP)” paragraph two: why are the student tests corrected for multiple comparisons, since only 4 are performed? Does it mean that the authors performed t-tests for all time-points? In this case, a multiple comparison is, indeed, necessary. But only one t-test is reported! Please clarify.

We did not carry out the t tests on all time points. However, to our understanding, one should correct for multiple comparisons to make sure that the probability of a type-1 error will not be inflated. When performing one t-test, we have 5% chances to obtain a false positive error. When performing four t-tests, this jumps to ~20% (e.g., Miller, RG. Simultaneous Statistical Inference 2nd Ed. Springer Verlag New York, 1981). Thus, we corrected for the performed t-tests which were conducted on the averaged activity. We now clarify this point in the manuscript. Naturally, by correcting for multiple comparisons, we made it harder on ourselves to achieve a statistically significant result. So, our corrected results certainly hold if we remove the correction. As for the non-significant results in the deliberate conditions, these results too remain insignificant without the correction. So, correcting did not affect the results we found.

In the same section: BF = .332 is not a serious evidence for no effect. The authors should not take 3 (or.33 depending on how we compute it) as a threshold above which an effect would become "significant".

As the reviewer notes by putting significant in quotation marks, Bayesian statistics is not about significance. It is rather about the likelihood of one model (H1) over another (H0). It is indeed a mistake to consider 0.33 as a ‘significance threshold’, akin to the 0.05 convention in NHST. We consequently did not use the term significant when referring to this BF. Instead, we wrote that there was evidence for the RP being absent in deliberate decisions (that is, evidence for H0). This is in agreement with the accepted convention in Bayesian statistics. According to this convention BF < 0.1 implies strong evidence for the lack of an effect (i.e., the data are at least 10 times more likely to be observed given H0 than given H1). A 0.1 < BF < 0.33 provides moderate evidence for the lack of an effect. 0.33 < BF < 3 suggests insensitivity of the data (anecdotal evidence for the lack or presence of an effect, for 0.33 < BF < 1 or 1 < BF < 3, respectively). 3 < BF < 10 denotes moderate evidence for the presence of an effect (i.e., H1). 10 < BF < 100 implies strong evidence. And BF > 100 suggests extreme evidence for the presence of an effect (Lee and Wagenmakers, 2013. Bayesian Data Analysis for Cognitive Science: A Practical Course. New York: Cambridge University Press.). Nevertheless, following this comment, and to make our description more accurate, we added the word ‘moderate’ to the sentence.

Subsection “EEG Results: Readiness Potential (RP)” paragraph four: while a BF of.09 is indeed an argument for no effect, a BF of.31 is not really.

We refer the reviewer to Reply 7 above. A BF of 0.31 means that H0 is more than 3 times more likely than H1, which is considered moderate yet conclusive evidence in favor of H0 (as opposed to inconclusive evidence).

Subsection “Differences in reaction times (RT) between conditions, including stimulus-locked potentials and baselines, do not drive the effect”: the authors discuss at length the potential impact of (or absence of) the chosen baseline. Besides all the rather indirect arguments based on different baselines (none is immune of criticisms), one analysis suffices to invalidate this argument: the slopes of the linear regressions are, by construction, independent of baseline (only the intercept is). So the fact the slopes more negative for arbitrary than for deliberate condition is a strong and not disputable fact, much stronger than all the baseline changes.

We sincerely thank the reviewer for highlighting this point and agree with his view. We now added this point to the manuscript.

Figure 6: I'm personally convinced that the RP observed on the arbitrary condition is not (only) a contamination by Stimulus-locked activities. However, the arguments based on Figure 6 are pretty weak. Indeed, in the time windows from about 600 ms to 1200 ms, a negative ramp is observed for all 4 conditions. The response is given in this time windows for the arbitrary condition, but for the deliberate one. So, this stimulus-locked negative ramp likely contributes to the RP.

We are glad to learn that the reviewer has been convinced by our results and analyses that the RP observed on the arbitrary condition is not (only) a contamination by stimulus-locked activities. Nevertheless, to the reviewer’s point, the observed negative ramp at the descriptive level seems to start at 750 ms before movement onset for the hard, arbitrary condition, and at 850 ms for easy arbitrary conditions. And response onset was at about 1s, leaving a time difference of 250-150 ms between the onset of this proclaimed ramp. The RP, on the other hand, started ~1.2 s before movement onset, so it is less likely that the two represent the same component. What is more, the amplitude of the RP was 2 µV, while that of this ramp was 0.8-1.2 µV, at the most. We therefore think that this pattern—which is different in both amplitude and time—cannot explain the RP.

Minor Comments:

Subsection “Experimental Procedure”, fourth paragraph: "right and left index finger" -> "left and right index finger" to be more consistent with the rest of the text.

Thank you. This has been corrected.

Subsection “Experimental Procedure”, final paragraph: "[…] We wanted to make sure subjects were carefully reading and remembering the causes also during the arbitrary trials to better equate memory load, attention, and other cognitive aspects between deliberate and arbitrary decisions […]" Although adding a task is a good idea, it may sound a bit naive to say that the task were equated. For example, in the arbitrary decision, there is a short term memory component that is not present in the deliberate.

Thank you. We now clarify further that we do not think that the tasks can be completely equated.

Subsection “ERP analysis”: "offline to the average of all channels,": including mastoids and nose? I guess not… At least, this should not be the case! Please clarify.

We now clarify that the averaging does not include external electrodes. We thank the reviewer for raising this point.

Subsection “ERP analysis”: " which subjects pressed the wrong button " What is a wrong button? Inconsistent response? and in arbitrary condition?

This does not refer to inconsistent responses. Rather, it refers to pressing a button that is not one of the designated response buttons (that is, not the <Q> or <P> buttons). We explain this better in the current version of the text. Thank you.

Subsection “ERP analysis”: "Channels that consistently had artifacts were replaced using interpolation (4.2 channels per subject, on average ": Although this is the range acceptable by some standards, I personally find this value high. Furthermore, could we have the range of channels interpolated?

We thank the reviewer for catching this mistake. That number did not take into account subjects for whom no interpolation was made (i.e., with 0 channels interpolated). We corrected that average—to 1.95—and added the range (0-6) as requested.

Subsection “Statistical Analysis”: The authors took 1 point over 10 to re-sample the signal. However, re-sampling requires appropriate anti-aliasing filtering to avoid signal distortion. Data were acquired at 512 Hz; Biosemi anti-aliasing filter, if I'm not mistaken, should be around 100Hz. Since no other low-pass filtering was applied, the data contains signal up to 100Hz. Hence, (re)sampling at 50Hz a signal whose max frequency is around 100Hz is extremely problematic. At minimum a 25Hz low pass filter should have been applied… It is very hard to anticipate what would be the impact of such aliasing (especially since the activity of interest is low frequency), but this should be corrected to avoid having incorrect practices published.

We thank the reviewer for making this good point and agree with it. We reran this analysis with a 25 Hz low-pass filter and the results remained the same. This is now reported in the manuscript.

Subsection “Model and Simulations”: "used Δt = 0.001, similar to our EEG sampling rate": if sampling rate is 512Hz, dt should be 0.002.

We removed the “similar to our EEG sampling rate” from the sentence and thank the reviewer for catching this mistake. To be clear, rerunning the modeling with Δt = 0.002 does not change any of the essential modeling results.

[Editors' note: another round of revisions were suggested prior to acceptance.]

The manuscript has been substantially improved. As part of this peer review trial, as reviewing editor I am required to indicate whether all of the reviewer comments have been fully addressed or if minor or major issues remain. There are some minor comments arising from this latest round of reviews that I thought I would give you the opportunity to address prior to finalising the 'Accept' decision so that I can then potentially indicate that all reviewer concerns were addressed. I have outlined these below. If you prefer to expedite the publication and not address these comments you can let me know.

1) Table 1. As expected, the difference of drift rates between congruent and incongruent options was larger in the deliberate than the arbitrary conditions. Could the authors comment on the large difference in the noise scaling factor c, which was 10-fold between the two types of decisions? The second result I found difficult to conceptualize was the decay rate k, which doubled in the easy-deliberate than in the hard-deliberate condition. Given that task difficulty was randomized across trials, doesn't this imply that the model (and the participants) adjusted the decay rate according to task difficulty prior to trial onset?

We thank the reviewers for this comment. Note that the noise scaling-factors in deliberate decisions were 35-45% of those in arbitrary (and not 10-fold). We think this makes sense, and—as per the reviewer’s suggestion—we now discuss this, the leak (or rate decay), and the reasons for these values in the manuscript. Briefly, following the reviewer’s comment, we reran the model simulations for the deliberate-hard condition and found that there are two regions in parameter space where the error reaches a local minimum. The first is the one we reported before, (Icongruent, Iincongruent, k, c) = (0.15, 0.07, 0.21, 0.09). The second, with a somewhat smaller error value is (Icongruent, Iincongruent, k, c) = (0.18, 0.09, 0.53, 0.11). We therefore now report the values corresponding to the smaller error in the table and elsewhere, but we mention the values associated with the other local minimum too. We agree with the reviewer that the current parameter values make more sense and thank them again for pointing this out to us. We further reran the simulations depicted in Figure 9 based on the new values of the parameters in deliberate hard decisions. Interestingly, the slight difference in trends between deliberate easy and hard in Figure 9B matches that in the empirical data (Figure 3A, B). We now note this briefly in the text.

2) Figure 9A. It is more meaningful to plot the empirical and simulated RT distributions, rather than their fitted γ functions.

As we note in our manuscript, our purpose was to extend the Schurger model to our experimental conditions (Schurger et al., 2012). We therefore followed the same procedure they did, including plotting the γ-function fits instead of the empirical distributions. And we think these should be shown in the paper. However, the reviewer’s point is well taken. So, we decided to plot both. Hence, we added a large inset to Figure 9, where we plot the original data, without fitting it to a γ function.

3) In several instances the authors use the term 'decision onset' when referring (I think) to the completion of the decision. This is potentially confusing because for many readers 'decision onset' may refer to the beginning of deliberation/evidence accumulation which means something entirely different. So I would suggest the authors check their terminology and use 'decision completion' or 'commitment' in such instances.

We see how this could be confusing and are happy to change “decision onset” to “decision completion”. This was now changed in all places in the manuscript.

Minor comments:

1) Subsection “Behavioral Results”. DDN.

Fixed. Thank you.

2) Figure 9A. Why was the y-axis labelled as voltage, for RT distributions?

This was a mistake. We thank the reviewer for noticing this and removed the erroneous yaxis label from Figure 9A.

3) Subsection “Model and Simulations” third paragraph and Table 1. I am confused about the scaling parameter 1.45. Does Table 1 show the drift rates only in Region X, and are the drift rates in SMA 1.45 times less than those values? The text and table indicated that the scaling applied only to the deliberate condition, if so, what were the drift rates in SMA in the arbitrary condition? Or do the drift rates in arbitrary decisions in Table 1 refer to the values in SMA?

We thank the reviewer for this comment and now clarify this point further in the manuscript. In particular, as we now hopefully better explain in the caption to Table 1, the values in the table are for Region X for deliberate decisions and for the SMA in arbitrary decisions. The values for the drift-rate parameter in the SMA during deliberate decisions are indeed 1.45 times smaller than those in the table, as the reviewer notes.

https://doi.org/10.7554/eLife.39787.018

Article and author information

Author details

  1. Uri Maoz

    1. Department of Psychology at Crean College of Health and Behavioral Sciences, Chapman University, Orange, United States
    2. Institute for Interdisciplinary Brain and Behavioral Sciences, Chapman University, Orange, United States
    3. Anderson School of Management, University of California, Los Angeles, Los Angeles, United States
    4. Department of Psychology, University of California, Los Angeles, Los Angeles, United States
    5. Division of Humanities and Social Sciences, California Institute of Technology, Pasadena, United States
    6. Division of Biology and Bioengineering, California Institute of Technology, Pasadena, United States
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Funding acquisition, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    For correspondence
    urimaoz@ucla.edu
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7899-1241
  2. Gideon Yaffe

    Yale Law School, Yale University, New Haven, United States
    Contribution
    Conceptualization, Funding acquisition, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
  3. Christof Koch

    Allen Institute for Brain Science, Seattle, United States
    Contribution
    Conceptualization, Supervision, Methodology, Writing—review and editing
    Competing interests
    No competing interests declared
  4. Liad Mudrik

    1. Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
    2. School of Psychological Sciences, Tel Aviv University, Tel Aviv, Israel
    Contribution
    Conceptualization, Data curation, Software, Formal analysis, Funding acquisition, Investigation, Visualization, Methodology, Writing—original draft, Writing—review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-3564-6445

Funding

John Templeton Foundation (BQFW initiative to FSU)

  • Uri Maoz
  • Gideon Yaffe
  • Christof Koch

Ralph Schlaeger Charitable Foundation

  • Uri Maoz
  • Christof Koch

Bial Foundation (388/14)

  • Uri Maoz
  • Liad Mudrik

German-Israeli Foundation for Scientific Research and Development (I-2426-421.13/2016)

  • Liad Mudrik

John Templeton Foundation (Consciousness and Free Will: A Joint Neuroscientific-Philosophical Investigation)

  • Uri Maoz
  • Gideon Yaffe
  • Liad Mudrik

Fetzer Institute (Consciousness and Free Will: A Joint Neuroscientific-Philosophical Investigation)

  • Uri Maoz
  • Gideon Yaffe
  • Liad Mudrik

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Acknowledgements

The experiments reported in this paper were carried out at the Caltech Brain Imaging Center. We thank Ralph Adolphs for his invaluable guidance and support in designing and running the experiment as well as for very useful discussions of the results. We thank Ram Rivlin for various conceptual discussions about deliberate versus arbitrary decision-making and about the initial experimental paradigm design. We thank Caitlin Duncan for her help in patiently and meticulously gathering the EEG data. We thank Daw-An Wu for discussions about EEG data collection and preprocessing and for his help with actual data collection. We thank Daniel Grossman for his help in carefully preprocessing the data and suggesting potential interpretations of it. We thank Aaron Schurger and Ueli Rutishauser for various discussions about the model and its simulations. We thank Shlomit Yuval-Greenberg and Leon Deouell for important discussions about EEG processing and analysis. Last, we thank the anonymous reviewers for their invaluable comments, which greatly improved this manuscript. 

Ethics

Human subjects: The experiment was approved by Caltech's Institutional Review Board (14-0432; Neural markers of deliberate and random decisions), and informed consent was obtained from all participants after the experimental procedures were explained to them.

Senior Editor

  1. Joshua I Gold, University of Pennsylvania, United States

Reviewing Editor

  1. Redmond G O'Connell, Trinity College Dublin, Ireland

Reviewers

  1. Redmond G O'Connell, Trinity College Dublin, Ireland
  2. Jiaxiang Zhang, Cardiff University, United Kingdom
  3. Boris Burle, Aix-Marseille University, France

Publication history

  1. Received: July 3, 2018
  2. Accepted: October 3, 2019
  3. Version of Record published: October 23, 2019 (version 1)

Copyright

© 2019, Maoz et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 502
    Page views
  • 58
    Downloads
  • 0
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Download citations (links to download the citations from this article in formats compatible with various reference manager tools)

Open citations (links to open the citations from this article in various online reference manager services)

  1. Further reading