Contrast comparison task and behavioural results.

(A) Stimulus and evidence time courses: The stimulus consisted of two interleaved gratings flickering on and off in alternation, whose relative contrast had to be judged. 0.6 sec after achieving fixation and triggering stimulus onset, the fixation point turned yellow to cue participants to prepare for the onset of contrast-difference evidence another 0.6 sec later. In all trials, the stimulus remained on for a full 1.6 sec following evidence onset, after which it disappeared and the fixation turned green to prompt the subject to make a left/right-hand button press to indicate that the left-or right-tilted grating had higher-contrast, within a 0.5-sec deadline. In 80% of trials, a weak (±10%) contrast increment/decrement was oppositely applied to the two gratings for 0.2, 0.4, 0.8, or 1.6 sec, returning to equal baseline contrast thereafter until stimulus offset. In the remaining 20% of trials, an easy contrast increment/decrement of ±40% was applied for the entire 1.6-sec period. (B) Accuracy significantly increased with the duration of the stimulus for low-contrast trials (blue dots) and was at the ceiling for the high-contrast condition (red dot). Individual participants’ accuracies are shown in grey lines. (C) The mean reaction time (RT; relative to evidence onset) slightly decreased in trials with longer durations and higher contrast levels. Correct responses (solid, with individuals also shown in grey) were also slightly faster than error responses (dashed) in this delayed response task. For b and c, data are mean ± s.e.m. after between-participant variance was factored out.

Alternative models and their fits to behavior only.

(A) Model schematics. In the unbounded integration model (‘UIntg;’ top left), the sign of the accumulated evidence at stimulus offset determines the choice, while in bounded integration (‘BIntg;’ top right), the decision is either terminated early by reaching the correct or error bound or otherwise determined in the same way as in the unbounded model. In the unbounded Extremum-tracking model (‘UExtE;’ middle left), the decision is based on the sign of the evidence sample that attains the maximum absolute value, which by making use of the full stimulus-assessment period can produce accuracy improvements with increasing evidence duration. In the bounded Extrema detection model (middle right), a choice is determined as soon as a single sample exceeds a bound or, if neither bound is reached, the choice is determined by: i) a random guess (‘BExtG’); ii) the sign of the last sample (‘BExtL’); or iii) the sign of the sample that attains the most extreme value (‘BExtE’). Finally, the Snapshot model, where the choice is based on a single sample chosen at random (uniform distribution over time), is shown in the bottom row. In the unbounded version (‘USpst’), the decision is made based on the sign of the sample. In the bounded version (‘BSpst’), the decision is based on whether the sample surpasses the bound; otherwise, the decision is guessed. (B) Observed (black dots) and model-predicted (bars) accuracy, showing that the five best-fitting models can capture the observed accuracy improvement with duration well. (C) AIC values across the 5 most competitive models show that models without bounds are favored due to parsimony, and selection among integration and non-integration mechanisms is largely inconclusive. Given that we computed G2 values through Monte-Carlo simulation, we conducted the fits with 10 different instantiations of noise, each shown here as a separate point. These clustered points depict the variability of the G2 values due to the stochastic nature of these simulations, underlining how the small margins among the best models in Table 1 are inconclusive (see Methods).

Goodness of fit metrics and parameter estimates of models fit to behaviour only.

Models beginning with U were unbounded, and beginning with B were bounded, with the bound parameter denoted by B. Drift rates for low and high contrast difference were denoted by D1 and D2, respectively; single-drift rate variants (denoted ‘_1d’) of the unbounded models provided a worst-case benchmark for each mechanism. G2 quantifies how closely the model quantitatively fits the data, while Akaike’s Information Criteria (AIC) adds a penalty for complexity. For both metrics, lower values indicate better fits. Colour shading highlights the competitive models depicted in Figure 2C.

Neural signals.

(A) SSVEP showed a robust representation of sensory evidence encoding. Note that the signal ramps rather than suddenly steps simply due to the width of the window for spectral estimation (230 ms), and lags the physical contrast changes simply due to transmission delays to the visual cortex. Topographies show the average SSVEP amplitude from 300 ms to 1500 ms (shaded period) for the high contrast conditions (top) and the longest low contrast condition (bottom), with a primary focus at standard site Oz. (B) Event-related potential (ERP) at centroparietal sites averaged over trials (regardless of outcome) within each condition, showing that the centro-parietal positivity (CPP) undergoes a gradual buildup and sustained elevation for lower contrast conditions, and a peak at around 530 ms for the high contrast trials, at more than twice the amplitude. Topographies show ERP amplitude for the high contrast (top) and the average over low contrast conditions (bottom), for the shaded time ranges 210-530 ms (left) and 1280-1600 ms (right). (C) Motor preparation was measured by subtracting ipsilateral from contralateral mu/beta-band activity (MB; 8-30 Hz) with respect to the correct side. The topographies show the difference between trials favouring a left minus right response, again shown for both high contrast (top) and averaged low contrast (bottom) conditions in the shaded windows 570-890 ms (left) and 1280-1600 ms (right).

Neurally constrained models and their predictions.

Among the models with various additional parameters, starting-point variability and collapsing bounds stood out as helping to achieve good CPP waveform fits without substantially sacrificing behavioural fits (Figure 4 - Figure Supplement 1), and given that both of these features were exhibited empirically in motor preparation signal effects (Figure 5), we show these models here (but see Figure 4 - Figure Supplements 4-6 for basic versions of the models without the extra parameters). (A) Single-trial sensory evidence traces were simulated for each of the five conditions using boxcar functions (stepping up during positive evidence) with added Gaussian noise as in behaviour-only models (one example for each condition shown). (B-D) single-trial examples of the decision variable in each of the three competing models, again simulating one trial per condition. (B) In the Integration model, the evidence was integrated over time until it reached a bound, after which it fell back to zero. Where the decision variable did not hit the bound the decision was made based on the sign of accumulated evidence at 1.6 sec. (C) In the Extremum-tracking model, the DV keeps track of the most extreme value so far, until it reaches the bound, after which it falls to zero. The sign of the extremum determined the choice, and the most extreme sub-bound value was used in the event that it did not cross the bound. (D) In the Extremum-flagging model, an all-or-nothing Half-Sine-profile signal marking choice termination onsetting at the bound crossing time (see main text). As in the other models, one example DV trace per condition is shown; for three of these (red and two dark blue traces) the bound was reached, while in the others it was not, so no signal was generated (flat light blue lines), and the choice was made as a random guess. In all 3 models in this figure, starting point variability and collapsing bound parameters are included, and they have an equal number of free parameters. (E) Akaike’s Information Criterion (AIC) derived from only the behavioural component of the neurally-constrained fit, plotted as a function of the weighting of CPP-waveform constraint. With increasing emphasis on neural data in the fit, the fit to behaviour was compromised to varying extents, and the Extremum-flagging model was best at retaining good behavioural fits at higher neural weights. (F) Overall neurally-constrained objective function (behavioural G2 value plus weighted neural penalty quantifying divergence in observed vs simulated CPP) as a function of neural constraint weighting. The inset compares the objective function values with starting point variability and collapsing bound parameters (labelled ‘Extra’ in G) to the ‘basic’ models without either feature, at w = 10. (G) Neural fit (R2) plotted against behavioural fit (G2), illustrating a trade-off between how well the models can capture behaviour and the CPP up to w = 100. The closer the joint values come to the origin the better they are at simultaneously fitting both. (H-J) Predicted behavioural accuracy for a range of neural constraint weightings (‘w’). (K-M) Simulated average CPP (where the DV is rectified on each single trial in keeping with CPP’s always-positive nature; see Methods) for the best fits of each model at a neural weighting of w=10 (see Figure 4 - Figure Supplement 2 for simulations from all weightings, and Figure 4 - Figure Supplement 3 for predicted bound crossing frequencies). The simulated average CPP (solid lines) is shown for all five conditions. The empirical data is also shown (dashed lines) with the waveforms for the four low-contrast conditions averaged together, as they were for the neurally-constrained fitting. (N-P) Simulated differential motor preparation. Same as (K-M) except showing the trial-averaged differential DV (without taking the absolute value) and having the signal sustain at the bound level after it is crossed.

Fit metrics and parameter estimates for neurally-constrained models with a neural-constraint weighting of w = 10.

B refers to the bound; D1 and D2 refer to the drift rates in the low-and high-contrast conditions, respectively. FlagWidth refers to the width of the half-sine, Sz refers to starting-point variability, and K refers to the temporal slope of the collapsing bound (see Tables S3 for fits using a nonlinear collapse function, with similar findings). Models including starting point variability and collapsing bounds are distinguished from basic models without those features, by the model name ending ‘_szC.’

Observed and simulated motor preparation and bound dynamics.

(A) Mu/beta (MB) decreases reflecting building motor preparation, contralateral and ipsilateral to the correct response, in correct (top) and error trials (bottom), shown aligned to evidence onset (left) and response (right). Contralateral traces, on correct trials, and ipsilateral traces, on error trials, reach a fixed threshold prior to response (marked by horizontal gray dashed line). Interestingly, on easy, high-contrast trials, this threshold (marked by the grey dashed horizontal line to aid comparison) is reached at approximately 800 ms and sustains at just below that level until the response. Note that on high-contrast trials, contralateral MB appears to reach a less decreased level prior to response compared to low-contrast trials. The topography inset shows the difference in MB between high-contrast and low-contrast conditions, revealing that this difference is unlikely motor-related, and more likely reflects greater attentional engagement in the harder low-contrast trials, known to be reflected in posterior alpha activity (Foxe & Snyder, 2011; Kelly & O’Connell, 2013). (B) collapsing bound estimated from the empirical data and the model fits. Linear (solid) and non-linear (‘NL,’ dashed) collapsing bounds were calculated for Integration and Extremum-flagging models. All traces were normalized to the initial bound height. (C) Observed Mu/beta lateralization for correct and error trials, demonstrating baseline choice-predictive effects that arise from starting-point variability. This can be compared to simulated average relative motor preparation at a neural constraint weighting of w = 10 for the Integration (D) and Extremum-flagging (E) models. Note again that models do not include non-decision delays, and so evidence-dependent dynamics begin at time zero in the simulated traces, whereas they are clearly delayed as usual in the empirically observed data. All simulations are from models containing both starting point variability and collapsing bounds.