Response inhibition, generally defined as the ability to suppress a planned or already-initiated response (Logan, 1985), is an essential part of everyday motor control, and underpinned by a series of cortical and subcortical pathways. Defining the neural mechanisms underlying response inhibition in the neurotypical population has important consequences in the clinical neurosciences, where impairment in these pathways has been associated with a number of neurological and psychiatric diseases including Parkinson’s Disease, addiction and schizophrenia (Chowdhury et al., 2018; Claassen et al., 2015; Congdon et al., 2014; Noël et al., 2016; Rømer Thomsen et al., 2018; Seeley et al., 2009).

Response inhibition has been behaviourally examined using the stop signal task (SST) for more than four decades. In the SST, participants make a motor response as quickly as possible in response to a go signal. In a minority of trials (usually around 25% of all trials), a stop signal appears shortly after the onset of the go signal, indicating that the participant should not respond to the go signal in that trial. The stop signal’s onset is normally adjusted after each stop signal trial based on stopping success, such that each participant will be able to stop successfully on approximately 50% of trials (Verbruggen et al., 2019). Behavioural dynamics during the SST are interpreted under the framework of the horse race model (Logan & Cowan, 1984). This model proposes that on each stop trial, the presentation of the go stimulus triggers the go process, which races towards a threshold that results in a response. Upon the presentation of the stop signal, a stop process is similarly triggered, which races towards an independent threshold. Depending on whether the go or stop process finishes first, the response is respectively performed or inhibited. Performance on go trials and failed stop (FS) trials (where the participant makes an inappropriate response) is quantified by reaction time (RT). Performance on successful stop (SS) trials (where the stop signal appears and the participant gives no response) is quantified by the stop signal reaction time (SSRT), which estimates the speed of the latent stopping process (Verbruggen et al., 2019).

Contemporary models of response inhibition propose that inhibition is realised via three cortico-basal-ganglia pathways; the direct, indirect, and hyperdirect pathways. While all three are involved in response inhibition and movement, the hyperdirect pathway has been theorized to be the pathway through which action is ultimately cancelled (Aron and Poldrack, 2006). The signalling cascade originates from the prefrontal cortex and is thought to implement stopping upon detection of a stop signal by inhibition of the thalamus (Tha) via the subthalamic nucleus (STN), substantia nigra (SN) and globus pallidus interna (GPi; Coudé et al., 2018; Diesburg & Wessel, 2021). This pathway was originally identified in rodents and non-human primates, but its anatomical plausibility in humans was demonstrated by Chen and colleagues, who measured firing in the frontal cortex 1-2 ms after stimulation of the STN (Chen et al., 2020). The connectivity of these cortico-basal-ganglia tracts have also been demonstrated to be correlated with stopping behaviour (Forstmann et al., 2012; Singh et al., 2021; M. Xu et al., 2016; F. Zhang & Iwaki, 2020). Clinical studies have also demonstrated the importance of subcortical regions, particularly the STN, in relation to stopping. Evidence from Parkinson’s disease patients undergoing deep brain stimulation has associated the STN with (successful) stopping behaviour (Alegre et al., 2013; Ray et al., 2009, 2012) and demonstrated that bilateral stimulation of this region can improve performance in the SST (Mancini et al., 2019).

Functional imaging research has been used extensively to elucidate which regions are associated with action cancellation. These images are frequently acquired at 3 Tesla (T) and the BOLD response during go (GO), FS, and SS trials is assessed. Contrasts of interest are often FS > GO, SS > GO, and FS > SS. Cortically, three regions have been consistently implicated: the right inferior frontal gyrus (rIFG), pre-supplementary motor area (preSMA) and anterior insula (Aron et al., 2014; de Hollander et al., 2017; Isherwood et al., 2023; Miletić et al., 2020; Swick et al., 2011). In the subcortex, functional evidence is inconsistent. Contrasts of FS > GO or SS > GO have intermittently implicated the STN in the SST (Aron & Poldrack, 2006; Coxon et al., 2016; Yoon et al., 2019), but few studies have detected differences in the BOLD response between FS and SS trials. Notably, many studies have not detected differences in subcortical BOLD activity for these contrasts (see e.g., Bloemendaal et al., 2016; Boehler et al., 2010; Chang et al., 2020; Gaillard et al., 2020; B. Xu et al., 2015). At higher field strengths (7T), a different pattern emerges. FS > GO and FS > SS contrasts have consistently shown a widespread subcortical BOLD response, with clusters of activation evident in the STN, SN, Tha, GPi, and globus pallidus externa (GPe; de Hollander et al., 2017; Isherwood et al., 2023; Miletić et al., 2020).

These disparities may arise due to the difficulty of imaging the human subcortex in vivo with MRI. Due to its distance from the MR head coils, proximity of subregions and varying biophysical properties, the subcortex can have limited inter-regional contrast and a low signal-to-noise ratio (Bazin et al., 2020; de Hollander et al., 2017; Isaacs et al., 2018; Isherwood, Bazin, et al., 2021; Keuken et al., 2018; Miletić et al., 2022). The subcortical regions are also relatively small structures. For example, the STN has a volume of approximately 82 mm3; 3mm isotropic resolutions therefore provide only 3 – 4 voxels for analysis of a relatively complex structure (Alkemade et al., 2020). Attaining sufficient signal in the deep brain for accurate statistical analysis is therefore particularly difficult at lower field strengths (Murphy et al., 2007).

Here, we reprocess and reanalyse five functional SST datasets to shed light on the discrepancies in subcortical BOLD responses. Canonical methods of meta-analysis have the tendency to lose information when compiling multiple sources of data, due to reliance on summary statistics and a lack of raw data accessibility. Taking advantage of the recent surge in open access data, we aimed to improve upon these methods by using the raw data now available instead of relying on simple summary measures (e.g., MNI coordinates). Though computationally expensive, the gain in power from reanalysing multiple functional datasets without this loss of information is of huge benefit. In addition, using raw data as a starting point for datasets acquired separately allows one to minimize differences in preprocessing and analyses pipelines. We chose datasets that used similar go stimuli (left or right pointing arrows) to maintain as much consistency across the datasets as possible. Stop signals during the SST are generally either of the auditory or visual type; we opted to use both types in this study with the assumption that they rely on the same underlying inhibition network (Ramautar et al., 2006).



This study combined data from five datasets, two acquired at 3T and three at 7T: Aron_3T (Aron & Poldrack, 2006), Poldrack_3T (Poldrack et al., 2016), deHollander_7T (de Hollander et al., 2017), Isherwood_7T (Isherwood et al., 2023), and Miletic_7T (Miletić et al., 2020). The number of participants and their relevant demographics for each dataset are as follows: Aron_3T – 14 participants (4 female; mean age 28.1 ± 4.1), Poldrack_3T – 130 participants (62 female; mean age 31 ± 8.7; age range 21 – 50), deHollander_7T – 20 participants (10 female; mean age 26 ± 2.6; age range 22 – 32), Isherwood_7T – 37 participants (20 female; mean age 26.3 ± 5.6; age range 19 – 39), and Miletic_7T – 17 participants (9 female; mean age 23.7 ± 3.2).

Scanning protocols

This section describes the MR acquisition procedure for each dataset. The main acquisition parameters of the functionals scans can be found in Table 1, with a detailed account of each dataset’s structural and functional scans in the following paragraphs.

The principal MR acquisition parameters of the functional scans for each dataset.

For the Aron_3T dataset, each participant was scanned on a Siemens Allegra 3T scanner. The session consisted of three functional runs of the SST and an anatomical T1w image. The functional data was collected using a single echo 2D-echo planar imaging (EPI) BOLD sequence (TR = 2000 ms; TE = 30 ms; voxel size = 3.125 x 3.125 x 4 mm; flip angle = 90°; Field of View (FOV) = 200 x 200 x 132 mm; matrix size = 64 x 64; slices = 33; phase encoding direction = A >> P). A 1 mm isotropic T1w image was acquired during each session using the MPRAGE sequence (TR = 2300 ms; TE = 2.1 ms; matrix size = 192 x 192).

For the Poldrack_3T dataset, each participant was scanned on a Siemens Trio 3T scanner. The session consisted of one functional run of the SST and an anatomical T1w image. The functional data was collected using a single echo 2D-EPI BOLD sequence (TR = 2000 ms; TE = 30 ms; voxel size = 3 x 3 x 4 mm; flip angle = 90°; FOV = 192 x 192 x 136 mm; matrix size = 64 x 64; slices = 34; phase encoding direction = A >> P). A 1 mm isotropic T1w image was acquired during each session using the MPRAGE sequence (TR = 1900 ms; TE = 2.26 ms; matrix size = 256 x 256).

For the deHollander_7T dataset, each participant was scanned on a Siemens MAGNETOM 7 Tesla (7T) scanner with a 32-channel head coil. The session consisted of three functional runs of the SST, B0 field map acquisition (TR = 1500 ms, TE1 = 6 ms, TE2 = 7.02 ms), and an anatomical T1w image. The functional data was collected using a single echo 2D-EPI BOLD sequence (TR = 2000 ms; TE = 14 ms; GRAPPA = 3; voxel size = 1.5 mm isotropic; partial Fourier = 6/8; flip angle = 60°; FOV = 192 x 192 x 97 mm; matrix size = 128 x 128; BW = 1446Hz/Px; slices = 60; phase encoding direction = A >> P; echo spacing = 0.8ms). Each run had an acquisition time of 13:27 min, totalling 40:21 min of functional scanning. A 0.7 mm isotropic T1w image was acquired during each session using the MP2RAGE sequence (TR = 5000 ms; TE = 2.45 ms; inversions TI1 = 900 ms, TI2 = 2750 ms; flip angle 1 = 5°; flip angle 2 = 3°; Marques et al., 2010).

For the Isherwood_7T dataset, each participant was scanned on a Siemens MAGNETOM TERRA 7T scanner with a 32-channel head coil. The session consisted of two functional runs of the SST, top-up acquisition, and an anatomical T1w image. The functional data was collected using a single echo 2D-EPI BOLD sequence (TR = 1380 ms; TE = 14 ms; MB = 2; GRAPPA = 3; voxel size = 1.5 mm isotropic; partial Fourier = 6/8; flip angle = 60°; FOV = 192 x 192 x 128 mm; matrix size = 128 x 128; BW = 1446 Hz/Px; slices = 82; phase encoding direction = A >> P; echo spacing = 0.8 ms). Each run had an acquisition time of 13:27 min, totalling 26:54 min of functional scanning. Subsequently to each run, five volumes of the same protocol with opposite phase encoding direction (P >> A) were collected (top-up) for distortion correction. A 1 mm isotropic T1w image was acquired during each session using the MP2RAGE sequence (TR = 4300 ms; TE = 1.84 ms; inversions TI1 = 840 ms, TI2 = 2370 ms; flip angle 1 = 5°; flip angle 2 = 6°; Marques et al., 2010).

For the Miletic_7T dataset, each participant was scanned on a Siemens MAGNETOM 7T scanner with a 32-channel head coil. The session consisted of three functional runs of the SST, B0 field map acquisition (TR = 1500 ms, TE1= 6 ms, TE2 = 7.02 ms), and an anatomical T1w image. The functional data was collected using a single echo 2D-EPI BOLD sequence (TR = 3000 ms; TE = 14 ms; GRAPPA = 3; voxel size = 1.6 mm isotropic; partial Fourier = 6/8; flip angle = 70°; FOV = 192 x 192 x 112 mm; matrix size = 120 x 120; BW = 1436 Hz/Px; slices = 70; phase encoding direction = A >> P; echo spacing = 0.8 ms). A 0.7 mm isotropic T1w image was acquired during each session using the MP2RAGE sequence (TR = 5000 ms; TE = 2.45 ms; inversions TI1 = 900 ms, TI2 = 2750 ms; flip angle 1 = 5°; flip angle 2 = 3°; Marques et al., 2010).

Procedure and exclusions

Participants that were not accompanied by a T1w anatomical image were automatically excluded from the study as the image is required for registration during preprocessing. In addition, the behavioural data of each participant from each database were quality controlled on the basis of a specific set of exclusion criteria. These criteria include (1) more than 10% go-omissions across all functional runs; (2) a stopping accuracy of less than 35% or more than 65%; (3) a go-accuracy of less than 95%; (4) mean signal respond RTs that were longer on average than go RTs (inconsistent with the standard race model). Based on these criteria, no subjects were excluded from the Aron_3T dataset, 24 from the Poldrack_3T dataset, three from the deHollander_7T dataset, five from the Isherwood_7T dataset, and two from the Miletic_7T dataset. A further nine participants were excluded from the Poldrack_3T dataset due to a lack of T1w image or a lack of SST data. As the specific genders and ages of each participant in each dataset are not all available due to General Data Protection Regulations (GDPR), we were unable to recalculate participant demographics after exclusions. The final number of participants in each dataset after screening is as follows: Aron_3T, 14 participants; Poldrack_3T, 97 participants; deHollander_7T, 17 participants; Isherwood_7T, 31 participants; Miletic_7T, 15 participants. Therefore, the analyses in this paper are based on stop signal data from 5 datasets, 174 participants and 293 runs.

Stop signal task (SST)

All datasets used a simple, two alternative choice stop signal paradigm. This paradigm consists of two trial types, go trials, and stop trials. On each trial, an arrow is presented on the screen in either the left or right direction (the go stimulus). The participant presses the button corresponding to the direction of the arrow. On a subset of trials (25%), a stop signal appears shortly after go signal onset, indicating the participant should try to inhibit their movement and not respond in that trial. In the auditory SST, this stop signal is presented as a ‘beep’ sound. In the visual SST, this stop signal is presented as a change in visual stimulus; for example, in the Isherwood_7T dataset, the circle surrounding the arrow would change from white to red. The time between the presentation of the go stimulus and the stop signal is defined by the stop signal delay (SSD). The SSD is adapted iteratively during the task. Generally, if the participant responds during a stop trial, the SSD is reduced by 50 ms on the next stop trial, meaning the stop signal will appear earlier in the next trial and it will be easier for the participant to inhibit their response. Conversely, if the participant stops successfully, the SSD will increase by 50 ms and the stop signal will appear later in the next trial This method of SSD adaptation is known as a staircase procedure and ensures that each participant is able to inhibit their actions approximately 50% of the time. Task performance in this paradigm is characterized by the race model (Logan and Cowan, 1984). The model assumes a go process and a stop process race independently and whichever finishes first defines whether a participant responds or inhibits their actions. The go process is characterized by the observable go RT, whereas the stop process is characterized by the latent SSRT, which is estimated based on the effects of the SSD throughout the task.

Although the SST employed in each dataset is similar, there are some differences which are detailed in Table 2. We note here the most important differences in design aspects of the SSTs, these include (1) Response modality, describing the manual response and whether left (L), right (R) or both (L/R) hands were used; (2) Type, describing whether the stop signal was auditory or visual; (3) Stop signal duration, how long the auditory or visual stop signal was presented for; (4) Number of staircases, describing the number of staircases used to track the SSD of each participant during the task; (5) SSD range, describing the minimum and maximum values that the SSD could be during the task; (6) total trial number, the number of trials each participant performed over all runs; (7) Stop trials, the percentage of overall trials that were stop trials (as opposed to go trials).

Task details for the SST in each dataset.

Behavioural analyses

For all runs within each dataset, median RTs on go and stop trials, the mean SSD and proportion of successful stops were calculated. For each participant, the SSRT was calculated using the mean method, estimated by subtracting the mean SSD from median go RT (Aron & Poldrack, 2006; Logan & Cowan, 1984). Both frequentist and Bayesian analyses methods were used to calculate the correlation between mean SSRTs and median go RTs, as well as to test the statistical difference between median failed stop RTs and median go RTs.

fMRIprep preprocessing pipeline

fMRIPrep was used to preprocess all acquired anatomical and functional data (Esteban et al., 2018, 2020). The following two sections describe, in detail, the preprocessing steps that fMRIPrep performed on each dataset.

Anatomical data preprocessing

A total of 1 T1-weighted (T1w) images was found within the input for each subject of each BIDS dataset. The T1-weighted (T1w) image was corrected for intensity non-uniformity (INU) with N4BiasFieldCorrection (Tustison et al., 2010), distributed with ANTs 2.3.3 (Avants et al., 2008), RRID:SCR_004757), and used as T1w-reference throughout the workflow. The T1w-reference was then skull-stripped with a Nipype implementation of the workflow (from ANTs), using OASIS30ANTs as target template. Brain tissue segmentation of cerebrospinal fluid (CSF), white matter (WM) and grey matter (GM) was performed on the brain-extracted T1w using fast (FSL 5.0.9, RRID:SCR_002823, Zhang et al., 2001). Brain surfaces were reconstructed using recon-all (FreeSurfer 6.0.1, RRID:SCR_001847, Dale et al., 1999), and the brain mask previously estimated was refined with a custom variation of the method to reconcile ANTs-derived and FreeSurfer-derived segmentations of the cortical grey matter of Mindboggle (RRID:SCR_002438, Klein et al., 2017). Volume-based spatial normalization to one standard space (MNI152NLin2009cAsym) was performed through nonlinear registration with antsRegistration (ANTs 2.3.3) using brain-extracted versions of both T1w reference and the T1w template. The following template was selected for spatial normalization: ICBM 152 Nonlinear Asymmetrical template version 2009c (Fonov et al., 2009, RRID:SCR_008796; TemplateFlow ID: MNI152NLin2009cAsym).

Functional data preprocessing

For each of the BOLD runs per subject (across all datasets), the following preprocessing was performed. First, a reference volume and its skull-stripped version were generated using a custom methodology of fMRIPrep. For datasets where a distortion correction image was not acquired (Aron_3T and Poldrack_3T), a deformation field to correct for susceptibility distortions was estimated based on fMRIPrep’s fieldmap-less approach. The deformation field is that resulting from co-registering the BOLD reference to the same-subject T1w-reference with its intensity inverted (Wang et al., 2017). Registration is performed with antsRegistration (ANTs 2.3.3), and the process regularized by constraining deformation to be nonzero along the phase-encoding direction, and modulated with an average fieldmap template (Treiber et al., 2016). For the deHollander_7T and Miletic_7T datasets, a B0-nonuniformity map (or fieldmap) was estimated based on a phase-difference map calculated with a dual-echo gradient-recall echo sequence, processed with a custom workflow of SDCFlows inspired by the epidewarp.fsl script with further improvements in HCP Pipelines (Uǧurbil et al., 2013). The fieldmap was then co-registered to the target EPI reference run and converted to a displacements field map (amenable to registration tools such as ANTs) with FSL’s fugue and other SDCflows tools. For the Isherwood_7T dataset, a B0-nonuniformity map (or fieldmap) was estimated based on two EPI references with opposing phase-encoding directions, with 3dQwarp (Cox and Hyde, 1997; AFNI 20160207).

Based on the estimated susceptibility distortion, a corrected EPI reference was calculated for a more accurate co-registration with the anatomical reference. The BOLD reference was then co-registered to the T1w reference using bbregister (FreeSurfer) which implements boundary-based registration (Greve & Fischl, 2009). Co-registration was configured with six degrees of freedom. Head-motion parameters with respect to the BOLD reference (transformation matrices, and six corresponding rotation and translation parameters) were estimated before any spatiotemporal filtering using mcflirt (FSL 5.0.9, Jenkinson et al., 2002). BOLD runs were slice-time corrected using 3dTshift from AFNI 20160207 (Cox and Hyde, 1997; RRID:SCR_005927). The BOLD time-series (including slice-timing correction when applied) were resampled onto their original, native space by applying a single, composite transform to correct for head-motion and susceptibility distortions. These resampled BOLD time-series will be referred to as preprocessed BOLD in original space, or just preprocessed BOLD. Several confounding time-series were calculated based on the preprocessed BOLD: framewise displacement (FD), DVARS and three region-wise global signals. FD was computed using two formulations following Power (absolute sum of relative motions, Power et al., 2014) and Jenkinson (relative root mean square displacement between affines, Jenkinson et al., 2002). FD and DVARS are calculated for each functional run, both using their implementations in Nipype (following the definitions by Power et al., 2014). The three global signals are extracted within the CSF, the WM, and the whole-brain masks. Additionally, a set of physiological regressors were extracted to allow for component-based noise correction (CompCor, Behzadi et al., 2007). Principal components are estimated after high-pass filtering the preprocessed BOLD time-series (using a discrete cosine filter with 128 s cut-off) for the two CompCor variants: temporal (tCompCor) and anatomical (aCompCor). tCompCor components are then calculated from the top 2% variable voxels within the brain mask. For aCompCor, three probabilistic masks (CSF, WM and combined CSF+WM) are generated in anatomical space. The implementation differs from that of Behzadi et al. in that instead of eroding the masks by 2 pixels on BOLD space, the aCompCor masks are subtracted a mask of pixels that likely contain a volume fraction of GM. This mask is obtained by dilating a GM mask extracted from the FreeSurfer’s aseg segmentation, and it ensures components are not extracted from voxels containing a minimal fraction of GM. Finally, these masks are resampled into BOLD space and binarized by thresholding at .99 (as in the original implementation). Components are also calculated separately within the WM and CSF masks. For each CompCor decomposition, the k components with the largest singular values are retained, such that the retained components’ time series are sufficient to explain 50 percent of variance across the nuisance mask (CSF, WM, combined, or temporal). The remaining components are dropped from consideration. The head-motion estimates calculated in the correction step were also placed within the corresponding confounds file. The confound time series derived from head motion estimates and global signals were expanded with the inclusion of temporal derivatives and quadratic terms for each (Satterthwaite et al., 2013). Frames that exceeded a threshold of .5 mm FD or 1.5 standardised DVARS were annotated as motion outliers. All resamplings can be performed with a single interpolation step by composing all the pertinent transformations (i.e. head-motion transform matrices, susceptibility distortion correction when available, and co-registrations to anatomical and output spaces). Gridded (volumetric) resamplings were performed using antsApplyTransforms (ANTs), configured with Lanczos interpolation to minimize the smoothing effects of other kernels (Lanczos, 1964). Non-gridded (surface) resamplings were performed using mri_vol2surf (FreeSurfer).

Temporal signal to noise ratios (tSNRs)

Sequence sensitivity in BOLD fMRI can be approximated by the calculation of the temporal signal to noise ratio (tSNR). While it is not possible discriminate the exact source of noise causing temporal fluctuations in measured signal, they are thought to arise from either thermal or physiological interference. To get a feel for the image quality in different regions of the brain between datasets, we here compared region of interest (ROI)-wise tSNRs. Using probabilistic atlases, we took the mean of the ROI signal and divided by its standard deviation across time. Each voxels contribution to the mean signal of the region was weighted by its probability of belonging to the region While simple to calculate, tSNR comparison between data of differing acquisition methods is less trivial. Here, we only correct for the differences in voxel size between datasets. As spatial resolution is directly proportional to MR signal, we divided these tSNR values by the volume of a single voxel (Edelstein et al., 1986). tSNR was calculated using the exact same data used in the ROI-wise GLMs. That is, unsmoothed but preprocessed data from fMRIprep.

fMRI analysis – general linear models (GLMs)

GLM analyses were computed at both a whole-brain voxel-wise and region-specific level. A canonical double gamma hemodynamic response function (HRF) with temporal derivative was used as the basis set for both methods of analysis (Glover, 1999). The design matrix consisted of the three task-specific regressors for each of the three experimental conditions; failed stop (FS) trials, successful stop (SS) trials and go (GO) trials, six motion parameters (three translational and three rotational) as well as DVARS and framewise displacement estimated during preprocessing. The first 20 aCompCor components from fMRIPrep were used to account for physiological noise (Behzadi et al., 2007). Following data preprocessing through fMRIPrep, all data were high-pass filtered (cut-off 1/128 Hz) to remove slow drift. Three SST contrasts were computed for both the whole-brain and region of interest (ROI) GLMs: FS > GO, FS > SS and SS > GO.


Whole-brain analyses were computed using the FILM method from FSL FEAT (version; Jenkinson et al., 2012; Woolrich et al., 2001) as implemented in the Python package wrapper Nipype (version 1.7.0; Gorgolewski et al., 2011). Run-level GLMs accounting for autocorrelated residuals were computed, the results warped to MNI152NLin2009cAsym space, and subsequently combined per subject using fixed effects analyses. Data for the whole-brain GLMs were spatially smoothed using the SUSAN method with a full width half maximum (FWHM) equal to the voxel size of the functional image (Smith & Brady, 1997). Therefore, a 3.125 mm kernel was applied to the Aron_3T dataset, a 3 mm kernel to the Poldrack_3T dataset, a 1.5 mm kernel to the deHollander_7T and Isherwood_7T datasets, and a 1.6 mm kernel to the Miletic_7T dataset. These base-level kernels were applied to the data used for the main statistical analyses. Group-level models were subsequently estimated using FMRIB Local Analysis of Mixed Effects (FLAME) 1 and FLAME 2 from FSL (Woolrich et al., 2001), taking advantage of the fact that FLAME allows the estimation of different variances for each dataset. Dummy variables were used as regressors to allow the categorization of data into different datasets so that they could be estimated separately and then combined. Statistical parametric maps (SPMs) were generated to visualize the resulting group-level models. The maps were corrected for the false discovery rate (FDR) using critical value of q < .05 (Yekutieli & Benjamini, 1999).


ROI analyses were then performed. Timeseries were extracted from each subcortical region of interest using probabilistic masks provided by MASSP (Bazin et al., 2020), except in the case of the putamen and caudate nucleus, which were provided by the Harvard-Oxford subcortical atlas (Rizk-Jackson et al., 2011). Each voxels contribution to the mean signal of the region was therefore weighted by its probability of belonging to the region. Cortical regions parcellations were provided by the Harvard-Oxford cortical atlas (Rizk-Jackson et al., 2011). These timeseries were extracted from unsmoothed data so to ensure regional specificity. ROI analyses were computed using the FILM method of FSL FEAT. To do this, we inputted each run for each participant in MNI152NLin2009cAsym space, where the signal of each region was replaced with its mean extracted timeseries. Hence, the signal within each region was homogenous on each given volume. ROIs were therefore defined before implementing the ROI analyses. The regions include the inferior frontal gyrus (IFG), primary motor cortex (M1), pre-supplementary motor area (preSMA), caudate nucleus (caudate), GPe, GPi, putamen, SN, STN, Tha, and VTA. Due to the restricted FOV of the deHollander_7T dataset, this dataset was not used in the ROI-wise analysis of the M1 and preSMA regions. M1 and preSMA ROI-wise results are therefore based only on the Aron_3T, Poldrack_3T, Isherwood_7T, and Miletic_7T datasets. After the run-level GLMs were computed using FILM, the same fixed effects analyses and subsequent mixed-effects analyses used in the voxel-wise GLMs were performed.

Smoothing comparison

To further understand the impact of preprocessing on fMRI analyses, we computed voxel-wise GLM results based on a more lenient smoothing kernel. To observe the effect of smoothing on these analyses, we compared the results of our main statistical analyses, using base-level kernel sizes, to the same data when all datasets were smoothed using a 5 mm FWHM kernel. We chose to compared base-level smoothing kernels to 5 mm as this was the kernel sized used in the Aron and Poldrack (2006) study. To do this, the same voxel-wise GLM method was used as described above.


Behavioural analyses

Table 3 summarizes the descriptive statistics of the behavioural data from each dataset. The median failed stop RT is significantly faster within all datasets than the median go RT (Aron_3T: p = 4.44e-5, BF = 590.75; Poldrack_3T: p < 2.2e-16, BF = 3.06e+23; deHollander_7T: p = 1.21e-11, BF = 7.57e+8; Isherwood_7T: p = 2.75e-5, BF = 897.41; Miletic_7T: p = .0019, BF = 22.39), consistent with assumptions of the standard horse-race model (Logan & Cowan, 1984). Mean SSRT was calculated using the mean method and are all within normal range across the datasets. The mean stopping accuracy (near 50%) across all datasets indicates that the staircasing procedure operated accordingly and successfully kept SSDs tailored to the SSRT of participants during the task. In addition, median go RTs did not correlate with mean SSRTs within datasets (Aron_3T: r = .411, p = .10, BF = 1.41; Poldrack_3T: r = .011, p = .91, BF = .23; deHollander_7T: r = -.30, p = .09, BF = 1.30; Isherwood_7T: r = .13, p = .65, BF = .57; Miletic_7T: r = .37, p = .19, BF = 1.02), indicating independence between the stop and go processes, an important assumption of the horse-race model (Logan & Cowan, 1984).

Descriptive statistics of behaviour in the SST across each dataset. Standard errors are given.


To observe quantitative differences in signal quality between the datasets, we first calculated ROI-wise tSNR maps of the unsmoothed data. In Fig. 1 we show both the corrected and uncorrected tSNR values for five ROIs. As the tSNR values across each hemisphere were similar, we opted to take the mean across both. The corrected tSNR values display the clear benefit of 7T acquisition compared to 3T in terms of data quality. In the cortical ROIs, the 7T datasets appear to perform equally well, though when zooming in on subcortical ROIs, the deHollander_7T and Miletic_7T datasets display superiority. The uncorrected tSNR values paint a different picture. These tSNRs are even across all the datasets, with the exception of the Isherwood_7T dataset which appears to suffer, most likely due to its increased multiband factor (L. Chen et al., 2015). It should be noted that interpretation of the uncorrected tSNR values is difficult, due to the inherent proportionality of tSNR and voxel volume (Edelstein et al., 1986). That is, the 3T datasets acquire data with a voxel volume approximately ten times smaller than that of the 7T datasets and therefore have an advantage when not correcting for this difference.

Corrected and uncorrected tSNR values for five ROIs over all datasets. The values are derived from the mean tSNR values of both hemispheres. Error bars are standard errors. Corrected tSNRs are equal to the uncorrected tSNRs divided by the volume of a single voxel. IFG, inferior frontal gyrus; SN, substantia nigra; STN, subthalamic nucleus; Tha, thalamus; VTA, ventral tegmental area.

Voxel-wise GLMs

We calculated whole-brain voxel-wise GLMs using the canonical HRF with a temporal derivative to statistically observe the brain areas underlying behaviour in the SST. The three trial types result in three possible contrasts: FS > GO, FS > SS, and SS > GO. Due to the restricted FOV of the images acquired in the deHollander_7T dataset, group-level SPMs display a limited activation pattern at the most superior part of the cortex, as no data were acquired there for one dataset. We first show the group-level SPMs of the overall contrasts of the SST across all datasets (see Fig. 2), the SPMs for each contrast of each individual dataset can be found in the appendix (supplementary Figs. 1, 2 & 3). Significant BOLD responses for the FS > GO contrast were found in the bilateral IFG, preSMA, SN, STN and VTA. It can be clearly seen that this contrast elicits the largest subcortical response out of the three. The FS > SS contrast shows significant bilateral activation in the IFG, STN, Tha and VTA. The SS > GO contrast shows significant activation in the bilateral IFG and Tha.

Group-level SPMs of the three main contrasts of the SST. Activation colours indicate FDR thresholded (q < .05) z-values. Two sagittal, one axial, and one zoomed in coronal view are shown. Coloured contour lines indicate regions of interest (IFG in white, M1 in grey, preSMA in orange, Caudate in dark blue, Putamen in light blue, GPe in dark green, GPi in light green, SN in pink, STN in red, thalamus in yellow, and VTA in black). The background template and coordinates are in MNI2009c (1mm). FS, failed stop; SS, successful stop.

ROI-wise GLMs

To further statistically compare the functional results between datasets, we then fit a set of GLMs using the canonical HRF with a temporal derivative to the timeseries extracted from each ROI. Below we show the results of the group-level ROI analyses over all datasets (Fig. 3). Overall, these results are in line with the group-level voxel-wise GLMs. To account for multiple comparisons, threshold values were set using the FDR method, with a minimum z-score of 3.1 to ensure conservatism. For the FS > GO contrast, we found significant positive z-scores in the bilateral IFG, preSMA, SN, Tha, VTA, left caudate, and right STN as well as significant negative z-scores in the right M1. For the FS > SS contrast we found significant positive z-scores in the bilateral preSMA, caudate, GPe, Tha, VTA, left M1, putamen, SN, STN, and right GPi. None of our ROIs were significantly more recruited during SS trials than FS trials. For the SS > GO contrast we found a significant positive z-scores in the bilateral IFG and significant negative z-scores in bilateral M1, GPe, GPi, putamen, and left Tha. Left/right differences in M1 may be explained by the differences in response modality between datasets (the Aron_3T and Poldrack_3T datasets only used the right hand for responses). Upon further inspection of the group-level data, left/right differences in M1 appears to be driven by larger variance estimates of the left M1.

Group-level z-scores from the ROI-wise GLM analysis of included datasets. Thresholds are set using FDR correction (q < .05), with a minimum threshold of 3.1. Regions that do not reach significance are coloured white. Left and right hemispheres are shown separately, denoted by ‘-l’ or ‘-r’, respectively. IFG, inferior frontal gyrus; M1, primary motor cortex; preSMA, pre-supplementary motor area; GPe, globus pallidus externa; GPi, globus pallidus interna; SN, substantia nigra; STN, subthalamic nucleus; Tha, thalamus; VTA, ventral tegmental area.

Smoothing comparison

To visualize the effect of spatial smoothing on voxel-wise GLMs, we computed SST contrasts using base-level kernels and a kernel of 5 mm. The difference in group-level SPMs for SS > GO contrast is prominent (see Fig 4). Comparisons for the contrasts of FS > GO and FS > SS contrasts can be found in Supplementary Figure 4. If we were to make inferences based on the group-level SPMs calculated using the 5 mm kernel, this study could potentially incorrectly conclude that both the SN and VTA are significantly activated in SS trials compared to GO trials. Much larger regions of significant activation can be seen in the 5 mm smoothed SPMs, both cortically and subcortically. This comparison demonstrates the prominent consequences that preprocessing pipelines can have on the overall analysis of functional data.

Comparison of group-level SPMs for the SS > GO contrast using different smoothing kernels. SPMs resulting from GLMs computed on base-level spatially smoothed data can be seen on the top row, with SPMs resulting from GLMs computed on data spatially smoothed with a FHWM of 5 mm. Activation colours indicate FDR thresholded (q < .05) z-values. Two sagittal, one axial, and one zoomed in coronal view are shown. Coloured contour lines indicate regions of interest (IFG in white, M1 in grey, preSMA in orange, Caudate in dark blue, Putamen in light blue, GPe in dark green, GPi in light green, SN in pink, STN in red, thalamus in yellow, and VTA in black). The background template and coordinates are in MNI2009c (1mm). FS, failed stop; SS, successful stop.


The functional network underlying response inhibition in the human brain has been a pinnacle research question in the cognitive neurosciences for decades. The basal ganglia specifically have been implicated in broad movement control since the early twentieth century (Wilson, 1912). However, the role of these subcortical structures in successful response inhibition is unclear. Evidence for the role of the basal ganglia in response inhibition comes from a multitude of studies citing significant activation of either the SN, STN or GPe during successful inhibition trials (Aron, 2007; Aron & Poldrack, 2006; Mallet et al., 2016; Nambu et al., 2002; Zhang & Iwaki, 2019). Here, we provide evidence that the canonical inhibition pathways, both indirect and hyperdirect, may not be recruited during successful response inhibition during the SST. We expected to find increased activation in the nodes of the indirect pathway (e.g., the STN, SN, GPe) during successful stop compared to go or failed stop trials, but we could not find evidence for this. Instead, we find a large recruitment of subcortical nodes, including the SN, STN, Tha, and VTA during failed stop trials (when compared to go trials). Due to the larger degree of positive BOLD responses in subcortical structures (SN, STN, Tha, and VTA) in the FS > GO and FS > SS contrasts, there is an indication that successful inhibition does not rely on the recruitment of these nodes. Moreover, it appears that failing to inhibit one’s action drives the utilisation of these nodes, pointing to the need to ascribe a separate function to these networks.

These results are in contention to previous fMRI studies of the stop signal task as well as an array of studies using different measurement techniques such as local field potential recordings (Alegre et al., 2013; Benis et al., 2014; Wessel et al., 2016). Furthermore, FS trials have been relatively understudied, as they do not cleanly reflect the inhibition network that is rightly the main area of attention. It is likely that the inhibition network is at least partially active even when inhibition is unsuccessful. From both the voxel-wise and ROI-wise analyses it is clear that the VTA plays a role in the mechanisms underlying FS trials. Investigation into the specific role of the VTA and dopaminergic system in response inhibition has led to conflicting results (Aron et al., 2003; Boonstra et al., 2005). Inhibition of the VTA has been seen to increase the number of premature responses in a five-choice serial reaction time task in rats (Flores-Dourojeanni et al., 2021), the VTA appears to degenerate in Parkinson’s Disease (Alberico et al., 2015), and is associated with reward uncertainty in response inhibition tasks (Tennyson et al., 2018). Feedback in the SST is inherent in its design; a failure to stop simultaneously triggers the realization that an error was made, without the need for explicit task feedback. Speculatively, a failure to stop could trigger nodes of the mesolimbic pathway in response to this action error, as the VTA is known to respond to reward prediction errors (Bayer et al., 2005; Eshel et al., 2015; Schultz et al., 1997).

This is not the first aggregatory study that has found limited activation of basal ganglia regions when specifically looking at successful response inhibition (Hung et al., 2018; Isherwood, Keuken, et al., 2021; R. Zhang et al., 2017). Although, from our literature search, this is the first study to take advantage of an array of unprocessed open-access data. Now that it is becoming increasingly common for researchers to make such detailed data available, it is beneficial for the research field to move on from meta-analytical methods that use only summary measures of activation profiles. Although, it should be noted, that canonical methods of meta-analysis still provide advantages over more processing-intensive methods such as applied here. Firstly, on the scale of five datasets, this methodology was applied and completed relatively quickly, but standard meta-analyses can include tens if not hundreds of studies, a feat that would be difficult to manage for the method described here. Secondly, the sample reported here is somewhat biased, it includes only data that were openly accessible in full, it is likely there are many more studies that would be useful to answer the research question posited here. Simpler methods of meta-analysis, where only coordinates or summary measures are needed to aggregate data, benefit from having access to a much wider range of potential sources of data. What we have been able to do here, even with a limited number of datasets, is process all the data with the same set of criteria. We therefore benefitted from sets of extremely well-vetted behavioural and functional data, that can have all aspects of the datasets compared to one another. This allowed us to tightly control aspects of the preprocessing pipeline that can affect later analyses steps, such as distortion correction and smoother kernel sizes.

The consequences of spatial smoothing on statistical analyses are well known and can have huge effects on group or subject-level inferences (Chen & Calhoun, 2018; Mikl et al., 2008). Here, we have shown again the substantial effect smoothing can have on the conclusions drawn from task-specific GLMs. Based on the results of the smoothing comparison and the differences in optimal kernel sizes for each dataset, ROI-analyses may offer superior statistical testing to that of voxel-wise methods as you do not introduce a loss of specificity. ROI-wise methods also have the added benefit of not needing to warp images from individual space to common or group templates, which may also introduce a loss of specificity when looking at smaller structures, such as those in the subcortex. However, ROI-wise methods are only as good as the predefined atlases used in the analysis. MRI may benefit from high spatial resolution in comparison to other neuroimaging methods, but there are still subpopulations of nuclei, such as those within the STN, that may have different roles in response inhibition and are not easily distinguishable (Mosher et al., 2021). MRI is of course also disadvantaged by its poor temporal resolution, a dimension that hinders the ability to dissociate different mechanisms occurring during the course of a trial (e.g., attention, detection of salient events; Benis et al., 2016). Methodologies with enhanced temporal resolution, such as electroencephalography (EEG), will also benefit from the wave of open-access data and can focus on research questions that MRI currently cannot, including disentangling the mixed cognitive processes underlying response inhibition.

The results presented here add to a growing literature exposing inconsistencies in our understanding of the networks underlying successful response inhibition. Based on the five datasets analysed, neither the indirect nor hyperdirect pathways show a greater degree of activation on SS trials compared to GO trials, or indeed compared to FS trials. It is evident that how response inhibition is implemented in the human brain requires more attention. Adaptations of the classical SST are already being deployed to aid in disentangling the role attention and signal detection in overall response inhibition (Boecker et al., 2013; Bryden et al., 2015). This paper serves as a proof of concept for methods of meta-analysis that allow the unification of largely unprocessed or unreduced datasets and exemplifies the huge opportunities that open-access data sharing can bring to the research field. As more and more datasets are made publicly available, researchers will be able to perform meta-analyses not only on summary data, but datasets with a rich body of parameters and data points.