Multi-study fMRI outlooks on subcortical BOLD responses in the stop-signal paradigm

  1. Integrative Model-Based Cognitive Neuroscience Research Unit, University of Amsterdam, Amsterdam, the Netherlands
  2. Sensorimotor Neuroscience and Ageing Research Lab, School of Psychological Sciences, University of Tasmania, Hobart, Australia
  3. Department of Psychology, Faculty of Social Sciences, Leiden University, the Netherlands
  4. Full brain picture Analytics, Leiden, the Netherlands

Peer review process

Revised: This Reviewed Preprint has been revised by the authors in response to the previous round of peer review; the eLife assessment and the public reviews have been updated where necessary by the editors and peer reviewers.

Read more about eLife’s peer review process.

Editors

  • Reviewing Editor
    David Badre
    Brown University, Providence, United States of America
  • Senior Editor
    Floris de Lange
    Donders Institute for Brain, Cognition and Behaviour, Nijmegen, Netherlands

Reviewer #1 (Public Review):

This study is one in a series of excellent papers by the Forstmann group focusing on the ability of fMRI to reliably detect activity in small subcortical nuclei - in this case, specifically those purportedly involved in the hyper- and indirect inhibitory basal ganglia pathways. I have been very fond of this work for a long time, beginning with the demonstration of De Hollander, Forstmann et al. (HBM 2017) of the fact that 3T fMRI imaging (as well as many 7T imaging sequences) do not afford sufficient signal to noise ratio to reliably image these small subcortical nuclei. This work has done a lot to reshape my view of seminal past studies of subcortical activity during inhibitory control, including some that have several thousand citations.

Comments on revised version:

This is my second review of this article, now entitled "Multi-study fMRI outlooks on subcortical BOLD responses in the stop-signal paradigm" by Isherwood and colleagues.

The authors have been very responsive to the initial round of reviews.

I still think it would be helpful to see a combined investigation of the available 7T data, just to really drive the point home that even with the best parameters and a multi-study sample size, fMRI cannot detect any increases in BOLD activity on successful stop compared to go trials. However, I agree with the authors that these "sub samples still lack the temporal resolution seemingly required for looking at the processes in the SST."

As such, I don't have any more feedback.

Reviewer #2 (Public Review):

This work aggregates data across 5 openly available stopping studies (3 at 7 tesla and 2 at 3 tesla) to evaluate activity patterns across the common contrasts of Failed Stop (FS) > Go, FS > stop success (SS), and SS > Go. Previous work has implicated a set of regions that tend to be positively active in one or more of these contrasts, including the bilateral inferior frontal gyrus, preSMA, and multiple basal ganglia structures. However, the authors argue that upon closer examination, many previous papers have not found subcortical structures to be more active on SS than FS trials, bringing into question whether they play an essential role in (successful) inhibition. In order to evaluate this with more data and power, the authors aggregate across five datasets and find many areas that are *more* active for FS than SS, including bilateral preSMA, GPE, thalamus, and VTA. They argue that this brings into question the role of these areas in inhibition, based upon the assumption that areas involved in inhibition should be more active on successful stop than failed stop trials, not the opposite as they observed.

Since the initial submission, the authors have improved their theoretical synthesis and changed their SSRT calculation method to the more appropriate integration method with replacement for go omissions. They have also done a better job of explaining how these fMRI results situate within the broader response inhibition literature including work using other neuroscience methods.

They have also included a new Bayes Factor analysis. In the process of evaluating this new analysis, I recognized the following comments that I believe justify additional analyses and discussion:

First, if I understand the author's pipeline, for the ROI analyses it is not appropriate to run FSL's FILM method on the data that were generated by repeating the same time series across all voxels of an ROI. FSL's FILM uses neighboring voxels in parts of the estimation to stabilize temporal correlation and variance estimates and was intended and evaluated for use on voxelwise data. Instead, I believe it would be more appropriate to average the level 1 contrast estimates over the voxels of each ROI to serve as the dependent variables in the ROI analysis.

Second, for the group-level ROI analyses there seems to be inconsistencies when comparing the z-statistics (Figure 3) to the Bayes Factors (Figure 4) in that very similar z-statistics have very different Bayes Factors within the same contrast across different brain areas, which seemed surprising (e.g., a z of 6.64 has a BF of .858 while another with a z of 6.76 has a BF of 3.18). The authors do briefly discuss some instances in the frequentist and Bayesian results differ, but they do not ever explain by similar z-stats yield very different bayes factors for a given contrast across different brain areas. I believe a discussion of this would be useful.

Third, since the Bayes Factor analysis appears to be based on repeated measures ANOVA and the z-statistics are from Flame1+2, the BayesFactor analysis model does not pair with the frequentist analysis model very cleanly. To facilitate comparison, I would recommend that the same repeated measures ANOVA model should be used in both cases. My reading of the literature is that there is no need to be concerned about any benefits of using Flame being lost, since heteroscedasticity does not impact type I errors and will only potentially impact power (Mumford & Nichols, 2009 NeuroImage).

Fourth, though frequentist statistics suggest that many basal ganglia structures are significantly more active in the FS > SS contrast (see 2nd row of Figure 3), the Bayesian analyses are much more equivocal, with no basal ganglia areas showing Log10BF > 1 (which would be indicative of strong evidence). The authors suggest that "the frequentist and Bayesian analyses are monst in line with one another", but in my view, this frequentist vs. Bayesian analysis for the FS > SS contrast seems to suggest substantially different conclusions. More specifically, the frequentist analyses suggest greater activity in FS than SS in most basal ganglia ROIs (all but 2), but the Bayesian analysis did not find *any* basal ganglia ROIs with strong evidence for the alternative hypothesis (or a difference), and several with more evidence for the null than the alternative hypothesis. This difference between the frequentist and Bayesian analyses seems to warrant discussion, but unless I overlooked it, the Bayesian analyses are not mentioned in the Discussion at all. In my view, the frequentist analyses are treated as the results, and the Bayesian analyses were largely ignored.

Overall, I think this paper makes a useful and mostly solid contribution to the literature. I have made some suggestions for adjustments and clarification of the neuroimaging pipeline and Bayesian analyses that I believe would strengthen the work further.

Author response:

The following is the authors’ response to the original reviews.

Reviewer #1:

This is my first review of the article entitled "The canonical stopping network: Revisiting the role of the subcortex in response inhibition" by Isherwood and colleagues. This study is one in a series of excellent papers by the Forstmann group focusing on the ability of fMRI to reliably detect activity in small subcortical nuclei - in this case, specifically those purportedly involved in the hyper- and indirect inhibitory basal ganglia pathways. I have been very fond of this work for a long time, beginning with the demonstration of De Hollander, Forstmann et al. (HBM 2017) of the fact that 3T fMRI imaging (as well as many 7T imaging sequences) do not afford sufficient signal to noise ratio to reliably image these small subcortical nuclei. This work has done a lot to reshape my view of seminal past studies of subcortical activity during inhibitory control, including some that have several thousand citations.

In the current study, the authors compiled five datasets that aimed to investigate neural activity associated with stopping an already initiated action, as operationalized in the classic stop-signal paradigm. Three of these datasets are taken from their own 7T investigations, and two are datasets from the Poldrack group, which used 3T fMRI.

The authors make six chief points:

(1) There does not seem to be a measurable BOLD response in the purportedly critical subcortical areas in contrasts of successful stopping (SS) vs. going (GO), neither across datasets nor within each individual dataset. This includes the STN but also any other areas of the indirect and hyperdirect pathways.

(2) The failed-stop (FS) vs. GO contrast is the only contrast showing substantial differences in those nodes.

(3) The positive findings of STN (and other subcortical) activation during the SS vs. GO contrast could be due to the usage of inappropriate smoothing kernels.

(4) The study demonstrates the utility of aggregating publicly available fMRI data from similar cognitive tasks.

(5) From the abstract: "The findings challenge previous functional magnetic resonance (fMRI) of the stop-signal task"

(6) and further: "suggest the need to ascribe a separate function to these networks."

I strongly and emphatically agree with points 1-5. However, I vehemently disagree with point 6, which appears to be the main thrust of the current paper, based on the discussion, abstract, and - not least - the title.

To me, this paper essentially shows that fMRI is ill-suited to study the subcortex in the specific context of the stop-signal task. That is not just because of the issues of subcortical small-volume SNR (the main topic of this and related works by this outstanding group), but also because of its limited temporal resolution (which is unacknowledged, but especially impactful in the context of the stop-signal task). I'll expand on what I mean in the following.

First, the authors are underrepresenting the non-fMRI evidence in favor of the involvement of the subthalamic nucleus (STN) and the basal ganglia more generally in stopping actions.

- There are many more intracranial local field potential recording studies that show increased STN LFP (or even single-unit) activity in the SS vs. FS and SS vs. GO contrast than listed, which come from at least seven different labs. Here's a (likely non-exhaustive) list of studies that come to mind:

Ray et al., NeuroImage 2012
Alegre et al., Experimental Brain Research 2013
Benis et al., NeuroImage 2014
Wessel et al., Movement Disorders 2016
Benis et al., Cortex 2016
Fischer et al., eLife 2017
Ghahremani et al., Brain and Language 2018
Chen et al., Neuron 2020
Mosher et al., Neuron 2021
Diesburg et al., eLife 2021

- Similarly, there is much more evidence than cited that causally influencing STN via deep-brain stimulation also influences action-stopping. Again, the following list is probably incomplete:

Van den Wildenberg et al., JoCN 2006
Ray et al., Neuropsychologia 2009
Hershey et al., Brain 2010
Swann et al., JNeuro 2011
Mirabella et al., Cerebral Cortex 2012
Obeso et al., Exp. Brain Res. 2013
Georgiev et al., Exp Br Res 2016
Lofredi et al., Brain 2021
van den Wildenberg et al, Behav Brain Res 2021
Wessel et al., Current Biology 2022

- Moreover, evidence from non-human animals similarly suggests critical STN involvement in action stopping, e.g.:

Eagle et al., Cerebral Cortex 2008
Schmidt et al., Nature Neuroscience 2013
Fife et al., eLife 2017
Anderson et al., Brain Res 2020

Together, studies like these provide either causal evidence for STN involvement via direct electrical stimulation of the nucleus or provide direct recordings of its local field potential activity during stopping. This is not to mention the extensive evidence for the involvement of the STN - and the indirect and hyperdirect pathways in general - in motor inhibition more broadly, perhaps best illustrated by their damage leading to (hemi)ballism.

Hence, I cannot agree with the idea that the current set of findings "suggest the need to ascribe a separate function to these networks", as suggested in the abstract and further explicated in the discussion of the current paper. For this to be the case, we would need to disregard more than a decade's worth of direct recording studies of the STN in favor of a remote measurement of the BOLD response using (provably) sub ideal imaging parameters. There are myriads of explanations of why fMRI may not be able to reveal a potential ground-truth difference in STN activity between the SS and FS/GO conditions, beginning with the simple proposition that it may not afford sufficient SNR, or that perhaps subcortical BOLD is not tightly related to the type of neurophysiological activity that distinguishes these conditions (in the purported case of the stop-signal task, specifically the beta band). But essentially, this paper shows that a specific lens into subcortical activity is likely broken, but then also suggests dismissing existing evidence from superior lenses in favor of the findings from the 'broken' lens. That doesn't make much sense to me.

Second, there is actually another substantial reason why fMRI may indeed be unsuitable to study STN activity, specifically in the stop-signal paradigm: its limited time resolution. The sequence of subcortical processes on each specific trial type in the stop-signal task is purportedly as follows: at baseline, the basal ganglia exert inhibition on the motor system. During motor initiation, this inhibition is lifted via direct pathway innervation. This is when the three trial types start diverging. When actions then have to be rapidly cancelled (SS and FS), cortical regions signal to STN via the hyperdirect pathway that inhibition has to be rapidly reinstated (see Chen, Starr et al., Neuron 2020 for direct evidence for such a monosynaptic hyperdirect pathway, the speed of which directly predicts SSRT). Hence, inhibition is reinstated (too late in the case of FS trials, but early enough in SS trials, see recordings from the BG in Schmidt, Berke et al., Nature Neuroscience 2013; and Diesburg, Wessel et al., eLife 2021).

Hence, according to this prevailing model, all three trial types involve a sequence of STN activation (initial inhibition), STN deactivation (disinhibition during GO), and STN reactivation (reinstantiation of inhibition during the response via the hyperdirect pathway on SS/FS trials, reinstantiation of inhibition via the indirect pathway after the response on GO trials). What distinguishes the trial types during this period is chiefly the relative timing of the inhibitory process (earliest on SS trials, slightly later on FS trials, latest on GO trials). However, these temporal differences play out on a level of hundreds of milliseconds, and in all three cases, processing concludes well under a second overall. To fMRI, given its limited time resolution, these activations are bound to look quite similar.

Lastly, further building on this logic, it's not surprising that FS trials yield increased activity compared to SS and GO trials. That's because FS trials are errors, which are known to activate the STN (Cavanagh et al., JoCN 2014; Siegert et al. Cortex 2014) and afford additional inhibition of the motor system after their occurrence (Guan et al., JNeuro 2022). Again, fMRI will likely conflate this activity with the abovementioned sequence, resulting in a summation of activity and the highest level of BOLD for FS trials.

In sum, I believe this study has a lot of merit in demonstrating that fMRI is ill-suited to study the subcortex during the SST, but I cannot agree that it warrants any reappreciation of the subcortex's role in stopping, which are not chiefly based on fMRI evidence.

We would like to thank reviewer 1 for their insightful and helpful comments. We have responded point-by-point below and will give an overview of how we reframed the paper here.

We agree that there is good evidence from other sources for the presence of the canonical stopping network (indirect and hyperdirect) during action cancellation, and that this should be reflected more in the paper. However, we do not believe that a lack of evidence for this network during the SST makes fMRI ill-suited for studying this task, or other tasks that have neural processes occurring in quick succession. What we believe the activation patterns of fMRI reflect during this task, is the large of amount of activation caused by failed stops. That is, that the role of the STN in error processing may be more pronounced that its role in action cancellation. Due to the replicability of fMRI results, especially at higher field strengths, we believe the activation profile of failed stop trials reflects a paramount role for the STN in error processing. Therefore, while we agree we do not provide evidence against the role of the STN in action cancellation, we do provide evidence that our outlook on subcortical activation during different trial types of this task should be revisited. We have reframed the article to reflect this, and discuss points such as fMRI reliability, validity and the complex overlapping of cognitive processes in the SST in the discussion. Please see all changes to the article indicated by red text.

A few other points:

- As I said before, this team's previous work has done a lot to convince me that 3T fMRI is unsuitable to study the STN. As such, it would have been nice to see a combination of the subsamples of the study that DID use imaging protocols and field strengths suitable to actually study this node. This is especially true since the second 3T sample (and arguably, the Isherwood_7T sample) does not afford a lot of trials per subject, to begin with.

Unfortunately, this study already comprises of the only 7T open access datasets available for the SST. Therefore, unless we combined only the deHollander_7T and Miletic_7T subsamples there is no additional analysis we can do for this right now. While looking at just the sub samples that were 7T and had >300 trials would be interesting, based on the new framing of the paper we do not believe it adds to the study, as the sub samples still lack the temporal resolution seemingly required for looking at the processes in the SST.

- What was the GLM analysis time-locked to on SS and FS trials? The stop-signal or the GO-signal?

SS and FS trials were time-locked to the GO signal as this is standard practice. The main reason for this is that we use contrasts to interpret differences in activation patterns between conditions. By time-locking the FS and SS trials to the stop signal, we are contrasting events at different time points, and therefore different stages of processing, which introduces its own sources of error. We agree with the reviewer, however, that a separate analysis with time-locking on the stop-signal has its own merit, and now include results in the supplementary material where the FS and SS trials are time-locked to the stop signal as well.

- Why was SSRT calculated using the outdated mean method?

We originally calculated SSRT using the mean method as this was how it was reported in the oldest of the aggregated studies. We have now re-calculated the SSRTs using the integration method with go omission replacement and thank the reviewer for pointing this out. Please see response to comment 3.

- The authors chose 3.1 as a z-score to "ensure conservatism", but since they are essentially trying to prove the null hypothesis that there is no increased STN activity on SS trials, I would suggest erring on the side of a more lenient threshold to avoid type-2 error.

We have used minimum FDR-corrected thresholds for each contrast now, instead of using a blanket conservative threshold of 3.1 over all contrasts. The new thresholds for each contrast are shown in text. Please see below (page 12):

“The thresholds for each contrast are as follows: 3.01 for FS > GO, 2.26 for FS > SS and 3.1 for SS > GO.”

- The authors state that "The results presented here add to a growing literature exposing inconsistencies in our understanding of the networks underlying successful response inhibition". It would be helpful if the authors cited these studies and what those inconsistencies are.

We thank reviewer 1 for their detailed and thorough evaluation of our paper. Overall, we agree that there is substantial direct and indirect evidence for the involvement of the cortico-basal-ganglia pathways in response inhibition. We have taken the vast constructive criticism on board and agree with the reviewer that the paper should be reframed. We would like to thank the reviewer for the thoroughness of their helpful comments aiding the revising of the paper.

(1) I would suggest reframing the study, abstract, discussion, and title to reflect the fact that the study shows that fMRI is unsuitable to study subcortical activity in the SST, rather than the fact that we need to question the subcortical model of inhibition, given the reasons in my public review.

We agree with the reviewer that the article should be reframed and not taken as direct evidence against the large sum of literature pointing towards the involvement of the cortico-basal-ganglia pathway in response inhibition. We have significantly rewritten the article in light of this.

(2) I suggest combining the datasets that provide the best imaging parameters and then analyzing the subcortical ROIs with a more lenient threshold and with regressors time-locked to the stop-signals (if that's not already the case). This would make the claim of a null finding much more impactful. Some sort of power analysis and/or Bayes factor analysis of evidence for the null would also be appreciated.

Instead of using a blanket conservative threshold of 3.1, we instead used only FDR-corrected thresholds. The threshold level is therefore different for each contrast and noted in the figures. We have also added supplementary figures including the group-level SPMs and ROI analyses when the FS and SS trials were time-locked to the stop signal instead of the GO signal (Supplementary Figs 4 & 5). But as mentioned above, due to the difference in time points when contrasting, we believe that time-locking to the GO signal for all trial types makes more sense for the main analysis.

We have now also computed BFs on the first level ROI beta estimates for all contrasts using the BayesFactor package as implemented in R. We add the following section to the methods and updated the results section accordingly (page 8):

“In addition to the frequentist analysis we also opted to compute Bayes Factors (BFs) for each contrast per ROI per hemisphere. To do this, we extracted the beta weights for each individual trial type from our first level model. We then compared the beta weights from each trial type to one another using the ‘BayesFactor’ package as implement in R (Morey & Rouder, 2015). We compared the full model comprising of trial type, dataset and subject as predictors to the null model comprising of only the dataset and subject as predictor. The datasets and subjects were modeled as random factors. We divided the resultant BFs from the full model by the null model to provide evidence for or against a significant difference in beta weights for each trial type. To interpret the BFs, we used a modified version of Jeffreys’ scale (Jeffreys, 1939; Lee & Wagenmakers, 2014).”

(3) I suggest calculating SSRT using the integration method with the replacement of Go omissions, as per the most recent recommendation (Verbruggen et al., eLife 2019).

We agree we should have used a more optimal method for SSRT estimation. We have replaced our original estimations with that of the integration method with go omissions replacement, as suggested and adapted the results in table 3.

We have also replaced text in the methods sections to reflect this (page 5):

“For each participant, the SSRT was calculated using the mean method, estimated by subtracting the mean SSD from median go RT (Aron & Poldrack, 2006; Logan & Cowan, 1984).”

Now reads:

“For each participant, the SSRT was calculated using the integration method with replacement of go omissions (Verbruggen et al., 2019), estimated by integrating the RT distribution and calculating the point at which the integral equals p(respond|signal). The completion time of the stop process aligns with the nth RT, where n equals the number of RTs in the RT distribution of go trials multiplied by the probability of responding to a signal.”

Reviewer #2:

This work aggregates data across 5 openly available stopping studies (3 at 7 tesla and 2 at 3 tesla) to evaluate activity patterns across the common contrasts of Failed Stop (FS) > Go, FS > stop success (SS), and SS > Go. Previous work has implicated a set of regions that tend to be positively active in one or more of these contrasts, including the bilateral inferior frontal gyrus, preSMA, and multiple basal ganglia structures. However, the authors argue that upon closer examination, many previous papers have not found subcortical structures to be more active on SS than FS trials, bringing into question whether they play an essential role in (successful) inhibition. In order to evaluate this with more data and power, the authors aggregate across five datasets and find many areas that are *more* active for FS than SS, specifically bilateral preSMA, caudate, GPE, thalamus, and VTA, and unilateral M1, GPi, putamen, SN, and STN. They argue that this brings into question the role of these areas in inhibition, based upon the assumption that areas involved in inhibition should be more active on successful stop than failed stop trials, not the opposite as they observed.

As an empirical result, I believe that the results are robust, but this work does not attempt a new theoretical synthesis of the neuro-cognitive mechanisms of stopping. Specifically, if these many areas are more active on failed stop than successful stop trials, and (at least some of) these areas are situated in pathways that are traditionally assumed to instantiate response inhibition like the hyperdirect pathway, then what function are these areas/pathways involved in? I believe that this work would make a larger impact if the author endeavored to synthesize these results into some kind of theoretical framework for how stopping is instantiated in the brain, even if that framework may be preliminary.

I also have one main concern about the analysis. The authors use the mean method for computing SSRT, but this has been shown to be more susceptible to distortion from RT slowing (Verbruggen, Chambers & Logan, 2013 Psych Sci), and goes against the consensus recommendation of using the integration with replacement method (Verbruggen et al., 2019). Therefore, I would strongly recommend replacing all mean SSRT estimates with estimates using the integration with replacement method.

I found the paper clearly written and empirically strong. As I mentioned in the public review, I believe that the main shortcoming is the lack of theoretical synthesis. I would encourage the authors to attempt to synthesize these results into some form of theoretical explanation. I would also encourage replacing the mean method with the integration with replacement method for computing SSRT. I also have the following specific comments and suggestions (in the approximate order in which they appear in the manuscript) that I hope can improve the manuscript:

We would like to thank reviewer 2 for their insightful and interesting comments. We have adapted our paper to reflect these comments. Please see direct responses to your comments below. We agree with the reviewer that some type of theoretical synthesis would help with the interpretability of the article. We have substantially reworked the discussion and included theoretical considerations behind the newer narrative. Please see all changes to the article indicated by red text.

(1) The authors say "performance on successful stop trials is quantified by the stop signal reaction time". I don't think this is technically accurate. SSRT is a measure of the average latency of the stop process for all trials, not just for the trials in which subjects successfully stop.

Thank you for pointing this technically incorrect statement. We have replaced the above sentence with the following (page 1):

“Inhibition performance in the SST as a whole is quantified by the stop signal reaction time (SSRT), which estimates the speed of the latent stopping process (Verbruggen et al., 2019).”

(2) The authors say "few studies have detected differences in the BOLD response between FS and SS trials", but then do not cite any papers that detected differences until several sentences later (de Hollander et al., 2017; Isherwood et al., 2023; Miletic et al., 2020). If these are the only ones, and they only show greater FS than SS, then I think this point could be made more clearly and directly.

We have moved the citations to the correct place in the text to be clearer. We have also rephrased this part of the introduction to make the points more direct (page 2).

“In the subcortex, functional evidence is relatively inconsistent. Some studies have found an increase in BOLD response in the STN in SS > GO contrasts (Aron & Poldrack, 2006; Coxon et al., 2016; Gaillard et al., 2020; Yoon et al., 2019), but others have failed to replicate this (Bloemendaal et al., 2016; Boehler et al., 2010; Chang et al., 2020; B. Xu et al., 2015). Moreover, some studies have actually found higher STN, SN and thalamic activation in failed stop trials, not successful ones (de Hollander et al., 2017; Isherwood et al., 2023; Miletić et al., 2020).

(3) Unless I overlooked it, I don't believe that the author specified the criterion that any given subject is excluded based upon. Given some studies have significant exclusions (e.g., Poldrack_3T), I think being clear about how many subjects violated each criterion would be useful.

This is indeed interesting and important information to include. We have added the number of participants who were excluded for each criterion. Please see added text below (page 4):

“Based on these criteria, no subjects were excluded from the Aron_3T dataset. 24 subjects were excluded from the Poldrack_3T dataset (3 based on criterion 1, 9 on criterion 2, 11 on criterion 3, and 8 on criterion 4). Three subjects were excluded from the deHollander_7T dataset (2 based on criterion 1 and 1 on criterion 2). Five subjects were excluded from the Isherwood_7T dataset (2 based on criterion 1, 1 on criterion 2, and 2 on criterion 4). Two subjects were excluded from the Miletic_7T dataset (1 based on criterion 2 and 1 on criterion 4). Note that some participants in the Poldrack_3T study failed to meet multiple inclusion criteria.”

(4) The Method section included very exhaustive descriptions of the neuroimaging processing pipeline, which was appreciated. However, it seems that much of what is presented is not actually used in any of the analyses. For example, it seems that "functional data preprocessing" section may be fMRIPrep boilerplate, which again is fine, but I think it would help to clarify that much of the preprocessing was not used in any part of the analysis pipeline for any results. For example, at first blush, I thought the authors were using global signal regression, but after a more careful examination, I believe that they are only computing global signals but never using them. Similarly with tCompCor seemingly being computed but not used. If possible, I would recommend that the authors share code that instantiates their behavioral and neuroimaging analysis pipeline so that any confusion about what was actually done could be programmatically verified. At a minimum, I would recommend more clearly distinguishing the pipeline steps that actually went into any presented analyses.

We thank the reviewer for finding this inconsistency. The methods section indeed uses the fMRIprep boilerplate text, which we included so to be as accurate as possible when describing the preprocessing steps taken. While we believe leaving the exact boilerplate text that fMRIprep gives us is the most accurate method to show our preprocessing, we have adapted some of the text to clarify which computations were not used in the subsequent analysis. As a side-note, for future reference, we’d like to add that the fmriprep authors expressly recommend users to report the boilerplate completely and unaltered, and as such, we believe this may become a recurring issue (page 7).

“While many regressors were computed in the preprocessing of the fMRI data, not all were used in the subsequent analysis. The exact regressors used for the analysis can be found above. For example, tCompCor and global signals were calculated in our generic preprocessing pipeline but not part of the analysis. The code used for preprocessing and analysis can be found in the data and code availability statement.”

(5) What does it mean for the Poldrack_3T to have N/A for SSD range? Please clarify.

Thank you for pointing out this omission. We had not yet found the possible SSD range for this study. We have replaced this value with the correct value (0 – 1000 ms).

(6) The SSD range of 0-2000ms for deHollander_7T and Miletic_7T seems very high. Was this limit ever reached or even approached? SSD distributions could be a useful addition to the supplement.

Thank you for also bringing this mistake to light. We had accidentally placed the max trial duration in these fields instead of the max allowable SSD value. We have replaced the correct value (0 – 900 ms).

(7) The author says "In addition, median go RTs did not correlate with mean SSRTs within datasets (Aron_3T: r = .411, p = .10, BF = 1.41; Poldrack_3T: r = .011, p = .91, BF = .23; deHollander_7T: r = -.30, p = .09, BF = 1.30; Isherwood_7T: r = .13, p = .65, BF = .57; Miletic_7T: r = .37, p = .19, BF = 1.02), indicating independence between the stop and go processes, an important assumption of the horse-race model (Logan & Cowan, 1984)." However, the independent race model assumes context independence (the finishing time of the go process is not affected by the presence of the stop process) and stochastic independence (the duration of the go and stop processes are independent on a given trial). This analysis does not seem to evaluate either of these forms of independence, as it correlates RT and SSRT across subjects, so it was unclear how this analysis evaluated either of the types of independence that are assumed by the independent race model. Please clarify or remove.

Thank you for this comment. We realize that this analysis indeed does not evaluate either context or stochastic independence and therefore we have removed this from the manuscript.

(8) The RTs in Isherwood_7T are considerably slower than the other studies, even though the go stimulus+response is the same (very simple) stimulus-response mapping from arrows to button presses. Is there any difference in procedure or stimuli that might explain this difference? It is the only study with a visual stop signal, but to my knowledge, there is no work suggesting visual stop signals encourage more proactive slowing. If possible, I think a brief discussion of the unusually slow RTs in Isherwood_7T would be useful.

We have included the following text in the manuscript to reflect this observed difference in RT between the Isherwood_7T dataset and the other datasets (page 9).

“Longer RTs were found in the Isherwood_7T dataset in comparison to the four other datasets. The only difference in procedure in the Isherwood_7T dataset is the use of a visual stop signal as opposed to an auditory stop signal. This RT difference is consistent with previous research, where auditory stop signals and visual go stimuli have been associated with faster RTs compared to unimodal visual presentation (Carrillo-de-la-Peña et al., 2019; Weber et al., 2024). The mean SSRTs and probability of stopping are within normal range, indicating that participants understood the task and responded in the expected manner.”

(9) When the authors included both 3T and 7T data, I thought they were preparing to evaluate the effect of magnet strength on stop networks, but they didn't do this analysis. Is this because the authors believe there is insufficient power? It seems that this could be an interesting exploratory analysis that could improve the paper.

We thank the reviewer for this interesting comment. As our dataset sample contains only two 3T and three 7T datasets we indeed believe there is insufficient power to warrant such an analysis. In addition, we wanted the focus of this paper to be how fMRI examines the SST in general, and not differences between acquisition methods. With a greater number of datasets with different imaging parameters (especially TE or resolution) in addition to field strength, we agree such an analysis would be interesting, although beyond the scope of this article.

(10) The authors evaluate smoothing and it seems that the conclusion that they want to come to is that with a larger smoothing kernel, the results in the stop networks bleed into surrounding areas, producing false positive activity. However, in the absence of a ground truth of the true contributions of these areas, it seems that an alternative interpretation of the results is that the denser maps when using a larger smoothing kernel could be closer to "true" activation, with the maps using a smaller smoothing kernel missing some true activity. It seems worth entertaining these two possible interpretations for the smoothing results unless there is clear reason to conclude that the smoothed results are producing false positive activity.

We agree with the view of the reviewer on the interpretation of the smoothing results. We indeed cannot rule this out as a possible interpretation of the results, due to a lack of ground truth. We have added text to the article to reflect this view and discuss the types of errors we can expect for both smaller and larger smoothing kernels (page 15).

“In the absence of a ground truth, we are not able to fully justify the use of either larger or smaller kernels to analyse such data. On the one hand, aberrantly large smoothing kernels could lead to false positives in activation profiles, due to bleeding of observed activation into surrounding tissues. On the other side, too little smoothing could lead to false negatives, missing some true activity in surrounding regions. While we cannot concretely validate either choice, it should be noted that there is lower spatial uncertainty in the subcortex compared to the cortex, due to the lower anatomical variability. False positives from smoothing spatially unmatched signal, are more likely than false negatives. It may be more prudent for studies to use a range of smoothing kernels, to assess the robustness of their fMRI activation profiles.”

  1. Howard Hughes Medical Institute
  2. Wellcome Trust
  3. Max-Planck-Gesellschaft
  4. Knut and Alice Wallenberg Foundation